C++ PROGRAMMING :

HybridList : Fast N lg (N) sort algorithm for lists

Copyright (c) Dan Kozub, 1997-1999

Management of large dictionaries is always a big problem. That is why I decided to create  efficient and simple for user algorithm to manage very long sorted lists.  

HybridList algorithm resembles skip-lists but it differs considerably in basic principles and memory usage. I hate the idea of  allocating  memory blocks for array of pointers and predict number of nodes for non-existing list.

You can also download Java version of the algorithm from JAVIR homepage. Java applet works just 1.5-2 times slower than C++ program.

Concept

There are two types of nodes: user bottom-nodes (blue circles) and top-nodes (green rectangles) that  create a kind of hierarchy. When user adds/removes bottom-node to/from the list, 'parent' top-node on the first level is checked to keep 10-30 bottom nodes. If node has more than 30 nodes, then new node is created and both nodes have 15 subnodes. If node has less than 10 subnodes, its right neighbor is removed. Therefore hierarchy is always quite balanced.

QList_scheme.gif (6718 bytes)

Main features

Simple example

#inlude "HybridList.h"

class SampleNode : public HListNode <SampleNode> { //defines Prev,Next members
    public:
int  Value;        //random value to sort


        SampleNode(){ Value = rand(); }


bool operator ==(SampleNode& comp){return Value ==comp.Value;}
bool operator > (SampleNode& comp){return Value >comp.Value;}
bool operator < (SampleNode& comp){return Value <comp.Value;}
};

bool TEST(){
HybridList<SampleNode>  List;
for(int i=0;i<1000;i++) List.InsertSorted(new SampleNode);


SampleNode to_find;
SampleNode * first_equal = List.FindEqual(to_find);
while(true)
{
SampleNode * head = List.RemoveHead();
if (!head) break; //end of list achieved
printf("%ld->",head->Value);
delete head;
}
return true;}

Speed

There is no sense to compare O(lg n) algorithm with O(n) algorithm. But even  O(lg n) algorithms, like binary trees, red-black trees, skip-lists have different speed due to balancing techniques, memory usage, comparison function, etc. My preliminary tests of red-black tree, that seems to be the fastest algorithm for dictionaries, shows that HybridList is more than twice as fast. But it seems to me to be not so important with regard to memory usage issue: HybridList node require only additional 4 byte for 'Next' pointer and red-black node require 12-16 bytes.

The following diagram shows time in microseconds for one operation (insert and find). Test program was written in C++ (MS Visual C++ 6.0) for Windows 95/NT. Hardware: Pentium 200, 64M RAM, Win98. Delete takes roughly the same time as insert.
3.000.000 nodes  were allocated as a big array of unlinked nodes and then added to the sorted list. Each item had 32-bit random value for comparison. Time of operation was calculated as average of adding last 100.000 nodes.

HList_search_time.gif (14308 bytes)

HybridList algorithm can be used as general purpose, stable, insertion sort.
The next diagram show  time in seconds to sort a set of random 32-bit integers.
As you can see standard Quicksort is about 3 times as fast as HybridList sort.
That means that HybridList sort is not efficient for sorting static arrays. It can be used when user data is already linked in a list, or when key is quite big (>10 bytes).

HList_sort_time.gif (11355 bytes)

Source

The algorithm is implemented in C++ . In principle it does not require any additional libraries or system support. But I think some insignificant changes will be needed to make it run on other platforms than Windows 95/NT.

Source code - 6K

Example: HybridList_Test.exe (106K) - Win32 Consol application to test speed of insertion and find operations


HOME