Problem:
You have an array of a billion numeric keys (e.g. 32-bit unsigned integers) and need the largest 11 of them.
Solutions:
- You could sort this array of keys using a comparison-based sorting algorithm (e.g. C++ std::sort, C# Array.Sort) and then read the top 11 array elements. This takes O(nlgn) time.
- You could sort using a non-comparison-based sorting algorithm (see Resources section below) and then read the top 11 array elements. This takes O(n) time.
- You could use the Top-K Radix Selection algorithm described in this blog, and then read the top 11 array elements. This takes O(n) time, and is many times faster than the other methods above, since it performs less work.
The first two solutions do more work than is necessary, as they sort all of the array elements, including the top 11 elements. This was not required by the problem statement. The whole array was not required to be sorted. The top 11 elements were also not required to be sorted.
The last solution does not sort the entire array and also does not provide the top 11 elements in sorted order. If the top 11 array elements need to be in sorted order, it’s much quicker to sort these 11 elements than to sort the entire billion-element array. The last solution is many times faster, is in-place, and is presented in this blog.
Performance
The following Table shows performance of LSD Radix Sort with each optimization when sorting and array of 10 million unsigned 32-bit integers on a single core of Intel Core Ultra 9 275HX laptop computer. The units are millions of unsigned 32-bit integers per second.
| Algorithm | Random | Presorted | Constant |
| Array::Sort | 20 | 292 | 255 |
| LSD Radix Sort | 129 | 120 | 145 |
| Top-K Radix Selection | 607 | 560 | 570 |
Top-K Radix Selection was setup to select randomly between 1 and 100 of the largest (top) elements within the array. Top-K Radix Selection is between 3-4X faster than LSD Radix Sort, and over 30X faster than Array::Sort in the worst case. It is always faster than either algorithm for the three data distributions tested.
Top-K Radix Selection is faster for two reasons:
- it performs less work by not having to process all of the bins
- memory accesses are always sequential for reading and writing of the array
Example Implementation
Top-K Radix Selection algorithm implementation is available in C# in the HPCsharp repository: https://github.com/DragonSpit/HPCsharp/blob/master/HPCsharp/SelectionTopKRadix.cs
C++ implementation is under development and will be part of the https://github.com/DragonSpit/ParallelAlgorithms repo.
How it works
To achieve in-place linear time performance, the following steps are performed:
- Use 16-bit digits, which then requires two passes for 32-bit keys.
- Count array elements for each of the 64K bins (2^16) based on the most significant digit (16-bits) of each array element. Turn the counts into start indexes for each bin (suffix sum).
- Determine which bin the k-th element is in. This is simple since we know that k-th element is at index array[k-1] and we know start indexes of each bin. End index of each bin is start index of the next bin minus one.
- Move all elements that belong to the k-th bin (in which k-th element is) to the k-th bin.
- Move all elements in the k-th bin which should be to the right of k-th bin out of k-th bin, to the right of k-th bin.
- Move all element which are to the left of the k-th bin and belong to the k-th bin into the k-th bin. And, elements which belong to the right of the k-th bin to the right of the k-th bin.
- Recurse into the k-th bin if it has more than 1 element and use the next most significant digit (next 16-bits). Continue to the recurse until all bits of each key have been processed.
The above steps are similar to MSD Radix Sort and Radix Selection algorithms.
Resources
LSD Radix Sort (non-comparison linear-time sort):
- C++ ParallelAlgorithms RadixSortLSDPowerOf2Radix_unsigned_TwoPhase_DeRandomize
- C# HPCsharp SortRadixLsd