Top-K Radix Selection Algorithm

Problem:

You have an array of a billion numeric keys (e.g. 32-bit unsigned integers) and need the largest 11 of them.

Solutions:

  • You could sort this array of keys using a comparison-based sorting algorithm (e.g. C++ std::sort, C# Array.Sort) and then read the top 11 array elements. This takes O(nlgn) time.
  • You could sort using a non-comparison-based sorting algorithm (see Resources section below) and then read the top 11 array elements. This takes O(n) time.
  • You could use the Top-K Radix Selection algorithm described in this blog, and then read the top 11 array elements. This takes O(n) time, and is many times faster than the other methods above, since it performs less work.

The first two solutions do more work than is necessary, as they sort all of the array elements, including the top 11 elements. This was not required by the problem statement. The whole array was not required to be sorted. The top 11 elements were also not required to be sorted.

The last solution does not sort the array and does not provide the top 11 elements in sorted order. If the top 11 array elements need to be in sorted order, it’s much quicker to sort these 11 elements than to sort the entire billion-element array. The last solution is many times faster and is presented in this blog.

Example Implementation

K-top Radix Selection algorithm implementation is available in C# in the HPCsharp repository: https://github.com/DragonSpit/HPCsharp/blob/master/HPCsharp/SelectionKtopRadix.cs

C++ implementation is under development and will be part of the https://github.com/DragonSpit/ParallelAlgorithms repo.

How it works

Coming soon…

Resources

LSD Radix Sort (non-comparison linear-time sort):

  • C++ ParallelAlgorithms RadixSortLSDPowerOf2Radix_unsigned_TwoPhase_DeRandomize
  • C# HPCsharp SortRadixLsd

Leave a comment