Selection is an interesting algorithm related to sorting. k-th value Selection, provides the k-th largest value within an unsorted array of values without sorting the array. It does this in linear time, while using comparisons, which is faster than comparison sorting algorithms are capable of.
Comparison-based Selection algorithm was developed over 50 years ago. Non-comparison (Radix) Selection algorithms are more recent (within the last decade). Both methods have linear time performance and are in-place. It’s amazing to see the ingenuity of multiple ways to solve a problem.
In two previous blogs I described two Selection algorithms which are implemented using non-comparison (Radix) technique instead of the comparison-based Selection. These are Radix Selection and top-k Radix Selection.
The first one returns the k-th largest value within an unsorted array, while the second one return the top-k largest values within an unsorted array. For example, given an array of 200 student grades, we would like the 4-th highest grade – this is k-th value Selection, or just Selection.
Sometimes we want the top 12 grades, which is what top-k Selection provides, without having to sort the 200 student grades. It also provides these top 12 grades in any order – i.e. these top 12 grades are not sorted.
k[]-th Selection
k[]-th Selection is another variation of Selection, where an array of values (k[] array) is given to the Selection algorithm along with the input array. For example, an array of 200 grades are provides to the Selection algorithm and we want it to provide the 50-th, 100-th, and 150-th highest grades. We could have accomplished the same result by calling k-th Selection three times, but with k[]-th Selection it takes a single function call.
k[]-th Selection, when implemented using comparisons, has an interesting property – the array elements in between k[]-th elements have values that are in between. For example, after calling Selection, the 200 grades will be rearranged from left (lowest grades) to right (highest grades), such that 49 rightmost values will be higher grades than the 50-th we asked for. Then the 49 leftmost values will the the lower grades than the 150-th we asked for. Also, grades in between 50-th and 100-th will be in that quartile, and grades between 100-th and 150-th will be within that quartile.
In other words, the 200 grades array was partitioned into four quartiles of 50 grades each. Within each quartile (of 50 grades) the grades are not in a sorted order however. What we now have is an array of 200 grades which has been split into the top 25% of grades, the bottom 25% of grades, and 25-to-50% range of grades, and 50-75% range of grades – i.e. four quartiles of grades.
This partitioning property is not always needed, as sometimes we just want the 50-th, 100-th and 150-th largest values, but is a useful property in some use cases. Python numpy.partition implements a similar method to k[]-th Selection, but instead let’s you pick the elements within the input array to partition on.
k[]-th Radix Selection
Like k-th value Selection and top-k Selection, k[]-th Selection can be implemented using Radix method instead of using comparisons (see algorithm implementation). k[]-th Radix Selection for arrays of 32-bit unsigned integers is the new additional algorithm implemented in the HPCsharp C# nuget package – version 3.19.1 and higher (C++ version is coming shortly).
This implementation is single-core (so far) and is about 10X faster than using Array.Sort, and about 4X faster than using the in-place MSD Radix Sort which Selection is based on.
The k[]-th Radix Selection is in-place, and places the k[] values of the array in their respective indexes.
The partitioning property of the k[]-th Radix Selection is under investigation.
Other data types can also be supported: signed integers of standard size, floating-point, user defined data types with numeric keys within them. HPCsharp C# nuget package supports Radix sorting for these data types, and it is possible to add support for these for Selection.
How It Works
k[]-th Radix Selection algorithm starts with the In-Place MSD Radix Sort. After permuting the input array based on the most significant digit, instead of recursing into every non-empty bin, only the bins which contain one or more of the k[] array elements is recursed into. This method continues for the rest of the less significant digits, recursively processing only the bins that contain one or more of the k[] array elements.