Radix Partition

Like the Selection algorithm, described in several previous blogs, Partition is another algorithm closely related to sorting. Given a single value or an array of values, the Partition algorithm splits the array into sections with useful statistical properties, without sorting it, in linear time – i.e. O(n).

Single Value Partition

For example, C++ nth_element function rearranges the array in such as way, that the n-th array element has a value as if the array has been sorted. It also rearranges the array elements in such a way that array elements to the left of the n-th array index are all smaller than or equal to the values to the right of the n-th index.

In other words, the element at the n-th index partitions the array into smaller or equal values on the left and equal or larger values on the right. The contents of each of the two resulting partitions are not in sorted order.

Multiple Value Partition

Extending the Partition concept to more than one partition element, where one partition element leads to two partition regions (to the left and to the right or the partition element) is natural. Two partition elements results in three partition regions (left, middle and right), and three partition elements leads to four partition regions, and so on…

Each partition region has elements whose values are in between the two partition elements that are on the left and right sides of that region. However, the elements within a partition region are not in sorted order.

Distinct Array Value

When array values are distinct, Partition splits the array precisely.

Using Comparisons

Traditional method used for Partitioning is based on the QuickSelect algorithm, which uses comparisons. Partitioning is a biproduct of the selection process. Or, it could be that partitioning is a necessary step to achieve selection when using comparisons. It would be powerful to develop a proof of this.

Using Radix to Partition

Comparisons is not the only way to achieve partitioning in linear time. In-place MSD Radix Sort is linear-time O(N) and is a good starting point.

AlgorithmRandomPresortedConstant
nth_element3.60 seconds2.6 seconds0.70 seconds
In-Place MSD Radix Sort2.90 seconds2.5 seconds2.6 seconds

The above table is for processing an array of 100 Million 32-bit unsigned integers. The two algorithms are nearly the same performance. In-Place MSD Radix Sort lags for the particular case of all array elements being constant.

However, the In-Place MSD Radix Sort performs too much work by sorting all elements of the array. To create a partition algorithm out of it, after performing the first level of the In-Place MSD Radix Sort, which based on the most significant digit, recurse only into those bins where the partitioning indexes are.

AlgorithmRandomPresortedConstant
nth_element3.60 seconds2.6 seconds0.70 seconds
Radix Partition0.70 seconds0.53 seconds2.22 seconds

Radix Partition is significantly faster (5X) for random and presorted data distributions, but is slower for the array filled with constants. It is linear time also.

Partition More Ways

Another useful partition variant, which Python implements, is to partition the array based of several values. Radix Partition supports this variant also. The implementation links provided below support single or multiple way partition, extending nth_element method to multi-way partitioning.

Implementations

C++ implementation is available here. C# implementation is available here.

More Detailed Explanation

Coming soon…

Leave a comment