Radix Partition

Like the Selection algorithm, described in several previous blogs, Partition is another algorithm closely related to sorting. Given a single value or an array of values, the Partition algorithm splits the array into sections with useful statistical properties, without sorting it, in linear time – i.e. O(n). For example, Partition can split an array into quartiles – i.e. top 25%, mid-upper 25%, mid-lower 25%, and lowest 25%. This would require specifying 3 partition elements. Partition would move the array elements into their proper quartiles, without fully sorting the array.

Single Value Partition

For example, C++ nth_element function rearranges the array in such as way, that the n-th array element has a value as if the array has been sorted. It also rearranges the array elements in such a way that array elements to the left of the n-th array index are all smaller than or equal to the values to the right of the n-th index.

In other words, the element at the n-th index partitions the array into smaller or equal values on the left and equal or larger values on the right. The contents of each of the two resulting partitions are not in sorted order.

Multiple Value Partition

Extending the Partition concept to more than one partition element, where one partition element leads to two partition regions (to the left and to the right or the partition element) is natural. Two partition elements results in three partition regions (left, middle and right), and three partition elements leads to four partition regions, and so on…

Each partition region has elements whose values are in between the two partition elements that are on the left and right sides of that region. However, the elements within a partition region are not in sorted order.

Distinct Array Value

When array values are distinct, Partition splits the array precisely.

Using Comparisons

Traditional method used for Partitioning is based on the QuickSelect algorithm, which uses comparisons. Partitioning is a biproduct of the selection process. Or, it could be that partitioning is a necessary step to achieve selection when using comparisons. It would be powerful to develop a proof of this.

Using Radix to Partition

Comparisons is not the only way to achieve partitioning in linear time. In-place MSD Radix Sort is linear-time O(N) and is a good starting point.

Algorithm	Random	Presorted	Constant
nth_element	3.60 seconds	2.6 seconds	0.70 seconds
In-Place MSD Radix Sort	2.90 seconds	2.5 seconds	2.6 seconds

The above table is for processing an array of 100 Million 32-bit unsigned integers. The two algorithms are nearly the same performance. In-Place MSD Radix Sort lags for the particular case of all array elements being constant.

However, the In-Place MSD Radix Sort performs too much work by sorting all elements of the array. To create a partition algorithm out of it, after performing the first level of the In-Place MSD Radix Sort, which based on the most significant digit, recurse only into those bins where the partitioning indexes are.

Algorithm	Random	Presorted	Constant
nth_element	3.60 seconds	2.6 seconds	0.70 seconds
Radix Partition	0.80 seconds	0.60 seconds	1.20 seconds

Radix Partition is significantly faster (5X) for random and presorted data distributions, but is slower for the array filled with constants. It is linear time also.

Partition More Ways

Another useful partition variant, which Python implements, is to partition the array based of several values. Radix Partition supports this variant also. The implementation links provided below support single or multiple way partition, extending nth_element method to multi-way partitioning.

Implementations

C++ implementation is available here. C# implementation is available here. Currently, the implementation is for an array of unsigned 32-bit integer values. This can be extended to other numeric data types, as well as user defined data types, as is done for the In-Place MSD Radix Sort in HPCsharp library. It can also be extended to strings, as is done in the “Algorithms” book by Sedgewick and Wayne (fourth edition).

More Detailed Explanation

My current implementation of Radix Partition is based on the In-Place MSD Radix Sort. Radix Partition starts out just like In-Place MSD Radix Sort, sorting the input array in-place based on the most significant digit of each array element – e.g. partitioning an array of 32-bit unsigned integers.

Once the array has been sorted, all array elements with the most significant digit (e.g. leftmost 8-bits out of 32-bits) of zero are placed on the left side of the array, then to the right of these are array elements with the most significant digit of one, and so on. However, now instead of processing all of the bins based on the next most significant digit, only the bins in which at least one partition element belongs, get further processed. All other bins are not processed further.

For example, when a partition index of 25 is requested, then only the bin which contains index 25 is further processed by the next most significant digit. In other words, only the bins in which the partition boundary belongs get further processed, while other bins don’t get processed.

The insight is that only the bins within which the partition boundary lands, need further refinement, while other bins don’t.

This method continues until the bins contain zero or one element, or all digits of the array elements (keys) have been used to sort by.

The main concept is that the bins between partition boundaries are placed in their proper location. The bins which contain the partition boundary (or boundaries) need further refinement, which is done by processing these bins using next most significant digits. Once all digits of the array elements (keys) have been used, the refinement has gone as far as it can go, as the full bit pattern of each key describes each key as uniquely as is available – i.e. there is no further detail.

Algorithm Performance

Measure, Question, Improve, Do It Again…

Radix Partition

Single Value Partition

Multiple Value Partition

Distinct Array Value

Using Comparisons

Using Radix to Partition

Partition More Ways

Implementations

More Detailed Explanation

Leave a comment Cancel reply

Single Value Partition

Multiple Value Partition

Distinct Array Value

Using Comparisons

Using Radix to Partition

Partition More Ways

Implementations

More Detailed Explanation

Share this:

Related

Leave a comment Cancel reply