Optimizations of LSD Radix Sort for Different Input Data Distributions

I ran across a research paper recently on Radix Selection where the authors mentioned that LSD Radix Sort is known to have performance issues when sorting arrays of data with certain distributions. They did not elaborate what those distributions were. Also, Professor Sedgewick also mentioned that the worst case for Radix Sort is when the input array is filled with a constant value – i.e. every array element has the same value. In this blog, I’ll explore several problematic input data distributions for LSD Radix Sort, will show which parts of the algorithm are struggling, ponder potential reasons, and will explore solutions, showing their performance gains.

C++ Problematic Input Data Distributions

The following table shows performance of LSD Radix Sort running on a single core of an Intel Core Ultra 9 275HX processor, sorting 1 Billion 32-bit unsigned integers:

Random Data5.59 seconds
Presorted Data5.95 seconds
Constant Data10.1 seconds

It looks like there is a little bit of a problem with presorted data and a big performance issue with arrays filled with the same value (constant data) – nearly 2X performance decline.

Digging Deeper

The LSD Radix Sort implementation used is a two-phase implementation, where in the first phase a single counting/histogram pass over the array is performed to create a counting array for all the digits of the keys. The second phase performs as many permutations as there are digits in the keys. For 32-bit unsigned integer keys, the algorithm would perform a single counting pass, followed by 4 permutation passes over the input array.

The following table shows performance for each of these steps:

Counting/Histogram of Random Data0.924 seconds
Permutation of Random Data1.1 seconds
Counting/Histogram of Presorted Data1.5 seconds
Permutation of Presorted Data1.2 seconds
Counting/Histogram of Constant Data1.6 seconds
Permutation of Constant Data2.1 seconds

The above measurements are showing that something is going in the permutation code which is data dependent, especially in the case of constant data. Counting/Histogram also slows down for presorted and constant data cases.

Leave a comment