Fastest LSD Radix Sort in C++ on a Single CPU Core

I’ve been optimizing variety of Radix algorithms for over a decade: LSD and MSD Radix Sort, Single Core and Multi-Core, and Radix Selection. Several of these optimization techniques can be applied to all of these algorithms. In this blog, I’ll discuss each of these optimizations and show how much performance is gained, for sequential single […]

Optimizations of LSD Radix Sort for Different Input Data Distributions

I ran across a research paper recently on Radix Selection where the authors mentioned that LSD Radix Sort is known to have performance issues when sorting arrays of data with certain distributions. They did not elaborate what those distributions were. Also, Professor Sedgewick also mentioned that the worst case for Radix Sort is when the […]

MSD Radix Sort Optimization

One performance optimization that was introduced in the Radix Selection algorithm can also be applied to the MSD Radix Sort – combining counting with the permutation phase. This optimization cannot be done during the first digit pass, since counting must be performed first to figure out the bins to permute the data into. However, during […]

Radix Selection Optimizations

In my previous blog post on Radix Selection, the algorithm which returns the k-th largest value from an unsorted array was shown to be significantly faster than sorting, because it performs less work. In this blog, I’ll explore further optimizations to make Radix Sort even faster. More Bits Per Digit Radix Sort (LSD and MSD […]

Radix Selection Algorithm

I’ve written many blogs and a book on sorting. There is a closely related algorithm called Selection, which provides the k-th element from an unsorted array. For example, a 17-th highest test score from a college Physics class, or a 91-st most popular book at the library. One way to accomplish Selection is to sort […]

Power Usage of Parallel Algorithms in C#

The Power Usage of Algorithms in C# blog presents power usage and efficiency of several C# sequential (single core) algorithms is shown. In this blog power usage and efficiency of parallel algorithms is explored. Power Measurement Setup The same setup is used as in Power Usage of Algorithms in C#. Parallel Sorting An array of […]

Power Usage of Algorithms in C#

Algorithms power the world – from search to streaming to AI. Each algorithmic domain has a variety of algorithms to choose from, with different run times and characteristics. But how much power do algorithms use? Does power usage vary between algorithms? Are some algorithms more power efficient than others? This blog answers these questions by […]

Testing the Tester

Unit testing and integration testing are ubiquitous for software development and chip design. Unit testing consists of numerous test cases which test various aspects of the unit/device under test. Typically, unit test cases make sure that all required behaviors are performed by the unit under test. One problem that comes up time and time again […]

Parallel Pattern of Bundling Small Work Items

Parallel programming frameworks like C++ Threading Building Blocks (TBB) and C# Task Parallel Library (TPL) are good at handling large work items using multi-core processors. Both provide mechanisms to break a large problem into smaller pieces (grains), to provide work for multiple cores in parallel, gaining performance acceleration. With 20-core laptops widely available along with […]

ParlayLib Parallel Algorithms Library

Professor Blelloch and his team at Carnegie Mellon University have designed and developed a parallel algorithms library over the last decade – ParlayLib. It provides numerous parallel algorithms targeting shared-memory multicore processors. It is similar to Intel’s Threading Building Blocks (TBB), providing a works-stealing scheduler, but also goes beyond with support for additional parallel primitives […]

Algorithm Performance

Measure, Question, Improve, Do It Again…