ParlayLib Parallel Algorithms Library

Professor Blelloch and his team at Carnegie Mellon University have designed and developed a parallel algorithms library over the last decade – ParlayLib. It provides numerous parallel algorithms targeting shared-memory multicore processors. It is similar to Intel’s Threading Building Blocks (TBB), providing a works-stealing scheduler, but also goes beyond with support for additional parallel primitives and algorithms.

ParlayLib implements many of the Parallel STL algorithms. It can also co-exist with TBB, OpenMP, and Cilk. This “Brief Announcement” paper is a wonderful introduction to ParlayLib capabilities.

Benchmark

ParlayLib provides an extensive benchmark suite, using Googles open source benchmarking library, which runs various algorithms multiple times and gathers statistics across them.

Getting Started

ParlayLib is simple to get started, supporting Linux and Windows. I put together a “Getting Started with ParlayLib” repository, which provides directions and shows the steps necessary to get a “Hello World!” ParlayLib application running on Linux and Windows.

The following Table shows performance for two ParlayLib algorithms when sorting an array of 1 billion 32-bit integers on a 12-th generation Intel i7-12700H laptop processor (14-core with hyperthreading):

AlgorithmRandomPre-sortedConstant
Parallel Merge Sort115394413
Parallel Integer Sort9437141720

The units of measure in the table are millions of integers/second.

The above measurement was performed on WSL (Ubuntu) on Windows. Parallel Integer Sort performed slower by more than twice slower on Windows, while Parallel Merge Sort performance was nearly the same.

The following Table shows performance for two ParlayLib algorithms when sorting an array of 1 billion unsigned 32-bit integers on a C7i.24xlarge AWS node (48-core with hyperthreading):

AlgorithmRandomPre-sortedConstant
Parallel Merge Sort51213031303
Parallel Integer Sort373630485568

The following Table shows performance on a C7i.48xlarge AWS node (96-core Intel CPU with hyperthreading):

AlgorithmRandomPre-SortedConstant
Parallel Merge Sort86514051314
Parallel Integer Sort419537093831

The following Table shows performance on a C7a.48xlarge AWS node (96-core AMD CPU with hyperthreading):

AlgorithmRandomPre-SortedConstant
Parallel Merge Sort67019291907
Parallel Integer Sort5382482910152

Azure Standard HB176-144rs v4 (144-core 4-th generation AMD EPYC CPU with 2.3 GByte 3D V-Cache without hyperthreading):

AlgorithmRandomPre-SortedConstant
Parallel Merge Sort93516131554
Parallel Integer Sort672569209891

Azure Standard Standard HB176rs v4 (176-core 4-th generation AMD EPYC CPU with 2.3 GByte 3D V-Cache without hyperthreading):

AlgorithmRandomPre-SortedConstant
Parallel Merge Sort100016911457
Parallel Integer Sort718971899756

Other Resources

Overview video

One thought on “ParlayLib Parallel Algorithms Library

Leave a comment