C++ includes a standard set of generic algorithms, which used to be called STL (Standard Template Library). On Windows, Microsoft provides parallel versions of standard algorithms, listed below with the first argument being std::. Also, Intel provides parallel version of standard algorithms, listed below with the first argument being dpl::.
The following Table shows performance of single-core serial and multi-core parallel algorithms on a 14-core i7-12700 laptop processor when processing an array of 100 Million 32-bit integers:
Algorithm | seq | unseq | par | par_unseq |
copy(std:: | 3395 | n/s | 3317 | 3080 |
copy(dpl:: | 2865 | 2518 | 3250 | 4181 |
count(std:: | 3758 | n/s | 7541 | 8403 |
count(dpl:: | 3284 | 3058 | 6083 | 8850 |
fill(std:: | 3030 | n/s | 3125 | 3056 |
fill(dpl:: | 2795 | 2889 | 4268 | 6459 |
merge(std:: | 142 | n/s | 153 | 155 |
merge(dpl:: | 156 | 155 | 1450 | 1431 |
sort(std:: | 11 | n/s | 73 | 74 |
sort(dpl:: | 11 | 11 | 109 | 107 |
stable_sort(std:: | 12 | n/s | 44 | 44 |
stable_sort(dpl:: | 10 | 12 | 115 | 117 |
The following Table shows performance of single-core serial and multi-core parallel algorithms on a 48-core Intel Xeon 8275CL AWS node when processing an array of 100 Million 32-bit integers:
Algorithm | seq | unseq | par | par_unseq |
fill(std:: | n/s | |||
fill(dpl:: | 2789 | 2804 | 2918 | 992- |
sort(std:: | 10 | n/s | 73 | 81 |
sort(dpl:: | 9 | 9 | 165 | 162 |
stable_sort(std:: | 10 | n/s | 63 | 63 |
stable_sort(dpl:: | 10 | 10 | 161 | 163 |
seq – sequential single-core version
unseq – single-core SIMD/SSE vectorized
par – multi-core version
par_unseq – multi-core SIMD/SSE vectorized
n/s – not supported – i.e. variation not implemented
Benchmark Environment
The above benchmarks were performed using VisualStudio 2022 built in release mode using Intel C++ compiler and Intel OneAPI 2023.1 environment. This enabled access to both Microsoft std:: implementations and Intel dpl:: implementations. Benchmarks were run on Windows 11.
One thought on “C++ Parallel STL Benchmark”