Under Construction…
Nvidia has added standard C++ parallel algorithms on GPUs.
| Algorithm | seq | unseq | par | par_unseq | GPU Speedup |
| max_element(std:: | 1600 | 1613 | 1620 | 1581 | 1.0 |
| adjacent_difference(std:: | 2052 | 2062 | 996 | 0.5 | |
| adjacent_find(std:: | 2963 | 2947 | 37 | ||
| all_of(std:: | 3652 | 3752 | 34 | ||
| any_of(std:: | 3652 | 3584 | 37 | ||
| count(std:: | 2999 | 2987 | 1627 | ||
| equal(std:: | 3839 | 3716 | 37 | ||
| copy(std:: | 4421 | 4525 | 1529 | ||
| merge(std:: | 201 | 197 | 387 | ||
| inplace_merge(std:: | 183 | 181 | |||
| sort(std:: | 15 | 15 | 15 | Segmentation Fault | |
| stable_sort(std:: | 17 | 17 | 17 | Segmentation Fault |
After following NVidia’s instructions on the above site, performance on Windows 11 WSL (Ubuntu) executing GPU accelerated C++ Standard algorithms is slower than single-core CPU algorithms on an Alienware Dell laptop with a GeFore RTX 3060 laptop GPU.
I have contacted NVidia about these performance issue and segmentation fault, to see if they can duplicate it, and suggest a fix.
2 thoughts on “C++ Parallel STL on GPUs”