C++ Parallel STL on GPUs

Under Construction…

Nvidia has added standard C++ parallel algorithms on GPUs.

Algorithmsequnseqparpar_unseqGPU
Speedup
max_element(std::16001613162015811.0
adjacent_difference(std::205220629960.5
adjacent_find(std::2963294737
all_of(std::3652375234
any_of(std::3652358437
count(std::299929871627
equal(std::3839371637
copy(std::442145251529
merge(std::201197387
inplace_merge(std::183181
sort(std::151515Segmentation Fault
stable_sort(std::171717Segmentation Fault

After following NVidia’s instructions on the above site, performance on Windows 11 WSL (Ubuntu) executing GPU accelerated C++ Standard algorithms is slower than single-core CPU algorithms on an Alienware Dell laptop with a GeFore RTX 3060 laptop GPU.

I have contacted NVidia about these performance issue and segmentation fault, to see if they can duplicate it, and suggest a fix.

2 thoughts on “C++ Parallel STL on GPUs

Leave a comment