C++ Parallel STL on GPUs

Posted on July 28, 2023August 14, 2023 by Victor J. Duvanenko

Under Construction…

Nvidia has added standard C++ parallel algorithms on GPUs.

Algorithm	seq	unseq	par	par_unseq	GPU Speedup
max_element(std::	1600	1613	1620	1581	1.0
adjacent_difference(std::	2052		2062	996	0.5
adjacent_find(std::	2963		2947	37
all_of(std::	3652		3752	34
any_of(std::	3652		3584	37
count(std::	2999		2987	1627
equal(std::	3839		3716	37
copy(std::	4421		4525	1529
merge(std::	201		197	387
inplace_merge(std::	183		181
sort(std::	15	15	15	Segmentation Fault
stable_sort(std::	17	17	17	Segmentation Fault

After following NVidia’s instructions on the above site, performance on Windows 11 WSL (Ubuntu) executing GPU accelerated C++ Standard algorithms is slower than single-core CPU algorithms on an Alienware Dell laptop with a GeFore RTX 3060 laptop GPU.

I have contacted NVidia about these performance issue and segmentation fault, to see if they can duplicate it, and suggest a fix.

2 thoughts on “C++ Parallel STL on GPUs”

Pingback: C++ Parallel STL Benchmark | Algorithm Performance
Pingback: C++ Parallel STL Benchmark – MC++ BLOG

Leave a comment Cancel reply

I love all kinds of algorithms, especially ones that parallelize. I used to write a column for Dr. Dobb's Journal on high performance and parallel algorithms. This blog is its continuation...

Design a site like this with WordPress.com