This blog is a re-post of my Dr. Dobb’s Journal article from March of 2011. All of the source code, including a working VisualStudio 2015 solution with examples is on GitHub. In last month’s article in this series, a Parallel Merge algorithm was introduced and performance was optimized to the point of being limited by system memory bandwidth. […]Read more "Parallel Merge Sort"
In late 1996 I developed a recursive hardware multiplier, and presented it at the Synopsys User Group conference in 1998. I recently ran across Karatsuba algorithm for fast multiplication, where its recursive application reminded me of my recursive multiplier. I was mainly after increasing performance for fairly small multipliers, ease of pipelining, and not in […]Read more "Recursive Multiplier in VHDL"
In this blog I’ll gather introductory material that is useful when you’re starting out with OpenCL, including links to videos, introductory source code for first projects, and information on how to get VisualStudio setup for OpenCL on Windows. Video introduction to OpenCL is a nice introduction to OpenCL terminology and the overall concepts. It’s an hour […]Read more "OpenCL Introduction"
In my previous blogs, pseudo random number generators (PRNGs) running on a multi-core processor (CPU) or graphics processor (GPU) were shown to have vastly superior performance to those in the standard C++ libraries. Using several CPU cores, utilizing parallel instructions within each core paid off for CPU-based generators. Using hundreds of GPU cores took performance […]Read more "Faster Random Number Generator"
Radix Sort is a high performance linear time sorting algorithm, which does not use comparisons. Instead, Radix Sort looks at each digit of the key and processes based on those digits. Two variations of the algorithm exit: least significant digit (LSD), and most significant digit (MSD). Each of the two variations has its own attributes, […]Read more "Radix Sort Implementations"