I’ve taken several attempts at parallelizing the LSD Radix Sort algorithm. This blog provides the key concepts and details about the latest attempt, which has succeeded and is beginning to pay off dividends of higher performance. Performance Summary The following table shows performance of three variations of the LSD Radix Sort on two different multi-core […]Read more "Parallel LSD Radix Sort"
C++ has included wonderful implementations of sorting algorithms over the years. Recently, with C++17 support for parallelism, sorting performance has skyrocketed by running on all of the available cores. The number of cores is predicted to grow in double-digit percentage per year, as competition between Intel, AMD, ARM and other processor vendors heats up. The […]Read more "Faster C++ Sorting"
In graduate school one of my favorite topics to work on, with one of my graduate advisors (Dr. Robbins) at North Carolina State University (NCSU), was rendering stereoscopic (3-D) images. Several of these rendered images are posted below. The sample data comes from a scanning tunneling microscope and atomic force microscope, provided by the Material […]Read more "Stereoscopic Imaging of Scanning Tunneling Microscope"
C# has several fundamental abstractions that make parallel programming possible. By splitting work across multiple processor cores, gains in performance can be quite significant. And, with AMD and Intel shipping processors with more and more cores, using all of these compute resources is critical. Some of C# parallel facilities are: Parallel.Invoke() which runs several functions […]Read more "Parallel Divide-and-Conquer in C#"
In the initial blog I developed a faster checked summation of ulong arrays. By using integer operations when no arithmetic overflow is detected, helped propel performance 15X higher to 682 MegaUlongAdditions/second, for a typical case. However, the worst case was still as slow as summation of Decimal arrays – 48 MegaUlongAdditions/second. In this blog I’ll develop […]Read more "Faster Checked Addition in C# (Part 2)"
Signed Integer Overflow Detection A similar method can be developed for summation of signed long arrays. When adding signed integers, arithmetic overflow is possible. In fact, arithmetic underflow is also possible – where adding two negatives results in a positive value. When adding two signed integers, four cases are possible: both positive both negative first […]Read more "Faster Checked Signed Addition in C#"
In the blog “Faster Checked Addition in C#” we saw how to add numbers safely in C# without using the checked key work and without exceptions. This raised performance, since exceptions have quite a bit of overhead. In this blog, I’ll extend this idea to the data parallel SIMD/SSE instructions of Intel and AMD processors, […]Read more "Checked SIMD/SSE Addition in C#"
Several improvements to C# .Sum() were shown in the “Better C# .Sum() in Many Ways” blog and made available in the HPCsharp nuget package. In this blog, I’ll explore more ways to improve .Sum() to raise its capabilities to the next level in performance while not giving up accuracy. BigInteger Summation C# provides a BigInteger […]Read more "Better C# .Sum() in More Ways"
One of my pet peeves is not using hardware to its fullest ability, such as all of memory bandwidth available. Copying is one of the simplest operations, which is memory bandwidth bound, and should use most of memory bandwidth. In this blog, I’ll examine C# copy operations from List to Array, to see how fast […]Read more "Faster List.ToArray() and Copying in C#"
Adding Two Integers – Surprise! Adding two integers in C# can produce surprising results. For instance, int sum = 2147483647; // Int32.MaxValue == 0x7FFFFFFF sum = sum + 1; The resulting sum is expected to be 2147483648, but is -2147483648 instead. uint sum = 4294967295; // UInt32.MaxValue == 0xFFFFFFFF sum = sum + 1; The […]Read more "Faster Checked Addition in C#"