Several improvements to C# .Sum() were shown in the “Better C# .Sum() in Many Ways” blog and made available in the HPCsharp nuget package. In this blog, I’ll explore more ways to improve .Sum() to raise its capabilities to the next level in performance while not giving up accuracy.
C# provides a BigInteger data type to support arbitrarily large integers, but does not provide .Sum() for BigInteger arrays. Linq .Aggregate() function can be used instead to achieve summation. However, it doesn’t seem to support .AsParallel() to scale performance to multi-core.
HPCsharp adds .Sum() for BigInteger that uses a single CPU core, as well as scales to multiple cores for additional performance. The same familiar interfaces are provided, to simplify usage.
Long and ULong Summation to BigInteger
To avoid throwing overflow exceptions when summing arrays of long and ulong, HPCsharp adds the ability to accumulate and return BigInteger. Single core (scalar) and multi-core versions have been implemented, and scale well with the number of cores. These functions provide perfect accuracy results, since BigInteger can grow to an arbitrary size.
These .Sum() functions are nice for completeness. However, they are about 5X slower than summation to Decimal, which also provides a perfectly accurate result, since Decimal provides 96-bits of accuracy.
Faster Long and ULong Summation
Summation of long arrays to Decimal and BigInteger provide perfect accuracy, but are much slower than summation to a long result. Summation to a long result is fast, but can throw an overflow exception. Data-parallel SIMD/SSE long array summation to a long is even faster, and doesn’t throw an overflow exception, but produces wrong result quietly.
These quiet wrong results for SEE most likely are because SSE instructions in Intel Architecture processors are not capable of generating arithmetic overflow exceptions. C# doesn’t have a mechanism of its own for SSE instructions to check for arithmetic overflow during addition. Plus, dealing with exceptions slows performance, since each exception takes about 8.4 microseconds to catch, limiting performance to about 177 KiloExceptions/second.
I’ve developed a way to keep performance high and check for an overflow condition, to preserve perfect accuracy. In the blog “Faster Checked Addition in C#” this method is described for long and ulong arrays. It is implemented in HPCsharp for summation of long and ulong arrays to Decimal and to BigInteger. SSE implementations for ulong and long have also been added to HPCsharp, for additional performance.