In the blog “Faster Checked Addition in C#” we saw how to add numbers safely in C# without using the checked key work and without exceptions. This raised performance, since exceptions have quite a bit of overhead. In this blog, I’ll extend this idea to the data parallel SIMD/SSE instructions of Intel and AMD processors, enabling checked SSE addition, without overflow exceptions.
Intel and AMD processors provide an overflow flag to make it simple to detect arithmetic overflow. C# uses this facility to detect when an overflow occurs, within the checked block, and then throws an overflow exception whenever an overflow processor flag gets set. However, this does not work for all C# code.
Intel and AMD processors provide 100’s of data parallel instructions. These can operate on several items in parallel. These instruction are called SIMD – single instruction multiple data – indicating that a single processor instruction operates on multiple data items. For example, for 256-bit wide instructions, eight 32-bit additions can be done in parallel. In this case, a single instruction operates on eight data items in parallel. On newer Intel and AMD processors, up to 512-bit wide instructions are available, which is the width of a 64-byte cache line.
C# checked block can contain SIMD/SSE instructions. However, C# does not check for overflow that occurs in SIMD/SSE operations, most likely because today’s processors do not provide overflow flags for SIMD/SSE instructions. To add support for SIMD/SSE overflow checking, we would need to implement it in software ourselves.
Overflow Detection for SIMD/SSE
Implementing arithmetic overflow detection using SIMD/SSE data-parallel instructions is possible and has been added to the HPCsharp library for ulong and long integer data types. This capability provides a significant performance boost when performing summation of array of ulong or long integers to produce a BigInteger or a Decimal sum result.
Performance of SIMD/SSE implementation can be characterized by the two extremes: arithmetic overflow happens on every sum, and overflow never happens. Most real-world scenarios will fall in between these two extremes, depending on the percentage of the cases within an array that cause arithmetic overflow.