Faster Checked Addition in C#

Adding Two Integers – Surprise!

Adding two integers in C# can produce surprising results. For instance,

int sum = 2147483647;    // Int32.MaxValue == 0x7FFFFFFF
sum = sum + 1;

The resulting sum is expected to be 2147483648, but is -2147483648 instead.

uint sum = 4294967295;    // UInt32.MaxValue == 0xFFFFFFFF
sum = sum + 1;

The resulting unsigned integer (uint) sum is expected to be 4294967296, but is 0 instead.

This condition is called arithmetic overflow, and happens due to uint and int being limited to 32 bits, whereas the result of addition needs 33 bits to hold the entire result. Instead, only the lower 32 bits of the result are returned and the upper 1 bit is lost. This issue is not unique to C#, and happens in other languages such as C++.

Checked Key Word

C# provides a checked keyword to detect arithmetic overflow, for operations such as addition. For example,

int sum = 2147483647;    // Int32.MaxValue == 0x7FFFFFFF
sum = checked(sum + 1);

In the above case, the checked block surrounds a potentially dangerous operation – addition. Instead of producing an unexpected result, an overflow exception is thrown when the result exceeds 32 bits. Otherwise, when the result fits within 32 bits, no overflow exception is thrown, as in:

int sum = 3;
sum = checked(sum + 1);

The checked key word can also be used as a block, to check multiple C# code statements for arithmetic overflow.

Summation of Unsigned Long Array

Checked addition is useful when summation of an array is needed, and should be used whenever there is any chance of arithmetic overflow. As we’ve seen with integer values in examples above, this will cause an overflow exception in the following case:

ulong sum = UInt64.MaxValue;
sum = sum + 1;

To be able to sum up an array of unsigned long integers, we can do the following:

Decimal sum = UInt64.MaxValue;
sum = sum + 1;

Decimal has 96-bits of accuracy, while ulong has 64-bits. The above addition provides perfect accuracy without causing arithmetic overflow exception. However, it’s 20 times slower, since Decimal is not supported by Intel/AMD processors natively. Decimal addition is implemented in software.  If a million additions were done, such as summation of an array, being 20 times slower begs for a better way.

To sum up an array of ulong values, several methods can be used. C# Linq provides an Array.Sum() function, which returns an ulong result, and uses checked addition, throwing arithmetic overflow whenever the result gets to be larger than UInt64.MaxValue:

ulong sum = ulongArray.Sum();

HPCsharp also provides such a function, which is slightly faster than:

ulong sum = ulongArray.SumHpc();

This function also uses checked addition and can also throw an arithmetic overflow.

To avoid arithmetic overflow, Decimal addition can be used instead using Linq or HPCsharp, as follows:

Decimal sum = ulongArray.Sum(x => (decimal)x);  // Linq     summation
Decimal sum1 = ulongArray.SumToDecimal();       // HPCsharp summation

HPCsharp summation function is slightly faster than Linq, but both are more than 15X slower than summation which doesn’t use Decimal. However, using Decimal produces a correct result. This high of performance drop is motivation to develop a better solution.

Catching Exceptions

Below is an implementation of summation of ulong array to a decimal result, which provides perfect accuracy for any possible value of each array element:

public static decimal SumToDecimalFast(this ulong[] arrayToSum)
{
    decimal overallSum = 0;
    ulong ulongSum = 0;
    for (int i = 0; i < arrayToSum.Length; i++)
    {
        try
        {
            ulongSum = checked(ulongSum + arrayToSum[i]);
        }
        catch (OverflowException)
        {
            overallSum += ulongSum;
            overallSum += arrayToSum[i];
            ulongSum = 0;
        }
    }
    return overallSum + ulongSum;
}

Note how an internal sum is a ulong to provide the highest possible performance. Only when an exception occurs, the ulong sum transferred to the decimal sum, and the ulong sum is reset to zero. This implementation makes an assumption that exceptions will be rare. It performs very well when that assumption is true, reaching  400 Million ulong/sec summations when there are no exceptions (e.g. when all values are small).

If, however, every element of an array is UInt64.MaxValue, then an exception would occur on every addition. This is the worst case, which performs poorly, as seen from the following measurements:

  • 117 Thousand ulong/sec for addition using the above algorithm with an exception thrown on every addition

From the above measurement, we can see that 117K exceptions can be processed per second, or about 8.4 micro-seconds per exception.

However, HPCsharp has another version of ulong array summation – .SumToDecimal() – which converts each ulong value to a Decimal first, and then accumulates and returns a Decimal sum. This function will not throw exceptions when adding UInt64.MaxValue array elements, since these convert to Decimal and have room to spare, as Decimal has 96-bit of precision. This function performs:

  • 49 Million ulong/sec for addition to a Decimal accumulator

Exceptions slow performance down by 420X in the worst case, which motivates the development of further improvements, as I’ll show below.

Detecting Overflow without Exceptions

As performance measurements above show, exceptions are expensive in terms of time. Only if they are truly rare should they be used. As I will show below, it is possible to improve performance of the worst case by eliminating use of exceptions and yet detect arithmetic overflow, much quicker.

When adding two unsigned integers, we expect the sum to be bigger than either of the numbers being added. If however, the result ends up smaller than either of the values being added, then an overflow has occurred. This condition is simple to detect:

public static decimal SumToDecimalFaster(this ulong[] arrayToSum)
{
    decimal overallSum = 0;
    ulong ulongSum = 0;
    for (int i = 0; i < arrayToSum.Length; i++)
    {
        ulong newUlongSum = ulongSum + arrayToSum[i];
        if (newUlongSum >= ulongSum)
            ulongSum = newUlongSum;     // no numeric overflow, as the new unsigned sum increased
        else
        {
            overallSum += ulongSum;
            overallSum += arrayToSum[i];
            ulongSum = 0;
        }
    }
    return overallSum + ulongSum;
}

Once again, we measure performance for summation of an array with every element of UInt64.MaxValue, which would force an exception for every addition.

  • 48 Million ulong/sec, which is a 376X speedup over 117K ulong/sec when exceptions were thrown

This is 2% slower than HPCsharp’s SumToDecimal() function, in the worst case, due to the overhead for avoiding exceptions with the if statement above. In the best case, of all ulong array values being small to not cause a single arithmetic overflow:

  • 682 Million ulong/sec

Thus, performance of the algorithm which detects arithmetic overflow without exceptions is between 682 Million ulong/sec and 48 Million ulong/sec, depending if no overflow ever occurs, to overflow occurring on every addition, respectively. The worst case is quite unlikely, in real-world usage. However, arithmetic overflow must be handled correctly, to avoid unexpected surprises of incorrect results.

Other Integer Data Types

Just like for ulong, presented in this blog, overflow can be detected for other unsigned integer data types, such as uintushort, and byte. Implementing overflow detection is especially useful for long and ulong integer data types, because there is no bigger integer CPU-native data types available in most programming languages.

It’s also possible to do overflow and underflow detection for singed integers, which I describe in another blog and is included in HPCsharp nuget package.

Even Higher Performance

SIMD/SSE instructions are especially challenging for overflow detection, which I have developed in HPCsharp nuget package and describe in this blog. With SSE instruction performance goes even higher on a single core of the CPU.

Multi-core is supported for the above algorithm and SIMD/SSE implementations.

Summary and Conclusion

  • care must be taken when adding two integers in C#, which can produce surprising results due to arithmetic overflow
  • C# has a checked keyword to help catch arithmetic overflow, and throws an overflow exception. Linq.Sum() uses checked addition and throws when it occurs.
  • Catching this exception in summation of ulong[] arrays, performed poorly in the worst case – 177 KiloUlongs/sec – limited by time per exception.
  • Exceptions were replaced with logic  to catch arithmetic overflow directly, which is 376X faster. This speeds up the worst case to be limited by performance of addition of Decimals – 48 MegaUlongs/sec.
  • Typical case, of a few arithmetic overflows, sped up by 70% from 400 MegaUlongs/sec to 682 MegaUlongs/sec on a single core of CPU.

The final implementation ended up performing at the same speed as

Decimal sum = ulongArray.Sum(x => (decimal)x); // Linq summation

in the worst-case of arithmetic overflows on each addition. This worst-case is when each value within the array is greater than Ulong.MaxValue/2, causing arithmetic overflow on each addition.

For a typical case, of a few arithmetic overflows per array or none, performance is 14X faster (682 MegaUlongAdditions/second versus 48 MegaUlongAdditions/second) for decimal summation, which is a nice performance win!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s