Faster List.ToArray() and Copying in C#

One of my pet peeves is not using hardware to its fullest ability, such as all of memory bandwidth available. Copying is one of the simplest operations, which is memory bandwidth bound, and should use most of memory bandwidth. In this blog, I’ll examine C# copy operations from List to Array, to see how fast they perform, and then make them go faster.

Copying List to Array

To convert a List to an Array, C# provides a convenient ToArray() function:

var listSource = new List<int> { 5, 7, 16, 3 };
int[] arrayDestination = listSource.ToArray();

ToArray comes up often, whenever some method requires an Array, but you happen to be using a List, because List can grow/shrink while Array can’t. Many C# standard functions and APIs only operate on Array, because Array is a higher performance collection.

Notice how in the example above a new Array is created and returned by ToArray. This function runs at about 600 Million Integers/sec. During the copy, each element is read from the source List and written to the destination Array. Thus, the overall bandwidth use by List.ToArray() is 600 * 4 Million bytes/sec for reads plus 600 * 4 Million bytes/sec for writes. The total bandwidth used is 4800 Million bytes/sec = 4.8 GigaBytes/sec. This is only 11.5% out of the available 41.8 GigaBytes/sec of system memory bandwidth on my 6-core dual-memory-channel laptop.

C# also provides a function to copy a List to an Array:

var listSource = new List<int> { 5, 7, 16, 3 };
int[] arrayDestination = new int[4];
listSource.CopyTo(arrayDestination);

CopyTo copies the List to a pre-existing Array. Notice, how arrayDestination has already been created, of the right size, for CopyTo() to copy the List to. CopyTo() function runs at about 600 Million Integers/sec – the same performance as ToArray().

Higher Bandwidth

The first time arrayDestination is used during the CopyTo() the performance is about 600 Million Integers. But the second and subsequent times, the performance goes up to 2.6 GigaIntegers/sec – over 4X faster.  This level of performance is much better – 20.8 GigaBytes/sec, or 50% of the available bandwidth.

The reason for the improvement is because during the first use, the destination array gets paged into system memory. This paging in and initialization process takes some time, lowering performance. Once the array has been paged in, it can be used again and again, at 4X higher performance level. To maintain this high level of performance, we would need to continue re-using this destination array. In other words, re-using an array leads to higher level of performance, over creating a new array each time.

Multiple Cores Yet Slower

C# Linq supports the use of multiple processor cores by the use of AsParallel() :

var listSource = new List<int> { 5, 7, 16, 3 };
int[] arrayDestination = listSource.AsParallel().ToArray();

and

var listSource = new List<int> { 5, 7, 16, 3 };
int[] arrayDestination = new int[4];
listSource.AsParallel().CopyTo(arrayDestination);

AsParallel() uses all available processor cores in parallel. When I benchmarked copying a 50 Million Integer List to an Array, on my 6-core laptop, performance went down substantially to 75 Million Integers/sec – an 8X performance decline.

I don’t know the exact reason for this performance drop, and only guessing that AsParallel() breaks the array into too small of work chunks for each core to work on, leading to severe inefficiency. I posted a question on StackOverflows about this issue, and hopefully someone knows what to do about this issue.

C# Linq provides additional control for the number of cores to be used:

var listSource = new List<int> { 5, 7, 16, 3 };
int[] arrayDestination = listSource.AsParallel().WithDegreeOfParallelism(2).ToArray();

the example above limits parallelism to use two processor cores. Limiting to two cores still reduces performance, to 150 Million Integers/sec, just less dramatically by only 4X. There is definitely a problem here that needs to be addressed.

Higher Performance without Re-use

I’ve developed an improved method for parallel copying, which performs well on multiple processor cores, and have added it to the HPCsharp nuget package (available for free on http://www.nuget.org).

var listSource = new List<int> { 5, 7, 16, 3 };
int[] arrayDestination2 = listSource.ToArrayPar();

This ToArrayPar() function runs at 2.6 GigaIntegers/sec, while creating a new Array. This function reaches the same high level of performance, while creating a new destination array, without having that array paged in.  The performance gain is over 4X, on my 6-core laptop. On a 14-core Xeon, the gain is slightly over 2X.

This high level of performance is gained without having to worry about re-use of an Array – less to think about – and also useful when re-use isn’t possible.

Higher Performance with Re-Use

I showed the fastest performance using built-in C# CopyTo() function:

var listSource = new List<int> { 5, 7, 16, 3 };
int[] arrayDestination = new int[4];
listSource.CopyTo(arrayDestination);

when re-using the destination Array for the second and subsequent time, performance is as high as 2.6 GigaIntegers/sec using a single core.

HPCsharp raises performance by another 15% by using multiple processor cores, on my 6-core laptop:

var listSource = new List<int> { 5, 7, 16, 3 };
int[] arrayDestination = new int[4];
listSource.CopyToPar(arrayDestination);

This performance gain is not as large, because bandwidth limit is being reached on my dual-memory-channel laptop.

Gain in performance is much better on a 14-core Xeon system, where it is about 2.8X, reaching 46 GigaBytes/sec of memory bandwidth, which is over 50% of the available bandwidth on this quad-memory-channel system.

Conclusion

It’s possible to run over 50% of peak system memory bandwidth for List.ToArray() and List.CopyTo() functions whether the destination Array is new or being re-used. By using multiple cores with List.ToArrayPar() and List.CopyToPar() functions of HPCsharp nuget package, up to 4X performance gain can be achieved.

Key points:

  • Avoid using List.AsParallel().ToArray() since it’s slower than List.ToArray()
  • Highest level of performance is obtained by List.CopyToPar() from HPCsharp nuget package, when the destination Array is re-used over and over. It reaches 21 GigaBytes/sec on 2-memory-channel systems, and 46 GigaBytes/sec on 4-memory-channel system
  • Highest level of performance when re-use is not possible is obtained by List.ToArrayPar() from HPCsharp nuget package: 21 GigaBytes/sec on 2-memory-channel systems and 10 GigaBytes/sec on 4-memory-channel systems
  • Highest level of performance on a single core is obtained by List.CopyTo() when the destination array is re-used over and over, reaching 16 GigaBytes/sec
  • On a single core, when re-use isn’t possible List.ToArray() runs at nearly 5 GigaBytes/sec

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s