Faster Sorting in C#

C# has several built-in algorithms for sorting arrays, lists and other containers. Array.Sort and List.Sort have many variations for sorting full or partial containers, using an algorithm that runs on a single CPU core. The Linq library can also sort containers, using a single CPU core or multiple cores in parallel. These are powerful, flexible, high performance capabilities. If you need to sort even faster in C# (.NET) using the CPU, this blog describes high performance sorting algorithms provided in a free, open source library called HPCsharp – high performance computing in C#, or high performance C#, or or whatever else.

HPCsharp Sorting

HPCsharp library is a NuGet package, available on nuget.org, with source code on GitHub. Currently, it provides Insertion Sort, Merge Sort, and Radix Sort. Merge Sort has a parallel and a serial implementation. Parallel implementation perform substantially faster than sorting using parallel Linq. Radix Sort runs on a single core, at the moment, and competitive versus Array.Sort, List.Sort and even parallel Linq.

These sorting algorithms in HPCsharp provide the same familiar interfaces of the standard C# sorting, making it simple to check these out.

Parallel Merge Sort

ParallelMergeSortVsLinq36core

The above graph compares performance of parallel Linq sorting versus HPCsharp Parallel Merge Sort. The x-axis is the size of the UInt32 array being sorted. The y-axis is time in seconds it takes to sort that array. For example, parallel Linq takes 28 seconds to sort a 150 million element array, while HPCsharp parallel Merge Sort takes 1.4 seconds. This is a speedup of about 20 times, not 20%, but 20 times faster.

The benchmarks in the above graph were run on a 36-core machine in AWS – c5.18xlarge node. The graph shows that the parallel Merge Sort scales much better. On a quad-core CPU parallel Merge Sort is at least 4X faster than parallel Linq.

Parallel Merge Sort is not limited to only sorting arrays of integers, but can sort arrays and lists of user defined classes – in general any type that can be compared. You can provide your own comparison method. The sort is also stable, whereas C# built-in sort for arrays and lists is not. Linq sort can be stable when .AsStable is used.

Serial implementation of Merge Sort is also provided, which is slower than C# Array.Sort and List.Sort, but is faster than Linq sort, serial and parallel. In this case, Merge Sort is using a single core, while parallel Linq is using all cores on a quad-core CPU, and serial Merge Sort is faster.

ParallelMergeSortVsLinqSpeedup36core

The above graph shows speedup of Parallel Merge Sort versus Parallel Linq. The x-axis is the array size of UInt32’s. The y-axis is the speedup provided by the Parallel Merge Sort. Most of the points land in the 15X to 30X range, showing that the Parallel Merge Sort is that many times faster.

Radix Sort

 

Parallel Merge

At the heart of the parallel Merge Sort is a parallel Merge algorithm. This algorithm is O(N) linear and parallelizes well.

C# Built-in Sorting Algorithms Details

C# Array.Sort and List.Sort uses a hybrid sorting algoritm called introspective sort, which is a mixture of Insertion Sort, Heap Sort and Quicksort. “This implementation performs an unstable sort; that is, if two elements are equal, their order might not be preserved. In contrast, a stable sort preserves the order of elements that are equal. On average, this method is an O(n log n) operation, where n is Count; in the worst case it is an O(n ^ 2) operation” (msdn).

I’ll describe a library of sorting algorithms that has been developed over the last few years, which can sort faster while using a single CPU core, or multiple cores.

19 thoughts on “Faster Sorting in C#

    1. Hi David, HPCsharp is built as .NET Standard 2.0, which should be compatible with .NET Framework 4.6.1. The test suite for HPCsharp is a 4.6.2 project, and when I switched it over to 4.6.1 it built and ran.

      Like

      1. Hi.. when I tried to install it using nuget it failed because it could not find a version compatible with my system. Otherwise, I would love to use it (my system has an Xeon with 12 cores which should yield nice ‘sort’ performance – my main interest for now)

        Like

  1. Here’s is the nuget console:

    Attempting to gather dependency information for package ‘HPCsharp.3.9.2’ with respect to project ‘Utils’, targeting ‘.NETFramework,Version=v4.6’
    Gathering dependency information took 71.97 ms
    Attempting to resolve dependencies for package ‘HPCsharp.3.9.2’ with DependencyBehavior ‘Lowest’
    Resolving dependency information took 0 ms
    Resolving actions to install package ‘HPCsharp.3.9.2’
    Resolved actions to install package ‘HPCsharp.3.9.2’
    Install failed. Rolling back…
    Package ‘HPCsharp.3.9.2’ does not exist in project ‘Utils’
    Package ‘HPCsharp.3.9.2’ does not exist in folder ‘D:\Talenya\Code\Repos2 – Elastic Search\packages’
    Executing nuget actions took 129.58 ms
    Could not install package ‘HPCsharp 3.9.2’. You are trying to install this package into a project that targets ‘.NETFramework,Version=v4.6’, but the package does not contain any assembly references or content files that are compatible with that framework. For more information, contact the package author.
    Time Elapsed: 00:00:00.3224876
    ========== Finished ==========

    Like

  2. My guess is that 2.0 is compatible from an API point of view but may not be at the assembly level (meaning the ability to mix 4.6.1 code with 2.0 code).

    Like

  3. I tried clearing the nuget cache as suggested by the link you posted

    (https://stackoverflow.com/questions/34991703/nuget-package-installation-failure)

    Unfortunately, it is still not installing – giving the same error. Note that the text of the nuget error says”

    `You are trying to install this package into a project that targets ‘.NETFramework,Version=v4.6’`

    It is not actually referencing 4.6.1.

    The 7 answers posted in the link you provided say that the problem was seen in all 4.6 versions.

    Perhaps it has to do with the nuspec packaging information.

    Like

    1. Such a strange problem. I looked at the nuspec file, but it doesn’t hold anything interesting except for the version, copyright, and release notes. One thing I noticed in the HPCsharp project is that x64 is the target platform, most likely because the performance is higher. Could you try switching your platform to x64 and see if that helps?

      Like

      1. Hi Victor,

        My project is already X64. 😦

        I would really like to get to use your code in my project.

        I have code that uses List which can hold well over 1 million objects. I need to sort the list in descending order using a property of MyObject.

        The built-in List.Sort has no option to run in parallel.

        Because my code requires very high performance, I researched looking for a fast mature code that could do the job.. that’s how I got to your project (which looks great, by the way).

        Unfortunately, the company that I’m building the code for cannot, for the moment, move to .Net Standard.. so I am constrained.

        One of the things I will try later in the week, is to compile the code locally and manually move the dll to my code and see if that works.

        I really appreciate your help in trying to resolve the issue.

        [[ by the way if you wish to communicate directly, via: Ab3djmarcusTax at my consulting company’s domain Ab3M5IncTax (note that the first and trailing 3 letters should be removed in both values)]]

        -Regards
        David

        Like

      2. Hi. I wanted to let you know that I solved the problem. We were simply not paying close enough attention to the nuget messages. It was saying that my system was set to 4.6 (while I was thinking it was 4.6.1).

        My VS 2019 was not showing me 4.6.1 in the project properties pane. Once I downloaded and installed the SDK, VS allowed me to switch to 4.6.1. Once I switched, I was able to install your package.

        Thanks for your help. I plan to get to your package this coming week and compare it to another parallel sorter package. I’ll keep yo posted on the results I get.

        -Regards
        David

        Like

      3. Hi Victor,

        I had a few minutes to test-drive your implementation of a parallel sort. In my case I’m sorting an array of strings (in place):

        My method of testing: I ran a parallel sort (~1.5M strings) using Wiesław Šoltés code [see: https://gist.github.com/wieslawsoltes/6592526 ]:

        ParallelQuickSort(T[] array) where T : IComparable

        Then I ran your code on the original copy of the unsorted array:

        HPCsharp.ParallelAlgorithm.SortMergePar(array);

        My initial findings:

        [1] They did not return the same result!! The array had some strings with non-ASCII characters, It looks like their code is not handling the Unicode chars correctly (yours is keys1).

        keys1[39745]=’æther’ — keys2[39745]=’aether’

        Note the ‘æ’ -vs- ‘ae’

        [2] Your sort in place hides that it is sorting into an internal array and then copying the results back to the original (I say that based on my inspection of the code on GitHub).

        This may be OK for small arrays, but not for very large arrays.. it REALLY should sort in place (would run faster since no initial allocation and final copy required).

        [3] The timing of your code seems very good as the array size increases (your code is t1).

        array size= 1,496,918: t1= 0.5427242 secs — t2= 1.6247592 secs
        array size= 325,209: t1= 0.1688067 secs — t2= 0.3535007 secs
        array size= 19,881: t1= 0.0568745 secs — t2= 0.0553183 secs
        array size= 174: t1= 0.0001849 secs — t2= 0.0020397 secs
        array size= 11,604,392: t1= 4.4706853 secs — t2= 14.0935896 secs
        array size= 253: t1= 0.0003025 secs — t2= 0.0019154 secs
        array size= 56,023: t1= 0.0840893 secs — t2= 0.1237621 secs
        array size= 3,300,159: t1= 1.0914012 secs — t2= 7.5834776 secs

        My system (Windows 10 Pro) has a new 12-core i7 with turbo boosting.

        Note your timing advantage as the array size increases. 🙂

        Kudos!!

        -Regards
        David

        Like

      4. Hi David,
        Thank you for sharing benchmark results for your use case. It’s exciting to see such good performance on a 12-core machine.
        How did you determine which algorithm was correct? Did you compare correctness with C# .Sort()? Would you care to share comparison in performance with C# .Sort()?
        Your suggestion for implementing a truly in-place parallel merge sort is appreciated, especially given that I published a paper on such an algorithm 8 years ago (https://www.drdobbs.com/parallel/parallel-in-place-merge/240008783?pgno=1) in C++.
        If you study that paper, you’ll see that the in-place merge algorithm in the Standard Template Library (STL), which is part of the C++ standard, is about 3X slower than the not-in-place merge. This is the reason STL implements a dynamic algorithm strategy, where if memory is available then the not-in-place (faster) algorithm is chosen, otherwise when memory isn’t available, the slower in-place version is selected. This has been my plan for HPCsharp as well. Also, keep in mind that in-place merge is O(nlgn) whereas not-in-place merge is O(n). Thus, in-place will get slower as array size grows. You’ll also see that the parallel version did not provide much of a performance improvement on a quad-core CPU at that time. But, now with 12-core CPUs (and even higher), it is definitely worth revisiting.
        The current parallel merge sort implementation in HPCsharp provides an in-place interface, disclosing the need for an extra array in the embedded documentation of each function.
        Stars and tips are always welcome on https://github.com/DragonSpit/HPCsharp
        Regards,
        -Victor

        Like

  4. I finally had a chance to test your code on sorting List using a custom comparer which had about 15M objects. I am sorting against a ‘double’ property of myObject.

    To sort the List directly was a disaster. I stopped it before it even completed (it ran for over a minute and did not finish).

    I am sure that the reason is that moving/swapping elements in a List is incurring all the overhead inherent in the List object (especially assignment).

    So, the code I ultimately tested was:
    var array = myList.ToArray();

    And then performed the sort benchmarks.

    HPCsharp.SortMergePar(): 1.06 seconds.

    Array.Sort(array): 9.425

    I compared this to the original sort using:
    myList.Sort( (x, y) => …): 6.218 seconds.

    The HPCSharp and the Array.Sort steps include the time (.06 seconds) to generate the array from the List. So to see the elapsed times you need to subtract 0.06 from the first and second number. The List.Sort() was in-place.

    Like

    1. These are great result. Thank you for sharing.
      If you are sorting based of a key within a class, then give SortRadix() function a try, with the following usage:
      sortedUserArray = userArray.SortRadix(element => element.Key);
      an example of it is in the HPSsharpExamples solution in the repo.
      Sadly, currently the key is limited to an integer for this function. Interestingly, the SortRadixMsd and SortRadixMsdFunc functions are truly in-place and support sorting floating-point keys. However, they don’t support user-defined-types and sorting by a key within a user-defined-key. However, that should be possible to accomplish. While both Radix Sorts are mostly single-core functions with a portion of the algorithm running on multi-core (but not the entire algorithm), these beat MergeSort even the parallel version while using only a single core. One of the reasons is both of these RadixSot algorithms are O(N) linear-time algorithms and when the size of the array gets big enough, they start beating O(nlgn) MergeSort (even the parallel version).

      Like

      1. The data that I sort is almost entirely string-keyed data so radix sorting is not a real option unless you treat each char as a ‘bit’ (for the purpose of a radix). Also, in my case, the string keys are variable length (except one case where they happen to be 11-char).

        The sort times that I am seeing are in the single digit seconds which for now is ok. Down the road (perhaps even near the end of this year) the array sizes will increase by t at least an order of magnitude (or perhaps even 40x) at which point I’ll need to come back to this issue.

        By the way, I would love to see an option (a bool flag in the ”sort’ parameters) that tells your code to return the internal sorted array it created for the sorting. This would avoid having to copy the sorted results back on top of the original array. In my case, and I’m sure for others that use your code, I just need a sorted array. If it comes back in another array – and that saves time – its all for the beeter. I think this is trivial to do.

        By the way, strings can be made usable for radix sorting by logically extending them with ‘x’00’ when comparing against a longer string. If I have time in the coming weeks, I’ll look into supplying you with a string sorter for your great library.

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s