Hybrid In-Place Merge Sort

Two truly simple sorting algorithms are shown in my previous blog https://duvanenko.tech.blog/2021/11/26/two-simple-sorting-algorithms/. These algorithms are both in-place and perform surprisingly well compared to standard C++ sort, especially when sufficient memory is available. In this blog, let’s see if we can raise the performance bar higher by adding slightly more complexity.

Hybrid Bottom Up Merge Sort

Pure algorithms usually perform well, but hybrids have been used time and time again to perform even better. For instance, C++ standard combines Insertion Sort, Heap Sort, and Quicksort, into a high performance in-place sorting algorithm – called Introspective Sort.

Pure Bottom Up In-Place Merge Sort can also be combined with Insertion Sort, to create a higher performing sorting algorithm. Insertion Sort is in-place, which fits perfectly, for in-place usage. Insertion Sort is good at handling small sub-arrays, and that’s how we use it.

template< class _Type >
inline void merge_sort_bottom_up_inplace_hybrid(_Type* src, size_t l, size_t r)
    if ((r - l) <= 32) {
        insertion_sort(src + l, r - l + 1);
    size_t m = 32;
    for (size_t i = l; i <= r; i += m)
        insertion_sort(src + i, __min(m, r - m + 1));
    for (; m <= r - l; m = m + m)
        for (size_t i = l; i <= r - m; i += m + m)
            std::inplace_merge(src + i, src + i + m, src + __min(i + m + m, r + 1));

The above implementation starts out by splitting the input array into 32-element sub-array chunks, sorting each of them, before continuing on with the pure bottom up merge algorithm.

Hybrid Divide-And-Conquer Merge Sort

Divide-and-conquer algorithms are made for hybrids, especially to process the leaf nodes of the recursive tree, as shown in the implementation below:

template< class _Type >
inline void merge_sort_hybrid_inplace(_Type* src, size_t l, size_t r)
    if (r <= l) return;
    if ((r - l) <= 48) {
        insertion_sort(src + l, r - l + 1);

   size_t m = l + (r - l) / 2;   // computes the average without overflow

   merge_sort_hybrid_inplace(src, l,     m);
   merge_sort_hybrid_inplace(src, m + 1, r);

   std::inplace_merge(src + l, src + m + 1, src + r + 1);

Insertion Sort is added to the implementation to handle sub-array chunks once recursive subdivision reaches less than 48 elements.


Let’s see how well these two In-Place Merge Sort algorithms perform.

The bottom up in-place Merge Sort algorithm performance has been raised to be nearly identical to the standard C++ sort implementation. Divide-and-conquer algorithm performance also increased substantially and is equivalent to performance of the standard C++ sort. Both algorithms are in-place and both use a hybrid approach to raise performance by terminating recursion early.


Two hybrid in-place Merge Sort algorithms have been developed. These are simple and competitive in performance to C++ standard sort algorithm, whenever sufficient memory resources are available, due to the adaptive nature of inplace_merge implementation. The algorithms shown run on a single processor core.

However, when memory resources are not available the above algorithms most likely perform worse than the standard C++ sort.

This is a great starting point for future Parallel In-Place Merge Sort implementations, which will run on many processor cores, and will take performance even higher.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s