Merge Sort was invented by John von Neumann in 1945. Many implementations have been shown in books and on the internet. However, in-place implementations have been difficult to accomplish, due to the need for an in-place merge, which only recently became available in standard C++, and is not available in other languages.

In this blog, I’ll present two simple and compact in-place sort implementations in C++, which take advantage of the availability of a standard in-place merge algorithm.

### Top Down Algorithm

Top down merge sort, in-place and out-of-place, algorithms are typically implemented using divide-and-conquer approach, as shown below:

```
template< class _Type >
inline void merge_sort_inplace(_Type* src, size_t l, size_t r)
{
if (r <= l) return;
size_t m = l + ( r - l ) / 2; // compute average without overflow
merge_sort_inplace(src, l, m);
merge_sort_inplace(src, m + 1, r);
std::inplace_merge(src + l, src + m + 1, src + r + 1);
}
```

This implementation is based on “Algorithms in C++” by Robert Sedgewick, p. 354, updating it with the availability of a standard C++ in-place merge algorithm. The range being sorted has also been updated to use 64-bit unsigned indexes (size_t). Computation of the mid/split index (m), has also been improved to avoid arithmetic overflow.

### Bottom Up Algorithm

Bottom up algorithm has been implemented in several text books, such as “Algorithms in C++” by Robert Sedgewick, p. 360:

```
template< class _Type >
inline void merge_sort_bottom_up_inplace(_Type* src, size_t l, size_t r)
{
for (size_t m = 1; m <= r - l; m = m + m)
for (size_t i = l; i <= r - m; i += m + m)
std::inplace_merge(src+i, src+i+m, src + __min(i+m+m, r+1));
}
```

This implementation has also been updated to use the standard C++ in-place algorithm, and is amazingly compact.

### Performance

Both of the above In-Place Merge Sort algorithm implementations have O(n * (lgN)^2) performance, because in the worst case, the standard C++ in-place merge algorithm is O(N * lgN). However, the standard C++ implementation cleverly dynamically decides whether to use an in-place or out-of-place merge implementation under the hood. The out-of-place merge algorithm is O(N) and is much faster. It is used if enough memory is available to allocate the extra O(N) buffer of space, needed as a temporary work buffer. Otherwise, if there is not enough memory, the slower truly in-place merge algorithm is used.

Thus, if enough memory is available, the above implementations will run at O(N * lgN). But, if memory fails to allocate, it will run at O(n * (lgN)^2).

The following table shows performance sorting an array of 10 Million unsigned long integers (32-bit):

Algorithm | Random | Presorted | Constant |
---|---|---|---|

Merge Sort Top Down | 9.6 | 46 | 95 |

Merge Sort Bottom Up | 11 | 51 | 130 |

C++ Standard sort() | 11 | 36 | 2174 |

the sorted array is first filled with either random values (uniform distribution), nearly pre-sorted values, or constant values, to provide three different distributions. Benchmark was run on Intel i7-9750H CPU.

The above graph shows performance of C++ standard sort algorithm versus Merge Sort algorithms implemented above. C++ standard sort is an Introspective Sort, which is a hybrid algorithm, made of Insertion Sort, Heap Sort and Quick Sort. Both Merge Sort implementations are pure (non-hybrid) implementations.

### Order of Performance

C++ standard sort performs ** O**(N * lgN) comparisons and moves.

Both Merge Sort algorithms perform ** O**(N * lgN) if there is sufficient memory for inplace_merge() to use not-in-place-merge version, and

**(N * lg**

*O*^{2}N) when there is not sufficient extra memory available, in which case it falls back to a truly in-place merge algorithm.

The above graph shows this performance for sorting arrays of unsigned long (32-bit) integers, up to a billion of them. This benchmark was performed on a computer with 32 GBytes of system memory, which is sufficient to never cause inplace_merge() algorithm to resort to its truly in-place merge implementation. This is the reason the pure Merge Sort algorithms are competitive with C++ standard sort, as their order of performance match in that case.

Once the array gets big enough to where allocation of extra memory for inplace_merge() is not possible, and it resorts to its truly in-place merge implementation, the both Merge Sort implementations will slow down. I’ll work on benchmarking this condition, and will present performance results.

### Conclusion

In this blog we saw that two simple in-place Merge Sort algorithms can be implemented using a standard C++ inplace_merge. These two implementation have competitive performance whenever enough system memory is available to allocate additional memory inplace_merge to use its higher performance not-in-place-merge algorithm implementation. inplace_merge is an adaptive algorithm – adapting its behavior and performance depending on how much system memory resource is available.

### Resources

These and other sorting algorithms, including parallel/multi-core implementations are available in an open source and free C++ repo: https://github.com/DragonSpit/ParallelAlgorithms

C# implementations are available in: https://github.com/DragonSpit/HPCsharp free and open source repo and on nuget.org (HPCsharp) as a prebuilt nuget package for Windows and Linux.

## One thought on “Two Simple Sorting Algorithms”