Big-O by a Concrete Example

Computer Scientists develop algorithms for all kinds of tasks, such as sorting numbers, searching the web for information, showing web pages and interacting with them, and so on – thousands of different algorithms. Even a task such as sorting a bunch of numbers, has more than 20 different algorithms. When you need to choose one of these 20+ sorting algorithms, how do you know that you’ve picked the best one? Big O can help you decide.

Sorting Example

Let’s compare three algorithms for sorting an array of numbers, integers in this case. All of these algorithms are real, and are implemented in the HPCsharp free open source library, written in C#.

Array Size Insertion Sort Merge Sort Radix Sort
10 0.0000013 0.0000004 0.0000039
100 0.0000093 0.0000217 0.0000044
1,000 0.0007115 0.0000666 0.0000791
10,000 0.0898216 0.0010013 0.0001737
100,000 8.9331636 0.0120635 0.0016906
1,000,000 917.4813177 0.1879224 0.0219319
10,000,000 2.2152744 0.2432855
100,000,000 25.5504355 2.4523489

Array Size column shows the number of integers being sorted. For example, in the first row, an array of 10 integers was sorted, using three algorithms: Insertion Sort, Merge Sort and Radix Sort. These algorithms are given the same input array, and produce the same result – a sorted array of 10 integers. However, they take a different amount of time to get sorting done on my laptop. The time taken is shown in each column in seconds.

As the array size increased to 100, then to 1000, then to 10000, a pattern emerges – Insertion Sort taking the most time, followed by Merge Sort, and then Radix Sort taking the least amount of time. When 1,000,000 integers are sorted, Insertion Sort takes 917 seconds (or over 15 minutes) to finish, while Merge Sort takes 0.18 seconds, and Radix Sort takes 0.02 seconds.

Can you imagine if instead of sorting, the task was to load a Facebook page, or to render a frame of a video game. Nobody would wait 15 minutes for such a slow task! We would think something is wrong with that website or that video game. Time is one of the most critical resources for algorithms to use wisely.

The reason Insertion Sort is missing time measurements for array sizes of 10,000,000 and 100,000,000 is because it would take about 24 hours to sort 10 million items, and 100 days to sort 100 million. I wasn’t willing to wait that long, when Merge Sort and Radix Sort took just a few seconds.

Another Way To Look At It

The table below shows another way of looking at the performance data in the table above.

Array Size Insertion Sort Merge Sort Radix Sort
10 0.3 0.1 1.0
100 2.1 4.9 1.0
1,000 9.0 0.8 1.0
10,000 517.1 5.8 1.0
100,000 5,284.0 7.1 1.0
1,000,000 41,833.2 8.6 1.0
10,000,000 9.1 1.0
100,000,000 10.4 1.0

For Insertion Sort and Merge Sort, the table shows now many times these algorithms are slower than Radix Sort. Insertion Sort starts out faster and close to the performance of Radix Sort for small arrays, but as arrays grow larger, Insertion Sort performs hundreds of times slower and even thousands of times slower. Yet, it produces the exact same result. There is just no point of wasting time using Insertion Sorts when arrays are larger than a couple of hundred items.

Merge Sort on the other hand, starts out being 5X slower than Radix Sort, and then slowly ramps up to 10X slower. For Radix Sort to be ten times faster is substantial. Can you image if someone handed you a computer or a gaming system that was 10 times faster than the one you currently have. That’s the beauty of better algorithms – produce the same results using fewer resources, such as time, or maybe use less computer memory space, or less of some other critical resource.

The table below offers another way of looking at performance.

Array Size Insertion Sort

O(N*N)

Merge Sort

O(N*logN)

Radix Sort

O(N)

10 1.0 1.0 1.0
100 7.2 54.3 1.1
1,000 76.5 3.1 18.0
10,000 126.2 15.0 2.2
100,000 99.5 12.0 9.7
1,000,000 102.7 15.6 13.0
10,000,000 11.8 11.1
100,000,000 11.5 10.1

For the algorithms, each row indicates how many times it is slower than the row above it. For instance, for array of 1,000 items, Insertion Sort is 76.5 times slower than when sorting an array of 100 items. Radix Sort is 10.1 times slower sorting an array with 100,000,000, then sorting an array with 10,000,000 items.

Notice that Insertion Sort, starting with an array of 1,000 items, slows down about 100 times as the array size grows 10 times. This is because Insertion Sort is an O(N*N) algorithm – i.e. O(N squared) – where N is number of items in the array. Ten squared is one hundred. In other words, as the array size N increases, the time grows by (N squared), which makes this a slow algorithm as N gets large.

Radix Sort slows down by about 10 times when the array size grows by 10 times. Radix Sort is an O(N) algorithms, a linear growth algorithm, which is amazing for sorting. In other words, as the array size (N) gets larger, the run time grows by a similar amount.

Performance of the Merge Sort is somewhere in between that of Insertion Sort and Radix Sort, but much closer to the performance of the Radix Sort. Merge Sort is an O(N*logN) algorithm, whose time grows faster than O(N) by logN.

Which Algorithm is the Best?

Seems like the answer should be simple based on the above example – Radix Sort! Yes, if the only thing we care about is speed, and our arrays are large and made of integers. But, what if our arrays are made of custom data types and we need to sort based on several fields. What if the only Radix Sort available in our programming language sorts arrays of integers and doesn’t handle any other data type, and Merge Sort is available with the ability to handle the need?

We could implement our own Radix Sort to handle those data types, or we could use a Merge Sort if available, or another O(N*logN) sorting algorithm. This would run slightly slower, but may still be sufficient. Even Insertion Sort may do if we know that our arrays are always less than a few hundred items. But, we know to stay away from Insertion Sort for arrays bigger than a few hundred.

What if memory/space usage was also important? Insertion Sort is in-place, which means it does not need any extra space to do its work. While the most common implementations of Merge Sort and other O(N*logN) sorting algorithms need additional memory equal to that of the array being sorted. Radix Sort (least significant digit variety) also required additional memory equal to that of the array being sorted. Radix Sort (most significant digit) can be in-place, but is slower. Ah, the wonder of trade-offs, and is the beauty of the puzzle of choosing “the best algorithm”. It just depends on the needs.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s