Divide-and-conquer sorting algorithms

Previously, I wrote about the simple sorting algorithms which have a computational complexity of O(n2). Better sort algorithms exist, and the particular class of divide-and-conquer algorithms are interesting in this era of big data as they lend themselves well to parallelisation.

There are two ways to divide a list: 1) you just split it into equal size portions, and 2) you split so that all the elements in the first sublist are smaller than all the elements in the second, even though the two halves themselves may be unsorted. In the first case, splitting is easy, and all the computational effort is spent on recombining the sorted lists. In the second case, recombining is easy as you can simply append the two lists together, knowing that the result will be sorted. However, computational effort is required to split the list.

These two approaches give rise to mergesort and quicksort respectively, which both have an average computational complexity of O(n log n). In practice, a simple sorting algorithm like insertion sort works faster on very short lists (e.g. n<10) so these divide-and-conquer algorithms tend to split until the sublists are short and then use insertion sort on the final step.

Mergesort

In mergesort, a list is split in two by size, until sublists of length 0 or 1 are ultimately reached. These lists are, by definition, sorted.

[4, 5, 7, 3, 2, 1, 6, 9] -> [4], [5], [7], [3], [2], [1], [6], [9]

Merging sublists is done by iteratively comparing the first elements of each sublist, and moving the smallest to the end of a new merged list, until the two initial sublists are empty. Merging is first done on the sublists of length 1:

[4], [5] -> [4, 5]
[7], [3] -> [3, 7]
[2], [1] -> [1, 2]
[6], [9] -> [6, 9]

Then the sorted sublists of length 2 are merged into sublists of length 4:

[4, 5], [3, 7] -> [3, 4, 5, 7]
[1, 2], [6, 9] -> [1, 2, 6, 9]

And finally, the two sorted sublists of length 4 are merged to give the sorted list of length 8:

[3, 4, 5, 7], [1, 2, 6, 9] -> [1, 2, 3, 4, 5, 6, 7, 9]

Quicksort

In quicksort, a pivot is chosen, and the unsorted list is split into sublists of elements less than and greater than the pivot. Suppose the first element, 4, is chosen as the pivot, then the list is split in two with the pivot in the middle:

[4, 5, 7, 3, 2, 1, 6, 9] -> [3,2,1], 4, [5,7,6,9]

Now, to obtain the final list, we call quicksort again on each of the sublists, and concatenate the results:

[4, 5, 7, 3, 2, 1, 6, 9] -> 
    quicksort([3,2,1]) + [4] +  quicksort([5,7,6,9])

In the worst case, the pivot is badly chosen and the divide step splits the longer list into an empty list and a list of length n-1. This gives rise to a, rare, worst-case running time of O(n2).

Bucket sort is a variant of this, where there are predefined ranges, or ‘buckets’ that a list is split into. The buckets are sorted individually, and the sorted buckets concatenated to yield the final sorted list.

Advertisements
This entry was posted in Technology and tagged , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s