Array and List Sorting


This post is an exploration of how the Array and List classes, from the System.Generic.Collections namespace, implement sorting in .NET framework 4.5+. My curiosity was peaked when no one I had discussed the topic with knew how it was implemented.

I purposefully chose to look into arrays and lists because these are the collections I use most frequently in day-to-day programming. I think it’s worth knowing and I hope some of my readers learn something new.

The Algorithm

The sorting algorithm used by the List and Array classes is comprised of more than one algorithm.

It comes as no surprise that the algorithm varies based upon input size. The .NET framework uses a variation of a hybrid sorting algorithm called introspective sort (intro sort for short) for it’s Array and List classes.

What the heck is that you ask? I had the same question! Let’s take a look from the Array.Sort MSDN page for a quick reference:

This method uses the introspective sort (introsort) algorithm as follows

  • If the partition size is fewer than 16 elements, it uses an insertion sort algorithm.
  • If the number of partitions exceeds 2 * LogN, where N is the range of the input array, it uses a Heapsort algorithm.
  • Otherwise, it uses a Quicksort algorithm.

The .NET framework variant of introspective sort  is comprised of 3 algorithms. One thing worth noting is that the algorithm’s stability varies based upon input size.

A sorting algorithm is stable if two objects with equal keys appear in the same order in sorted output as they appear in the input array to be sorted.

If your input is less than or equal 16 elements, then your sort is guaranteed to be stable. If your input size is greater than 16, you’re using one of two unstable sorting algorithms.

Note: The MSDN remarks on implementation is incorrect, per the source code, insertion sort is used if the input size is <= 16 elements.

Below is a quick overview of the key features of each of these algorithms, followed by the reasons behind favoring introspective sort over the .NET framework <4.5 implementation.

Insertion Sort

Insertion sort is a naive sorting algorithm, yielding O(n^2) worst case and average case performance.

Per wikipedia, insertion sort iterates, consuming one input element each repetition, and growing a sorted output list. The following image illustrates the algorithm at work.

The worst case space complexity for insertion sort is O(n).

This algorithm is used for sorting lists of n <= 16 elements.

Heapsort

Heapsort is a comparison-based sorting algorithm, similar to selection sort. It divides the inputs into a sorted and unsorted region and iteratively shrinks the unsorted region by moving the largest element to the sorted region.

Heapsort’s worst-case runtime is more favorable than Quicksorts at O(n log n). Heapsort operates in-place and is unstable.

An algorithm is said to be in-place if it transforms inputs using no auxiliary data structures. A small amount of extra storage space is allowed for auxiliary variables.

Since Heapsort is an in-place algorithm, the worst-case space complexity is O(1).

This algorithm is used when the introspective sort’s recursion depth exceeds a level based on the logarithm of the number of elements being sorted ( represented as 2 * Log of keys.Length ).

Quicksort

Quicksort (also called partition sort) is a divide and conquer comparison-based sorting algorithm. It operates by recursively dividing the input array into smaller sub-arrays containing low elements and high elements, picking a pivot, and sorting the arrays using that pivot.

On average, the algorithm has an O(n log n) runtime, however, in worst case scenarios it can make up to O(n^2) comparisons.

The worst case space complexity is O(n).

This algorithm is the default sorting method used when neither insertion sort or heapsort are used.

Why Introsort

So why did Microsoft switch from a quicksort only implementation (pre .NET framework 4.5) to an introsort implementation?

The reason has to do with quicksort’s potential time and space complexity. In best case scenarios, quicksort will have an O(n log n) runtime and an O(log n) space requirement. However, if your input is already sorted and you repeatedly pick the pivot with the smallest or largest element in the array for each partition, you wind up with a call tree that makes n – 1 nested calls, resulting in O(n^2) runtime and O(n) extra space. The switch to heapsort at log n occurs because it guarantees that the upper bound on runtime is O(n log n) and will take O(log n) extra space.

Introsort offers fast average performance and optimal worst-case performance which is why Microsoft chose to use it over the original quicksort only implementation.

Conclusion

The .NET framework version of introspective sort differs in that it opts to use an insertion sort for inputs with less than or equal to 16 elements. Sorting is stable for inputs with fewer than 16 elements and unstable for inputs with more than 16 elements.

Introspective sort was chosen because it provides fast average performance and optimal worst-case performance.


Leave a comment below.