When should we use Radix sort?

PerformanceAlgorithmSortingQuicksortRadix Sort

Performance Problem Overview


It seems Radix sort has a very good average case performance, i.e. O(kN): http://en.wikipedia.org/wiki/Radix_sort

Yet it seems like most people are still using Quick Sort - why is this?

Performance Solutions


Solution 1 - Performance

Radix sort is harder to generalize than most other sorting algorithms. It requires fixed size keys, and some standard way of breaking the keys into pieces. Thus it never finds its way into libraries.

Solution 2 - Performance

The other answers here fail to give examples of when radix sort is actually used.

An example is when creating a "suffix array" using the skew DC3 algorithm (Kärkkäinen-Sanders-Burkhardt). The algorithm is only linear-time if the sorting algorithm is linear-time, and radix sort is necessary and useful here because the keys are short by construction (3-tuples of integers).

Solution 3 - Performance

Edited according to your comments:

  • Radix sort only applies to integers, fixed size strings, floating points and to "less than", "greater than" or "lexicographic order" comparison predicates, whereas comparison sorts can accommodate different orders.
  • k can be greater than log N.
  • Quick sort can be done in place, radix sort becomes less efficient.

Solution 4 - Performance

Unless you have a huge list or extremely small keys, log(N) is usually smaller than k, it is rarely much higher. So choosing a general-purpose sorting algorithm with O(N log N) average case performance isn't neccesarily worse than using radix sort.

Correction: As @Mehrdad pointed out in the comments, the argument above isn't sound: Either the key size is constant, then radix sort is O(N), or the key size is k, then quicksort is O(k N log N). So in theory, radix sort really has better asymptotic runtime.

In practice, the runtimes will be dominated by terms like:

  • radix sort: c1 k N

  • quicksort: c2 k N log(N)

where c1 >> c2, because "extracting" bits out of a longer key is usually an expensive operation involving bit shifts and logical operations (or at least unaligned memory access), while modern CPUs can compare keys with 64, 128 or even 256 bits in one operation. So for many common cases, unless N is gigantic, c1 will be larger than c2 log(N)

Solution 5 - Performance

Radix sort takes O(k*n) time. But you have to ask what is K. K is the "number of digits" (a bit simplistic but basically something like that).

So, how many digits do you have? Quite answer, more than log(n) (log using the "digit size" as base) which makes the Radix algorithm O(n log n).

Why is that? If you have less than log(n) digits, then you have less than n possible numbers. Hence you can simply use "count sort" which takes O(n) time (just count how many of each number you have). So I assume you have more than k>log(n) digits...

That's why people don't use Radix sort that much. Although there are cases where it's worthwhile using it, in most cases quick sort is much better.

Solution 6 - Performance

when n > 128, we should use RadixSort

when sort int32s, I choose radix 256, so k = log(256, 2^32) = 4, which is significant smaller than log(2, n)

and in my test, radix sort is 7 times faster than quicksort in the best case.

public class RadixSort {
	private static final int radix=256, shifts[]={8,16,24}, mask=radix-1;
	private final int bar[]=new int[radix];
	private int s[] = new int[65536];//不使用额外的数组t,提高cpu的cache命中率

	public void ensureSort(int len){
		if(s.length < len)
			s = new int[len];
	}	
	
	public void sort(int[] a){
		int n=a.length;
		ensureSort(n);
		for(int i=0;i<radix;i++)bar[i]=0;
		for(int i=0;i<n;i++)bar[a[i]&mask]++;//bar存放了桶内元素数量
		for(int i=1;i<radix;i++)bar[i]+=bar[i-1];//bar存放了桶内的各个元素在排序结果中的最大下标+1
		for(int i=0;i<n;i++)s[--bar[a[i]&mask]]=a[i];//对桶内元素,在bar中找到下标x=bar[slot]-1, 另s[x]=a[i](同时--bar[slot]将下标前移,供桶内其它元素使用)
		
		for(int i=0;i<radix;i++)bar[i]=0;
		for(int i=0;i<n;i++)bar[(s[i]>>8)&mask]++;
		for(int i=1;i<radix;i++)bar[i]+=bar[i-1];
		for(int i=n-1;i>=0;i--)a[--bar[(s[i]>>8)&mask]]=s[i];//同一个桶内的元素,低位已排序,而放入t中时是从t的大下标向小下标放入的,所以应该逆序遍历s[i]来保证原有的顺序不变
		
		for(int i=0;i<radix;i++)bar[i]=0;
		for(int i=0;i<n;i++)bar[(a[i]>>16)&mask]++;
		for(int i=1;i<radix;i++)bar[i]+=bar[i-1];
		for(int i=n-1;i>=0;i--)s[--bar[(a[i]>>16)&mask]]=a[i];//同一个桶内的元素,低位已排序,而放入t中时是从t的大下标向小下标放入的,所以应该逆序遍历s[i]来保证原有的顺序不变
		
		for(int i=0;i<radix;i++)bar[i]=0;
		for(int i=0;i<n;i++)bar[(s[i]>>24)&mask]++;
		for(int i=129;i<radix;i++)bar[i]+=bar[i-1];//bar[128~255]是负数,比正数小
		bar[0] += bar[255];
		for(int i=1;i<128;i++)bar[i]+=bar[i-1];		
		for(int i=n-1;i>=0;i--)a[--bar[(s[i]>>24)&mask]]=s[i];//同一个桶内的元素,低位已排序,而放入t中时是从t的大下标向小下标放入的,所以应该逆序遍历s[i]来保证原有的顺序不变		
	}
}

Solution 7 - Performance

Radix sort isn't a comparison-based sort and can only sort numeric types like integers (including pointer addresses) and floating-point, and it's a bit difficult to portably support floating-point.

It's probably because it has such a narrow range of applicability that many standard libraries choose to omit it. It can't even let you provide your own comparator, since some people might not want to even sort integers directly so much as using the integers as indices to something else to be used as a key for sorting, e.g. Comparison-based sorts allow all that flexibility so it's probably a case of just preferring a generalized solution fitting 99% of people's daily needs instead of going out of the way to cater to that 1%.

That said, in spite of the narrow applicability, in my domain I find more use for radix sorts than introsorts or quicksorts. I'm in that 1% and barely ever work with, say, string keys, but often find use cases for numbers that benefit from being sorted. It's because my codebase revolves around indices to entities and components (entity-component system) as well as things like indexed meshes and there's a whole lot of numeric data.

As a result, radix sort becomes useful for all kinds of things in my case. One common example in my case is eliminating duplicate indices. In that case I don't really need the results to be sorted but often a radix sort can eliminate duplicates faster than the alternatives.

Another is finding, say, a median split for a kd-tree along a given dimension. There radix sorting the floating-point values of the point for a given dimension gives me a median position rapidly in linear time to split the tree node.

Another is depth-sorting higher-level primitives by z for semi-proper alpha transparency if we aren't going to be doing it in a frag shader. That also applies to GUIs and vector graphics software to z-order elements.

Another is cache-friendly sequential access using a list of indices. If the indices are traversed many times, it often improves performance if I radix sort them in advance so that the traversal is done in sequential order instead of random order. The latter could zig-zag back and forth in memory, evicting data from cache lines only to reload the same memory region repeatedly within the same loop. When I radix sort the indices first prior to accessing them repeatedly, that ceases to happen and I can reduce cache misses considerably. This is actually my most common use for radix sorts and it's the key to my ECS being cache-friendly when systems want to access entities with two or more components.

In my case I have a multithreaded radix sort which I use quite often. Some benchmarks:

--------------------------------------------
- test_mt_sort
--------------------------------------------
Sorting 1,000,000 elements 32 times...

mt_radix_sort: {0.234000 secs}
-- small result: [ 22 48 59 77 79 80 84 84 93 98 ]

std::sort: {1.778000 secs}
-- small result: [ 22 48 59 77 79 80 84 84 93 98 ]

qsort: {2.730000 secs}
-- small result: [ 22 48 59 77 79 80 84 84 93 98 ]

I can average something like 6-7 ms to sort a million numbers one time on my dinky hardware which isn't as fast as I would like since 6-7 milliseconds can still be noticed by users sometimes in interactive contexts, but still a whole lot better than 55-85 ms as with the case of C++'s std::sort or C's qsort which would definitely lead to very obvious hiccups in frame rates. I've even heard of people implementing radix sorts using SIMD, though I have no idea how they managed that. I'm not smart enough to come up with such a solution, though even my naive little radix sort does quite well compared to the standard libraries.

Solution 8 - Performance

k = "length of the longest value in Array to be sorted"

n = "length of the array"

O(k*n) = "worst case running"

k * n = n^2 (if k = n)

so when using Radix sort make sure "the longest integer is shorter than the array size" or vice versa. Then you going to beat Quicksort!

The drawback is: Most of the time you cannot assure how big integers become, but if you have a fixed range of numbers radix sort should be the way to go.

Solution 9 - Performance

Here's a link which compares quicksort and radixsort:

Is radix sort faster than quicksort for integer arrays? (yes it is, 2-3x)

Here's another link which analyzes running times of several algorithms:

A Question of Sorts:

Which is faster on the same data; an O(n) sort or an O(nLog(n)) sort?

Answer: It depends. It depends on the amount of data being sorted. It depends on the hardware its being run on, and it depends on the implementation of the algorithms.

Solution 10 - Performance

One example would be when you are sorting a very large set or array of integers. A radix sort and any other types distribution sorts are extremely fast since data elements are mainly being enqueued into an array of queues(max 10 queues for an LSD radix sort) and remapped to a different index location of the same input data to be sorted. There are no nested loops so the algorithm tends to behave more linearly as the number of data input integers to be sorted becomes significantly larger. Unlike other sorting methods, like the extremely inefficient bubbleSort method, the radix sort does not implement comparison operations to sort. Its just a simple process of remapping integers to different index positions until the input is finally sorted. If you would like to test out an LSD radix sort for yourself, I have written one out and stored on github which can be easily tested on an online js ide such as eloquent javascript's coding sandbox. Feel free to play around with it and watch how it behaves with differing numbers of n. I've tested with up to 900,000 unsorted integers with a runtime < 300ms. Here is the link if you wish to play around with it.

https://gist.github.com/StBean/4af58d09021899f14dfa585df6c86df6

Solution 11 - Performance

in Integer 32bit Sort it will bit quicksort 7-10 times but on 1b elements will take noticeable memory like few gb . So you can use Radix or Counter sort first only if your data n large but original values in data are small or you can use in any huge integer list sorting when you can trade memory for speed

Solution 12 - Performance

Quick sort has an average of O(N logN), but it also has a worst case of O(N^2), so even due in most practical cases it wont get to N^2, there is always the risk that the input will be in "bad order" for you. This risk doesn't exist in radix sort. I think this gives a great advantage to radix sort.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionHowardView Question on Stackoverflow
Solution 1 - PerformanceMark RansomView Answer on Stackoverflow
Solution 2 - Performanceuser541686View Answer on Stackoverflow
Solution 3 - PerformanceAlexandre C.View Answer on Stackoverflow
Solution 4 - PerformanceNikiView Answer on Stackoverflow
Solution 5 - PerformanceGuyView Answer on Stackoverflow
Solution 6 - PerformancezhuwenbinView Answer on Stackoverflow
Solution 7 - Performanceuser4842163View Answer on Stackoverflow
Solution 8 - PerformancekiltekView Answer on Stackoverflow
Solution 9 - PerformanceIvan ŠView Answer on Stackoverflow
Solution 10 - PerformanceAnthony PoblacionView Answer on Stackoverflow
Solution 11 - PerformanceTigran SargsyanView Answer on Stackoverflow
Solution 12 - PerformanceGuy NirView Answer on Stackoverflow