The other day I was watching a new series of lectures about Least significant digit Radix sort given by Robert Sedgewick and Kevin Wayne on Coursera ( and it got me thinking – why don’t I use it anywhere ?

I got so comfy with Java’s mergesort and Comparable<T> that I never thought on optimizing that even in some performance critical code. So i though i should at least check what the performance impact can be.

LSD Radix sort is only useful in cases where you have a reasonable length of the keys on which you are sorting, so for my tests I assumed long as THE key data type. Reasoning behind was the following datatype:

since timestamp (Date object) can be expressed as long, and it’s perfectly reasonable to sort by date, it will do.

There are two additional things I had to define to enable more general sorting:

An interface that allows for extracting the sort key from object (and thus allowing for sorting various object by various keys – an equivalent of Comparator) and implementation of that for my data type.

So now for the sort itself:

So how this works:

We start by extracting the keys to a separate array (to avoid extracting them for every pass). It’s an optimization step, but it’s worth to do it, since calling the extractor method that many times will be expensive (but of course, there is a memory trade-off). Then, for each double octet (16 bits) starting from least significant ones, we do a key-index counting sort on extracted keys also moving the objects themselves into an auxiliary array. The result of each pass is an array sorted by that double byte. Because the sort is stable, we can start from least significant and move ‘up’.

The method is nor overly complicated, there are a couple of subtleties though:

lines 20-21 and 34-35 – the XOR with most significant bit of the most significant byte is required in order to correctly sort the negative numbers (negating the sign bit)

lines 40-46 – exchanging the auxiliary array with initial array (or just two aux arrays in case of keys) is needed in order to avoid copying all elements from aux array which is typically done in one-pass key-index sort.

Also note here, that thanks to the fact we have an even number of passes (4), we use the input array as an auxiliary array in the last pass, so we end up with sorted elements in original array and there is no need to copy elements from aux array to the original array.

So, is this any good ? Here’s results:

Size Comparable LSD Radix
10 0 ms. 10 ms.
100 1 ms. 9 ms.
1_000 3 ms. 6 ms.
10_000 37 ms. 10 ms.
100_000 178 ms. 65 ms.
1_000_000 466 ms. 179 ms.
10_000_000 8882 ms. 2219 ms.

For small number of elements it’s definitely not worth it, but that was expected. Above certain threshold though it definitely makes sense – it’s faster than system sort, and since Java uses mergesort for objects anyway, it doesn’t have that much bigger memory requirements than system sorting (the two aux key arrays, but in performance critical code if you want to sacrifice some purity for performance it too can be avoided).

There is one low-hanging optimization that can double the performance for large arrays – if you can limit your key size to a smaller number of bytes, you can get away with less passes. For example I tested with the Date converted to milliseconds from start of unix epoch which is long datatype, but afterwords did some tests with the key calculated by making the number of milliseconds relative to January 1-st 2000 (since I knew my data set couldn’t contain dates prior to that one) and lowered resolution to seconds which allowed me to fit it in 4 byte int.

It not always can be done, but as with many optimizations – a lot depends on inherent properties of your data.

Leave a reply


<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">