Sorting an almost sorted array (elements misplaced by no more than k)

ArraysAlgorithmSorting

Arrays Problem Overview


I was asked this interview question recently:

> You're given an array that is almost sorted, in that each of the N elements may be misplaced by no more than k positions from the correct sorted order. Find a space-and-time efficient algorithm to sort the array.

I have an O(N log k) solution as follows.

Let's denote arr[0..n) to mean the elements of the array from index 0 (inclusive) to N (exclusive).

  • Sort arr[0..2k)
    • Now we know that arr[0..k) are in their final sorted positions...
    • ...but arr[k..2k) may still be misplaced by k!
  • Sort arr[k..3k)
    • Now we know that arr[k..2k) are in their final sorted positions...
    • ...but arr[2k..3k) may still be misplaced by k
  • Sort arr[2k..4k)
  • ....
  • Until you sort arr[ik..N), then you're done!
    • This final step may be cheaper than the other steps when you have less than 2k elements left

In each step, you sort at most 2k elements in O(k log k), putting at least k elements in their final sorted positions at the end of each step. There are O(N/k) steps, so the overall complexity is O(N log k).

My questions are:

  • Is O(N log k) optimal? Can this be improved upon?
  • Can you do this without (partially) re-sorting the same elements?

Arrays Solutions


Solution 1 - Arrays

As Bob Sedgewick showed in his dissertation work (and follow-ons), insertion sort absolutely crushes the "almost-sorted array". In this case your asymptotics look good but if k < 12 I bet insertion sort wins every time. I don't know that there's a good explanation for why insertion sort does so well, but the place to look would be in one of Sedgewick's textbooks entitled Algorithms (he has done many editions for different languages).

  • I have no idea whether O(N log k) is optimal, but more to the point, I don't really care—if k is small, it's the constant factors that matter, and if k is large, you may as well just sort the array.

  • Insertion sort will nail this problem without re-sorting the same elements.

Big-O notation is all very well for algorithm class, but in the real world, constants matter. It's all too easy to lose sight of this. (And I say this as a professor who has taught Big-O notation!)

Solution 2 - Arrays

If using only the comparison model, O(n log k) is optimal. Consider the case when k = n.

To answer your other question, yes it is possible to do this without sorting, by using heaps.

Use a min-heap of 2k elements. Insert 2k elements first, then remove min, insert next element etc.

This guarantees O(n log k) time and O(k) space and heaps usually have small enough hidden constants.

Solution 3 - Arrays

It was already pointed out that one of the asymptotically optimal solutions uses a min heap and I just wanted to provide code in Java:

public void sortNearlySorted(int[] nums, int k) {
  PriorityQueue<Integer> minHeap = new PriorityQueue<>();
  for (int i = 0; i < k; i++) {
    minHeap.add(nums[i]);
  }

  for (int i = 0; i < nums.length; i++) {
    if (i + k < nums.length) {
      minHeap.add(nums[i + k]);
    }
    nums[i] = minHeap.remove();
  }
}

Solution 4 - Arrays

Since k is apparently supposed to be pretty small, an insertion sort is probably the most obvious and generally accepted algorithm.

In an insertion sort on random elements, you have to scan through N elements, and you have to move each one an average of N/2 positions, giving ~N*N/2 total operations. The "/2" constant is ignored in a big-O (or similar) characterization, giving O(N2) complexity.

In the case you're proposing, the expected number of operations is ~N*K/2 -- but since k is a constant, the whole k/2 term is ignored in a big-O characterization, so the overall complexity is O(N).

Solution 5 - Arrays

Your solution is a good one if k is large enough. There is no better solution in terms of time complexity; each element might be out of place by k places, which means you need to learn log2 k bits of information to place it correctly, which means you need to make log2 k comparisons at least--so it's got to be a complexity of at least O(N log k).

However, as others have pointed out, if k is small, the constant terms are going to kill you. Use something that's very fast per operation, like insertion sort, in that case.

If you really wanted to be optimal, you'd implement both methods, and switch from one to the other depending on k.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionpolygenelubricantsView Question on Stackoverflow
Solution 1 - ArraysNorman RamseyView Answer on Stackoverflow
Solution 2 - ArraysAryabhattaView Answer on Stackoverflow
Solution 3 - ArraysIvaylo ToskovView Answer on Stackoverflow
Solution 4 - ArraysJerry CoffinView Answer on Stackoverflow
Solution 5 - ArraysRex KerrView Answer on Stackoverflow