Is there a way to measure how sorted a list is?

ArraysAlgorithmListSorting

Arrays Problem Overview


Is there is a way to measure how sorted a list is?

I mean, it's not about knowing if a list is sorted or not (boolean), but something like a ratio of "sortness", something like the coefficient of correlation in statistics.

For example,

  • If the items of a list are in ascending order, then its rate would be 1.0

  • If list is sorted descending, its rate would be -1.0

  • If list is almost sorted ascending, its rate would be 0.9 or some value close to 1.

  • If the list is not sorted at all (random), its rate would be close to 0

I'm writting a small library in Scala for practice. I think a sorting rate would be useful, but I don't find any information about something like that. Maybe I don't know adequate terms for the concept.

Arrays Solutions


Solution 1 - Arrays

You can simply count the number of inversions in the list.

Inversion

An inversion in a sequence of elements of type T is a pair of sequence elements that appear out of order according to some ordering < on the set of T's.

From Wikipedia: > Formally, let A(1), A(2), ..., A(n) be a sequence of n numbers.
If i < j and A(i) > A(j), then the pair (i,j) is called an inversion of A. > > The inversion number of a sequence is one common measure of its sortedness.
Formally, the inversion number is defined to be the number of inversions, that is, > > definition

To make these definitions clearer, consider the example sequence 9, 5, 7, 6. This sequence has the inversions (0,1), (0,2), (0,3), (2,3) and the inversion number 4.

If you want a value between 0 and 1, you can divide the inversion number by N choose 2.

To actually create an algorithm to compute this score for how sorted a list is, you have two approaches:

Approach 1 (Deterministic)

Modify your favorite sorting algorithm to keep track of how many inversions it is correcting as it runs. Though this is nontrivial and has varying implementations depending on the sorting algorithm you choose, you will end up with an algorithm that is no more expensive (in terms of complexity) than the sorting algorithm you started with.

If you take this route, be aware that it's not as simple as counting "swaps." Mergesort, for example, is worst case O(N log N), yet if it is run on a list sorted in descending order, it will correct all N choose 2 inversions. That's O(N^2) inversions corrected in O(N log N) operations. So some operations must inevitably be correcting more than one inversion at a time. You have to be careful with your implementation. Note: you can do this with O(N log N) complexity, it's just tricky.

Related: https://stackoverflow.com/questions/6523712/calculating-the-number-of-inversions-in-a-permutation

Approach 2 (Stochastic)

  • Randomly sample pairs (i,j), where i != j
  • For each pair, determine whether list[min(i,j)] < list[max(i,j)] (0 or 1)
  • Compute the average of these comparisons and then normalize by N choose 2

I would personally go with the stochastic approach unless you have a requirement of exactness - if only because it's so easy to implement.


If what you really want is a value (z') between -1 (sorted descending) to 1 (sorted ascending), you can simply map the value above (z), which is between 0 (sorted ascending) and 1 (sorted descending), to this range using this formula:

z' = -2 * z + 1

Solution 2 - Arrays

The traditional measure of how sorted a list (or other sequential structure) is, is the number of inversions.

The number of inversions is the number of pairs (a,b) st index of a < b AND b << a. For these purposes << represents whatever ordering relation you choose for your particular sort.

A fully sorted list has no inversions, and a completely reversed list has the maximum number of inversions.

Solution 3 - Arrays

You can use actual correlation.

Suppose that to each item in the sorted list, you assign an integer rank starting from zero. Note that a graph of the elements position index versus rank will look like dots in a straight line (correlation of 1.0 between the position and rank).

You can compute a correlation on this data. For a reverse sort you will get -1 and so on.

Solution 4 - Arrays

There has been great answers, and I would like to add a mathematical aspect for completeness:

  • You can measure how sorted a list is by measuring how much it is correlated to a sorted list. To do that, you may use the rank correlation (the most known being Spearman's), which is exactly the same as the usual correlation, but it uses the rank of elements in a list instead of the analog values of its items.

  • Many extensions exist, like a correlation coefficient (+1 for exact sort, -1 for exact inversion)

  • This allows you to have statistical properties for this measure, like the permutational central limit theorem, which allows you to know the distribution of this measure for random lists.

Solution 5 - Arrays

Apart from inversion count, for numeric lists, mean square distance from the sorted state is imaginable:

#! ruby
d = -> a { a.zip( a.sort ).map { |u, v| ( u - v ) ** 2 }.reduce( :+ ) ** 0.5 }

a = 8, 7, 3, 4, 10, 9, 6, 2, 5, 1
d.( a ) #=> 15.556
d.( a.sort ) #=> 0.0
d.( a.sort.reverse ) # => 18.166 is the worrst case

Solution 6 - Arrays

I am not sure of the "best" method, but a simple one would be to compare every element with the one after it, incrementing a counter if element2 > element 1 (or whatever you want to test) and then divide by the total number of elements. It should give you a percentage.

Solution 7 - Arrays

I would count comparisons and divide it to the total number of comparisons. Here is a simple Python example.

my_list = [1,4,5,6,9,-1,5,3,55,11,12,13,14]

right_comparison_count = 0

for i in range(len(my_list)-1):
    if my_list[i] < my_list[i+1]: # Assume you want to it ascending order
        right_comparison_count += 1

if right_comparison_count == 0:
    result = -1
else:
    result = float(right_comparison_count) / float((len(my_list) - 1))

print result

Solution 8 - Arrays

If you take your list, calculate the ranks of the values in that list and call the list of ranks Y and another list, X that contains the integers from 1 to length(Y), you can obtain exactly the measure of sortedness that you are looking for by calculating the correlation coefficient, r, between the two lists.

r = \frac{\sum ^n _{i=1}(X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum ^n _{i=1}(X_i - \bar{X})^2} \sqrt{\sum ^n _{i=1}(Y_i - \bar{Y})^2}} 

For a fully-sorted list, r = 1.0, for a reverse-sorted list, r=-1.0, and the r varies between these limits for varying degrees of sortedness.

A possible problem with this approach, depending on the application, is that calculating the rank of each item in the list is equivalent to sorting it, so it is an O(n log n) operation.

Solution 9 - Arrays

How about something like this?

#!/usr/bin/python3

def sign(x, y):
   if x < y:
      return 1
   elif x > y:
      return -1
   else:
      return 0

def mean(list_):
   return float(sum(list_)) / float(len(list_))

def main():
   list_ = [ 1, 2, 3, 4, 6, 5, 7, 8 ]
   signs = []
   # this zip is pairing up element 0, 1, then 1, 2, then 2, 3, etc...
   for elem1, elem2 in zip(list_[:-1], list_[1:]):
      signs.append(sign(elem1, elem2))

   # This should print 1 for a sorted list, -1 for a list that is in reverse order
   # and 0 for a run of the same numbers, like all 4's
   print(mean(signs))

main()

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJosellView Question on Stackoverflow
Solution 1 - ArraysTimothy ShieldsView Answer on Stackoverflow
Solution 2 - ArraysMarcinView Answer on Stackoverflow
Solution 3 - ArraysKazView Answer on Stackoverflow
Solution 4 - ArraysmeduzView Answer on Stackoverflow
Solution 5 - ArraysBoris StitnickyView Answer on Stackoverflow
Solution 6 - Arraysuser2369405View Answer on Stackoverflow
Solution 7 - ArraysibrahimView Answer on Stackoverflow
Solution 8 - ArraysSimonView Answer on Stackoverflow
Solution 9 - ArraysdstrombergView Answer on Stackoverflow