Numpy argsort - what is it doing?

PythonNumpy

Python Problem Overview


Why is numpy giving this result:

x = numpy.array([1.48,1.41,0.0,0.1])
print x.argsort()

>[2 3 1 0]

when I'd expect it to do this: >[3 2 0 1]

Clearly my understanding of the function is lacking.

Python Solutions


Solution 1 - Python

According to the documentation

> Returns the indices that would sort an array.

  • 2 is the index of 0.0.
  • 3 is the index of 0.1.
  • 1 is the index of 1.41.
  • 0 is the index of 1.48.

Solution 2 - Python

[2, 3, 1, 0] indicates that the smallest element is at index 2, the next smallest at index 3, then index 1, then index 0.

There are a number of ways to get the result you are looking for:

import numpy as np
import scipy.stats as stats

def using_indexed_assignment(x):
    "https://stackoverflow.com/a/5284703/190597 (Sven Marnach)"
    result = np.empty(len(x), dtype=int)
    temp = x.argsort()
    result[temp] = np.arange(len(x))
    return result

def using_rankdata(x):
    return stats.rankdata(x)-1

def using_argsort_twice(x):
    "https://stackoverflow.com/a/6266510/190597 (k.rooijers)"
    return np.argsort(np.argsort(x))

def using_digitize(x):
    unique_vals, index = np.unique(x, return_inverse=True)
    return np.digitize(x, bins=unique_vals) - 1

For example,

In [72]: x = np.array([1.48,1.41,0.0,0.1])

In [73]: using_indexed_assignment(x)
Out[73]: array([3, 2, 0, 1])

This checks that they all produce the same result:

x = np.random.random(10**5)
expected = using_indexed_assignment(x)
for func in (using_argsort_twice, using_digitize, using_rankdata):
    assert np.allclose(expected, func(x))

These IPython %timeit benchmarks suggests for large arrays using_indexed_assignment is the fastest:

In [50]: x = np.random.random(10**5)
In [66]: %timeit using_indexed_assignment(x)
100 loops, best of 3: 9.32 ms per loop

In [70]: %timeit using_rankdata(x)
100 loops, best of 3: 10.6 ms per loop

In [56]: %timeit using_argsort_twice(x)
100 loops, best of 3: 16.2 ms per loop

In [59]: %timeit using_digitize(x)
10 loops, best of 3: 27 ms per loop

For small arrays, using_argsort_twice may be faster:

In [78]: x = np.random.random(10**2)

In [81]: %timeit using_argsort_twice(x)
100000 loops, best of 3: 3.45 µs per loop

In [79]: %timeit using_indexed_assignment(x)
100000 loops, best of 3: 4.78 µs per loop

In [80]: %timeit using_rankdata(x)
100000 loops, best of 3: 19 µs per loop

In [82]: %timeit using_digitize(x)
10000 loops, best of 3: 26.2 µs per loop

Note also that stats.rankdata gives you more control over how to handle elements of equal value.

Solution 3 - Python

As the documentation says, argsort:

>Returns the indices that would sort an array.

That means the first element of the argsort is the index of the element that should be sorted first, the second element is the index of the element that should be second, etc.

What you seem to want is the rank order of the values, which is what is provided by scipy.stats.rankdata. Note that you need to think about what should happen if there are ties in the ranks.

Solution 4 - Python

numpy.argsort(a, axis=-1, kind='quicksort', order=None)

Returns the indices that would sort an array

Perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as that index data along the given axis in sorted order.

Consider one example in python, having a list of values as

listExample  = [0 , 2, 2456,  2000, 5000, 0, 1]

Now we use argsort function:

import numpy as np
list(np.argsort(listExample))

The output will be

[0, 5, 6, 1, 3, 2, 4]

This is the list of indices of values in listExample if you map these indices to the respective values then we will get the result as follows:

[0, 0, 1, 2, 2000, 2456, 5000]

(I find this function very useful in many places e.g. If you want to sort the list/array but don't want to use list.sort() function (i.e. without changing the order of actual values in the list) you can use this function.)

For more details refer this link: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.argsort.html

Solution 5 - Python

input:
import numpy as np
x = np.array([1.48,1.41,0.0,0.1])
x.argsort().argsort()

output:
array([3, 2, 0, 1])

Solution 6 - Python

First, it was ordered the array. Then generate an array with the initial index of the array.

Solution 7 - Python

Just want to directly contrast the OP's original understanding against the actual implementation with code.

numpy.argsort is defined such that for 1D arrays:

x[x.argsort()] == numpy.sort(x) # this will be an array of True's

The OP originally thought that it was defined such that for 1D arrays:

x == numpy.sort(x)[x.argsort()] # this will not be True

Note: This code doesn't work in the general case (only works for 1D), this answer is purely for illustration purposes.

Solution 8 - Python

np.argsort returns the index of the sorted array given by the 'kind' (which specifies the type of sorting algorithm). However, when a list is used with np.argmax, it returns the index of the largest element in the list. While, np.sort, sorts the given array, list.

Solution 9 - Python

It returns indices according to the given array indices,[1.48,1.41,0.0,0.1],that means: 0.0 is the first element, in index [2]. 0.1 is the second element, in index[3]. 1.41 is the third element, in index [1]. 1.48 is the fourth element, in index[0]. Output:

[2,3,1,0]

Solution 10 - Python

For anyone wondering "why argsort", my answer is "using one array to sort another":

In [49]: a = np.array(list('asdf'))

In [50]: b = [3,2,0,1]

In [51]: np.argsort(b)
Out[51]: array([2, 3, 1, 0])

In [52]: a[np.argsort(b)]
Out[52]: array(['d', 'f', 's', 'a'], dtype='<U1')

This is great for columnar data, e.g. a column of names and a column of salaries, and you want to see the names of the N highest-paid people.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser1276273View Question on Stackoverflow
Solution 1 - PythonfalsetruView Answer on Stackoverflow
Solution 2 - PythonunutbuView Answer on Stackoverflow
Solution 3 - PythonBrenBarnView Answer on Stackoverflow
Solution 4 - PythonYogesh Awdhut GadadeView Answer on Stackoverflow
Solution 5 - PythonJMponyView Answer on Stackoverflow
Solution 6 - PythonRodrigo SaraguroView Answer on Stackoverflow
Solution 7 - PythonMultihunterView Answer on Stackoverflow
Solution 8 - PythonvivekView Answer on Stackoverflow
Solution 9 - Pythonnucsit026View Answer on Stackoverflow
Solution 10 - PythonAlex ShroyerView Answer on Stackoverflow