Is it possible to use argsort in descending order?

PythonNumpy

Python Problem Overview


Consider the following code:

avgDists = np.array([1, 8, 6, 9, 4])
ids = avgDists.argsort()[:n]

This gives me indices of the n smallest elements. Is it possible to use this same argsort in descending order to get the indices of n highest elements?

Python Solutions


Solution 1 - Python

If you negate an array, the lowest elements become the highest elements and vice-versa. Therefore, the indices of the n highest elements are:

(-avgDists).argsort()[:n]

Another way to reason about this, as mentioned in the comments, is to observe that the big elements are coming last in the argsort. So, you can read from the tail of the argsort to find the n highest elements:

avgDists.argsort()[::-1][:n]

Both methods are O(n log n) in time complexity, because the argsort call is the dominant term here. But the second approach has a nice advantage: it replaces an O(n) negation of the array with an O(1) slice. If you're working with small arrays inside loops then you may get some performance gains from avoiding that negation, and if you're working with huge arrays then you can save on memory usage because the negation creates a copy of the entire array.

Note that these methods do not always give equivalent results: if a stable sort implementation is requested to argsort, e.g. by passing the keyword argument kind='mergesort', then the first strategy will preserve the sorting stability, but the second strategy will break stability (i.e. the positions of equal items will get reversed).

Example timings:

Using a small array of 100 floats and a length 30 tail, the view method was about 15% faster

>>> avgDists = np.random.rand(100)
>>> n = 30
>>> timeit (-avgDists).argsort()[:n]
1.93 µs ± 6.68 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
1.64 µs ± 3.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
1.64 µs ± 3.66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

For larger arrays, the argsort is dominant and there is no significant timing difference

>>> avgDists = np.random.rand(1000)
>>> n = 300
>>> timeit (-avgDists).argsort()[:n]
21.9 µs ± 51.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
21.7 µs ± 33.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
21.9 µs ± 37.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Please note that the comment from nedim below is incorrect. Whether to truncate before or after reversing makes no difference in efficiency, since both of these operations are only striding a view of the array differently and not actually copying data.

Solution 2 - Python

Just like Python, in that [::-1] reverses the array returned by argsort() and [:n] gives that last n elements:

>>> avgDists=np.array([1, 8, 6, 9, 4])
>>> n=3
>>> ids = avgDists.argsort()[::-1][:n]
>>> ids
array([3, 1, 2])

The advantage of this method is that ids is a view of avgDists:

>>> ids.flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

(The 'OWNDATA' being False indicates this is a view, not a copy)

Another way to do this is something like:

(-avgDists).argsort()[:n]

The problem is that the way this works is to create negative of each element in the array:

>>> (-avgDists)
array([-1, -8, -6, -9, -4])

ANd creates a copy to do so:

>>> (-avgDists_n).flags['OWNDATA']
True

So if you time each, with this very small data set:

>>> import timeit
>>> timeit.timeit('(-avgDists).argsort()[:3]', setup="from __main__ import avgDists")
4.2879798610229045
>>> timeit.timeit('avgDists.argsort()[::-1][:3]', setup="from __main__ import avgDists")
2.8372560259886086

The view method is substantially faster (and uses 1/2 the memory...)

Solution 3 - Python

Instead of using np.argsort you could use np.argpartition - if you only need the indices of the lowest/highest n elements.

That doesn't require to sort the whole array but just the part that you need but note that the "order inside your partition" is undefined, so while it gives the correct indices they might not be correctly ordered:

>>> avgDists = [1, 8, 6, 9, 4]
>>> np.array(avgDists).argpartition(2)[:2]  # indices of lowest 2 items
array([0, 4], dtype=int64)

>>> np.array(avgDists).argpartition(-2)[-2:]  # indices of highest 2 items
array([1, 3], dtype=int64)

Solution 4 - Python

You can use the flip commands numpy.flipud() or numpy.fliplr() to get the indexes in descending order after sorting using the argsort command. Thats what I usually do.

Solution 5 - Python

As @Kanmani hinted, an easier to interpret implementation may use numpy.flip, as in the following:

import numpy as np
    
avgDists = np.array([1, 8, 6, 9, 4])
ids = np.flip(np.argsort(avgDists))
print(ids)

By using the visitor pattern rather than member functions, it is easier to read the order of operations.

Solution 6 - Python

You could create a copy of the array and then multiply each element with -1.
As an effect the before largest elements would become the smallest.
The indeces of the n smallest elements in the copy are the n greatest elements in the original.

Solution 7 - Python

With your example:

avgDists = np.array([1, 8, 6, 9, 4])

Obtain indexes of n maximal values:

ids = np.argpartition(avgDists, -n)[-n:]

Sort them in descending order:

ids = ids[np.argsort(avgDists[ids])[::-1]]

Obtain results (for n=4):

>>> avgDists[ids]
array([9, 8, 6, 4])

Solution 8 - Python

An elegant way could be as follows -

ids = np.flip(np.argsort(avgDists))

This will give you indices of elements sorted in descending order. Now you can use regular slicing...

top_n = ids[:n]

Solution 9 - Python

Another way is to use only a '-' in the argument for argsort as in : "df[np.argsort(-df[:, 0])]", provided df is the dataframe and you want to sort it by the first column (represented by the column number '0'). Change the column-name as appropriate. Of course, the column has to be a numeric one.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionshnView Question on Stackoverflow
Solution 1 - PythonwimView Answer on Stackoverflow
Solution 2 - PythondawgView Answer on Stackoverflow
Solution 3 - PythonMSeifertView Answer on Stackoverflow
Solution 4 - PythonKanmaniView Answer on Stackoverflow
Solution 5 - PythonAdam EricksonView Answer on Stackoverflow
Solution 6 - PythonMentholBonbonView Answer on Stackoverflow
Solution 7 - PythonAlexey AntonenkoView Answer on Stackoverflow
Solution 8 - PythonNiteshKView Answer on Stackoverflow
Solution 9 - PythonBiswajit GhoshalView Answer on Stackoverflow