What is the cleanest way to do a sort plus uniq on a Python list?

PythonUnique

Python Problem Overview


Consider a Python list my_list containing ['foo', 'foo', 'bar'].

What is the most Pythonic way to uniquify and sort a list ?
(think cat my_list | sort | uniq)

This is how I currently do it and while it works I'm sure there are better ways to do it.

my_list = []
...
my_list.append("foo")
my_list.append("foo")
my_list.append("bar")
...
my_list = set(my_list)
my_list = list(my_list)
my_list.sort()

Python Solutions


Solution 1 - Python

my_list = sorted(set(my_list))

Solution 2 - Python

# Python ≥ 2.4
# because of (generator expression) and itertools.groupby, sorted

import itertools

def sort_uniq(sequence):
    return (x[0] for x in itertools.groupby(sorted(sequence)))

Faster:

import itertools, operator
import sys

if sys.hexversion < 0x03000000:
    mapper= itertools.imap # 2.4 ≤ Python < 3
else:
    mapper= map # Python ≥ 3

def sort_uniq(sequence):
    return mapper(
        operator.itemgetter(0),
        itertools.groupby(sorted(sequence)))

Both versions return an generator, so you might want to supply the result to the list type:

sequence= list(sort_uniq(sequence))

Note that this will work with non-hashable items too:

>>> list(sort_uniq([[0],[1],[0]]))
[[0], [1]]

Solution 3 - Python

The straightforward solution is provided by Ignacio—sorted(set(foo)).

If you have unique data, there's a reasonable chance you don't just want to do sorted(set(...)) but rather to store a set all the time and occasionally pull out a sorted version of the values. (At that point, it starts sounding like the sort of thing people often use a database for, too.)

If you have a sorted list and you want to check membership on logarithmic and add an item in worst case linear time, you can use the bisect module.

If you want to keep this condition all the time and you want to simplify things or make some operations perform better, you might consider blist.sortedset.

Solution 4 - Python

Others have mentioned sorted(set(my_list)), which works for hashable values such as strings, numbers and tuples, but not for unhashable types such as lists.

To get a sorted list of values of any sortable type, without duplicates:

from itertools import izip, islice
def unique_sorted(values):
    "Return a sorted list of the given values, without duplicates."
    values = sorted(values)
    if not values:
        return []
    consecutive_pairs = izip(values, islice(values, 1, len(values)))
    result = [a for (a, b) in consecutive_pairs if a != b]
    result.append(values[-1])
    return result

This can be further simplified using the "pairwise" or "unique_justseen" recipes from the itertools documentation.

Solution 5 - Python

Can't say it is clean way to do that, but just for fun:

my_list = [x for x in sorted(my_list) if not x in locals()["_[1]"]]

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionknorvView Question on Stackoverflow
Solution 1 - PythonIgnacio Vazquez-AbramsView Answer on Stackoverflow
Solution 2 - PythontzotView Answer on Stackoverflow
Solution 3 - PythonMike GrahamView Answer on Stackoverflow
Solution 4 - PythontaleinatView Answer on Stackoverflow
Solution 5 - PythonandreypoppView Answer on Stackoverflow