Python list subtraction operation

PythonList

Python Problem Overview


I want to do something similar to this:

>>> x = [1,2,3,4,5,6,7,8,9,0]  
>>> x  
[1, 2, 3, 4, 5, 6, 7, 8, 9, 0]  
>>> y = [1,3,5,7,9]  
>>> y  
[1, 3, 5, 7, 9]  
>>> y - x   # (should return [2,4,6,8,0])

But this is not supported by python lists What is the best way of doing it?

Python Solutions


Solution 1 - Python

Use a list comprehension:

[item for item in x if item not in y]

If you want to use the - infix syntax, you can just do:

class MyList(list):
    def __init__(self, *args):
        super(MyList, self).__init__(args)

    def __sub__(self, other):
        return self.__class__(*[item for item in self if item not in other])

you can then use it like:

x = MyList(1, 2, 3, 4)
y = MyList(2, 5, 2)
z = x - y   

But if you don't absolutely need list properties (for example, ordering), just use sets as the other answers recommend.

Solution 2 - Python

Use set difference

>>> z = list(set(x) - set(y))
>>> z
[0, 8, 2, 4, 6]

Or you might just have x and y be sets so you don't have to do any conversions.

Solution 3 - Python

if duplicate and ordering items are problem :

[i for i in a if not i in b or b.remove(i)]

a = [1,2,3,3,3,3,4]
b = [1,3]
result: [2, 3, 3, 3, 4]

Solution 4 - Python

That is a "set subtraction" operation. Use the set data structure for that.

In Python 2.7:

x = {1,2,3,4,5,6,7,8,9,0}
y = {1,3,5,7,9}
print x - y

Output:

>>> print x - y
set([0, 8, 2, 4, 6])

Solution 5 - Python

For many use cases, the answer you want is:

ys = set(y)
[item for item in x if item not in ys]

This is a hybrid between aaronasterling's answer and quantumSoup's answer.

aaronasterling's version does len(y) item comparisons for each element in x, so it takes quadratic time. quantumSoup's version uses sets, so it does a single constant-time set lookup for each element in x—but, because it converts both x and y into sets, it loses the order of your elements.

By converting only y into a set, and iterating x in order, you get the best of both worlds—linear time, and order preservation.*


However, this still has a problem from quantumSoup's version: It requires your elements to be hashable. That's pretty much built into the nature of sets.** If you're trying to, e.g., subtract a list of dicts from another list of dicts, but the list to subtract is large, what do you do?

If you can decorate your values in some way that they're hashable, that solves the problem. For example, with a flat dictionary whose values are themselves hashable:

ys = {tuple(item.items()) for item in y}
[item for item in x if tuple(item.items()) not in ys]

If your types are a bit more complicated (e.g., often you're dealing with JSON-compatible values, which are hashable, or lists or dicts whose values are recursively the same type), you can still use this solution. But some types just can't be converted into anything hashable.


If your items aren't, and can't be made, hashable, but they are comparable, you can at least get log-linear time (O(N*log M), which is a lot better than the O(N*M) time of the list solution, but not as good as the O(N+M) time of the set solution) by sorting and using bisect:

ys = sorted(y)
def bisect_contains(seq, item):
    index = bisect.bisect(seq, item)
    return index < len(seq) and seq[index] == item
[item for item in x if bisect_contains(ys, item)]

If your items are neither hashable nor comparable, then you're stuck with the quadratic solution.


* Note that you could also do this by using a pair of OrderedSet objects, for which you can find recipes and third-party modules. But I think this is simpler.

** The reason set lookups are constant time is that all it has to do is hash the value and see if there's an entry for that hash. If it can't hash the value, this won't work.

Solution 6 - Python

If the lists allow duplicate elements, you can use Counter from collections:

from collections import Counter
result = list((Counter(x)-Counter(y)).elements())

If you need to preserve the order of elements from x:

result = [ v for c in [Counter(y)] for v in x if not c[v] or c.subtract([v]) ]

Solution 7 - Python

I think the easiest way to achieve this is by using set().

>>> x = [1,2,3,4,5,6,7,8,9,0]  
>>> y = [1,3,5,7,9]  
>>> list(set(x)- set(y))
[0, 2, 4, 6, 8]

Solution 8 - Python

Looking up values in sets are faster than looking them up in lists:

[item for item in x if item not in set(y)]

I believe this will scale slightly better than:

[item for item in x if item not in y]

Both preserve the order of the lists.

Solution 9 - Python

The other solutions have one of a few problems:

  1. They don't preserve order, or
  2. They don't remove a precise count of elements, e.g. for x = [1, 2, 2, 2] and y = [2, 2] they convert y to a set, and either remove all matching elements (leaving [1] only) or remove one of each unique element (leaving [1, 2, 2]), when the proper behavior would be to remove 2 twice, leaving [1, 2], or
  3. They do O(m * n) work, where an optimal solution can do O(m + n) work

Alain was on the right track with Counter to solve #2 and #3, but that solution will lose ordering. The solution that preserves order (removing the first n copies of each value for n repetitions in the list of values to remove) is:

from collections import Counter

x = [1,2,3,4,3,2,1]  
y = [1,2,2]  
remaining = Counter(y)

out = []
for val in x:
    if remaining[val]:
        remaining[val] -= 1
    else:
        out.append(val)
# out is now [3, 4, 3, 1], having removed the first 1 and both 2s.

[Try it online!](https://tio.run/##VY3BDoJADETv@xWTeIGIJqAnE09@huFAoOgm0G6WlYA/j0URY5Mm7evM1I3hLnyYptpLi1KahspghTvY1okPuMiDA3ljBpxxTZMsOSRH7SxJc8CMX5rNm6e2sGz5pnQxRmNsjDzCrMtNLR590cAyhpOBlq2xuq56yj94rn@O3Rnp@0ZNRz@VZu8L54irSGX6zHnLIVIcL@N2DYqBDbZzsPTU4UleqNI8aolDN00v "Python 3 – Try It Online")

To make it remove the last copies of each element, just change the for loop to for val in reversed(x): and add out.reverse() immediately after exiting the for loop.

Constructing the Counter is O(n) in terms of y's length, iterating x is O(n) in terms of x's length, and Counter membership testing and mutation are O(1), while list.append is amortized O(1) (a given append can be O(n), but for many appends, the overall big-O averages O(1) since fewer and fewer of them require a reallocation), so the overall work done is O(m + n).

You can also test for to determine if there were any elements in y that were not removed from x by testing:

remaining = +remaining  # Removes all keys with zero counts from Counter
if remaining:
    # remaining contained elements with non-zero counts

Solution 10 - Python

Try this.

def subtract_lists(a, b):
    """ Subtracts two lists. Throws ValueError if b contains items not in a """
    # Terminate if b is empty, otherwise remove b[0] from a and recurse
    return a if len(b) == 0 else [a[:i] + subtract_lists(a[i+1:], b[1:]) 
                                  for i in [a.index(b[0])]][0]

>>> x = [1,2,3,4,5,6,7,8,9,0]
>>> y = [1,3,5,7,9]
>>> subtract_lists(x,y)
[2, 4, 6, 8, 0]
>>> x = [1,2,3,4,5,6,7,8,9,0,9]
>>> subtract_lists(x,y)
[2, 4, 6, 8, 0, 9]     #9 is only deleted once
>>>

Solution 11 - Python

We can use set methods as well to find the difference between two list

x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
y = [1, 3, 5, 7, 9]
list(set(x).difference(y))
[0, 2, 4, 6, 8]

Solution 12 - Python

You could also try this if the values are unique anyway:

list(set(x) - set(y))

Solution 13 - Python

The answer provided by @aaronasterling looks good, however, it is not compatible with the default interface of list: x = MyList(1, 2, 3, 4) vs x = MyList([1, 2, 3, 4]). Thus, the below code can be used as a more python-list friendly:

class MyList(list):
    def __init__(self, *args):
        super(MyList, self).__init__(*args)

    def __sub__(self, other):
        return self.__class__([item for item in self if item not in other])

Example:

x = MyList([1, 2, 3, 4])
y = MyList([2, 5, 2])
z = x - y

Solution 14 - Python

This example subtracts two lists:

# List of pairs of points
list = []
list.append([(602, 336), (624, 365)])
list.append([(635, 336), (654, 365)])
list.append([(642, 342), (648, 358)])
list.append([(644, 344), (646, 356)])
list.append([(653, 337), (671, 365)])
list.append([(728, 13), (739, 32)])
list.append([(756, 59), (767, 79)])

itens_to_remove = []
itens_to_remove.append([(642, 342), (648, 358)])
itens_to_remove.append([(644, 344), (646, 356)])

print("Initial List Size: ", len(list))
        
for a in itens_to_remove:
    for b in list:
        if a == b :
            list.remove(b)

print("Final List Size: ", len(list))

Solution 15 - Python

from collections import Counter

y = Counter(y)
x = Counter(x)

print(list(x-y))

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiondaydreamerView Question on Stackoverflow
Solution 1 - PythonaaronasterlingView Answer on Stackoverflow
Solution 2 - PythonquantumSoupView Answer on Stackoverflow
Solution 3 - PythonnguyênView Answer on Stackoverflow
Solution 4 - PythonSantaView Answer on Stackoverflow
Solution 5 - PythonabarnertView Answer on Stackoverflow
Solution 6 - PythonAlain T.View Answer on Stackoverflow
Solution 7 - PythonLoochieView Answer on Stackoverflow
Solution 8 - PythonrudolfbykerView Answer on Stackoverflow
Solution 9 - PythonShadowRangerView Answer on Stackoverflow
Solution 10 - Pythonuser3435376View Answer on Stackoverflow
Solution 11 - PythonPython DeveloperView Answer on Stackoverflow
Solution 12 - PythonDokwoNView Answer on Stackoverflow
Solution 13 - PythonHamid ZafarView Answer on Stackoverflow
Solution 14 - PythonJoao NicolauView Answer on Stackoverflow
Solution 15 - PythonabiView Answer on Stackoverflow