Slice Pandas dataframe by index values that are (not) in a list

PythonPandas

Python Problem Overview


I have a pandas dataframe, df.

I want to select all indices in df that are not in a list, blacklist.

Now, I use list comprehension to create the desired labels to slice.

ix=[i for i in df.index if i not in blacklist]  
df_select=df.loc[ix]

Works fine, but may be clumsy if I need to do this often.

Is there a better way to do this?

Python Solutions


Solution 1 - Python

Use isin on the index and invert the boolean index to perform label selection:

In [239]:

df = pd.DataFrame({'a':np.random.randn(5)})
df
Out[239]:
          a
0 -0.548275
1 -0.411741
2 -1.187369
3  1.028967
4 -2.755030
In [240]:

t = [2,4]
df.loc[~df.index.isin(t)]
Out[240]:
          a
0 -0.548275
1 -0.411741
3  1.028967

Solution 2 - Python

You could use set() to create the difference between your original indices and those that you want to remove:

df.loc[set(df.index) - set(blacklist)]

It has the advantage of being parsimonious, as well as being easier to read than a list comprehension.

Solution 3 - Python

Thanks to ASGM; I found that I needed to turn the set into a list to make it work with a MultiIndex:

mi1 = pd.MultiIndex.from_tuples([("a", 1), ("a", 2), ("b", 1), ("b", 2)])
df1 = pd.DataFrame(data={"aaa":[1,2,3,4]}, index=mi1)
setValid = set(df1.index) - set([("a", 2)])
df1.loc[list(setValid)] # works
df1.loc[setValid] # fails

(sorry can't comment, insufficient rep)

Solution 4 - Python

If you are looking for a way to select all rows that are outside a condition you can use np.invert() given that the condition returns an array of booleans.

df.loc[np.invert(({condition 1}) & (condition 2))]

Solution 5 - Python

df = pd.DataFrame(data=[5,6,7,8], index=[1,2,3,4], columns=['D',])
blacklist = [2,3]

df.drop(blacklist,0)

Solution 6 - Python

import pandas as pd
df = pd.DataFrame(data=[5,6,7,8], index=[1,2,3,4], columns=['D',])
blacklist = [2,3]
#your current way ...
ix=[i for i in df.index if i not in blacklist]  
df_select=df.loc[ix]

# use a mask
mask = [True if x else False for x in df.index if x not in blacklist]
df.loc[mask]

http://pandas.pydata.org/pandas-docs/dev/indexing.html#indexing-label actually, loc and iloc both take a boolean array, in this case the mask. from now on you can reuse this mask and should be more efficient.

Solution 7 - Python

You can use the np.setdiff1d function which finds the set difference of two arrays.

index = np.array(blacklist)
not_index = np.setdiff1d(df.index.to_numpy(), index)
df.iloc[not_index]

Solution 8 - Python

You could use difference() to obtain the difference between your original indices and those that you want to exclude:

df.loc[df.index.difference(blacklist), :]

It has the advantage of being easier to read.

Solution 9 - Python

Inside query you can access your variable blacklist using @:

df.query("index not in @blacklist")

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionlmart999View Question on Stackoverflow
Solution 1 - PythonEdChumView Answer on Stackoverflow
Solution 2 - PythonASGMView Answer on Stackoverflow
Solution 3 - PythonHagrid67View Answer on Stackoverflow
Solution 4 - PythonHector Garcia LView Answer on Stackoverflow
Solution 5 - PythonGeorge XiongView Answer on Stackoverflow
Solution 6 - PythonDyno FuView Answer on Stackoverflow
Solution 7 - PythonHàm NghiView Answer on Stackoverflow
Solution 8 - PythonAlexander MartinsView Answer on Stackoverflow
Solution 9 - PythonrachwaView Answer on Stackoverflow