Pandas How to filter a Series

PythonPandas

Python Problem Overview


I have a Series like this after doing groupby('name') and used mean() function on other column

name
383      3.000000
663      1.000000
726      1.000000
737      9.000000
833      8.166667

Could anyone please show me how to filter out the rows with 1.000000 mean values? Thank you and I greatly appreciate your help.

Python Solutions


Solution 1 - Python

In [5]:

import pandas as pd
 
test = {
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
}
 
s = pd.Series(test)
s = s[s != 1]
s
Out[0]:
383    3.000000
737    9.000000
833    8.166667
dtype: float64

Solution 2 - Python

From pandas version 0.18+ filtering a series can also be done as below

test = {
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
}

pd.Series(test).where(lambda x : x!=1).dropna()

Checkout: http://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#method-chaininng-improvements

Solution 3 - Python

As DACW pointed out, there are method-chaining improvements in pandas 0.18.1 that do what you are looking for very nicely.

Rather than using .where, you can pass your function to either the .loc indexer or the Series indexer [] and avoid the call to .dropna:

test = pd.Series({
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
})

test.loc[lambda x : x!=1]

test[lambda x: x!=1]

Similar behavior is supported on the DataFrame and NDFrame classes.

Solution 4 - Python

A fast way of doing this is to reconstruct using numpy to slice the underlying arrays. See timings below.

mask = s.values != 1
pd.Series(s.values[mask], s.index[mask])

0
383    3.000000
737    9.000000
833    8.166667
dtype: float64

naive timing

enter image description here

Solution 5 - Python

Another way is to first convert to a DataFrame and use the query method (assuming you have numexpr installed):

import pandas as pd

test = {
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
}

s = pd.Series(test)
s.to_frame(name='x').query("x != 1")

Solution 6 - Python

If you like a chained operation, you can also use compress function:

test = pd.Series({
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
})

test.compress(lambda x: x != 1)

# 383    3.000000
# 737    9.000000
# 833    8.166667
# dtype: float64

Solution 7 - Python

In my case I had a pandas Series where the values are tuples of characters:

Out[67]
0    (H, H, H, H)
1    (H, H, H, T)
2    (H, H, T, H)
3    (H, H, T, T)
4    (H, T, H, H)

Therefore I could use indexing to filter the series, but to create the index I needed apply. My condition is "find all tuples which have exactly one 'H'".

series_of_tuples[series_of_tuples.apply(lambda x: x.count('H')==1)]

I admit it is not "chainable", (i.e. notice I repeat series_of_tuples twice; you must store any temporary series into a variable so you can call apply(...) on it).

There may also be other methods (besides .apply(...)) which can operate elementwise to produce a Boolean index.

Many other answers (including accepted answer) using the chainable functions like:

  • .compress()
  • .where()
  • .loc[]
  • []

These accept callables (lambdas) which are applied to the Series, not to the individual values in those series!

Therefore my Series of tuples behaved strangely when I tried to use my above condition / callable / lambda, with any of the chainable functions, like .loc[]:

series_of_tuples.loc[lambda x: x.count('H')==1]

Produces the error:

KeyError: 'Level H must be same as name (None)'

I was very confused, but it seems to be using the Series.count series_of_tuples.count(...) function , which is not what I wanted.

I admit that an alternative data structure may be better:

  • A Category datatype?
  • A Dataframe (each element of the tuple becomes a column)
  • A Series of strings (just concatenate the tuples together):

This creates a series of strings (i.e. by concatenating the tuple; joining the characters in the tuple on a single string)

series_of_tuples.apply(''.join)

So I can then use the chainable Series.str.count

series_of_tuples.apply(''.join).str.count('H')==1

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionKiem NguyenView Question on Stackoverflow
Solution 1 - PythonAndrewView Answer on Stackoverflow
Solution 2 - PythonDACWView Answer on Stackoverflow
Solution 3 - PythonGordon BeanView Answer on Stackoverflow
Solution 4 - PythonpiRSquaredView Answer on Stackoverflow
Solution 5 - PythonKamil SindiView Answer on Stackoverflow
Solution 6 - PythonPsidomView Answer on Stackoverflow
Solution 7 - PythonThe Red PeaView Answer on Stackoverflow