Pandas split DataFrame by column value

PythonPandasDataframeIndexingSplit

Python Problem Overview


I have DataFrame with column Sales.

How can I split it into 2 based on Sales value?

First DataFrame will have data with 'Sales' < s and second with 'Sales' >= s

Python Solutions


Solution 1 - Python

You can use boolean indexing:

df = pd.DataFrame({'Sales':[10,20,30,40,50], 'A':[3,4,7,6,1]})
print (df)
   A  Sales
0  3     10
1  4     20
2  7     30
3  6     40
4  1     50

s = 30

df1 = df[df['Sales'] >= s]
print (df1)
   A  Sales
2  7     30
3  6     40
4  1     50

df2 = df[df['Sales'] < s]
print (df2)
   A  Sales
0  3     10
1  4     20

It's also possible to invert mask by ~:

mask = df['Sales'] >= s
df1 = df[mask]
df2 = df[~mask]
print (df1)
   A  Sales
2  7     30
3  6     40
4  1     50

print (df2)
   A  Sales
0  3     10
1  4     20

print (mask)
0    False
1    False
2     True
3     True
4     True
Name: Sales, dtype: bool

print (~mask)
0     True
1     True
2    False
3    False
4    False
Name: Sales, dtype: bool

Solution 2 - Python

Using groupby you could split into two dataframes like

In [1047]: df1, df2 = [x for _, x in df.groupby(df['Sales'] < 30)]

In [1048]: df1
Out[1048]:
   A  Sales
2  7     30
3  6     40
4  1     50

In [1049]: df2
Out[1049]:
   A  Sales
0  3     10
1  4     20

Solution 3 - Python

Using "groupby" and list comprehension:

Storing all the split dataframe in list variable and accessing each of the seprated dataframe by their index.

DF = pd.DataFrame({'chr':["chr3","chr3","chr7","chr6","chr1"],'pos':[10,20,30,40,50],})
ans = [y for x, y in DF.groupby('chr', as_index=False)]

accessing the separated DF like this:

ans[0]
ans[1]
ans[len(ans)-1] # this is the last separated DF

accessing the column value of the separated DF like this:

ansI_chr=ans[i].chr 

Solution 4 - Python

One-liner using the walrus operator (Python 3.8):

df1, df2 = df[(mask:=df['Sales'] >= 30)], df[~mask]

Consider using copy to avoid SettingWithCopyWarning:

df1, df2 = df[(mask:=df['Sales'] >= 30)].copy(), df[~mask].copy()

Alternatively, you can use the method query:

df1, df2 = df.query('Sales >= 30').copy(), df.query('Sales < 30').copy()

Solution 5 - Python

I like to use this for speeding up searches or rolling average finds .apply(lambda x...) type functions so I split big files into dictionaries of dataframes:

df_dict = {sale_v: df[df['Sales'] == sale_v] for sale_v in df.Sales.unique()}

This should do it if you wanted to go based on categorical groups.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Question146 percent RussianView Question on Stackoverflow
Solution 1 - PythonjezraelView Answer on Stackoverflow
Solution 2 - PythonZeroView Answer on Stackoverflow
Solution 3 - PythonkeryruoView Answer on Stackoverflow
Solution 4 - PythonMykola ZotkoView Answer on Stackoverflow
Solution 5 - PythonRyan LView Answer on Stackoverflow