Create column of value_counts in Pandas dataframe
PythonPandasPython Problem Overview
I want to create a count of unique values from one of my Pandas dataframe columns and then add a new column with those counts to my original data frame. I've tried a couple different things. I created a pandas series and then calculated counts with the value_counts method. I tried to merge these values back to my original dataframe, but I the keys that I want to merge on are in the Index(ix/loc).
Color Value
Red 100
Red 150
Blue 50
I want to return something like:
Color Value Counts
Red 100 2
Red 150 2
Blue 50 1
Python Solutions
Solution 1 - Python
df['Counts'] = df.groupby(['Color'])['Value'].transform('count')
For example,
In [102]: df = pd.DataFrame({'Color': 'Red Red Blue'.split(), 'Value': [100, 150, 50]})
In [103]: df
Out[103]:
Color Value
0 Red 100
1 Red 150
2 Blue 50
In [104]: df['Counts'] = df.groupby(['Color'])['Value'].transform('count')
In [105]: df
Out[105]:
Color Value Counts
0 Red 100 2
1 Red 150 2
2 Blue 50 1
Note that transform('count')
ignores NaNs. If you want to count NaNs, use transform(len)
.
To the anonymous editor: If you are getting an error while using transform('count')
it may be due to your version of Pandas being too old. The above works with pandas version 0.15 or newer.
Solution 2 - Python
One other option:
z = df['Color'].value_counts
z1 = z.to_dict() #converts to dictionary
df['Count_Column'] = df['Color'].map(z1)
This option will give you a column with repeated values of the counts, corresponding to the frequency of each value in the 'Color' column.
Solution 3 - Python
This answer uses Series.map
with Series.value_counts
. It was tested with Pandas 1.1.
df['counts'] = df['attribute'].map(df['attribute'].value_counts())
Credit: comment by sacuL
Solution 4 - Python
df['Counts'] = df.Color.groupby(df.Color).transform('count')
You can do this with any series: group it by itself and call transform('count')
:
>>> series = pd.Series(['Red', 'Red', 'Blue'])
>>> series.groupby(series).transform('count')
0 2
1 2
2 1
dtype: int64
Solution 5 - Python
My initial thought would be to use list comprehension as shown below but, as was pointed out in the comment, this is slower than the groupby
and transform
method. I will leave this answer to demonstrate WHAT NOT TO DO:
In [94]: df = pd.DataFrame({'Color': 'Red Red Blue'.split(), 'Value': [100, 150, 50]})
In [95]: df['Counts'] = [sum(df['Color'] == df['Color'][i]) for i in xrange(len(df))]
In [96]: df
Out[100]:
Color Value Counts
0 Red 100 2
1 Red 150 2
2 Blue 50 1
[3 rows x 3 columns]
@unutbu's method gets complicated for DataFrames with several columns which make this simpler to code. If you are working with a small data frame, this is faster (see below), but otherwise, you should use NOT use this.
In [97]: %timeit df = pd.DataFrame({'Color': 'Red Red Blue'.split(), 'Value': [100, 150, 50]}); df['Counts'] = df.groupby(['Color']).transform('count')
100 loops, best of 3: 2.87 ms per loop
In [98]: %timeit df = pd.DataFrame({'Color': 'Red Red Blue'.split(), 'Value': [100, 150, 50]}); df['Counts'] = [sum(df['Color'] == df['Color'][i]) for i in xrange(len(df))]
1000 loops, best of 3: 1.03 ms per loop
Solution 6 - Python
Create a column containing the count of repeated values. The values are temporary calculations computed from other columns. Very fast. Credit to @ZakS.
sum_A_B = df['A']+df['B']
sum_A_B_dict = sum_A_B.value_counts().to_dict()
df['sum_A_B'] = sum_A_B.map(sum_A_B_dict)