Create column of value_counts in Pandas dataframe

PythonPandas

Python Problem Overview


I want to create a count of unique values from one of my Pandas dataframe columns and then add a new column with those counts to my original data frame. I've tried a couple different things. I created a pandas series and then calculated counts with the value_counts method. I tried to merge these values back to my original dataframe, but I the keys that I want to merge on are in the Index(ix/loc).

Color Value
Red   100
Red   150
Blue  50

I want to return something like:

Color Value Counts
Red   100   2
Red   150   2 
Blue  50    1

Python Solutions


Solution 1 - Python

df['Counts'] = df.groupby(['Color'])['Value'].transform('count')

For example,

In [102]: df = pd.DataFrame({'Color': 'Red Red Blue'.split(), 'Value': [100, 150, 50]})

In [103]: df
Out[103]: 
  Color  Value
0   Red    100
1   Red    150
2  Blue     50

In [104]: df['Counts'] = df.groupby(['Color'])['Value'].transform('count')

In [105]: df
Out[105]: 
  Color  Value  Counts
0   Red    100       2
1   Red    150       2
2  Blue     50       1

Note that transform('count') ignores NaNs. If you want to count NaNs, use transform(len).


To the anonymous editor: If you are getting an error while using transform('count') it may be due to your version of Pandas being too old. The above works with pandas version 0.15 or newer.

Solution 2 - Python

One other option:

z = df['Color'].value_counts 

z1 = z.to_dict() #converts to dictionary

df['Count_Column'] = df['Color'].map(z1) 

This option will give you a column with repeated values of the counts, corresponding to the frequency of each value in the 'Color' column.

Solution 3 - Python

This answer uses Series.map with Series.value_counts. It was tested with Pandas 1.1.

df['counts'] = df['attribute'].map(df['attribute'].value_counts())

Credit: comment by sacuL

Solution 4 - Python

df['Counts'] = df.Color.groupby(df.Color).transform('count')

You can do this with any series: group it by itself and call transform('count'):

>>> series = pd.Series(['Red', 'Red', 'Blue'])
>>> series.groupby(series).transform('count')
0    2
1    2
2    1
dtype: int64

Solution 5 - Python

My initial thought would be to use list comprehension as shown below but, as was pointed out in the comment, this is slower than the groupby and transform method. I will leave this answer to demonstrate WHAT NOT TO DO:

In [94]: df = pd.DataFrame({'Color': 'Red Red Blue'.split(), 'Value': [100, 150, 50]})
In [95]: df['Counts'] = [sum(df['Color'] == df['Color'][i]) for i in xrange(len(df))]
In [96]: df
Out[100]: 
  Color  Value  Counts
0   Red    100       2
1   Red    150       2
2  Blue     50       1

[3 rows x 3 columns]

@unutbu's method gets complicated for DataFrames with several columns which make this simpler to code. If you are working with a small data frame, this is faster (see below), but otherwise, you should use NOT use this.

In [97]: %timeit df = pd.DataFrame({'Color': 'Red Red Blue'.split(), 'Value': [100, 150, 50]}); df['Counts'] = df.groupby(['Color']).transform('count')
100 loops, best of 3: 2.87 ms per loop
In [98]: %timeit df = pd.DataFrame({'Color': 'Red Red Blue'.split(), 'Value': [100, 150, 50]}); df['Counts'] = [sum(df['Color'] == df['Color'][i]) for i in xrange(len(df))]
1000 loops, best of 3: 1.03 ms per loop

Solution 6 - Python

Create a column containing the count of repeated values. The values are temporary calculations computed from other columns. Very fast. Credit to @ZakS.

sum_A_B = df['A']+df['B']
sum_A_B_dict = sum_A_B.value_counts().to_dict()
df['sum_A_B'] = sum_A_B.map(sum_A_B_dict) 

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser2592989View Question on Stackoverflow
Solution 1 - PythonunutbuView Answer on Stackoverflow
Solution 2 - PythonZakSView Answer on Stackoverflow
Solution 3 - PythonAsclepiusView Answer on Stackoverflow
Solution 4 - Python1''View Answer on Stackoverflow
Solution 5 - PythonSteven C. HowellView Answer on Stackoverflow
Solution 6 - PythonBSalitaView Answer on Stackoverflow