How to get value counts for multiple columns at once in Pandas DataFrame?

PythonNumpyPandas

Python Problem Overview


Given a Pandas DataFrame that has multiple columns with categorical values (0 or 1), is it possible to conveniently get the value_counts for every column at the same time?

For example, suppose I generate a DataFrame as follows:

import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randint(0, 2, (10, 4)), columns=list('abcd'))

I can get a DataFrame like this:

   a  b  c  d
0  0  1  1  0
1  1  1  1  1
2  1  1  1  0
3  0  1  0  0
4  0  0  0  1
5  0  1  1  0
6  0  1  1  1
7  1  0  1  0
8  1  0  1  1
9  0  1  1  0

How do I conveniently get the value counts for every column and obtain the following conveniently?

   a  b  c  d
0  6  3  2  6
1  4  7  8  4

My current solution is:

pieces = []
for col in df.columns:
    tmp_series = df[col].value_counts()
    tmp_series.name = col
    pieces.append(tmp_series)
df_value_counts = pd.concat(pieces, axis=1)

But there must be a simpler way, like stacking, pivoting, or groupby?

Python Solutions


Solution 1 - Python

Just call apply and pass pd.Series.value_counts:

In [212]:
df = pd.DataFrame(np.random.randint(0, 2, (10, 4)), columns=list('abcd'))
df.apply(pd.Series.value_counts)
Out[212]:
   a  b  c  d
0  4  6  4  3
1  6  4  6  7

Solution 2 - Python

There is actually a fairly interesting and advanced way of doing this problem with crosstab and melt

df = pd.DataFrame({'a': ['table', 'chair', 'chair', 'lamp', 'bed'],
                   'b': ['lamp', 'candle', 'chair', 'lamp', 'bed'],
                   'c': ['mirror', 'mirror', 'mirror', 'mirror', 'mirror']})

df

       a       b       c
0  table    lamp  mirror
1  chair  candle  mirror
2  chair   chair  mirror
3   lamp    lamp  mirror
4    bed     bed  mirror

We can first melt the DataFrame

df1 = df.melt(var_name='columns', value_name='index')
df1

   columns   index
0        a   table
1        a   chair
2        a   chair
3        a    lamp
4        a     bed
5        b    lamp
6        b  candle
7        b   chair
8        b    lamp
9        b     bed
10       c  mirror
11       c  mirror
12       c  mirror
13       c  mirror
14       c  mirror

And then use the crosstab function to count the values for each column. This preserves the data type as ints which wouldn't be the case for the currently selected answer:

pd.crosstab(index=df1['index'], columns=df1['columns'])

columns  a  b  c
index           
bed      1  1  0
candle   0  1  0
chair    2  1  0
lamp     1  2  0
mirror   0  0  5
table    1  0  0

Or in one line, which expands the column names to parameter names with ** (this is advanced)

pd.crosstab(**df.melt(var_name='columns', value_name='index'))

Also, value_counts is now a top-level function. So you can simplify the currently selected answer to the following:

df.apply(pd.value_counts)

Solution 3 - Python

To get the counts only for specific columns:

df[['a', 'b']].apply(pd.Series.value_counts)

where df is the name of your dataframe and 'a' and 'b' are the columns for which you want to count the values.

Solution 4 - Python

You can also try this code:

for i in heart.columns:
    x = heart[i].value_counts()
    print("Column name is:",i,"and it value is:",x)

Solution 5 - Python

The solution that selects all categorical columns and makes a dataframe with all value counts at once:

df = pd.DataFrame({
'fruits': ['apple', 'mango', 'apple', 'mango', 'mango', 'pear', 'mango'],
'vegetables': ['cucumber', 'eggplant', 'tomato', 'tomato', 'tomato', 'tomato', 'pumpkin'],
'sauces': ['chili', 'chili', 'ketchup', 'ketchup', 'chili', '1000 islands', 'chili']})

cat_cols = df.select_dtypes(include=object).columns.tolist()
(pd.DataFrame(
    df[cat_cols]
    .melt(var_name='column', value_name='value')
    .value_counts())
.rename(columns={0: 'counts'})
.sort_values(by=['column', 'counts']))

							counts
column		value	
fruits		pear			1
			apple			2
			mango			4
sauces		1000 islands	1
			ketchup			2
			chili			4
vegetables	pumpkin			1
			eggplant		1
			cucumber		1
			tomato			4
			

Solution 6 - Python

Your solution wrapped in one line looks even simpler than using groupby, stacking etc:

pd.concat([df[column].value_counts() for column in df], axis = 1)

Solution 7 - Python

This is what worked for me:

for column in df.columns:
     print("\n" + column)
     print(df[column].value_counts())

link to source

Solution 8 - Python

You can use a lambda function:

df.apply(lambda x: x.value_counts())

Solution 9 - Python

Ran into this to see if there was a better way of doing what I was doing. Turns out calling df.apply(pd.value_counts) on a DataFrame whose columns each have their own many distinct values will result in a pretty substantial performance hit.

In this case, it is better to simply iterate over the non-numeric columns in a dictionary comprehension, and leave it as a dictionary:

types_to_count = {"object", "category", "string"}
result = {
    col: df[col].value_counts()
    for col in df.columns[df.dtypes.isin(types_to_count)]
}

The filtering by types_to_count helps to ensure you don't try to take the value_counts of continuous data.

Solution 10 - Python

Another solution which can be done:

df = pd.DataFrame(np.random.randint(0, 2, (10, 4)), columns=list('abcd'))
l1 = pd.Series()
for var in df.columns:
    l2 = df[var].value_counts()
    l1 = pd.concat([l1, l2], axis = 1)
l1

Solution 11 - Python

Sometimes some columns are subsequent in hierarchy, in that case I recommend to "group" them and then make counts:

# note: "_id" is whatever column you have to make the counts with len()
cat_cols = ['column_1', 'column_2']
df.groupby(cat_cols).agg(count=('_id', lambda x: len(x)))

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th>count</th>
    </tr>
    <tr>
      <th>column_1</th>
      <th>column_2</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="3" valign="top">category_1</th>
      <th>Excelent</th>
      <td>19</td>
    </tr>
    <tr>
      <th>Good</th>
      <td>11</td>
    </tr>
    <tr>
      <th>Bad</th>
      <td>1</td>
    </tr>
    <tr>
      <th rowspan="5" valign="top">category_2</th>
      <th>Happy</th>
      <td>48</td>
    </tr>
    <tr>
      <th>Good mood</th>
      <td>158</td>
    </tr>
    <tr>
      <th>Serious</th>
      <td>62</td>
    </tr>
    <tr>
      <th>Sad</th>
      <td>10</td>
    </tr>
    <tr>
      <th>Depressed</th>
      <td>8</td>
    </tr>
  </tbody>
</table>

Bonus: you can change len(x) to x.nunique() or other lambda functions you want.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionXinView Question on Stackoverflow
Solution 1 - PythonEdChumView Answer on Stackoverflow
Solution 2 - PythonTed PetrouView Answer on Stackoverflow
Solution 3 - PythonmOnaView Answer on Stackoverflow
Solution 4 - PythonAjay KumarView Answer on Stackoverflow
Solution 5 - PythonSerge TochilovView Answer on Stackoverflow
Solution 6 - PythonSimon OsadchiiView Answer on Stackoverflow
Solution 7 - PythonjcdevilleresView Answer on Stackoverflow
Solution 8 - PythonMykola ZotkoView Answer on Stackoverflow
Solution 9 - PythonPMendeView Answer on Stackoverflow
Solution 10 - PythonMujeebur RahmanView Answer on Stackoverflow
Solution 11 - PythonAbimael DomínguezView Answer on Stackoverflow