Plotting categorical data with pandas and matplotlib

PythonPandas

Python Problem Overview


I have a data frame with categorical data:

     colour  direction
1    red     up
2    blue    up
3    green   down
4    red     left
5    red     right
6    yellow  down
7    blue    down

I want to generate some graphs, like pie charts and histograms based on the categories. Is it possible without creating dummy numeric variables? Something like

df.plot(kind='hist')

Python Solutions


Solution 1 - Python

You can simply use value_counts on the series:

df['colour'].value_counts().plot(kind='bar')

enter image description here

Solution 2 - Python

You might find useful mosaic plot from statsmodels. Which can also give statistical highlighting for the variances.

from statsmodels.graphics.mosaicplot import mosaic
plt.rcParams['font.size'] = 16.0
mosaic(df, ['direction', 'colour']);

enter image description here

But beware of the 0 sized cell - they will cause problems with labels.

See this answer for details

Solution 3 - Python

like this :

df.groupby('colour').size().plot(kind='bar')

Solution 4 - Python

You could also use countplot from seaborn. This package builds on pandas to create a high level plotting interface. It gives you good styling and correct axis labels for free.

import pandas as pd
import seaborn as sns
sns.set()

df = pd.DataFrame({'colour': ['red', 'blue', 'green', 'red', 'red', 'yellow', 'blue'],
                   'direction': ['up', 'up', 'down', 'left', 'right', 'down', 'down']})
sns.countplot(df['colour'], color='gray')

enter image description here

It also supports coloring the bars in the right color with a little trick

sns.countplot(df['colour'],
              palette={color: color for color in df['colour'].unique()})

enter image description here

Solution 5 - Python

To plot multiple categorical features as bar charts on the same plot, I would suggest:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(
    {
        "colour": ["red", "blue", "green", "red", "red", "yellow", "blue"],
        "direction": ["up", "up", "down", "left", "right", "down", "down"],
    }
)

categorical_features = ["colour", "direction"]
fig, ax = plt.subplots(1, len(categorical_features))
for i, categorical_feature in enumerate(df[categorical_features]):
    df[categorical_feature].value_counts().plot("bar", ax=ax[i]).set_title(categorical_feature)
fig.show()

enter image description here

Solution 6 - Python

You can simply use value_counts with sort option set to False. This will preserve ordering of the categories

df['colour'].value_counts(sort=False).plot.bar(rot=0)

link to image

Solution 7 - Python

Using plotly

import plotly.express as px
px.bar(df["colour"].value_counts())

Solution 8 - Python

Roman's answer is very helpful and correct but in latest version you also need to specify kind as the parameter's order can change.

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(
    {
    "colour": ["red", "blue", "green", "red", "red", "yellow", "blue"],
    "direction": ["up", "up", "down", "left", "right", "down", "down"],
    }
)

categorical_features = ["colour", "direction"]
fig, ax = plt.subplots(1, len(categorical_features))
for i, categorical_feature in enumerate(df[categorical_features]):
    df[categorical_feature].value_counts().plot(kind="bar", ax=ax[i]).set_title(categorical_feature)
fig.show()

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionIvanView Question on Stackoverflow
Solution 1 - PythonAlexanderView Answer on Stackoverflow
Solution 2 - PythonPrimerView Answer on Stackoverflow
Solution 3 - PythonstebocView Answer on Stackoverflow
Solution 4 - PythonJarnoView Answer on Stackoverflow
Solution 5 - PythonRoman OracView Answer on Stackoverflow
Solution 6 - Pythonmsenior_View Answer on Stackoverflow
Solution 7 - PythonBiman PalView Answer on Stackoverflow
Solution 8 - Pythonahtasham nazeerView Answer on Stackoverflow