Plotting grouped data in same plot using Pandas
PythonPandasMatplotlibSeabornPython Problem Overview
In Pandas, I am doing:
bp = p_df.groupby('class').plot(kind='kde')
p_df
is a dataframe
object.
However, this is producing two plots, one for each class. How do I force one plot with both classes in the same plot?
Python Solutions
Solution 1 - Python
Version 1:
You can create your axis, and then use the ax
keyword of DataFrameGroupBy.plot
to add everything to these axes:
import matplotlib.pyplot as plt
p_df = pd.DataFrame({"class": [1,1,2,2,1], "a": [2,3,2,3,2]})
fig, ax = plt.subplots(figsize=(8,6))
bp = p_df.groupby('class').plot(kind='kde', ax=ax)
This is the result:
Unfortunately, the labeling of the legend does not make too much sense here.
Version 2:
Another way would be to loop through the groups and plot the curves manually:
classes = ["class 1"] * 5 + ["class 2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
p_df = pd.DataFrame({"class": classes, "vals": vals})
fig, ax = plt.subplots(figsize=(8,6))
for label, df in p_df.groupby('class'):
df.vals.plot(kind="kde", ax=ax, label=label)
plt.legend()
This way you can easily control the legend. This is the result:
Solution 2 - Python
Another approach would be using seaborn
module. This would plot the two density estimates on the same axes without specifying a variable to hold the axes as follows (using some data frame setup from the other answer):
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# data to create an example data frame
classes = ["c1"] * 5 + ["c2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
# the data frame
df = pd.DataFrame({"cls": classes, "indices":idx, "vals": vals})
# this is to plot the kde
sns.kdeplot(df.vals[df.cls == "c1"],label='c1');
sns.kdeplot(df.vals[df.cls == "c2"],label='c2');
# beautifying the labels
plt.xlabel('value')
plt.ylabel('density')
plt.show()
This results in the following image.
Solution 3 - Python
import matplotlib.pyplot as plt
p_df.groupby('class').plot(kind='kde', ax=plt.gca())
Solution 4 - Python
Maybe you can try this:
fig, ax = plt.subplots(figsize=(10,8))
classes = list(df.class.unique())
for c in classes:
df2 = data.loc[data['class'] == c]
df2.vals.plot(kind="kde", ax=ax, label=c)
plt.legend()
Solution 5 - Python
- There are two easy methods to plot each group in the same plot.
- When using
pandas.DataFrame.groupby
, the column to be plotted, (e.g. the aggregation column) should be specified. - Use
seaborn.kdeplot
orseaborn.displot
and specify thehue
parameter
- When using
- Using
pandas v1.2.4
,matplotlib 3.4.2
,seaborn 0.11.1
- The OP is specific to plotting the
kde
, but the steps are the same for many plot types (e.g.kind='line'
,sns.lineplot
, etc.).
Imports and Sample Data
- For the sample data, the groups are in the
'kind'
column, and thekde
of'duration'
will be plotted, ignoring'waiting'
.
import pandas as pd
import seaborn as sns
df = sns.load_dataset('geyser')
# display(df.head())
duration waiting kind
0 3.600 79 long
1 1.800 54 short
2 3.333 74 long
3 2.283 62 short
4 4.533 85 long
pandas.DataFrame.plot
Plot with - Reshape the data using
.groupby
or.pivot
.groupby
- Specify the aggregation column,
['duration']
, andkind='kde'
.
ax = df.groupby('kind')['duration'].plot(kind='kde', legend=True)
.pivot
ax = df.pivot(columns='kind', values='duration').plot(kind='kde')
seaborn.kdeplot
Plot with - Specify
hue='kind'
ax = sns.kdeplot(data=df, x='duration', hue='kind')
seaborn.displot
Plot with - Specify
hue='kind'
andkind='kde'
fig = sns.displot(data=df, kind='kde', x='duration', hue='kind')