Pandas dataframe total row
PythonPandasPython Problem Overview
I have a dataframe, something like:
foo bar qux
0 a 1 3.14
1 b 3 2.72
2 c 2 1.62
3 d 9 1.41
4 e 3 0.58
and I would like to add a 'total' row to the end of dataframe:
foo bar qux
0 a 1 3.14
1 b 3 2.72
2 c 2 1.62
3 d 9 1.41
4 e 3 0.58
5 total 18 9.47
I've tried to use the sum
command but I end up with a Series, which although I can convert back to a Dataframe, doesn't maintain the data types:
tot_row = pd.DataFrame(df.sum()).T
tot_row['foo'] = 'tot'
tot_row.dtypes:
foo object
bar object
qux object
I would like to maintain the data types from the original data frame as I need to apply other operations to the total row, something like:
baz = 2*tot_row['qux'] + 3*tot_row['bar']
Python Solutions
Solution 1 - Python
Append a totals row with
df.append(df.sum(numeric_only=True), ignore_index=True)
The conversion is necessary only if you have a column of strings or objects.
It's a bit of a fragile solution so I'd recommend sticking to operations on the dataframe, though. eg.
baz = 2*df['qux'].sum() + 3*df['bar'].sum()
Solution 2 - Python
df.loc["Total"] = df.sum()
works for me and I find it easier to remember. Am I missing something? Probably wasn't possible in earlier versions.
I'd actually like to add the total row only temporarily though. Adding it permanently is good for display but makes it a hassle in further calculations.
Just found
df.append(df.sum().rename('Total'))
This prints what I want in a Jupyter notebook and appears to leave the df itself untouched.
Solution 3 - Python
New Method
To get both row and column total:
import numpy as np
import pandas as pd
df = pd.DataFrame({'a': [10,20],'b':[100,200],'c': ['a','b']})
df.loc['Column_Total']= df.sum(numeric_only=True, axis=0)
df.loc[:,'Row_Total'] = df.sum(numeric_only=True, axis=1)
print(df)
a b c Row_Total
0 10.0 100.0 a 110.0
1 20.0 200.0 b 220.0
Column_Total 30.0 300.0 NaN 330.0
Solution 4 - Python
Use DataFrame.pivot_table
with margins=True
:
import pandas as pd
data = [('a',1,3.14),('b',3,2.72),('c',2,1.62),('d',9,1.41),('e',3,.58)]
df = pd.DataFrame(data, columns=('foo', 'bar', 'qux'))
Original df
:
foo bar qux
0 a 1 3.14
1 b 3 2.72
2 c 2 1.62
3 d 9 1.41
4 e 3 0.58
Since pivot_table
requires some sort of grouping (without the index
argument, it'll raise a ValueError: No group keys passed!
), and your original index is vacuous, we'll use the foo
column:
df.pivot_table(index='foo',
margins=True,
margins_name='total', # defaults to 'All'
aggfunc=sum)
VoilĂ !
bar qux
foo
a 1 3.14
b 3 2.72
c 2 1.62
d 9 1.41
e 3 0.58
total 18 9.47
Solution 5 - Python
Alternative way (verified on Pandas 0.18.1):
import numpy as np
total = df.apply(np.sum)
total['foo'] = 'tot'
df.append(pd.DataFrame(total.values, index=total.keys()).T, ignore_index=True)
Result:
foo bar qux
0 a 1 3.14
1 b 3 2.72
2 c 2 1.62
3 d 9 1.41
4 e 3 0.58
5 tot 18 9.47
Solution 6 - Python
Building on JMZ answer
df.append(df.sum(numeric_only=True), ignore_index=True)
if you want to continue using your current index you can name the sum series using .rename() as follows:
df.append(df.sum().rename('Total'))
This will add a row at the bottom of the table.
Solution 7 - Python
Building on answer from Matthias Kauer.
To add row total:
df.loc["Row_Total"] = df.sum()
To add column total,
df.loc[:,"Column_Total"] = df.sum(axis=1)
Solution 8 - Python
This is the way that I do it, by transposing and using the assign method in combination with a lambda function. It makes it simple for me.
df.T.assign(GrandTotal = lambda x: x.sum(axis=1)).T
Solution 9 - Python
Following helped for me to add a column total and row total to a dataframe.
Assume dft1 is your original dataframe... now add a column total and row total with the following steps.
from io import StringIO
import pandas as pd
#create dataframe string
dfstr = StringIO(u"""
a;b;c
1;1;1
2;2;2
3;3;3
4;4;4
5;5;5
""")
#create dataframe dft1 from string
dft1 = pd.read_csv(dfstr, sep=";")
## add a column total to dft1
dft1['Total'] = dft1.sum(axis=1)
## add a row total to dft1 with the following steps
sum_row = dft1.sum(axis=0) #get sum_row first
dft1_sum=pd.DataFrame(data=sum_row).T #change it to a dataframe
dft1_sum=dft1_sum.reindex(columns=dft1.columns) #line up the col index to dft1
dft1_sum.index = ['row_total'] #change row index to row_total
dft1.append(dft1_sum) # append the row to dft1
Solution 10 - Python
Actually all proposed solutions render the original DataFrame unusable for any further analysis and can invalidate following computations, which will be easy to overlook and could lead to false results.
This is because you add a row to the data, which Pandas cannot differentiate from an additional row of data.
Example:
import pandas as pd
data = [1, 5, 6, 8, 9]
df = pd.DataFrame(data)
df
df.describe()
yields
0 | |
---|---|
0 | 1 |
1 | 5 |
2 | 6 |
3 | 8 |
4 | 9 |
0 | |
---|---|
count | 5 |
mean | 5.8 |
std | 3.11448 |
min | 1 |
25% | 5 |
50% | 6 |
75% | 8 |
max | 9 |
After
df.loc['Totals']= df.sum(numeric_only=True, axis=0)
the dataframe looks like this
0 | |
---|---|
0 | 1 |
1 | 5 |
2 | 6 |
3 | 8 |
4 | 9 |
Totals | 29 |
This looks nice, but the new row is treated as if it was an additional data item, so df.describe
will produce false results:
0 | |
---|---|
count | 6 |
mean | 9.66667 |
std | 9.87252 |
min | 1 |
25% | 5.25 |
50% | 7 |
75% | 8.75 |
max | 29 |
So: Watch out! and apply this only after doing all other analyses of the data or work on a copy of the DataFrame!