Add row to a data frame with total sum for each column
RDataframeR Problem Overview
I have a data frame where I would like to add an additional row that totals up the values for each column. For example, Let's say I have this data:
x <- data.frame(Language=c("C++", "Java", "Python"),
Files=c(4009, 210, 35),
LOC=c(15328,876, 200),
stringsAsFactors=FALSE)
Data looks like this:
Language Files LOC
1 C++ 4009 15328
2 Java 210 876
3 Python 35 200
My instinct is to do this:
y <- rbind(x, c("Total", colSums(x[,2:3])))
And this works, it computes the totals:
> y
Language Files LOC
1 C++ 4009 15328
2 Java 210 876
3 Python 35 200
4 Total 4254 16404
The problem is that the Files and LOC columns have all been converted to strings:
> y$LOC
[1] "15328" "876" "200" "16404"
I understand that this is happening because I created a vector c("Total", colSums(x[,2:3])
with inputs that are both numbers and strings, and it's converting all the elements to a common type so that all of the vector elements are the same. Then the same thing happens to the Files and LOC columns.
What's a better way to do this?
R Solutions
Solution 1 - R
See adorn_totals()
from the janitor package:
library(janitor)
x %>%
adorn_totals("row")
#> Language Files LOC
#> C++ 4009 15328
#> Java 210 876
#> Python 35 200
#> Total 4254 16404
The numeric columns remain of class numeric.
Disclaimer: I created this package, including adorn_totals()
which is made for precisely this task.
Solution 2 - R
A tidyverse
way to do this would be to use bind_rows
(or eventually add_row
) and summarise
to compute the sums. Here the issue is that we want sums for all but one, so a trick would be:
summarise_all(x, ~if(is.numeric(.)) sum(.) else "Total")
In one line:
x %>%
bind_rows(summarise_all(., ~if(is.numeric(.)) sum(.) else "Total"))
Edit with dplyr >=1.0
One can also use across()
, which is slightly more verbose in this case:
x %>%
bind_rows(summarise(.,
across(where(is.numeric), sum),
across(where(is.character), ~"Total")))
Solution 3 - R
Here's a way that gets you what you want, but there may very well be a more elegant solution.
rbind(x, data.frame(Language = "Total", t(colSums(x[, -1]))))
For the record, I prefer Chase's answer if you don't absolutely need the Language
column.
Solution 4 - R
Do you need the Language column in your data, or is it more appropriate to think of that column as the row.names
? That would change your data.frame from 4 observations of 3 variables to 4 observations of 2 variables (Files & LOC).
x <- data.frame(Files = c(4009, 210, 35), LOC = c(15328,876, 200),
row.names = c("C++", "Java", "Python"), stringsAsFactors = FALSE)
x["Total" ,] <- colSums(x)
> x
Files LOC
C++ 4009 15328
Java 210 876
Python 35 200
Total 4254 16404
Solution 5 - R
Try this
y[4,] = c("Total", colSums(y[,2:3]))
Solution 6 - R
If (1) we don't need the "Language"
heading on the first column then we can represent it using row names and if (2) it is ok to label the last row as "Sum"
rather than "Total"
then we can use addmargins
like this:
rownames(x) <- x$Language
addmargins(as.table(as.matrix(x[-1])), 1)
giving:
Files LOC
C++ 4009 15328
Java 210 876
Python 35 200
Sum 4254 16404
If we do want the first column labelled "Language"
and the total row labelled "Total"
then its a bit longer:
rownames(x) <- x$Language
Total <- sum
xa <- addmargins(as.table(as.matrix(x[-1])), 1, FUN = Total)
data.frame(Language = rownames(xa), as.matrix(xa[]), row.names = NULL)
giving:
Language Files LOC
1 C++ 4009 15328
2 Java 210 876
3 Python 35 200
4 Total 4254 16404
Solution 7 - R
Try this
library(tibble)
x %>% add_row( Language="Total",Files = sum(.$Files),LOC = sum(.$LOC) )
Solution 8 - R
Extending the answer of Nicolas Ratto, if you were to have a lot more columns you could use
x %>% add_row(Language = "Total", summarise(., across(where(is.numeric), sum)))
Solution 9 - R
df %>% bind_rows(purrr::map_dbl(.,sum))
Solution 10 - R
Are you sure you really want to have the column totals in your data frame? To me, the data frame's interpretation now depends on the row. For example,
- Rows 1-(n-1): how many files are associated with a particular language
- Row n: how many files are associated with all languages
This gets more confusing if you start to subset your data. For example, suppose you want to know which languages have more than 100 Files:
> x = data.frame(Files=c(4009, 210, 35),
LOC=c(15328,876, 200),
row.names=c("C++", "Java", "Python"),
stringsAsFactors=FALSE)
> x["Total" ,] = colSums(x)
> x[x$Files > 100,]
Files LOC
C++ 4009 15328
Java 210 876
Total 4254 16404#But this refers to all languages!
The Total
row is now wrong!
Personally I would work out the column sums and store them in a separate vector.
Solution 11 - R
Since you mention this is a last step before exporting for presentation, you may have column names that will include spaces in them for clarity (i.e. "Grand Total"). If so, the following will insure that the created data.frame will rbind to the original dataset without an error caused by mismatched column names:
dfTotals <- data.frame(Language="Total",t(colSums(x[,-1]))))
colnames(dfTotals) <- names(x)
rbind(x, dfTotals)
Solution 12 - R
Your original instinct would work if you coerced your columns to numeric:
y$LOC <- as.numeric(y$LOC)
y$Files <- as.numeric(y$Files)
And then apply colSums() and rbind().