dplyr: how to reference columns by column index rather than column name using mutate?

RDplyr

R Problem Overview


Using dplyr, you can do something like this:

iris %>% head %>% mutate(sum=Sepal.Length + Sepal.Width) 
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1          5.1         3.5          1.4         0.2  setosa 8.6
2          4.9         3.0          1.4         0.2  setosa 7.9
3          4.7         3.2          1.3         0.2  setosa 7.9
4          4.6         3.1          1.5         0.2  setosa 7.7
5          5.0         3.6          1.4         0.2  setosa 8.6
6          5.4         3.9          1.7         0.4  setosa 9.3

But above, I referenced the columns by their column names. How can I use 1 and 2 , which are the column indices to achieve the same result?

Here I have the following, but I feel it's not as elegant.

iris %>% head %>% mutate(sum=apply(select(.,1,2),1,sum))
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1          5.1         3.5          1.4         0.2  setosa 8.6
2          4.9         3.0          1.4         0.2  setosa 7.9
3          4.7         3.2          1.3         0.2  setosa 7.9
4          4.6         3.1          1.5         0.2  setosa 7.7
5          5.0         3.6          1.4         0.2  setosa 8.6
6          5.4         3.9          1.7         0.4  setosa 9.3

R Solutions


Solution 1 - R

You can try:

iris %>% head %>% mutate(sum = .[[1]] + .[[2]])

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1          5.1         3.5          1.4         0.2  setosa 8.6
2          4.9         3.0          1.4         0.2  setosa 7.9
3          4.7         3.2          1.3         0.2  setosa 7.9
4          4.6         3.1          1.5         0.2  setosa 7.7
5          5.0         3.6          1.4         0.2  setosa 8.6
6          5.4         3.9          1.7         0.4  setosa 9.3

Solution 2 - R

I'm a bit late to the game, but my personal strategy in cases like this is to write my own tidyverse-compliant function that will do exactly what I want. By tidyverse-compliant, I mean that the first argument of the function is a data frame and that the output is a vector that can be added to the data frame.

sum_cols <- function(x, col1, col2){
   x[[col1]] + x[[col2]]
}

iris %>%
  head %>%
  mutate(sum = sum_cols(x = ., col1 = 1, col2 = 2))

Solution 3 - R

What do you think about this version?
Inspired by @SavedByJesus's answer.

applySum <- function(df, ...) {
  assertthat::assert_that(...length() > 0, msg = "one or more column indexes are required")
  mutate(df, Sum = apply(as.data.frame(df[, c(...)]), 1, sum))
}

iris %>%
  head(2) %>%
  applySum(1, 2)
#
### output
#
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sum
1          5.1         3.5          1.4         0.2  setosa 8.6
2          4.9         3.0          1.4         0.2  setosa 7.9
#
### you can select and sum more then two columns by the same function
#
iris %>%
  head(2) %>%
  applySum(1, 2, 3, 4)
#
### output
#
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species  Sum
1          5.1         3.5          1.4         0.2  setosa 10.2
2          4.9         3.0          1.4         0.2  setosa  9.5

Solution 4 - R

To address the issue that @pluke is asking about in the comments, dplyr doesn't really support column index.

Not a perfect solution, but you can use base R to get around this

iris[1] <- iris[1] + iris[2]

Solution 5 - R

An alternative to reusing . in mutate that will respect grouping is to use dplyr::cur_data_all(). From help(cur_data_all)

>cur_data_all() gives the current data for the current group (including grouping variables)

Consider the following:

iris %>% group_by(Species) %>% mutate(sum = .[[1]] + .[[2]]) %>% head
#Error: Problem with `mutate()` column `sum`.
#ℹ `sum = .[[1]] + .[[2]]`.
#ℹ `sum` must be size 50 or 1, not 150.
#ℹ The error occurred in group 1: Species = setosa.

If instead you use cur_data_all(), it works without issue:

iris %>% mutate(sum = select(cur_data_all(),1) + select(cur_data_all(),2)) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length
#1          5.1         3.5          1.4         0.2  setosa          8.6
#2          4.9         3.0          1.4         0.2  setosa          7.9
#3          4.7         3.2          1.3         0.2  setosa          7.9
#4          4.6         3.1          1.5         0.2  setosa          7.7
#5          5.0         3.6          1.4         0.2  setosa          8.6
#6          5.4         3.9          1.7         0.4  setosa          9.3

The same approach works with the extract operator ([[).

iris %>% mutate(sum = cur_data()[[1]] + cur_data()[[2]]) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
#1          5.1         3.5          1.4         0.2  setosa 8.6
#2          4.9         3.0          1.4         0.2  setosa 7.9
#3          4.7         3.2          1.3         0.2  setosa 7.9
#4          4.6         3.1          1.5         0.2  setosa 7.7
#5          5.0         3.6          1.4         0.2  setosa 8.6
#6          5.4         3.9          1.7         0.4  setosa 9.3

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAlbyView Question on Stackoverflow
Solution 1 - RjeremycgView Answer on Stackoverflow
Solution 2 - RSavedByJESUSView Answer on Stackoverflow
Solution 3 - RbenajaView Answer on Stackoverflow
Solution 4 - RNina SonnebornView Answer on Stackoverflow
Solution 5 - RIan CampbellView Answer on Stackoverflow