Selecting columns in R data frame based on those *not* in a vector

RDataframeSubset

R Problem Overview


I'm familiar with being able to extract columns from an R data frame (or matrix) like so:

df.2 <- df[, c("name1", "name2", "name3")]

But can one use a ! or other tool to select all but those listed columns?

For background, I have a data frame with quite a few column vectors and I'd like to avoid:

  • Typing out the majority of the names when I could just remove a minority
  • Using the much shorter df.2 <- df[, c(1,3,5)] because when my .csv file changes, my code goes to heck since the numbering isn't the same anymore. I'm new to R and think I've learned the hard way not to use number vectors for larger df's that might change.

I tried:

df.2 <- df[, !c("name1", "name2", "name3")]
df.2 <- df[, !=c("name1", "name2", "name3")]

And just as I was typing this, found out that this works:

df.2 <- df[, !names(df) %in% c("name1", "name2", "name3")]

Is there a better way than this last one?

R Solutions


Solution 1 - R

An alternative to grep is which:

df.2 <- df[, -which(names(df) %in% c("name1", "name2", "name3"))]

Solution 2 - R

You can make a shorter call that is also more generalizable with negative-grep:

df.2 <- df[, -grep("^name[1:3]$", names(df) )] 

Since grep returns numerics you can use the negative vector indexing to remove columns. You could add further number or more complex patterns.

Solution 3 - R

dplyr::select() has several options for dropping specific columns:

library(dplyr)

drop_columns <- c('cyl','disp','hp')
mtcars %>% 
  select(-one_of(drop_columns)) %>% 
  head(2)

              mpg drat    wt  qsec vs am gear carb
Mazda RX4      21  3.9 2.620 16.46  0  1    4    4
Mazda RX4 Wag  21  3.9 2.875 17.02  0  1    4    4

Negating specific column names, the following drops the column "hp" and the columns from "qsec" through "gear":

mtcars %>% 
  select(-hp, -(qsec:gear)) %>% 
  head(2)

              mpg cyl disp drat    wt carb
Mazda RX4      21   6  160  3.9 2.620    4
Mazda RX4 Wag  21   6  160  3.9 2.875    4

You could also negate contains(), starts_with(), ends_with(), or matches():

mtcars %>% 
  select(-contains('t')) %>%
  select(-starts_with('a')) %>% 
  select(-ends_with('b')) %>% 
  select(-matches('^m.+g$')) %>% 
  head(2)

              cyl disp  hp  qsec vs gear
Mazda RX4       6  160 110 16.46  0    4
Mazda RX4 Wag   6  160 110 17.02  0    4

Solution 4 - R

Old thread, but here's another solution:

df.2 <- subset(df, select=-c(name1, name2, name3))

This was posted in another similar thread (though I can't find it right now). Should be sustainable code in the situation you describe, and is probably easier to read and edit than some of the other options.

Solution 5 - R

You could make a custom function to do this if you're using it for your own use to manipulate data. I may do something like this:

rm.col <- function(df, ...) {
    x <- substitute(...())
    z <- Trim(unlist(lapply(x, function(y) as.character(y))))
    df[, !names(df) %in% z]
}

rm.col(mtcars, hp, mpg)

The first argument is the dataframe name. the following ... are the names of any columns you wish to remove.

Solution 6 - R

The easiest way that comes to my mind:

> filtered_df<-df[, setdiff(names(df),c("name1","name2") ]

essentially you are computing the set difference between full list of column names and the subset you want to filter out (name1 and name2 above).

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionHendyView Question on Stackoverflow
Solution 1 - RharkmugView Answer on Stackoverflow
Solution 2 - RIRTFMView Answer on Stackoverflow
Solution 3 - RsbhaView Answer on Stackoverflow
Solution 4 - Rmflo-ByeSEView Answer on Stackoverflow
Solution 5 - RTyler RinkerView Answer on Stackoverflow
Solution 6 - RAhmed OsmanView Answer on Stackoverflow