How to drop columns by name pattern in R?

R

R Problem Overview


I have this dataframe:

state county city  region  mmatrix  X1 X2 X3    A1     A2     A3      B1     B2     B3      C1      C2      C3

  1      1     1      1     111010   1  0  0     2     20    200       Push      8     12      NA      NA      NA
  1      2     1      1     111010   1  0  0     4     NA    400       Shove      9     NA 

Now I want to exclude columns whose names end with a certain string, say "1" (i.e. A1 and B1). I wrote this code:

df_redacted <- df[, -grep("\\1$", colnames(df))]

However, this seems to delete every column. How can I modify the code so that it only deletes the columns that matches the pattern (i.e. ends with "3" or any other string)?

The solution has to be able to handle a dataframe with has both numerical and categorical values.

R Solutions


Solution 1 - R

I found a simple answer using dplyr/tidyverse. If your colnames contain "This", then all variables containing "This" will be dropped.

library(dplyr) 
df_new <- df %>% select(-contains("This"))

Solution 2 - R

Your code works like a charm if I apply it to a minimal example and just search for the string "A":

df <- data.frame(ID = 1:10,
                 A1 = rnorm(10),
                 A2 = rnorm(10),
                 B1 = letters[1:10],
                 B2 = letters[11:20])
df[, -grep("A", colnames(df))]

So your problem is more a regular expression problem, not how to drop columns. If I run your code, I get an error:

df[, -grep("\\3$", colnames(df))]
Error in grep("\\3$", colnames(df)) : 
  invalid regular expression '\3$', reason 'Invalid back reference'

Update: Why don't you just use this following expression?

df[, -grep("1$", colnames(df))]
   ID         A2 B2
1   1  2.0957940  k
2   2 -1.7177042  l
3   3 -0.0448357  m
4   4  1.2899925  n
5   5  0.7569659  o
6   6 -0.5048024  p
7   7  0.6929080  q
8   8 -0.5116399  r
9   9 -1.2621066  s
10 10  0.7664955  t

Solution 3 - R

Just as an additional answer, since I stumbled across this, when looking for the data.table solution to this problem.

library(data.table)
dt <- data.table(df)
drop.cols <- grep("1$", colnames(dt))
dt[, (drop.cols) := NULL]

Solution 4 - R

For excluding any string you can use...

 # Search string to exclude
 strng <- "1"
 df <- data.frame(matrix(runif(25,max=10),nrow=5))
 colnames(df) <- paste( "EX" , 1:5 )
 df_red <- df[, -( grep(paste0( strng , "$" ) , colnames(df),perl = TRUE) ) ]

	df
#		  EX 1     EX 2        EX 3     EX 4     EX 5
#	1 7.332913 4.972780 1.175947853 6.428073 8.625763
#	2 2.730271 3.734072 6.031157537 1.305951 8.012606
#	3 9.450122 3.259247 2.856123205 5.067294 7.027795
#	4 9.682430 5.295177 0.002015966 9.322912 7.424568
#	5 1.225359 1.577659 4.013616377 5.092042 5.130887

	df_red
#		  EX 2        EX 3     EX 4     EX 5
#	1 4.972780 1.175947853 6.428073 8.625763
#	2 3.734072 6.031157537 1.305951 8.012606
#	3 3.259247 2.856123205 5.067294 7.027795
#	4 5.295177 0.002015966 9.322912 7.424568
#	5 1.577659 4.013616377 5.092042 5.130887

Solution 5 - R

If you are specifically looking for a pattern that appears at the end of the column name, to drop those columns, you can use the following command:

library(dplyr) 
df_new <- df %>% select(-ends_with("linear"))

All the columns that end with the string linear will be dropped.

Solution 6 - R

You can expand it further using regex for a broader pattern search. I have a data frame that has a bunch of columns with "name", "upper_name"and"lower_name"` as they represent confidence intervals for a bunch of series, but I don't need them all. So, using regex, you can do the following:

pattern = "(upper_[a-z]*)|(lower_[a-z]*)"
policyData <- policyData[, -grep(pattern = pattern, colnames(policyData))]

The "|" allows me to include an or statement in the regex so I can do it once with a single patter rather than look for each pattern.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionhistelheimView Question on Stackoverflow
Solution 1 - RSamuel SaariView Answer on Stackoverflow
Solution 2 - RChristoph_JView Answer on Stackoverflow
Solution 3 - Rhannes101View Answer on Stackoverflow
Solution 4 - RSimon O'HanlonView Answer on Stackoverflow
Solution 5 - RSandyView Answer on Stackoverflow
Solution 6 - RBryan ButlerView Answer on Stackoverflow