Extracting specific columns from a data frame

RDataframeR Faq

R Problem Overview


I have an R data frame with 6 columns, and I want to create a new dataframe that only has three of the columns.

Assuming my data frame is df, and I want to extract columns A, B, and E, this is the only command I can figure out:

 data.frame(df$A,df$B,df$E)

Is there a more compact way of doing this?

R Solutions


Solution 1 - R

You can subset using a vector of column names. I strongly prefer this approach over those that treat column names as if they are object names (e.g. subset()), especially when programming in functions, packages, or applications.

# data for reproducible example
# (and to avoid confusion from trying to subset `stats::df`)
df <- setNames(data.frame(as.list(1:5)), LETTERS[1:5])
# subset
df[c("A","B","E")]

Note there's no comma (i.e. it's not df[,c("A","B","C")]). That's because df[,"A"] returns a vector, not a data frame. But df["A"] will always return a data frame.

str(df["A"])
## 'data.frame':	1 obs. of  1 variable:
## $ A: int 1
str(df[,"A"])  # vector
##  int 1

Thanks to David Dorchies for pointing out that df[,"A"] returns a vector instead of a data.frame, and to Antoine Fabri for suggesting a better alternative (above) to my original solution (below).

# subset (original solution--not recommended)
df[,c("A","B","E")]  # returns a data.frame
df[,"A"]             # returns a vector

Solution 2 - R

Using the dplyr package, if your data.frame is called df1:

library(dplyr)

df1 %>%
  select(A, B, E)

This can also be written without the %>% pipe as:

select(df1, A, B, E)

Solution 3 - R

This is the role of the subset() function:

> dat <- data.frame(A=c(1,2),B=c(3,4),C=c(5,6),D=c(7,7),E=c(8,8),F=c(9,9)) 
> subset(dat, select=c("A", "B"))
  A B
1 1 3
2 2 4

Solution 4 - R

There are two obvious choices: Joshua Ulrich's df[,c("A","B","E")] or

df[,c(1,2,5)]

as in

> df <- data.frame(A=c(1,2),B=c(3,4),C=c(5,6),D=c(7,7),E=c(8,8),F=c(9,9)) 
> df
  A B C D E F
1 1 3 5 7 8 9
2 2 4 6 7 8 9
> df[,c(1,2,5)]
  A B E
1 1 3 8
2 2 4 8
> df[,c("A","B","E")]
  A B E
1 1 3 8
2 2 4 8

Solution 5 - R

Where df1 is your original data frame:

df2 <- subset(df1, select = c(1, 2, 5))

Solution 6 - R

For some reason only

df[, (names(df) %in% c("A","B","E"))]

worked for me. All of the above syntaxes yielded "undefined columns selected".

Solution 7 - R

You can also use the sqldf package which performs selects on R data frames as :

df1 <- sqldf("select A, B, E from df")

This gives as the output a data frame df1 with columns: A, B ,E.

Solution 8 - R

You can use with :

with(df, data.frame(A, B, E))

Solution 9 - R

df<- dplyr::select ( df,A,B,C)

Also, you can assign a different name to the newly created data

data<- dplyr::select ( df,A,B,C)

Solution 10 - R

[ and subset are not substitutable:

[ does return a vector if only one column is selected.

df = data.frame(a="a",b="b")    

identical(
  df[,c("a")], 
  subset(df,select="a")
) 

identical(
  df[,c("a","b")],  
  subset(df,select=c("a","b"))
)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAren CambreView Question on Stackoverflow
Solution 1 - RJoshua UlrichView Answer on Stackoverflow
Solution 2 - RSam FirkeView Answer on Stackoverflow
Solution 3 - RStéphane LaurentView Answer on Stackoverflow
Solution 4 - RHenryView Answer on Stackoverflow
Solution 5 - RRichard BallView Answer on Stackoverflow
Solution 6 - Rso860View Answer on Stackoverflow
Solution 7 - RAman BurmanView Answer on Stackoverflow
Solution 8 - RmoodymudskipperView Answer on Stackoverflow
Solution 9 - RMohamed RahoumaView Answer on Stackoverflow
Solution 10 - RfxiView Answer on Stackoverflow