How do you delete a column by name in data.table?
Rdata.tableR Problem Overview
To get rid of a column named "foo" in a data.frame
, I can do:
df <- df[-grep('foo', colnames(df))]
However, once df
is converted to a data.table
object, there is no way to just remove a column.
Example:
df <- data.frame(id = 1:100, foo = rnorm(100))
df2 <- df[-grep('foo', colnames(df))] # works
df3 <- data.table(df)
df3[-grep('foo', colnames(df3))]
But once it is converted to a data.table
object, this no longer works.
R Solutions
Solution 1 - R
Any of the following will remove column foo
from the data.table df3
:
# Method 1 (and preferred as it takes 0.00s even on a 20GB data.table)
df3[,foo:=NULL]
df3[, c("foo","bar"):=NULL] # remove two columns
myVar = "foo"
df3[, (myVar):=NULL] # lookup myVar contents
# Method 2a -- A safe idiom for excluding (possibly multiple)
# columns matching a regex
df3[, grep("^foo$", colnames(df3)):=NULL]
# Method 2b -- An alternative to 2a, also "safe" in the sense described below
df3[, which(grepl("^foo$", colnames(df3))):=NULL]
data.table also supports the following syntax:
## Method 3 (could then assign to df3,
df3[, !"foo"]
though if you were actually wanting to remove column "foo"
from df3
(as opposed to just printing a view of df3
minus column "foo"
) you'd really want to use Method 1 instead.
(Do note that if you use a method relying on grep()
or grepl()
, you need to set pattern="^foo$"
rather than "foo"
, if you don't want columns with names like "fool"
and "buffoon"
(i.e. those containing foo
as a substring) to also be matched and removed.)
Less safe options, fine for interactive use:
The next two idioms will also work -- if df3
contains a column matching "foo"
-- but will fail in a probably-unexpected way if it does not. If, for instance, you use any of them to search for the non-existent column "bar"
, you'll end up with a zero-row data.table.
As a consequence, they are really best suited for interactive use where one might, e.g., want to display a data.table minus any columns with names containing the substring "foo"
. For programming purposes (or if you are wanting to actually remove the column(s) from df3
rather than from a copy of it), Methods 1, 2a, and 2b are really the best options.
# Method 4:
df3[, .SD, .SDcols = !patterns("^foo$")]
Lastly there are approaches using with=FALSE
, though data.table
is gradually moving away from using this argument so it's now discouraged where you can avoid it; showing here so you know the option exists in case you really do need it:
# Method 5a (like Method 3)
df3[, !"foo", with=FALSE]
# Method 5b (like Method 4)
df3[, !grep("^foo$", names(df3)), with=FALSE]
# Method 5b (another like Method 4)
df3[, !grepl("^foo$", names(df3)), with=FALSE]
Solution 2 - R
You can also use set
for this, which avoids the overhead of [.data.table
in loops:
dt <- data.table( a=letters, b=LETTERS, c=seq(26), d=letters, e=letters )
set( dt, j=c(1L,3L,5L), value=NULL )
> dt[1:5]
b d
1: A a
2: B b
3: C c
4: D d
5: E e
If you want to do it by column name, which(colnames(dt) %in% c("a","c","e"))
should work for j
.
Solution 3 - R
I simply do it in the data frame kind of way:
DT$col = NULL
Works fast and as far as I could see doesn't cause any problems.
UPDATE: not the best method if your DT is very large, as using the $<-
operator will lead to object copying. So better use:
DT[, col:=NULL]
Solution 4 - R
Very simple option in case you have many individual columns to delete in a data table and you want to avoid typing in all column names #careadviced
dt <- dt[, -c(1,4,6,17,83,104)]
This will remove columns based on column number instead.
It's obviously not as efficient because it bypasses data.table advantages but if you're working with less than say 500,000 rows it works fine
Solution 5 - R
Suppose your dt has columns col1
, col2
, col3
, col4
, col5
, coln
.
To delete a subset of them:
vx <- as.character(bquote(c(col1, col2, col3, coln)))[-1]
DT[, paste0(vx):=NULL]
Solution 6 - R
Here is a way when you want to set a # of columns to NULL given their column names a function for your usage :)
deleteColsFromDataTable <- function (train, toDeleteColNames) {
for (myNm in toDeleteColNames)
train <- train [,(myNm):=NULL]
return (train)
}
Solution 7 - R
DT[,c:=NULL] # remove column c
Solution 8 - R
For a data.table, assigning the column to NULL removes it:
DT[,c("col1", "col1", "col2", "col2")] <- NULL
^
|---- Notice the extra comma if DT is a data.table
... which is the equivalent of:
DT$col1 <- NULL
DT$col2 <- NULL
DT$col3 <- NULL
DT$col4 <- NULL
The equivalent for a data.frame is:
DF[c("col1", "col1", "col2", "col2")] <- NULL
^
|---- Notice the missing comma if DF is a data.frame
Q. Why is there a comma in the version for data.table, and no comma in the version for data.frame?
A. As data.frames are stored as a list of columns, you can skip the comma. You could also add it in, however then you will need to assign them to a list of NULL
s, DF[, c("col1", "col2", "col3")] <- list(NULL)
.