Select / assign to data.table when variable names are stored in a character vector

Rdata.table

R Problem Overview


How do you refer to variables in a data.table if the variable names are stored in a character vector? For instance, this works for a data.frame:

df <- data.frame(col1 = 1:3)
colname <- "col1"
df[colname] <- 4:6
df
#   col1
# 1    4
# 2    5
# 3    6

How can I perform this same operation for a data.table, either with or without := notation? The obvious thing of dt[ , list(colname)] doesn't work (nor did I expect it to).

R Solutions


Solution 1 - R

Two ways to programmatically select variable(s):

  1. with = FALSE:

     DT = data.table(col1 = 1:3)
     colname = "col1"
     DT[, colname, with = FALSE] 
     #    col1
     # 1:    1
     # 2:    2
     # 3:    3
    
  2. 'dot dot' (..) prefix:

     DT[, ..colname]    
     #    col1
     # 1:    1
     # 2:    2
     # 3:    3
    

For further description of the 'dot dot' (..) notation, see New Features in 1.10.2 (it is currently not described in help text).

To assign to variable(s), wrap the LHS of := in parentheses:

DT[, (colname) := 4:6]    
#    col1
# 1:    4
# 2:    5
# 3:    6

The latter is known as a column plonk, because you replace the whole column vector by reference. If a subset i was present, it would subassign by reference. The parens around (colname) is a shorthand introduced in version v1.9.4 on CRAN Oct 2014. Here is the news item:

> Using with = FALSE with := is now deprecated in all cases, given that wrapping the LHS of := with parentheses has been preferred for some time.

> colVar = "col1" DT[, (colVar) := 1] # please change to this DT[, c("col1", "col2") := 1] # no change DT[, 2:4 := 1] # no change DT[, c("col1","col2") := list(sum(a), mean(b))] # no change DT[, :=(...), by = ...] # no change

See also Details section in ?`:=`:

DT[i, (colnamevector) := value]
# [...] The parens are enough to stop the LHS being a symbol

And to answer further question in comment, here's one way (as usual there are many ways) :

DT[, colname := cumsum(get(colname)), with = FALSE]
#    col1
# 1:    4
# 2:    9
# 3:   15 

or, you might find it easier to read, write and debug just to eval a paste, similar to constructing a dynamic SQL statement to send to a server :

expr = paste0("DT[,",colname,":=cumsum(",colname,")]")
expr
# [1] "DT[,col1:=cumsum(col1)]"

eval(parse(text=expr))
#    col1
# 1:    4
# 2:   13
# 3:   28

If you do that a lot, you can define a helper function EVAL :

EVAL = function(...)eval(parse(text=paste0(...)),envir=parent.frame(2))

EVAL("DT[,",colname,":=cumsum(",colname,")]")
#    col1
# 1:    4
# 2:   17
# 3:   45

Now that data.table 1.8.2 automatically optimizes j for efficiency, it may be preferable to use the eval method. The get() in j prevents some optimizations, for example.

Or, there is set(). A low overhead, functional form of :=, which would be fine here. See ?set.

set(DT, j = colname, value = cumsum(DT[[colname]]))
DT
#    col1
# 1:    4
# 2:   21
# 3:   66

Solution 2 - R

*This is not an answer really, but I don't have enough street cred to post comments :/

Anyway, for anyone who might be looking to actually create a new column in a data table with a name stored in a variable, I've got the following to work. I have no clue as to it's performance. Any suggestions for improvement? Is it safe to assume a nameless new column will always be given the name V1?

colname <- as.name("users")
# Google Analytics query is run with chosen metric and resulting data is assigned to DT
DT2 <- DT[, sum(eval(colname, .SD)), by = country]
setnames(DT2, "V1", as.character(colname))

Notice I can reference it just fine in the sum() but can't seem to get it to assign in the same step. BTW, the reason I need to do this is colname will be based on user input in a Shiny app.

Solution 3 - R

Retrieve multiple columns from data.table via variable or function:

library(data.table)

x <- data.table(this=1:2,that=1:2,whatever=1:2)

# === explicit call
x[, .(that, whatever)]
x[, c('that', 'whatever')]

# === indirect via  variable
# ... direct assignment
mycols <- c('that','whatever')
# ... same as result of a function call
mycols <- grep('a', colnames(x), value=TRUE)

x[, ..mycols]
x[, .SD, .SDcols=mycols]

# === direct 1-liner usage
x[, .SD, .SDcols=c('that','whatever')]
x[, .SD, .SDcols=grep('a', colnames(x), value=TRUE)]

which all yield

   that whatever
1:    1        1
2:    2        2

I find the .SDcols way the most elegant.

Solution 4 - R

For multiple columns and a function applied on column values.

When updating the values from a function, the RHS must be a list object, so using a loop on .SD with lapply will do the trick.

The example below converts integer columns to numeric columns

a1 <- data.table(a=1:5, b=6:10, c1=letters[1:5])
sapply(a1, class)  # show classes of columns
#         a           b          c1 
# "integer"   "integer" "character" 

# column name character vector
nm <- c("a", "b")

# Convert columns a and b to numeric type
a1[, j = (nm) := lapply(.SD, as.numeric ), .SDcols = nm ]

sapply(a1, class)
#         a           b          c1 
# "numeric"   "numeric" "character" 

Solution 5 - R

You could try this:

colname <- as.name("COL_NAME")
DT2 <- DT[, list(COL_SUM=sum(eval(colname, .SD))), by = c(group)]

Solution 6 - R

With development version 1.14.3, [tag:data.table] has gained a new interface for programming on data.table, see item 10 in New Features. It uses the new env = parameter.

library(data.table) # development version 1.14.3 used
dt <- data.table(col1 = 1:3)
colname <- "col1"

dt[, cn := cn + 3L, env = list(cn = colname)][]

> col1 > > 1: 4 > 2: 5 > 3: 6

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionfrankcView Question on Stackoverflow
Solution 1 - RMatt DowleView Answer on Stackoverflow
Solution 2 - Refh0888View Answer on Stackoverflow
Solution 3 - RC.KView Answer on Stackoverflow
Solution 4 - RSathishView Answer on Stackoverflow
Solution 5 - Rshrilata murthyView Answer on Stackoverflow
Solution 6 - RUweView Answer on Stackoverflow