Convert a data frame to a data.table without copy

RDataframeReferencedata.table

R Problem Overview


I have a large data frame (in the order of several GB) that I'd like to convert to a data.table. Using as.data.table creates a copy of the data frame, which means I need available memory to be at least twice the size of the data. Is there a way to do the conversion without a copy?

Here's a simple example to demonstrate:

library(data.table)
N <- 1e6
K <- 1e2
data <- as.data.frame(rep(data.frame(rnorm(N)), K))

gc(reset=TRUE)
tracemem(data)
data <- as.data.table(data)
gc()

With output:

library(data.table)
# data.table 1.8.10  For help type: help("data.table")
N <- 1e6
K <- 1e2
data <- as.data.frame(rep(data.frame(rnorm(N)), K))

gc(reset=TRUE)
# used  (Mb) gc trigger   (Mb)  max used  (Mb)
# Ncells    303759  16.3     597831   32.0    303759  16.3
# Vcells 100442572 766.4  402928632 3074.2 100442572 766.4
tracemem(data)
# [1] "<0x363fda0>"
data <- as.data.table(data)
# tracemem[0x363fda0 -> 0x31e4260]: copy as.data.table.data.frame as.data.table 
gc()
# used  (Mb) gc trigger   (Mb)  max used   (Mb)
# Ncells    304519  16.3     597831   32.0    306162   16.4
# Vcells 100444242 766.4  322342905 2459.3 200933219 1533.0

R Solutions


Solution 1 - R

This is available from v1.9.0+. From NEWS:

> o Following this S.O. post, a function setDT is now implemented that takes a list (named and/or unnamed), data.frame (or data.table) as input and returns the same object as a data.table by reference (without any copy). See ?setDT examples for more.

This is in accordance with data.table naming convention - all set* functions modifies by reference. := is the only other that also modifies by reference.

require(data.table) # v1.9.0+
setDT(data) # converts data which is a data.frame to data.table *by reference*

See history for older (now outdated) answers.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionytsaigView Question on Stackoverflow
Solution 1 - RArunView Answer on Stackoverflow