How to append rows to an R data frame

R Problem Overview

I have looked around StackOverflow, but I cannot find a solution specific to my problem, which involves appending rows to an R data frame.

I am initializing an empty 2-column data frame, as follows.

df = data.frame(x = numeric(), y = character())

Then, my goal is to iterate through a list of values and, in each iteration, append a value to the end of the list. I started with the following code.

for (i in 1:10) {
    df$x = rbind(df$x, i)
    df$y = rbind(df$y, toString(i))
}

I also attempted the functions c, append, and merge without success. Please let me know if you have any suggestions.

Update from comment: I don't presume to know how R was meant to be used, but I wanted to ignore the additional line of code that would be required to update the indices on every iteration and I cannot easily preallocate the size of the data frame because I don't know how many rows it will ultimately take. Remember that the above is merely a toy example meant to be reproducible. Either way, thanks for your suggestion!

R Solutions

Solution 1 - R

Update

Not knowing what you are trying to do, I'll share one more suggestion: Preallocate vectors of the type you want for each column, insert values into those vectors, and then, at the end, create your data.frame.

Continuing with Julian's f3 (a preallocated data.frame) as the fastest option so far, defined as:

# pre-allocate space
f3 <- function(n){
  df <- data.frame(x = numeric(n), y = character(n), stringsAsFactors = FALSE)
  for(i in 1:n){
    df$x[i] <- i
    df$y[i] <- toString(i)
  }
  df
}

Here's a similar approach, but one where the data.frame is created as the last step.

# Use preallocated vectors
f4 <- function(n) {
  x <- numeric(n)
  y <- character(n)
  for (i in 1:n) {
    x[i] <- i
    y[i] <- i
  }
  data.frame(x, y, stringsAsFactors=FALSE)
}

microbenchmark from the "microbenchmark" package will give us more comprehensive insight than system.time:

library(microbenchmark)
microbenchmark(f1(1000), f3(1000), f4(1000), times = 5)
# Unit: milliseconds
#      expr         min          lq      median         uq         max neval
#  f1(1000) 1024.539618 1029.693877 1045.972666 1055.25931 1112.769176     5
#  f3(1000)  149.417636  150.529011  150.827393  151.02230  160.637845     5
#  f4(1000)    7.872647    7.892395    7.901151    7.95077    8.049581     5

f1() (the approach below) is incredibly inefficient because of how often it calls data.frame and because growing objects that way is generally slow in R. f3() is much improved due to preallocation, but the data.frame structure itself might be part of the bottleneck here. f4() tries to bypass that bottleneck without compromising the approach you want to take.

Original answer

This is really not a good idea, but if you wanted to do it this way, I guess you can try:

for (i in 1:10) {
  df <- rbind(df, data.frame(x = i, y = toString(i)))
}

Note that in your code, there is one other problem:

You should use stringsAsFactors if you want the characters to not get converted to factors. Use: df = data.frame(x = numeric(), y = character(), stringsAsFactors = FALSE)

Solution 2 - R

Let's benchmark the three solutions proposed:

# use rbind
f1 <- function(n){
  df <- data.frame(x = numeric(), y = character())
  for(i in 1:n){
	df <- rbind(df, data.frame(x = i, y = toString(i)))
  }
  df
}
# use list
f2 <- function(n){
  df <- data.frame(x = numeric(), y = character(), stringsAsFactors = FALSE)
  for(i in 1:n){
	df[i,] <- list(i, toString(i))
  }
  df
}
# pre-allocate space
f3 <- function(n){
  df <- data.frame(x = numeric(1000), y = character(1000), stringsAsFactors = FALSE)
  for(i in 1:n){
	df$x[i] <- i
	df$y[i] <- toString(i)
  }
  df
}
system.time(f1(1000))
#   user  system elapsed 
#   1.33    0.00    1.32 
system.time(f2(1000))
#   user  system elapsed 
#   0.19    0.00    0.19 
system.time(f3(1000))
#   user  system elapsed 
#   0.14    0.00    0.14

The best solution is to pre-allocate space (as intended in R). The next-best solution is to use list, and the worst solution (at least based on these timing results) appears to be rbind.

Solution 3 - R

Suppose you simply don't know the size of the data.frame in advance. It can well be a few rows, or a few millions. You need to have some sort of container, that grows dynamically. Taking in consideration my experience and all related answers in SO I come with 4 distinct solutions:

rbindlist to the data.frame
Use data.table's fast set operation and couple it with manually doubling the table when needed.
Use RSQLite and append to the table held in memory.
data.frame's own ability to grow and use custom environment (which has reference semantics) to store the data.frame so it will not be copied on return.

Here is a test of all the methods for both small and large number of appended rows. Each method has 3 functions associated with it:

create(first_element) that returns the appropriate backing object with first_element put in.
append(object, element) that appends the element to the end of the table (represented by object).
access(object) gets the data.frame with all the inserted elements.

`rbindlist` to the data.frame

That is quite easy and straight-forward:

create.1<-function(elems)
{
  return(as.data.table(elems))
}

append.1<-function(dt, elems)
{ 
  return(rbindlist(list(dt,  elems),use.names = TRUE))
}

access.1<-function(dt)
{
  return(dt)
}

`data.table::set` + manually doubling the table when needed.

I will store the true length of the table in a rowcount attribute.

create.2<-function(elems)
{
  return(as.data.table(elems))
}

append.2<-function(dt, elems)
{
  n<-attr(dt, 'rowcount')
  if (is.null(n))
    n<-nrow(dt)
  if (n==nrow(dt))
  {
    tmp<-elems[1]
    tmp[[1]]<-rep(NA,n)
    dt<-rbindlist(list(dt, tmp), fill=TRUE, use.names=TRUE)
    setattr(dt,'rowcount', n)
  }
  pos<-as.integer(match(names(elems), colnames(dt)))
  for (j in seq_along(pos))
  {
    set(dt, i=as.integer(n+1), pos[[j]], elems[[j]])
  }
  setattr(dt,'rowcount',n+1)
  return(dt)
}

access.2<-function(elems)
{
  n<-attr(elems, 'rowcount')
  return(as.data.table(elems[1:n,]))
}

SQL should be optimized for fast record insertion, so I initially had high hopes for `RSQLite` solution

This is basically copy&paste of Karsten W. answer on similar thread.

create.3<-function(elems)
{
  con <- RSQLite::dbConnect(RSQLite::SQLite(), ":memory:")
  RSQLite::dbWriteTable(con, 't', as.data.frame(elems))
  return(con)
}

append.3<-function(con, elems)
{ 
  RSQLite::dbWriteTable(con, 't', as.data.frame(elems), append=TRUE)
  return(con)
}

access.3<-function(con)
{
  return(RSQLite::dbReadTable(con, "t", row.names=NULL))
}

`data.frame`'s own row-appending + custom environment.

create.4<-function(elems)
{
  env<-new.env()
  env$dt<-as.data.frame(elems)
  return(env)
}

append.4<-function(env, elems)
{ 
  env$dt[nrow(env$dt)+1,]<-elems
  return(env)
}

access.4<-function(env)
{
  return(env$dt)
}

The test suite:

For convenience I will use one test function to cover them all with indirect calling. (I checked: using do.call instead of calling the functions directly doesn't makes the code run measurable longer).

test<-function(id, n=1000)
{
  n<-n-1
  el<-list(a=1,b=2,c=3,d=4)
  o<-do.call(paste0('create.',id),list(el))
  s<-paste0('append.',id)
  for (i in 1:n)
  {
    o<-do.call(s,list(o,el))
  }
  return(do.call(paste0('access.', id), list(o)))
}

Let's see the performance for n=10 insertions.

I also added a 'placebo' functions (with suffix 0) that don't perform anything - just to measure the overhead of the test setup.

r<-microbenchmark(test(0,n=10), test(1,n=10),test(2,n=10),test(3,n=10), test(4,n=10))
autoplot(r)

For 1E5 rows (measurements done on Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz):

nr  function      time
4   data.frame    228.251 
3   sqlite        133.716
2   data.table      3.059
1   rbindlist     169.998 
0   placebo         0.202

It looks like the SQLite-based sulution, although regains some speed on large data, is nowhere near data.table + manual exponential growth. The difference is almost two orders of magnitude!

Summary

If you know that you will append rather small number of rows (n<=100), go ahead and use the simplest possible solution: just assign the rows to the data.frame using bracket notation and ignore the fact that the data.frame is not pre-populated.

For everything else use data.table::set and grow the data.table exponentially (e.g. using my code).

Solution 4 - R

Update with purrr, tidyr & dplyr

As the question is already dated (6 years), the answers are missing a solution with newer packages tidyr and purrr. So for people working with these packages, I want to add a solution to the previous answers - all quite interesting, especially .

The biggest advantage of purrr and tidyr are better readability IMHO. purrr replaces lapply with the more flexible map() family, tidyr offers the super-intuitive method add_row - just does what it says :)

map_df(1:1000, function(x) { df %>% add_row(x = x, y = toString(x)) })

This solution is short and intuitive to read, and it's relatively fast:

system.time(
   map_df(1:1000, function(x) { df %>% add_row(x = x, y = toString(x)) })
)
   user  system elapsed 
   0.756   0.006   0.766

It scales almost linearly, so for 1e5 rows, the performance is:

system.time(
  map_df(1:100000, function(x) { df %>% add_row(x = x, y = toString(x)) })
)
   user  system elapsed 
 76.035   0.259  76.489

which would make it rank second right after data.table (if your ignore the placebo) in the benchmark by @Adam Ryczkowski:

nr  function      time
4   data.frame    228.251 
3   sqlite        133.716
2   data.table      3.059
1   rbindlist     169.998 
0   placebo         0.202

Solution 5 - R

A more generic solution for might be the following.

    extendDf <- function (df, n) {
    withFactors <- sum(sapply (df, function(X) (is.factor(X)) )) > 0
    nr          <- nrow (df)
    colNames    <- names(df)
    for (c in 1:length(colNames)) {
        if (is.factor(df[,c])) {
            col         <- vector (mode='character', length = nr+n) 
            col[1:nr]   <- as.character(df[,c])
            col[(nr+1):(n+nr)]<- rep(col[1], n)  # to avoid extra levels
            col         <- as.factor(col)
        } else {
            col         <- vector (mode=mode(df[1,c]), length = nr+n)
            class(col)  <- class (df[1,c])
            col[1:nr]   <- df[,c] 
        }
        if (c==1) {
            newDf       <- data.frame (col ,stringsAsFactors=withFactors)
        } else {
            newDf[,c]   <- col 
        }
    }
    names(newDf) <- colNames
    newDf
}

The function extendDf() extends a data frame with n rows.

As an example:

aDf <- data.frame (l=TRUE, i=1L, n=1, c='a', t=Sys.time(), stringsAsFactors = TRUE)
extendDf (aDf, 2)
#      l i n c                   t
# 1  TRUE 1 1 a 2016-07-06 17:12:30
# 2 FALSE 0 0 a 1970-01-01 01:00:00
# 3 FALSE 0 0 a 1970-01-01 01:00:00

system.time (eDf <- extendDf (aDf, 100000))
#    user  system elapsed 
#   0.009   0.002   0.010
system.time (eDf <- extendDf (eDf, 100000))
#    user  system elapsed 
#   0.068   0.002   0.070

Solution 6 - R

Lets take a vector 'point' which has numbers from 1 to 5

point = c(1,2,3,4,5)

if we want to append a number 6 anywhere inside the vector then below command may come handy

i) Vectors

new_var = append(point, 6 ,after = length(point))

ii) columns of a table

new_var = append(point, 6 ,after = length(mtcars$mpg))

The command append takes three arguments:

the vector/column to be modified.
value to be included in the modified vector.
a subscript, after which the values are to be appended.

simple...!! Apologies in case of any...!

Solution 7 - R

My solution is almost the same as the original answer but it doesn't worked for me.

So, I gave names for the columns and it works:

painel <- rbind(painel, data.frame("col1" = xtweets$created_at,
                                   "col2" = xtweets$text))

Content Type	Original Author	Original Content on Stackoverflow
Question	Gyan Veda	View Question on Stackoverflow
Solution 1 - R	A5C1D2H2I1M1N2O1R2T1	View Answer on Stackoverflow
Solution 2 - R	Julián Urbano	View Answer on Stackoverflow
Solution 3 - R	Adam Ryczkowski	View Answer on Stackoverflow
Solution 4 - R	Agile Bean	View Answer on Stackoverflow
Solution 5 - R	Pisca46	View Answer on Stackoverflow
Solution 6 - R	Praneeth Krishna	View Answer on Stackoverflow
Solution 7 - R	Bruno Gomes	View Answer on Stackoverflow

How to append rows to an R data frame

R Problem Overview

R Solutions

Solution 1 - R

Update

Original answer

Solution 2 - R

Solution 3 - R

`rbindlist` to the data.frame

`data.table::set` + manually doubling the table when needed.

SQL should be optimized for fast record insertion, so I initially had high hopes for `RSQLite` solution

`data.frame`'s own row-appending + custom environment.

The test suite:

Summary

Solution 4 - R

Update with purrr, tidyr & dplyr

Solution 5 - R

Solution 6 - R

Solution 7 - R

Correct use for angular-translate in controllers

Merge Images Side by Side (Horizontally)

Attributions

R Problem Overview

R Solutions

Solution 1 - R

Update

Original answer

Solution 2 - R

Solution 3 - R

rbindlist to the data.frame

data.table::set + manually doubling the table when needed.

SQL should be optimized for fast record insertion, so I initially had high hopes for RSQLite solution

data.frame's own row-appending + custom environment.

The test suite:

Summary

Solution 4 - R

Update with purrr, tidyr & dplyr

Solution 5 - R

Solution 6 - R

Solution 7 - R

Correct use for angular-translate in controllers

Merge Images Side by Side (Horizontally)

Attributions

`rbindlist` to the data.frame

`data.table::set` + manually doubling the table when needed.

SQL should be optimized for fast record insertion, so I initially had high hopes for `RSQLite` solution

`data.frame`'s own row-appending + custom environment.