Repeat rows of a data.frame N times

R Problem Overview

I have the following data frame:

data.frame(a = c(1,2,3),b = c(1,2,3))
  a b
1 1 1
2 2 2
3 3 3

I want to repeat the rows n times. For example, here the rows are repeated 3 times:

Is there an easy function to do this in R? Thanks!

R Solutions

Solution 1 - R

EDIT: updated to a better modern R answer.

You can use replicate(), then rbind the result back together. The rownames are automatically altered to run from 1:nrows.

d <- data.frame(a = c(1,2,3),b = c(1,2,3))
n <- 3
do.call("rbind", replicate(n, d, simplify = FALSE))

A more traditional way is to use indexing, but here the rowname altering is not quite so neat (but more informative):

 d[rep(seq_len(nrow(d)), n), ]

Here are improvements on the above, the first two using purrr functional programming, idiomatic purrr:

purrr::map_dfr(seq_len(3), ~d)

and less idiomatic purrr (identical result, though more awkward):

purrr::map_dfr(seq_len(3), function(x) d)

and finally via indexing rather than list apply using dplyr:

d %>% slice(rep(row_number(), 3))

Solution 2 - R

For data.frame objects, this solution is several times faster than @mdsummer's and @wojciech-sobala's.

d[rep(seq_len(nrow(d)), n), ]

For data.table objects, @mdsummer's is a bit faster than applying the above after converting to data.frame. For large n this might flip. microbenchmark .

Full code:

packages <- c("data.table", "ggplot2", "RUnit", "microbenchmark")
lapply(packages, require, character.only=T)

Repeat1 <- function(d, n) {
  return(do.call("rbind", replicate(n, d, simplify = FALSE)))
}

Repeat2 <- function(d, n) {
  return(Reduce(rbind, list(d)[rep(1L, times=n)]))
}

Repeat3 <- function(d, n) {
  if ("data.table" %in% class(d)) return(d[rep(seq_len(nrow(d)), n)])
  return(d[rep(seq_len(nrow(d)), n), ])
}

Repeat3.dt.convert <- function(d, n) {
  if ("data.table" %in% class(d)) d <- as.data.frame(d)
  return(d[rep(seq_len(nrow(d)), n), ])
}

# Try with data.frames
mtcars1 <- Repeat1(mtcars, 3)
mtcars2 <- Repeat2(mtcars, 3)
mtcars3 <- Repeat3(mtcars, 3)

checkEquals(mtcars1, mtcars2)
#  Only difference is row.names having ".k" suffix instead of "k" from 1 & 2
checkEquals(mtcars1, mtcars3)

# Works with data.tables too
mtcars.dt <- data.table(mtcars)
mtcars.dt1 <- Repeat1(mtcars.dt, 3)
mtcars.dt2 <- Repeat2(mtcars.dt, 3)
mtcars.dt3 <- Repeat3(mtcars.dt, 3)

# No row.names mismatch since data.tables don't have row.names
checkEquals(mtcars.dt1, mtcars.dt2)
checkEquals(mtcars.dt1, mtcars.dt3)

# Time test
res <- microbenchmark(Repeat1(mtcars, 10),
                      Repeat2(mtcars, 10),
                      Repeat3(mtcars, 10),
                      Repeat1(mtcars.dt, 10),
                      Repeat2(mtcars.dt, 10),
                      Repeat3(mtcars.dt, 10),
                      Repeat3.dt.convert(mtcars.dt, 10))
print(res)
ggsave("repeat_microbenchmark.png", autoplot(res))

Solution 3 - R

The package dplyr contains the function bind_rows() that directly combines all data frames in a list, such that there is no need to use do.call() together with rbind():

df <- data.frame(a = c(1, 2, 3), b = c(1, 2, 3))
library(dplyr)
bind_rows(replicate(3, df, simplify = FALSE))

For a large number of repetions bind_rows() is also much faster than rbind():

library(microbenchmark)
microbenchmark(rbind = do.call("rbind", replicate(1000, df, simplify = FALSE)),
               bind_rows = bind_rows(replicate(1000, df, simplify = FALSE)),
               times = 20)
## Unit: milliseconds
##       expr       min        lq      mean   median        uq       max neval cld
##      rbind 31.796100 33.017077 35.436753 34.32861 36.773017 43.556112    20   b
##  bind_rows  1.765956  1.818087  1.881697  1.86207  1.898839  2.321621    20  a

Solution 4 - R

With the [tag:data.table]-package, you could use the special symbol .I together with rep:

df <- data.frame(a = c(1,2,3), b = c(1,2,3))
dt <- as.data.table(df)

n <- 3

dt[rep(dt[, .I], n)]

which gives:

> a b > 1: 1 1 > 2: 2 2 > 3: 3 3 > 4: 1 1 > 5: 2 2 > 6: 3 3 > 7: 1 1 > 8: 2 2 > 9: 3 3

Solution 5 - R

d <- data.frame(a = c(1,2,3),b = c(1,2,3))
r <- Reduce(rbind, list(d)[rep(1L, times=3L)])

Solution 6 - R

Even simpler:

library(data.table)
my_data <- data.frame(a = c(1,2,3),b = c(1,2,3))
rbindlist(replicate(n = 3, expr = my_data, simplify = FALSE)

Solution 7 - R

Just use simple indexing with repeat function.

mydata<-data.frame(a = c(1,2,3),b = c(1,2,3)) #creating your data frame  
n<-10           #defining no. of time you want repetition of the rows of your dataframe

mydata<-mydata[rep(rownames(mydata),n),] #use rep function while doing indexing 
rownames(mydata)<-1:NROW(mydata)    #rename rows just to get cleaner look of data

Solution 8 - R

For time execution purposes, i would like to suggest a comparison of different way of rbind:

> mydata <- data.frame(a=1:200,b=201:400,c=301:500)
> microbenchmark(rbind = do.call("rbind",replicate(n=100,mydata,simplify = FALSE)),
+                bind_rows = bind_rows(replicate(n=100,mydata,simplify = FALSE)),
+                rbindlist = rbindlist(replicate(n=100,exp= mydata,simplify = FALSE)),
+                times= 2000)
Unit: microseconds
      expr    min      lq      mean  median      uq      max neval
     rbind 5760.7 6723.10 8642.6930 7132.30 7761.05 240720.3  2000
 bind_rows  976.4 1186.90 1430.7741 1308.85 1469.80  15817.9  2000
 rbindlist  263.6  347.85  465.5894  392.90  459.95  10974.2  2000

Content Type	Original Author	Original Content on Stackoverflow
Question	Michael	View Question on Stackoverflow
Solution 1 - R	mdsumner	View Answer on Stackoverflow
Solution 2 - R	Max Ghenis	View Answer on Stackoverflow
Solution 3 - R	Stibu	View Answer on Stackoverflow
Solution 4 - R	Jaap	View Answer on Stackoverflow
Solution 5 - R	Wojciech Sobala	View Answer on Stackoverflow
Solution 6 - R	Arturo Sbr	View Answer on Stackoverflow
Solution 7 - R	user5433002	View Answer on Stackoverflow
Solution 8 - R	A. chahid	View Answer on Stackoverflow