Repeat rows of a data.frame N times
RDataframeR Problem Overview
I have the following data frame:
data.frame(a = c(1,2,3),b = c(1,2,3))
a b
1 1 1
2 2 2
3 3 3
I want to repeat the rows n times. For example, here the rows are repeated 3 times:
a b
1 1 1
2 2 2
3 3 3
4 1 1
5 2 2
6 3 3
7 1 1
8 2 2
9 3 3
Is there an easy function to do this in R? Thanks!
R Solutions
Solution 1 - R
EDIT: updated to a better modern R answer.
You can use replicate()
, then rbind
the result back together. The rownames are automatically altered to run from 1:nrows.
d <- data.frame(a = c(1,2,3),b = c(1,2,3))
n <- 3
do.call("rbind", replicate(n, d, simplify = FALSE))
A more traditional way is to use indexing, but here the rowname altering is not quite so neat (but more informative):
d[rep(seq_len(nrow(d)), n), ]
Here are improvements on the above, the first two using purrr
functional programming, idiomatic purrr:
purrr::map_dfr(seq_len(3), ~d)
and less idiomatic purrr (identical result, though more awkward):
purrr::map_dfr(seq_len(3), function(x) d)
and finally via indexing rather than list apply using dplyr
:
d %>% slice(rep(row_number(), 3))
Solution 2 - R
For data.frame
objects, this solution is several times faster than @mdsummer's and @wojciech-sobala's.
d[rep(seq_len(nrow(d)), n), ]
For data.table
objects, @mdsummer's is a bit faster than applying the above after converting to data.frame
. For large n this might flip.
.
Full code:
packages <- c("data.table", "ggplot2", "RUnit", "microbenchmark")
lapply(packages, require, character.only=T)
Repeat1 <- function(d, n) {
return(do.call("rbind", replicate(n, d, simplify = FALSE)))
}
Repeat2 <- function(d, n) {
return(Reduce(rbind, list(d)[rep(1L, times=n)]))
}
Repeat3 <- function(d, n) {
if ("data.table" %in% class(d)) return(d[rep(seq_len(nrow(d)), n)])
return(d[rep(seq_len(nrow(d)), n), ])
}
Repeat3.dt.convert <- function(d, n) {
if ("data.table" %in% class(d)) d <- as.data.frame(d)
return(d[rep(seq_len(nrow(d)), n), ])
}
# Try with data.frames
mtcars1 <- Repeat1(mtcars, 3)
mtcars2 <- Repeat2(mtcars, 3)
mtcars3 <- Repeat3(mtcars, 3)
checkEquals(mtcars1, mtcars2)
# Only difference is row.names having ".k" suffix instead of "k" from 1 & 2
checkEquals(mtcars1, mtcars3)
# Works with data.tables too
mtcars.dt <- data.table(mtcars)
mtcars.dt1 <- Repeat1(mtcars.dt, 3)
mtcars.dt2 <- Repeat2(mtcars.dt, 3)
mtcars.dt3 <- Repeat3(mtcars.dt, 3)
# No row.names mismatch since data.tables don't have row.names
checkEquals(mtcars.dt1, mtcars.dt2)
checkEquals(mtcars.dt1, mtcars.dt3)
# Time test
res <- microbenchmark(Repeat1(mtcars, 10),
Repeat2(mtcars, 10),
Repeat3(mtcars, 10),
Repeat1(mtcars.dt, 10),
Repeat2(mtcars.dt, 10),
Repeat3(mtcars.dt, 10),
Repeat3.dt.convert(mtcars.dt, 10))
print(res)
ggsave("repeat_microbenchmark.png", autoplot(res))
Solution 3 - R
The package dplyr
contains the function bind_rows()
that directly combines all data frames in a list, such that there is no need to use do.call()
together with rbind()
:
df <- data.frame(a = c(1, 2, 3), b = c(1, 2, 3))
library(dplyr)
bind_rows(replicate(3, df, simplify = FALSE))
For a large number of repetions bind_rows()
is also much faster than rbind()
:
library(microbenchmark)
microbenchmark(rbind = do.call("rbind", replicate(1000, df, simplify = FALSE)),
bind_rows = bind_rows(replicate(1000, df, simplify = FALSE)),
times = 20)
## Unit: milliseconds
## expr min lq mean median uq max neval cld
## rbind 31.796100 33.017077 35.436753 34.32861 36.773017 43.556112 20 b
## bind_rows 1.765956 1.818087 1.881697 1.86207 1.898839 2.321621 20 a
Solution 4 - R
With the [tag:data.table]-package, you could use the special symbol .I
together with rep
:
df <- data.frame(a = c(1,2,3), b = c(1,2,3))
dt <- as.data.table(df)
n <- 3
dt[rep(dt[, .I], n)]
which gives:
> a b > 1: 1 1 > 2: 2 2 > 3: 3 3 > 4: 1 1 > 5: 2 2 > 6: 3 3 > 7: 1 1 > 8: 2 2 > 9: 3 3
Solution 5 - R
d <- data.frame(a = c(1,2,3),b = c(1,2,3))
r <- Reduce(rbind, list(d)[rep(1L, times=3L)])
Solution 6 - R
Even simpler:
library(data.table)
my_data <- data.frame(a = c(1,2,3),b = c(1,2,3))
rbindlist(replicate(n = 3, expr = my_data, simplify = FALSE)
Solution 7 - R
Just use simple indexing with repeat function.
mydata<-data.frame(a = c(1,2,3),b = c(1,2,3)) #creating your data frame
n<-10 #defining no. of time you want repetition of the rows of your dataframe
mydata<-mydata[rep(rownames(mydata),n),] #use rep function while doing indexing
rownames(mydata)<-1:NROW(mydata) #rename rows just to get cleaner look of data
Solution 8 - R
For time execution purposes, i would like to suggest a comparison of different way of rbind:
> mydata <- data.frame(a=1:200,b=201:400,c=301:500)
> microbenchmark(rbind = do.call("rbind",replicate(n=100,mydata,simplify = FALSE)),
+ bind_rows = bind_rows(replicate(n=100,mydata,simplify = FALSE)),
+ rbindlist = rbindlist(replicate(n=100,exp= mydata,simplify = FALSE)),
+ times= 2000)
Unit: microseconds
expr min lq mean median uq max neval
rbind 5760.7 6723.10 8642.6930 7132.30 7761.05 240720.3 2000
bind_rows 976.4 1186.90 1430.7741 1308.85 1469.80 15817.9 2000
rbindlist 263.6 347.85 465.5894 392.90 459.95 10974.2 2000