Use a value from the previous row in an R data.table calculation
Rdata.tableR Problem Overview
I want to create a new column in a data.table calculated from the current value of one column and the previous of another. Is it possible to access previous rows?
E.g.:
> DT <- data.table(A=1:5, B=1:5*10, C=1:5*100)
> DT
A B C
1: 1 10 100
2: 2 20 200
3: 3 30 300
4: 4 40 400
5: 5 50 500
> DT[, D := C + BPreviousRow] # What is the correct code here?
The correct answer should be
> DT
A B C D
1: 1 10 100 NA
2: 2 20 200 210
3: 3 30 300 320
4: 4 40 400 430
5: 5 50 500 540
R Solutions
Solution 1 - R
With shift()
implemented in v1.9.6, this is quite straightforward.
DT[ , D := C + shift(B, 1L, type="lag")]
# or equivalently, in this case,
DT[ , D := C + shift(B)]
From NEWS:
> 5. New function shift()
implements fast lead/lag
of vector, list, data.frames or data.tables. It takes a type
argument which can be either "lag" (default) or "lead". It enables very convenient usage along with :=
or set()
. For example: DT[, (cols) := shift(.SD, 1L), by=id]
. Please have a look at ?shift
for more info.
See history for previous answers.
Solution 2 - R
Using dplyr
you could do:
mutate(DT, D = lag(B) + C)
Which gives:
# A B C D
#1: 1 10 100 NA
#2: 2 20 200 210
#3: 3 30 300 320
#4: 4 40 400 430
#5: 5 50 500 540
Solution 3 - R
Several folks have answered the specific question. See the code below for a general purpose function that I use in situations like this that may be helpful. Rather than just getting the prior row, you can go as many rows in the "past" or "future" as you'd like.
rowShift <- function(x, shiftLen = 1L) {
r <- (1L + shiftLen):(length(x) + shiftLen)
r[r<1] <- NA
return(x[r])
}
# Create column D by adding column C and the value from the previous row of column B:
DT[, D := C + rowShift(B,-1)]
# Get the Old Faithul eruption length from two events ago, and three events in the future:
as.data.table(faithful)[1:5,list(eruptLengthCurrent=eruptions,
eruptLengthTwoPrior=rowShift(eruptions,-2),
eruptLengthThreeFuture=rowShift(eruptions,3))]
## eruptLengthCurrent eruptLengthTwoPrior eruptLengthThreeFuture
##1: 3.600 NA 2.283
##2: 1.800 NA 4.533
##3: 3.333 3.600 NA
##4: 2.283 1.800 NA
##5: 4.533 3.333 NA
Solution 4 - R
Based on @Steve Lianoglou 's comment above, why not just:
DT[, D:= C + c(NA, B[.I - 1]) ]
# A B C D
# 1: 1 10 100 NA
# 2: 2 20 200 210
# 3: 3 30 300 320
# 4: 4 40 400 430
# 5: 5 50 500 540
And avoid using seq_len
or head
or any other function.
Solution 5 - R
Following Arun's solution, a similar results can be obtained without referring to to .N
> DT[, D := C + c(NA, head(B, -1))][]
A B C D
1: 1 10 100 NA
2: 2 20 200 210
3: 3 30 300 320
4: 4 40 400 430
5: 5 50 500 540
Solution 6 - R
Here is my intuitive solution:
#create data frame
df <- data.frame(A=1:5, B=seq(10,50,10), C=seq(100,500, 100))`
#subtract the shift from num rows
shift <- 1 #in this case the shift is 1
invshift <- nrow(df) - shift
#Now create the new column
df$D <- c(NA, head(df$B, invshift)+tail(df$C, invshift))`
Here invshift
, the number of rows minus 1, is 4. nrow(df)
provides you with the number of rows in a data frame or in a vector. Similarly, if you want to take still earlier values, subtract from nrow 2, 3, ...etc, and also put NA's accordingly at the beginning.
Solution 7 - R
it can be done in a loop.
# Create the column D
DT$D <- 0
# for every row in DT
for (i in 1:length(DT$A)) {
if(i==1) {
#using NA at first line
DT[i,4] <- NA
} else {
#D = C + BPreviousRow
DT[i,4] <- DT[i,3] + DT[(i-1), 2]
}
}
Using a for, you can even use the previous value of the row of this new column DT[(i-1), 4]