subtract value from previous row by group
RDataframeLagR Problem Overview
In R, let's say I have this data frame:
Data
id date value
2380 10/30/12 21.01
2380 10/31/12 22.04
2380 11/1/12 22.65
2380 11/2/12 23.11
20100 10/30/12 35.21
20100 10/31/12 37.07
20100 11/1/12 38.17
20100 11/2/12 38.97
20103 10/30/12 57.98
20103 10/31/12 60.83
And I want to subtract the previous value from the current value, by group ID date, to create this:
id date value diff
2380 10/30/12 21.01 0
2380 10/31/12 22.04 1.03
2380 11/1/12 22.65 0.61
2380 11/2/12 23.11 0.46
20100 10/30/12 35.21 0
20100 10/31/12 37.07 1.86
20100 11/1/12 38.17 1.1
20100 11/2/12 38.97 0.8
20103 10/30/12 57.98 0
20103 10/31/12 60.83 2.85
R Solutions
Solution 1 - R
With dplyr
:
library(dplyr)
data %>%
group_by(id) %>%
arrange(date) %>%
mutate(diff = value - lag(value, default = first(value)))
For clarity you can arrange
by date
and grouping column (as per comment by lawyer)
data %>%
group_by(id) %>%
arrange(date, .by_group = TRUE) %>%
mutate(diff = value - lag(value, default = first(value)))
or lag
with order_by
:
data %>%
group_by(id) %>%
mutate(diff = value - lag(value, default = first(value), order_by = date))
With data.table
:
library(data.table)
dt <- as.data.table(data)
setkey(dt, id, date)
dt[, diff := value - shift(value, fill = first(value)), by = id]
Solution 2 - R
You can do this with the ave
function:
data$diff <- ave(data$value, data$id, FUN=function(x) c(0, diff(x)))
data
# id date value diff
# 1 2380 2012-10-30 00:15:51 21.01 0.00
# 2 2380 2012-10-31 00:31:03 22.04 1.03
# 3 2380 2012-11-01 00:16:02 22.65 0.61
# 4 2380 2012-11-02 00:15:32 23.11 0.46
# 5 20100 2012-10-30 00:15:38 35.21 0.00
# 6 20100 2012-10-31 00:15:48 37.07 1.86
# 7 20100 2012-11-01 00:15:49 38.17 1.10
# 8 20100 2012-11-02 00:15:19 38.97 0.80
# 9 20103 2012-10-30 10:27:34 57.98 0.00
# 10 20103 2012-10-31 12:24:42 60.83 2.85
The first argument is the data to be operated on, the second argument is the group, and the last argument is the function to be applied to the data from each group.
Solution 3 - R
Awesome answers! Just wanted to add that if you want to make your data consecutively to work with the code above you can do that with order, e.g.:
data <- data[with{data, order(id, date)), ]
data$diff <- ave(data$value, data$id, FUN=function(x) c(0, diff(x)))