subtract value from previous row by group

RDataframeLag

R Problem Overview


In R, let's say I have this data frame:

Data
id		date		value
2380	10/30/12	21.01
2380	10/31/12	22.04
2380	11/1/12		22.65
2380	11/2/12		23.11
20100	10/30/12	35.21
20100	10/31/12	37.07
20100	11/1/12		38.17
20100	11/2/12		38.97
20103	10/30/12	57.98
20103	10/31/12	60.83 

And I want to subtract the previous value from the current value, by group ID date, to create this:

id		date		value	diff
2380	10/30/12	21.01	0
2380	10/31/12	22.04	1.03
2380	11/1/12		22.65	0.61
2380	11/2/12		23.11	0.46
20100	10/30/12	35.21	0
20100	10/31/12	37.07	1.86
20100	11/1/12		38.17	1.1
20100	11/2/12		38.97	0.8
20103	10/30/12	57.98	0
20103	10/31/12	60.83	2.85

R Solutions


Solution 1 - R

With dplyr:

library(dplyr)

data %>%
    group_by(id) %>%
    arrange(date) %>%
    mutate(diff = value - lag(value, default = first(value)))

For clarity you can arrange by date and grouping column (as per comment by lawyer)

data %>%
    group_by(id) %>%
    arrange(date, .by_group = TRUE) %>%
    mutate(diff = value - lag(value, default = first(value)))

or lag with order_by:

data %>%
    group_by(id) %>%
    mutate(diff = value - lag(value, default = first(value), order_by = date))

With data.table:

library(data.table)

dt <- as.data.table(data)
setkey(dt, id, date)
dt[, diff := value - shift(value, fill = first(value)), by = id]

Solution 2 - R

You can do this with the ave function:

data$diff <- ave(data$value, data$id, FUN=function(x) c(0, diff(x)))
data
#       id                date value diff
# 1   2380 2012-10-30 00:15:51 21.01 0.00
# 2   2380 2012-10-31 00:31:03 22.04 1.03
# 3   2380 2012-11-01 00:16:02 22.65 0.61
# 4   2380 2012-11-02 00:15:32 23.11 0.46
# 5  20100 2012-10-30 00:15:38 35.21 0.00
# 6  20100 2012-10-31 00:15:48 37.07 1.86
# 7  20100 2012-11-01 00:15:49 38.17 1.10
# 8  20100 2012-11-02 00:15:19 38.97 0.80
# 9  20103 2012-10-30 10:27:34 57.98 0.00
# 10 20103 2012-10-31 12:24:42 60.83 2.85

The first argument is the data to be operated on, the second argument is the group, and the last argument is the function to be applied to the data from each group.

Solution 3 - R

Awesome answers! Just wanted to add that if you want to make your data consecutively to work with the code above you can do that with order, e.g.:

data <- data[with{data, order(id, date)), ]
data$diff <- ave(data$value, data$id, FUN=function(x) c(0, diff(x)))

See: https://stackoverflow.com/questions/67383811/calculate-difference-between-values-in-rows-by-two-grouping-variables

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionhaithamView Question on Stackoverflow
Solution 1 - Rzero323View Answer on Stackoverflow
Solution 2 - RjosliberView Answer on Stackoverflow
Solution 3 - RKimView Answer on Stackoverflow