calculating number of days between 2 columns of dates in data frame

RDate

R Problem Overview


I have a data frame which has two columns of dates in the format yyyy/mm/dd. I am trying to calculate the number of days between these two dates for each observation within the data frame (and create a new variable with this number of days in it).

So far I have tried using the answer given here:

https://stackoverflow.com/questions/5046708/calculate-the-number-of-weekdays-between-2-dates-in-r

but editing the code so that it calculates number of total days rather than just the number of week days. This just came up with error saying:

Error in del/by : non-numeric argument to binary operator
In addition: Warning message:
In Ops.factor(to, from) : - not meaningful for factors

I have also attempted using this code:

finish <- as.Date(survey$date, format="%yyyy/%mm/%dd")
start <- as.Date(survey$tx_start, format="%yyyy/%mm/%dd")
date_diff<-as.data.frame(finish-start)

with the plan of using "cbind" or something equivalent to combine the data frame "days" to my data frame "survey" which has data in it.

Although this does not give me any errors, the observations within the "finish" and "start" objects are all "NA_real_", and the date_diff data frame therefore has all the observations listed as NA.

If someone could point me in the right direction that would be great! All the other question I have found do not seem to be dealing with dates within variables but as individual dates, and applying those techniques to variables hasn`t been working for me.

R Solutions


Solution 1 - R

Without your seeing your data (you can use the output of dput(head(survey)) to show us) this is a shot in the dark:

survey <- data.frame(date=c("2012/07/26","2012/07/25"),tx_start=c("2012/01/01","2012/01/01"))

survey$date_diff <- as.Date(as.character(survey$date), format="%Y/%m/%d")-
                  as.Date(as.character(survey$tx_start), format="%Y/%m/%d")
survey
       date   tx_start date_diff
1 2012/07/26 2012/01/01  207 days
2 2012/07/25 2012/01/01  206 days

Solution 2 - R

You could find the difference between dates in columns in a data frame by using the function difftime as follows:

df$diff_in_days<- difftime(df$datevar1 ,df$datevar2 , units = c("days"))

Solution 3 - R

You need to use the as.Date formats correctly.

Eg.

x = '2012/07/25'
xd = as.Date(x,'%Y/%m/%d')
xd    # Prints "2012-07-25"

R date formats are similary to *nix ones.

Doing a typeof(xd) shows it as a double ie. days since 1970.

Solution 4 - R

In Ronald's example, if the date formats are different (as displayed below) then modify the format parameter

survey <- data.frame(date=c("2012-07-26","2012-07-25"),tx_start=c("2012-01-01","2012-01-01"))

survey$date_diff <- as.Date(as.character(survey$date), format="%Y-%m-%d")-
              as.Date(as.character(survey$tx_start), format="%Y-%m-%d")

survey:

   date      tx_start     date_diff
1 2012-07-26     2012-01-01    207 days
2 2012-07-25     2012-01-01    206 days

Solution 5 - R

Following Ronald Example I would like to add that it should be considered if the origin and end dates must be included or not in the days count between two dates. I faced the same problem and ended up using a third option with apply. It could be memory inefficient but helps to understand the problem:

   survey <- data.frame(date=c("2012/07/26","2012/07/25"),tx_start=c("2012/01/01","2012/01/01"))

survey$diff_1 <- as.numeric(
  as.Date(as.character(survey$date), format="%Y/%m/%d")-
    as.Date(as.character(survey$tx_start), format="%Y/%m/%d")
)

survey$diff_2<- as.numeric(
  difftime(survey$date ,survey$tx_start , units = c("days"))
)

survey$diff_3 <- apply(X = survey[,c("date", "tx_start")],
                       MARGIN = 1,
                       FUN = function(x)
                         length(
                           seq.Date(
                             from = as.Date(x[2]),
                             to = as.Date(x[1]),
                             by = "day")
                           )
                       )

This gives the following date differences:

        date   tx_start diff_1   diff_2 diff_3
1 2012/07/26 2012/01/01    207 206.9583    208
2 2012/07/25 2012/01/01    206 205.9583    207

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTimothy AlstonView Question on Stackoverflow
Solution 1 - RRolandView Answer on Stackoverflow
Solution 2 - R26laneView Answer on Stackoverflow
Solution 3 - RPete855217View Answer on Stackoverflow
Solution 4 - RMotomanView Answer on Stackoverflow
Solution 5 - RalvaroprView Answer on Stackoverflow