Combine Date and Time columns using python pandas
PythonPandasDatetimeTime SeriesPython Problem Overview
I have a pandas dataframe with the following columns:
data = {'Date': ['01-06-2013', '02-06-2013', '02-06-2013', '02-06-2013', '02-06-2013', '03-06-2013', '03-06-2013', '03-06-2013', '03-06-2013', '04-06-2013'],
'Time': ['23:00:00', '01:00:00', '21:00:00', '22:00:00', '23:00:00', '01:00:00', '21:00:00', '22:00:00', '23:00:00', '01:00:00']}
df = pd.DataFrame(data)
Date Time
0 01-06-2013 23:00:00
1 02-06-2013 01:00:00
2 02-06-2013 21:00:00
3 02-06-2013 22:00:00
4 02-06-2013 23:00:00
5 03-06-2013 01:00:00
6 03-06-2013 21:00:00
7 03-06-2013 22:00:00
8 03-06-2013 23:00:00
9 04-06-2013 01:00:00
How do I combine data['Date'] & data['Time'] to get the following? Is there a way of doing it using pd.to_datetime
?
Date
01-06-2013 23:00:00
02-06-2013 01:00:00
02-06-2013 21:00:00
02-06-2013 22:00:00
02-06-2013 23:00:00
03-06-2013 01:00:00
03-06-2013 21:00:00
03-06-2013 22:00:00
03-06-2013 23:00:00
04-06-2013 01:00:00
Python Solutions
Solution 1 - Python
It's worth mentioning that you may have been able to read this in directly e.g. if you were using read_csv
using parse_dates=[['Date', 'Time']]
.
Assuming these are just strings you could simply add them together (with a space), allowing you to use to_datetime
, which works without specifying the format=
parameter
In [11]: df['Date'] + ' ' + df['Time']
Out[11]:
0 01-06-2013 23:00:00
1 02-06-2013 01:00:00
2 02-06-2013 21:00:00
3 02-06-2013 22:00:00
4 02-06-2013 23:00:00
5 03-06-2013 01:00:00
6 03-06-2013 21:00:00
7 03-06-2013 22:00:00
8 03-06-2013 23:00:00
9 04-06-2013 01:00:00
dtype: object
In [12]: pd.to_datetime(df['Date'] + ' ' + df['Time'])
Out[12]:
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
dtype: datetime64[ns]
Alternatively, without the + ' '
, but the format=
parameter must be used. Additionally, pandas is good at inferring the format to be converted to a datetime
, however, specifying the exact format is faster.
pd.to_datetime(df['Date'] + df['Time'], format='%m-%d-%Y%H:%M:%S')
Note: surprisingly (for me), this works fine with NaNs being converted to NaT, but it is worth worrying that the conversion (perhaps using the raise
argument).
%%timeit
# sample dataframe with 10000000 rows using df from the OP
df = pd.concat([df for _ in range(1000000)]).reset_index(drop=True)
%%timeit
pd.to_datetime(df['Date'] + ' ' + df['Time'])
[result]:
1.73 s ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
pd.to_datetime(df['Date'] + df['Time'], format='%m-%d-%Y%H:%M:%S')
[result]:
1.33 s ± 9.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Solution 2 - Python
The accepted answer works for columns that are of datatype string
. For completeness: I come across this question when searching how to do this when the columns are of datatypes: date and time.
df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']),1)
Solution 3 - Python
Cast the columns if the types are different (datetime
and timestamp
or str
) and use to_datetime
:
df.loc[:,'Date'] = pd.to_datetime(df.Date.astype(str)+' '+df.Time.astype(str))
Result :
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
Best,
Solution 4 - Python
You can use this to merge date and time into the same column of dataframe.
import pandas as pd
data_file = 'data.csv' #path of your file
Reading .csv file with merged columns Date_Time:
data = pd.read_csv(data_file, parse_dates=[['Date', 'Time']])
You can use this line to keep both other columns also.
data.set_index(['Date', 'Time'], drop=False)
Solution 5 - Python
I don't have enough reputation to comment on jka.ne so:
I had to amend jka.ne's line for it to work:
df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']).time(),1)
This might help others.
Also, I have tested a different approach, using replace
instead of combine
:
def combine_date_time(df, datecol, timecol):
return df.apply(lambda row: row[datecol].replace(
hour=row[timecol].hour,
minute=row[timecol].minute),
axis=1)
which in the OP's case would be:
combine_date_time(df, 'Date', 'Time')
I have timed both approaches for a relatively large dataset (>500.000 rows), and they both have similar runtimes, but using combine
is faster (59s for replace
vs 50s for combine
).
Solution 6 - Python
You can also convert to datetime
without string concatenation, by combining to_datetime
and to_timedelta
, which create datetime
and timedeltea
objects, respectively. Combined with pd.DataFrame.pop
, you can remove the source Series simultaneously:
df['DateTime'] = pd.to_datetime(df.pop('Date')) + pd.to_timedelta(df.pop('Time'))
print(df)
DateTime
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
print(df.dtypes)
DateTime datetime64[ns]
dtype: object
Solution 7 - Python
The answer really depends on what your column types are. In my case, I had datetime
and timedelta
.
> df[['Date','Time']].dtypes
Date datetime64[ns]
Time timedelta64[ns]
If this is your case, then you just need to add the columns:
> df['Date'] + df['Time']
Solution 8 - Python
First make sure to have the right data types:
df["Date"] = pd.to_datetime(df["Date"])
df["Time"] = pd.to_timedelta(df["Time"])
Then you easily combine them:
df["DateTime"] = df["Date"] + df["Time"]
Solution 9 - Python
Use the combine
function:
datetime.datetime.combine(date, time)
Solution 10 - Python
Here is a one liner, to do it. You simply concatenate the two string in each of the column with a " " space in between.
Say df is your dataframe and columns are 'Time' and 'Date'. And your new column is DateAndTime.
df['DateAndTime'] = df['Date'].str.cat(df['Time'],sep=" ")
And if you also wanna handle entries like datetime objects, you can do this. You can tweak the formatting as per your needs.
df['DateAndTime'] = pd.to_datetime(df['DateAndTime'], format="%m/%d/%Y %I:%M:%S %p")
Cheers!! Happy Data Crunching.
Solution 11 - Python
My dataset had 1second resolution data for a few days and parsing by the suggested methods here was very slow. Instead I used:
dates = pandas.to_datetime(df.Date, cache=True)
times = pandas.to_timedelta(df.Time)
datetimes = dates + times
Note the use of cache=True
makes parsing the dates very efficient since there are only a couple unique dates in my files, which is not true for a combined date and time column.
Solution 12 - Python
DATA:
<TICKER>,<PER>,<DATE>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL> SPFB.RTS,1,20190103,100100,106580.0000000,107260.0000000,106570.0000000,107230.0000000,3726
CODE:
data.columns = ['ticker', 'per', 'date', 'time', 'open', 'high', 'low', 'close', 'vol']
data.datetime = pd.to_datetime(data.date.astype(str) + ' ' + data.time.astype(str), format='%Y%m%d %H%M%S')
Solution 13 - Python
I think the best solution is to parse dates within read_csv
(or other read_ functions) directly. It is not obvious how to manage two columns in date_parser but here it is:
date_parser = lambda x,y: datetime.strptime(f"{x}T{y}", "%d-%m-%YT%H:%M:%S")
date = pd.read_csv('data.csv', parse_dates={'date': ['Date', 'Time']}, date_parser=date_parser)