How do I convert strings in a Pandas data frame to a 'date' data type?

PythonDatePandas

Python Problem Overview


I have a Pandas data frame, one of the column contains date strings in the format YYYY-MM-DD

For e.g. '2013-10-28'

At the moment the dtype of the column is object.

How do I convert the column values to Pandas date format?

Python Solutions


Solution 1 - Python

Essentially equivalent to @waitingkuo, but I would use pd.to_datetime here (it seems a little cleaner, and offers some additional functionality e.g. dayfirst):

In [11]: df
Out[11]:
   a        time
0  1  2013-01-01
1  2  2013-01-02
2  3  2013-01-03

In [12]: pd.to_datetime(df['time'])
Out[12]:
0   2013-01-01 00:00:00
1   2013-01-02 00:00:00
2   2013-01-03 00:00:00
Name: time, dtype: datetime64[ns]

In [13]: df['time'] = pd.to_datetime(df['time'])

In [14]: df
Out[14]:
   a                time
0  1 2013-01-01 00:00:00
1  2 2013-01-02 00:00:00
2  3 2013-01-03 00:00:00

Handling ValueErrors
If you run into a situation where doing

df['time'] = pd.to_datetime(df['time'])

Throws a

ValueError: Unknown string format

That means you have invalid (non-coercible) values. If you are okay with having them converted to pd.NaT, you can add an errors='coerce' argument to to_datetime:

df['time'] = pd.to_datetime(df['time'], errors='coerce')

Solution 2 - Python

Use astype

In [31]: df
Out[31]: 
   a        time
0  1  2013-01-01
1  2  2013-01-02
2  3  2013-01-03

In [32]: df['time'] = df['time'].astype('datetime64[ns]')

In [33]: df
Out[33]: 
   a                time
0  1 2013-01-01 00:00:00
1  2 2013-01-02 00:00:00
2  3 2013-01-03 00:00:00

Solution 3 - Python

I imagine a lot of data comes into Pandas from CSV files, in which case you can simply convert the date during the initial CSV read:

dfcsv = pd.read_csv('xyz.csv', parse_dates=[0]) where the 0 refers to the column the date is in.
You could also add , index_col=0 in there if you want the date to be your index.

See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

Solution 4 - Python

Now you can do df['column'].dt.date

Note that for datetime objects, if you don't see the hour when they're all 00:00:00, that's not pandas. That's iPython notebook trying to make things look pretty.

Solution 5 - Python

If you want to get the DATE and not DATETIME format:

df["id_date"] = pd.to_datetime(df["id_date"]).dt.date

Solution 6 - Python

Another way to do this and this works well if you have multiple columns to convert to datetime.

cols = ['date1','date2']
df[cols] = df[cols].apply(pd.to_datetime)

Solution 7 - Python

It may be the case that dates need to be converted to a different frequency. In this case, I would suggest setting an index by dates.

#set an index by dates
df.set_index(['time'], drop=True, inplace=True)

After this, you can more easily convert to the type of date format you will need most. Below, I sequentially convert to a number of date formats, ultimately ending up with a set of daily dates at the beginning of the month.

#Convert to daily dates
df.index = pd.DatetimeIndex(data=df.index)

#Convert to monthly dates
df.index = df.index.to_period(freq='M')

#Convert to strings
df.index = df.index.strftime('%Y-%m')

#Convert to daily dates
df.index = pd.DatetimeIndex(data=df.index)

For brevity, I don't show that I run the following code after each line above:

print(df.index)
print(df.index.dtype)
print(type(df.index))

This gives me the following output:

Index(['2013-01-01', '2013-01-02', '2013-01-03'], dtype='object', name='time')
object
<class 'pandas.core.indexes.base.Index'>

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03'], dtype='datetime64[ns]', name='time', freq=None)
datetime64[ns]
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>

PeriodIndex(['2013-01', '2013-01', '2013-01'], dtype='period[M]', name='time', freq='M')
period[M]
<class 'pandas.core.indexes.period.PeriodIndex'>

Index(['2013-01', '2013-01', '2013-01'], dtype='object')
object
<class 'pandas.core.indexes.base.Index'>

DatetimeIndex(['2013-01-01', '2013-01-01', '2013-01-01'], dtype='datetime64[ns]', freq=None)
datetime64[ns]
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>

Solution 8 - Python

For the sake of completeness, another option, which might not be the most straightforward one, a bit similar to the one proposed by @SSS, but using rather the datetime library is:

import datetime
df["Date"] = df["Date"].apply(lambda x: datetime.datetime.strptime(x, '%Y-%d-%m').date())

Solution 9 - Python

 #   Column          Non-Null Count   Dtype         
---  ------          --------------   -----         
 0   startDay        110526 non-null  object
 1   endDay          110526 non-null  object

import pandas as pd

df['startDay'] = pd.to_datetime(df.startDay)

df['endDay'] = pd.to_datetime(df.endDay)

 #   Column          Non-Null Count   Dtype         
---  ------          --------------   -----         
 0   startDay        110526 non-null  datetime64[ns]
 1   endDay          110526 non-null  datetime64[ns]

Solution 10 - Python

Try to convert one of the rows into timestamp using the pd.to_datetime function and then use .map to map the formular to the entire column

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser7289View Question on Stackoverflow
Solution 1 - PythonAndy HaydenView Answer on Stackoverflow
Solution 2 - PythonwaitingkuoView Answer on Stackoverflow
Solution 3 - PythonfantabolousView Answer on Stackoverflow
Solution 4 - PythonszeitlinView Answer on Stackoverflow
Solution 5 - PythonDavid Valenzuela UrrutiaView Answer on Stackoverflow
Solution 6 - PythonSSSView Answer on Stackoverflow
Solution 7 - PythonTed M.View Answer on Stackoverflow
Solution 8 - PythonrubebopView Answer on Stackoverflow
Solution 9 - PythondonDreyView Answer on Stackoverflow
Solution 10 - PythonMwanaidi NicoleView Answer on Stackoverflow