How do I tell if a column in a pandas dataframe is of type datetime? How do I tell if a column is numerical?

PythonPandasNumpyDataframe

Python Problem Overview


I am trying to filter the columns in a pandas dataframe based on whether they are of type date or not. I can figure out which ones are, but then would have to parse that output or manually select columns. I want to select date columns automatically. Here's what I have so far as an example - I'd want to only select the 'date_col' column in this case.

import pandas as pd
df = pd.DataFrame([['Feb-2017', 1, 2],
                   ['Mar-2017', 1, 2],
                   ['Apr-2017', 1, 2],
                   ['May-2017', 1, 2]], 
                  columns=['date_str', 'col1', 'col2'])
df['date_col'] = pd.to_datetime(df['date_str'])
df.dtypes

Out:

date_str            object
col1                 int64
col2                 int64
date_col    datetime64[ns]
dtype: object

Python Solutions


Solution 1 - Python

I just encountered this issue and found that @charlie-haley's answer isn't quite general enough for my use case. In particular np.datetime64 doesn't seem to match datetime64[ns, UTC].

df['date_col'] = pd.to_datetime(df['date_str'], utc=True)
print(df.date_str.dtype)  # datetime64[ns, UTC]

You could also extend the list of dtypes to include other types, but that doesn't seem like a good solution for future compatability, so I ended up using the is_datetime64_any_dtype function from the pandas api instead.

In:

from pandas.api.types import is_datetime64_any_dtype as is_datetime

df[[column for column in df.columns if is_datetime(df[column])]]

Out:

                   date_col
0 2017-02-01 00:00:00+00:00
1 2017-03-01 00:00:00+00:00
2 2017-04-01 00:00:00+00:00
3 2017-05-01 00:00:00+00:00

Solution 2 - Python

Pandas has a cool function called select_dtypes, which can take either exclude or include (or both) as parameters. It filters the dataframe based on dtypes. So in this case, you would want to include columns of dtype np.datetime64. To filter by integers, you would use [np.int64, np.int32, np.int16, np.int], for float: [np.float32, np.float64, np.float16, np.float], to filter by numerical columns only: [np.number].

df.select_dtypes(include=[np.datetime64])

Out:

 	date_col
0 	2017-02-01
1 	2017-03-01
2 	2017-04-01
3 	2017-05-01

In:

df.select_dtypes(include=[np.number])

Out:

 	col1 	col2
0 	1 	    2
1 	1 	    2
2 	1 	    2
3 	1 	    2

Solution 3 - Python

bit uglier Numpy alternative:

In [102]: df.loc[:, [np.issubdtype(t, np.datetime64) for t in df.dtypes]]
Out[102]:
    date_col
0 2017-02-01
1 2017-03-01
2 2017-04-01
3 2017-05-01

In [103]: df.loc[:, [np.issubdtype(t, np.number) for t in df.dtypes]]
Out[103]:
   col1  col2
0     1     2
1     1     2
2     1     2
3     1     2

Solution 4 - Python

This code automatically identify the date column and change datatype from object to 'datetime64[ns]'. Once you got date datatype you can easily perform other operations.

for col in data.columns:
    if data[col].dtype == 'object':
        try:
            data[col] = pd.to_datetime(data[col])
        except ValueError:
            pass

Solution 5 - Python

Recently I needed to check if any element of a column was a date or numeric

My approach was, try to convert to type (datetime or numeric), then check if any element is null

pd.to_datetime( data_temp.eval('col_name'), format='%d/%m/%Y', errors='coerce')

output:

0   2010-09-16
1   2010-09-16
2   2018-06-04
3          NaT
4          NaT
5   2018-11-30

Then use isnull() to check if the elements could be converted

pd.to_datetime( data_temp.eval('col_name'), format='%d/%m/%Y', errors='coerce').isnull().any()

This will return True because at last one element is null/NaT

To check for numerics

data_temp.eval('col_name').astype(str).str.isnumeric().all()

This will return True if all elements on the column are numeric

Both will return a numpy.bool_, but it can easily be converted to bool if needed

type(pd.to_datetime( data_temp.eval(name), format='%d/%m/%Y', errors='coerce').isnull().any())

output:

numpy.bool_

--

type(bool(pd.to_datetime( data_temp.eval(name), format='%d/%m/%Y', errors='coerce').isnull().any()))

output:

bool

Solution 6 - Python

This should work for datetime which have timezone info into it. Here I am converting a datetime object (initially stored as object), first into a datetime, then I am localizing it.

my initial datetime value looks like this
> 2021-06-15 23:35:00+05:30

for i,j in zip(data.dtypes.index,data.dtypes.values):
    if type(j) ==  pd.core.dtypes.dtypes.DatetimeTZDtype:
        data[i] = pd.to_datetime(data[i],utc=True)
        data[i] = data[i].dt.tz_convert(tz='Asia/Kolkata')
        data[i] = data[i].dt.tz_localize(tz=None)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionCharlie HaleyView Question on Stackoverflow
Solution 1 - PythonjsignellView Answer on Stackoverflow
Solution 2 - PythonCharlie HaleyView Answer on Stackoverflow
Solution 3 - PythonMaxU - stop genocide of UAView Answer on Stackoverflow
Solution 4 - PythonBhagwat ChateView Answer on Stackoverflow
Solution 5 - PythonGuilhermeView Answer on Stackoverflow
Solution 6 - PythonAbhishek SahuView Answer on Stackoverflow