Check if dataframe column is Categorical

PythonPandas

Python Problem Overview


I can't seem to get a simple dtype check working with Pandas' improved Categoricals in v0.15+. Basically I just want something like is_categorical(column) -> True/False.

import pandas as pd
import numpy as np
import random

df = pd.DataFrame({
    'x': np.linspace(0, 50, 6),
    'y': np.linspace(0, 20, 6),
    'cat_column': random.sample('abcdef', 6)
})
df['cat_column'] = pd.Categorical(df2['cat_column'])

We can see that the dtype for the categorical column is 'category':

df.cat_column.dtype
Out[20]: category

And normally we can do a dtype check by just comparing to the name of the dtype:

df.x.dtype == 'float64'
Out[21]: True

But this doesn't seem to work when trying to check if the x column is categorical:

df.x.dtype == 'category'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-22-94d2608815c4> in <module>()
----> 1 df.x.dtype == 'category'

TypeError: data type "category" not understood

Is there any way to do these types of checks in pandas v0.15+?

Python Solutions


Solution 1 - Python

Use the name property to do the comparison instead, it should always work because it's just a string:

>>> import numpy as np
>>> arr = np.array([1, 2, 3, 4])
>>> arr.dtype.name
'int64'

>>> import pandas as pd
>>> cat = pd.Categorical(['a', 'b', 'c'])
>>> cat.dtype.name
'category'

So, to sum up, you can end up with a simple, straightforward function:

def is_categorical(array_like):
    return array_like.dtype.name == 'category'

Solution 2 - Python

First, the string representation of the dtype is 'category' and not 'categorical', so this works:

In [41]: df.cat_column.dtype == 'category'
Out[41]: True

But indeed, as you noticed, this comparison gives a TypeError for other dtypes, so you would have to wrap it with a try .. except .. block.


Other ways to check using pandas internals:

In [42]: isinstance(df.cat_column.dtype, pd.api.types.CategoricalDtype)
Out[42]: True

In [43]: pd.api.types.is_categorical_dtype(df.cat_column)
Out[43]: True

For non-categorical columns, those statements will return False instead of raising an error. For example:

In [44]: pd.api.types.is_categorical_dtype(df.x)
Out[44]: False

For much older version of pandas, replace pd.api.types in the above snippet with pd.core.common.

Solution 3 - Python

Just putting this here because pandas.DataFrame.select_dtypes() is what I was actually looking for:

df['column'].name in df.select_dtypes(include='category').columns

Thanks to @Jeff.

Solution 4 - Python

In my pandas version (v1.0.3), a shorter version of joris' answer is available.

df = pd.DataFrame({'noncat': [1, 2, 3], 'categ': pd.Categorical(['A', 'B', 'C'])})

print(isinstance(df.noncat.dtype, pd.CategoricalDtype))  # False
print(isinstance(df.categ.dtype, pd.CategoricalDtype))   # True

print(pd.CategoricalDtype.is_dtype(df.noncat)) # False
print(pd.CategoricalDtype.is_dtype(df.categ))  # True

Solution 5 - Python

I ran into this thread looking for the exact same functionality, and also found out another option, right from the pandas documentation here.

It looks like the canonical way to check if a pandas dataframe column is a categorical Series should be the following:

hasattr(column_to_check, 'cat')

So, as per the example given in the initial question, this would be:

hasattr(df.x, 'cat') #True

Solution 6 - Python

Nowadays you can use:

pandas.api.types.is_categorical_dtype(series)

Docs here: https://pandas.pydata.org/docs/reference/api/pandas.api.types.is_categorical_dtype.html

Available since at least pandas 1.0

Solution 7 - Python

Taking a look at @Jeff Tratner answer, since the condition df.cat_column.dtype == 'category' not needs to be True to be considered a column as cataegorical, I propose this considering categorical the dtypes within 'categorical_dtypes' list:

def is_cat(column):
    categorical_dtypes = ['object', 'category', 'bool']
    if column.dtype.name in categorical_dtypes:
        return True
    else:
        return False   

´´´

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMariusView Question on Stackoverflow
Solution 1 - PythonJeff TratnerView Answer on Stackoverflow
Solution 2 - PythonjorisView Answer on Stackoverflow
Solution 3 - PythonjorijnsmitView Answer on Stackoverflow
Solution 4 - PythonDieterDPView Answer on Stackoverflow
Solution 5 - PythonPierre MasséView Answer on Stackoverflow
Solution 6 - PythonjcdudeView Answer on Stackoverflow
Solution 7 - PythonMiguel GonzalezView Answer on Stackoverflow