Get list of pandas dataframe columns based on data type

PythonPandasDtype

Python Problem Overview


If I have a dataframe with the following columns:

1. NAME                                     object
2. On_Time                                      object
3. On_Budget                                    object
4. %actual_hr                                  float64
5. Baseline Start Date                  datetime64[ns]
6. Forecast Start Date                  datetime64[ns] 

I would like to be able to say: for this dataframe, give me a list of the columns which are of type 'object' or of type 'datetime'?

I have a function which converts numbers ('float64') to two decimal places, and I would like to use this list of dataframe columns, of a particular type, and run it through this function to convert them all to 2dp.

Maybe something like:

For c in col_list: if c.dtype = "Something"
list[]
List.append(c)?

Python Solutions


Solution 1 - Python

If you want a list of columns of a certain type, you can use groupby:

>>> df = pd.DataFrame([[1, 2.3456, 'c', 'd', 78]], columns=list("ABCDE"))
>>> df
   A       B  C  D   E
0  1  2.3456  c  d  78

[1 rows x 5 columns]
>>> df.dtypes
A      int64
B    float64
C     object
D     object
E      int64
dtype: object
>>> g = df.columns.to_series().groupby(df.dtypes).groups
>>> g
{dtype('int64'): ['A', 'E'], dtype('float64'): ['B'], dtype('O'): ['C', 'D']}
>>> {k.name: v for k, v in g.items()}
{'object': ['C', 'D'], 'int64': ['A', 'E'], 'float64': ['B']}

Solution 2 - Python

As of pandas v0.14.1, you can utilize select_dtypes() to select columns by dtype

In [2]: df = pd.DataFrame({'NAME': list('abcdef'),
    'On_Time': [True, False] * 3,
    'On_Budget': [False, True] * 3})

In [3]: df.select_dtypes(include=['bool'])
Out[3]:
  On_Budget On_Time
0     False    True
1      True   False
2     False    True
3      True   False
4     False    True
5      True   False

In [4]: mylist = list(df.select_dtypes(include=['bool']).columns)

In [5]: mylist
Out[5]: ['On_Budget', 'On_Time']

Solution 3 - Python

Using dtype will give you desired column's data type:

dataframe['column1'].dtype

if you want to know data types of all the column at once, you can use plural of dtype as dtypes:

dataframe.dtypes

Solution 4 - Python

list(df.select_dtypes(['object']).columns)

This should do the trick

Solution 5 - Python

You can use boolean mask on the dtypes attribute:

In [11]: df = pd.DataFrame([[1, 2.3456, 'c']])

In [12]: df.dtypes
Out[12]: 
0      int64
1    float64
2     object
dtype: object

In [13]: msk = df.dtypes == np.float64  # or object, etc.

In [14]: msk
Out[14]: 
0    False
1     True
2    False
dtype: bool

You can look at just those columns with the desired dtype:

In [15]: df.loc[:, msk]
Out[15]: 
        1
0  2.3456

Now you can use round (or whatever) and assign it back:

In [16]: np.round(df.loc[:, msk], 2)
Out[16]: 
      1
0  2.35

In [17]: df.loc[:, msk] = np.round(df.loc[:, msk], 2)

In [18]: df
Out[18]: 
   0     1  2
0  1  2.35  c

Solution 6 - Python

The most direct way to get a list of columns of certain dtype e.g. 'object':

df.select_dtypes(include='object').columns

For example:

>>df = pd.DataFrame([[1, 2.3456, 'c', 'd', 78]], columns=list("ABCDE"))
>>df.dtypes

A      int64
B    float64
C     object
D     object
E      int64
dtype: object

To get all 'object' dtype columns:

>>df.select_dtypes(include='object').columns

Index(['C', 'D'], dtype='object')

For just the list:

>>list(df.select_dtypes(include='object').columns)

['C', 'D']   

Solution 7 - Python

use df.info(verbose=True) where df is a pandas datafarme, by default verbose=False

Solution 8 - Python

If you want a list of only the object columns you could do:

non_numerics = [x for x in df.columns \
                if not (df[x].dtype == np.float64 \
                        or df[x].dtype == np.int64)]

and then if you want to get another list of only the numerics:

numerics = [x for x in df.columns if x not in non_numerics]

Solution 9 - Python

If after 6 years you still have the issue, this should solve it :)

cols = [c for c in df.columns if df[c].dtype in ['object', 'datetime64[ns]']]

Solution 10 - Python

I came up with this three liner.

Essentially, here's what it does:

  1. Fetch the column names and their respective data types.
  2. I am optionally outputting it to a csv.

inp = pd.read_csv('filename.csv') # read input. Add read_csv arguments as needed
columns = pd.DataFrame({'column_names': inp.columns, 'datatypes': inp.dtypes})
columns.to_csv(inp+'columns_list.csv', encoding='utf-8') # encoding is optional

This made my life much easier in trying to generate schemas on the fly. Hope this helps

Solution 11 - Python

for yoshiserry;

def col_types(x,pd):
    dtypes=x.dtypes
    dtypes_col=dtypes.index
    dtypes_type=dtypes.value
    column_types=dict(zip(dtypes_col,dtypes_type))
    return column_types

Solution 12 - Python

I use infer_objects()

> Docstring: Attempt to infer better dtypes for object columns. > > Attempts soft conversion of object-dtyped columns, leaving non-object > and unconvertible columns unchanged. The inference rules are the same > as during normal Series/DataFrame construction.

df.infer_objects().dtypes

Solution 13 - Python

df = pd.DataFrame({'float': [1.0],
                   'int': [1],
                   'bool_1': [False],
                   'datetime': [pd.Timestamp('20180310')],
                   'bool_2': [True],
                   'string': ['foo']})
df.dtypes

# float              float64
# int                  int64
# bool_1                bool
# datetime    datetime64[ns]
# bool_2                bool
# string              object
# dtype: object


[column for column, is_type in (df.dtypes==bool).items() if is_type]
# ['bool_1', 'bool_2']

Solution 14 - Python

Many of the posted solutions use df.select_dtypes which unnecessarily creates a temporary intermediate dataframe. If all you want is "a list of the columns which are of" non-numeric (not float32/int64/complex128/etc.) types, just do one of these (remove the "not" if you do want just the numeric types):

import numpy as np
[c for c in df.columns if not np.issubdtype(df[c].dtype, np.number)]
from pandas.api.types import is_numeric_dtype
[c for c in df.columns if not is_numeric_dtype(c)]

Note: if you want to distinguish floating (float32/float64) from integer and complex then you could use np.floating instead of np.number in the first of the two solutions above or in the first of the two just below.

If you want the result to be a pd.Index rather than just a list of column name strings as above, here are two ways (first is based on @juanpa.arrivillaga):

import numpy as np
df.columns[[not np.issubdtype(dt, np.number) for dt in df.dtypes]]
from pandas.api.types import is_numeric_dtype
df.columns[[not is_numeric_dtype(c) for c in df.columns]]

Some other methods may consider a bool column to be numeric, but the solutions above do not (tested with numpy 1.22.3 / pandas 1.4.2).

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionyoshiserryView Question on Stackoverflow
Solution 1 - PythonDSMView Answer on Stackoverflow
Solution 2 - PythonqmorganView Answer on Stackoverflow
Solution 3 - PythonAshish25View Answer on Stackoverflow
Solution 4 - PythonTanmoyView Answer on Stackoverflow
Solution 5 - PythonAndy HaydenView Answer on Stackoverflow
Solution 6 - PythonMLKingView Answer on Stackoverflow
Solution 7 - PythonKooView Answer on Stackoverflow
Solution 8 - Pythonuser4322543View Answer on Stackoverflow
Solution 9 - PythonRafael NevesView Answer on Stackoverflow
Solution 10 - PythongeekidharshView Answer on Stackoverflow
Solution 11 - PythonitthrillView Answer on Stackoverflow
Solution 12 - Pythonas - ifView Answer on Stackoverflow
Solution 13 - PythonOleksView Answer on Stackoverflow
Solution 14 - PythondabruView Answer on Stackoverflow