No numeric types to aggregate - change in groupby() behaviour?

PythonPandas

Python Problem Overview


I have a problem with some groupy code which I'm quite sure once ran (on an older pandas version). On 0.9, I get No numeric types to aggregate errors. Any ideas?

In [31]: data
Out[31]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2557 entries, 2004-01-01 00:00:00 to 2010-12-31 00:00:00
Freq: <1 DateOffset>
Columns: 360 entries, -89.75 to 89.75
dtypes: object(360)

In [32]: latedges = linspace(-90., 90., 73)

In [33]: lats_new = linspace(-87.5, 87.5, 72)

In [34]: def _get_gridbox_label(x, bins, labels):
   ....:             return labels[searchsorted(bins, x) - 1]
   ....: 

In [35]: lat_bucket = lambda x: _get_gridbox_label(x, latedges, lats_new)

In [36]: data.T.groupby(lat_bucket).mean()
---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-36-ed9c538ac526> in <module>()
----> 1 data.T.groupby(lat_bucket).mean()

/usr/lib/python2.7/site-packages/pandas/core/groupby.py in mean(self)
    295         """
    296         try:
--> 297             return self._cython_agg_general('mean')
    298         except DataError:
    299             raise

/usr/lib/python2.7/site-packages/pandas/core/groupby.py in _cython_agg_general(self, how, numeric_only)
   1415 
   1416     def _cython_agg_general(self, how, numeric_only=True):
-> 1417         new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)
   1418         return self._wrap_agged_blocks(new_blocks)
   1419 

/usr/lib/python2.7/site-packages/pandas/core/groupby.py in _cython_agg_blocks(self, how, numeric_only)
   1455 
   1456         if len(new_blocks) == 0:
-> 1457             raise DataError('No numeric types to aggregate')
   1458 
   1459         return new_blocks

DataError: No numeric types to aggregate

Python Solutions


Solution 1 - Python

How are you generating your data?

See how the output shows that your data is of 'object' type? the groupby operations specifically check whether each column is a numeric dtype first.

In [31]: data
Out[31]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2557 entries, 2004-01-01 00:00:00 to 2010-12-31 00:00:00
Freq: <1 DateOffset>
Columns: 360 entries, -89.75 to 89.75
dtypes: object(360)

look ↑


Did you initialize an empty DataFrame first and then filled it? If so that's probably why it changed with the new version as before 0.9 empty DataFrames were initialized to float type but now they are of object type. If so you can change the initialization to DataFrame(dtype=float).

You can also call frame.astype(float)

Solution 2 - Python

I got this error generating a data frame consisting of timestamps and data:

df = pd.DataFrame({'data':value}, index=pd.DatetimeIndex(timestamp))

Adding the suggested solution works for me:

df = pd.DataFrame({'data':value}, index=pd.DatetimeIndex(timestamp), dtype=float))

Thanks Chang She!

Example:

                     data
2005-01-01 00:10:00  7.53
2005-01-01 00:20:00  7.54
2005-01-01 00:30:00  7.62
2005-01-01 00:40:00  7.68
2005-01-01 00:50:00  7.81
2005-01-01 01:00:00  7.95
2005-01-01 01:10:00  7.96
2005-01-01 01:20:00  7.95
2005-01-01 01:30:00  7.98
2005-01-01 01:40:00  8.06
2005-01-01 01:50:00  8.04
2005-01-01 02:00:00  8.06
2005-01-01 02:10:00  8.12
2005-01-01 02:20:00  8.12
2005-01-01 02:30:00  8.25
2005-01-01 02:40:00  8.27
2005-01-01 02:50:00  8.17
2005-01-01 03:00:00  8.21
2005-01-01 03:10:00  8.29
2005-01-01 03:20:00  8.31
2005-01-01 03:30:00  8.25
2005-01-01 03:40:00  8.19
2005-01-01 03:50:00  8.17
2005-01-01 04:00:00  8.18
                     data
2005-01-01 00:00:00  7.636000
2005-01-01 01:00:00  7.990000
2005-01-01 02:00:00  8.165000
2005-01-01 03:00:00  8.236667
2005-01-01 04:00:00  8.180000

Solution 3 - Python

I got this done by :

data_frame.groupby(COL1).COL2.apply(np.mean).reset_index()

Solution 4 - Python

Got the same problem here, searched for so long just to realize my values were not floats but strings.

Here is what solved my issue:

df["column_name"] = pd.to_numeric(df["column_name"], downcast="float")

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionandreas-hView Question on Stackoverflow
Solution 1 - PythonChang SheView Answer on Stackoverflow
Solution 2 - Pythonuser4952560View Answer on Stackoverflow
Solution 3 - Pythonhassan firasatView Answer on Stackoverflow
Solution 4 - PythonAugustinView Answer on Stackoverflow