Converting strings to floats in a DataFrame

PythonPandas

Python Problem Overview


How to covert a DataFrame column containing strings and NaN values to floats. And there is another column whose values are strings and floats; how to convert this entire column to floats.

Python Solutions


Solution 1 - Python

> NOTE: pd.convert_objects has now been deprecated. You should use pd.Series.astype(float) or pd.to_numeric as described in other > answers.

This is available in 0.11. Forces conversion (or set's to nan) This will work even when astype will fail; its also series by series so it won't convert say a complete string column

In [10]: df = DataFrame(dict(A = Series(['1.0','1']), B = Series(['1.0','foo'])))

In [11]: df
Out[11]: 
     A    B
0  1.0  1.0
1    1  foo

In [12]: df.dtypes
Out[12]: 
A    object
B    object
dtype: object

In [13]: df.convert_objects(convert_numeric=True)
Out[13]: 
   A   B
0  1   1
1  1 NaN

In [14]: df.convert_objects(convert_numeric=True).dtypes
Out[14]: 
A    float64
B    float64
dtype: object

Solution 2 - Python

You can try df.column_name = df.column_name.astype(float). As for the NaN values, you need to specify how they should be converted, but you can use the .fillna method to do it.

Example:

In [12]: df
Out[12]: 
     a    b
0  0.1  0.2
1  NaN  0.3
2  0.4  0.5

In [13]: df.a.values
Out[13]: array(['0.1', nan, '0.4'], dtype=object)

In [14]: df.a = df.a.astype(float).fillna(0.0)

In [15]: df
Out[15]: 
     a    b
0  0.1  0.2
1  0.0  0.3
2  0.4  0.5

In [16]: df.a.values
Out[16]: array([ 0.1,  0. ,  0.4])

Solution 3 - Python

In a newer version of pandas (0.17 and up), you can use to_numeric function. It allows you to convert the whole dataframe or just individual columns. It also gives you an ability to select how to treat stuff that can't be converted to numeric values:

import pandas as pd
s = pd.Series(['1.0', '2', -3])
pd.to_numeric(s)
s = pd.Series(['apple', '1.0', '2', -3])
pd.to_numeric(s, errors='ignore')
pd.to_numeric(s, errors='coerce')

Solution 4 - Python

df['MyColumnName'] = df['MyColumnName'].astype('float64') 

Solution 5 - Python

you have to replace empty strings ('') with np.nan before converting to float. ie:

df['a']=df.a.replace('',np.nan).astype(float)

Solution 6 - Python

Here is an example

                            GHI	            Temp  Power	Day_Type
2016-03-15 06:00:00	-7.99999952505459e-7	18.3	0	NaN
2016-03-15 06:01:00	-7.99999952505459e-7	18.2	0	NaN
2016-03-15 06:02:00	-7.99999952505459e-7	18.3	0	NaN
2016-03-15 06:03:00	-7.99999952505459e-7	18.3	0	NaN
2016-03-15 06:04:00	-7.99999952505459e-7	18.3	0	NaN

but if this is all string values...as was in my case... Convert the desired columns to floats:

df_inv_29['GHI'] = df_inv_29.GHI.astype(float)
df_inv_29['Temp'] = df_inv_29.Temp.astype(float)
df_inv_29['Power'] = df_inv_29.Power.astype(float)

Your dataframe will now have float values :-)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionNeerView Question on Stackoverflow
Solution 1 - PythonJeffView Answer on Stackoverflow
Solution 2 - PythonrootView Answer on Stackoverflow
Solution 3 - PythonSalvador DaliView Answer on Stackoverflow
Solution 4 - PythonClaude COULOMBEView Answer on Stackoverflow
Solution 5 - PythonPaul MwanikiView Answer on Stackoverflow
Solution 6 - PythonArmandduPlessisView Answer on Stackoverflow