Convert number strings with commas in pandas DataFrame to float

PythonPandas

Python Problem Overview


I have a DataFrame that contains numbers as strings with commas for the thousands marker. I need to convert them to floats.

a = [['1,200', '4,200'], ['7,000', '-0.03'], [ '5', '0']]
df=pandas.DataFrame(a)

I am guessing I need to use locale.atof. Indeed

df[0].apply(locale.atof)

works as expected. I get a Series of floats.

But when I apply it to the DataFrame, I get an error.

df.apply(locale.atof)

>TypeError: ("cannot convert the series to ", u'occurred at index 0')

and

df[0:1].apply(locale.atof)

gives another error:

>ValueError: ('invalid literal for float(): 1,200', u'occurred at index 0')

So, how do I convert this DataFrame of strings to a DataFrame of floats?

Python Solutions


Solution 1 - Python

If you're reading in from csv then you can use the thousands arg:

df.read_csv('foo.tsv', sep='\t', thousands=',')

This method is likely to be more efficient than performing the operation as a separate step.


You need to set the locale first:

In [ 9]: import locale

In [10]: from locale import atof

In [11]: locale.setlocale(locale.LC_NUMERIC, '')
Out[11]: 'en_GB.UTF-8'

In [12]: df.applymap(atof)
Out[12]:
      0        1
0  1200  4200.00
1  7000    -0.03
2     5     0.00

Solution 2 - Python

You can convert one column at a time like this :

df['colname'] = df['colname'].str.replace(',', '').astype(float)

Solution 3 - Python

You may use the pandas.Series.str.replace method:

df.iloc[:,:].str.replace(',', '').astype(float)

This method can remove or replace the comma in the string.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionpheonView Question on Stackoverflow
Solution 1 - PythonAndy HaydenView Answer on Stackoverflow
Solution 2 - Pythonghollah kiokoView Answer on Stackoverflow
Solution 3 - Pythonshen keView Answer on Stackoverflow