converting currency with $ to numbers in Python pandas

PythonPython 2.7Pandas

Python Problem Overview


I have the following data in pandas dataframe:

	state	     1st	    2nd	            3rd
0	California	$11,593,820	$109,264,246	$8,496,273
1	New York	$10,861,680	$45,336,041	    $6,317,300
2	Florida	    $7,942,848	$69,369,589	    $4,697,244
3	Texas	    $7,536,817	$61,830,712	    $5,736,941

I want to perform some simple analysis (e.g., sum, groupby) with three columns (1st, 2nd, 3rd), but the data type of those three columns is object (or string).

So I used the following code for data conversion:

data = data.convert_objects(convert_numeric=True)

But, conversion does not work, perhaps, due to the dollar sign. Any suggestion?

Python Solutions


Solution 1 - Python

@EdChum's answer is clever and works well. But since there's more than one way to bake a cake.... why not use regex? For example:

df[df.columns[1:]] = df[df.columns[1:]].replace('[\$,]', '', regex=True).astype(float)

To me, that is a little bit more readable.

Solution 2 - Python

You can also use locale as follows

import locale
import pandas as pd
locale.setlocale(locale.LC_ALL,'')
df['1st']=df.1st.map(lambda x: locale.atof(x.strip('$')))

Note the above code was tested in Python 3 and Windows environment

Solution 3 - Python

You can use the vectorised str methods to replace the unwanted characters and then cast the type to int:

In [81]:
df[df.columns[1:]] = df[df.columns[1:]].apply(lambda x: x.str.replace('$','')).apply(lambda x: x.str.replace(',','')).astype(np.int64)
df

Out[81]:
            state       1st        2nd      3rd
index                                          
0      California  11593820  109264246  8496273
1        New York  10861680   45336041  6317300
2         Florida   7942848   69369589  4697244
3           Texas   7536817   61830712  5736941

dtype change is now confirmed:

In [82]:

df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 4 columns):
state    4 non-null object
1st      4 non-null int64
2nd      4 non-null int64
3rd      4 non-null int64
dtypes: int64(3), object(1)
memory usage: 160.0+ bytes

Another way:

In [108]:

df[df.columns[1:]] = df[df.columns[1:]].apply(lambda x: x.str[1:].str.split(',').str.join('')).astype(np.int64)
df
Out[108]:
            state       1st        2nd      3rd
index                                          
0      California  11593820  109264246  8496273
1        New York  10861680   45336041  6317300
2         Florida   7942848   69369589  4697244
3           Texas   7536817   61830712  5736941

Solution 4 - Python

To convert into integer, use:

carSales["Price"] = carSales["Price"].replace("[$,]", "", regex=True).astype(int)

Solution 5 - Python

You can use the methodstr.replace and the regex '\D' to remove all nondigit characters or '[^-.0-9]' to keep minus signs, decimal points and digits:

for col in df.columns[1:]:
    df[col] = pd.to_numeric(df[col].str.replace('[^-.0-9]', ''))

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionkevinView Question on Stackoverflow
Solution 1 - PythondagrhaView Answer on Stackoverflow
Solution 2 - PythonsushmitView Answer on Stackoverflow
Solution 3 - PythonEdChumView Answer on Stackoverflow
Solution 4 - Pythonrohit01View Answer on Stackoverflow
Solution 5 - PythonMykola ZotkoView Answer on Stackoverflow