python pandas replacing strings in dataframe with numbers

PythonReplaceDataframePandas

Python Problem Overview


Is there any way to use the mapping function or something better to replace values in an entire dataframe?

I only know how to perform the mapping on series.

I would like to replace the strings in the 'tesst' and 'set' column with a number for example set = 1, test =2

Here is a example of my dataset: (Original dataset is very large)

ds_r
  respondent  brand engine  country  aware  aware_2  aware_3  age tesst   set
0          a  volvo      p      swe      1        0        1   23   set   set
1          b  volvo   None      swe      0        0        1   45   set   set
2          c    bmw      p       us      0        0        1   56  test  test
3          d    bmw      p       us      0        1        1   43  test  test
4          e    bmw      d  germany      1        0        1   34   set   set
5          f   audi      d  germany      1        0        1   59   set   set
6          g  volvo      d      swe      1        0        0   65  test   set
7          h   audi      d      swe      1        0        0   78  test   set
8          i  volvo      d       us      1        1        1   32   set   set

Final result should be

 ds_r
  respondent  brand engine  country  aware  aware_2  aware_3  age  tesst  set
0          a  volvo      p      swe      1        0        1   23      1    1
1          b  volvo   None      swe      0        0        1   45      1    1
2          c    bmw      p       us      0        0        1   56      2    2
3          d    bmw      p       us      0        1        1   43      2    2
4          e    bmw      d  germany      1        0        1   34      1    1
5          f   audi      d  germany      1        0        1   59      1    1
6          g  volvo      d      swe      1        0        0   65      2    1
7          h   audi      d      swe      1        0        0   78      2    1
8          i  volvo      d       us      1        1        1   32      1    1

Python Solutions


Solution 1 - Python

What about DataFrame.replace?

In [9]: mapping = {'set': 1, 'test': 2}

In [10]: df.replace({'set': mapping, 'tesst': mapping})
Out[10]: 
   Unnamed: 0 respondent  brand engine  country  aware  aware_2  aware_3  age  \
0           0          a  volvo      p      swe      1        0        1   23   
1           1          b  volvo   None      swe      0        0        1   45   
2           2          c    bmw      p       us      0        0        1   56   
3           3          d    bmw      p       us      0        1        1   43   
4           4          e    bmw      d  germany      1        0        1   34   
5           5          f   audi      d  germany      1        0        1   59   
6           6          g  volvo      d      swe      1        0        0   65   
7           7          h   audi      d      swe      1        0        0   78   
8           8          i  volvo      d       us      1        1        1   32   

  tesst set  
0     2   1  
1     1   2  
2     2   1  
3     1   2  
4     2   1  
5     1   2  
6     2   1  
7     1   2  
8     2   1  

As @Jeff pointed out in the comments, in pandas versions < 0.11.1, manually tack .convert_objects() onto the end to properly convert tesst and set to int64 columns, in case that matters in subsequent operations.

Solution 2 - Python

I know this is old, but adding for those searching as I was. Create a dataframe in pandas, df in this code

ip_addresses = df.source_ip.unique()
ip_dict = dict(zip(ip_addresses, range(len(ip_addresses))))

That will give you a dictionary map of the ip addresses without having to write it out.

Solution 3 - Python

You can use the applymap DataFrame function to do this:

In [26]: df = DataFrame({"A": [1,2,3,4,5], "B": ['a','b','c','d','e'],
                         "C": ['b','a','c','c','d'], "D": ['a','c',7,9,2]})
In [27]: df
Out[27]:
   A  B  C  D
0  1  a  b  a
1  2  b  a  c
2  3  c  c  7
3  4  d  c  9
4  5  e  d  2

In [28]: mymap = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}

In [29]: df.applymap(lambda s: mymap.get(s) if s in mymap else s)
Out[29]:
   A  B  C  D
0  1  1  2  1
1  2  2  1  3
2  3  3  3  7
3  4  4  3  9
4  5  5  4  2

Solution 4 - Python

The simplest way to replace any value in the dataframe:

df=df.replace(to_replace="set",value="1")
df=df.replace(to_replace="test",value="2")

Hope this will help.

Solution 5 - Python

To convert Strings like 'volvo','bmw' into integers first convert it to a dataframe then pass it to pandas.get_dummies()

  df  = DataFrame.from_csv("myFile.csv")
  df_transform = pd.get_dummies( df )
  print( df_transform )

Better alternative: passing a dictionary to map() of a pandas series (df.myCol) (by specifying the column brand for example)

df.brand = df.brand.map( {'volvo':0 , 'bmw':1, 'audi':2} )

Solution 6 - Python

You can also do this with pandas rename_categories. You would first need to define the column as dtype="category" e.g.

In [66]: s = pd.Series(["a","b","c","a"], dtype="category")

In [67]: s
Out[67]: 
0    a
1    b
2    c
3    a
dtype: category
Categories (3, object): [a, b, c]

and then rename them:

In [70]: s.cat.rename_categories([1,2,3])
Out[70]: 
0    1
1    2
2    3
3    1
dtype: category
Categories (3, int64): [1, 2, 3]

You can also pass a dict-like object to map the renaming, e.g.:

In [72]: s.cat.rename_categories({1: 'x', 2: 'y', 3: 'z'})

Solution 7 - Python

When no of features are not much :

mymap = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}
df.applymap(lambda s: mymap.get(s) if s in mymap else s)

When it's not possible manually :

temp_df2 = pd.DataFrame({'data': data.data.unique(), 'data_new':range(len(data.data.unique()))})# create a temporary dataframe 
data = data.merge(temp_df2, on='data', how='left')# Now merge it by assigning different values to different strings.

Solution 8 - Python

You can build dictionary from column values itself and fill like below

x=df['Item_Type'].value_counts()
item_type_mapping={}
item_list=x.index
for i in range(0,len(item_list)):
    item_type_mapping[item_list[i]]=i

df['Item_Type']=df['Item_Type'].map(lambda x:item_type_mapping[x]) 

Solution 9 - Python

df.replace(to_replace=['set', 'test'], value=[1, 2]) from @Ishnark comment on accepted answer.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionjonasView Question on Stackoverflow
Solution 1 - PythonDan AllanView Answer on Stackoverflow
Solution 2 - PythonBrandonView Answer on Stackoverflow
Solution 3 - PythonbdiamanteView Answer on Stackoverflow
Solution 4 - PythonKapilfreemanView Answer on Stackoverflow
Solution 5 - PythonSamer AyoubView Answer on Stackoverflow
Solution 6 - PythontsandoView Answer on Stackoverflow
Solution 7 - PythonAkash KandpalView Answer on Stackoverflow
Solution 8 - PythonManoj Kumar DhakadView Answer on Stackoverflow
Solution 9 - PythonChapoView Answer on Stackoverflow