Is there a simple way to change a column of yes/no to 1/0 in a Pandas dataframe?
PythonPandasDataframeSeriesPython Problem Overview
I read a csv file into a pandas dataframe, and would like to convert the columns with binary answers from strings of yes/no to integers of 1/0. Below, I show one of such columns ("sampleDF" is the pandas dataframe).
In [13]: sampleDF.housing[0:10]
Out[13]:
0 no
1 no
2 yes
3 no
4 no
5 no
6 no
7 no
8 yes
9 yes
Name: housing, dtype: object
Help is much appreciated!
Python Solutions
Solution 1 - Python
method 1
sample.housing.eq('yes').mul(1)
method 2
pd.Series(np.where(sample.housing.values == 'yes', 1, 0),
sample.index)
method 3
sample.housing.map(dict(yes=1, no=0))
method 4
pd.Series(map(lambda x: dict(yes=1, no=0)[x],
sample.housing.values.tolist()), sample.index)
method 5
pd.Series(np.searchsorted(['no', 'yes'], sample.housing.values), sample.index)
All yield
0 0
1 0
2 1
3 0
4 0
5 0
6 0
7 0
8 1
9 1
timing
given sample
timing
long sample
sample = pd.DataFrame(dict(housing=np.random.choice(('yes', 'no'), size=100000)))
Solution 2 - Python
Try this:
sampleDF['housing'] = sampleDF['housing'].map({'yes': 1, 'no': 0})
Solution 3 - Python
# produces True/False
sampleDF['housing'] = sampleDF['housing'] == 'yes'
The above returns True/False values which are essentially 1/0, respectively. Booleans support sum functions, etc. If you really need it to be 1/0 values, you can use the following.
housing_map = {'yes': 1, 'no': 0}
sampleDF['housing'] = sampleDF['housing'].map(housing_map)
Solution 4 - Python
%timeit
sampleDF['housing'] = sampleDF['housing'].apply(lambda x: 0 if x=='no' else 1)
1.84 ms ± 56.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> Replaces 'yes' with 1, 'no' with 0 for the df column specified.
Solution 5 - Python
Use sklearn's LabelEncoder
from sklearn.preprocessing import LabelEncoder
lb = LabelEncoder()
sampleDF['housing'] = lb.fit_transform(sampleDF['housing'])
Solution 6 - Python
yes there is you can change yes/no values of your column to 1/0 by using following code snippet
sampleDF = sampleDF.replace(to_replace = ['yes','no'],value = ['1','0'])
sampleDF
by using first line you can replace the values with 1/0 by using second line you can see the changes by printing it
Solution 7 - Python
For a dataset names data and a column named Paid
;
data = data.replace({'Paid': {'yes': 1, 'no': 0}})
all the yes
will change to 1
and all the no
will be replaced by 0
Solution 8 - Python
Generic way:
import pandas as pd
string_data = string_data.astype('category')
numbers_data = string_data.cat.codes
reference: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.astype.html
Solution 9 - Python
You can convert a series from Boolean to integer explicitly:
sampleDF['housing'] = sampleDF['housing'].eq('yes').astype(int)
Solution 10 - Python
The easy way to do that use pandas as below:
housing = pd.get_dummies(sampleDF['housing'],drop_first=True)
after that drop this filed from main df
sampleDF.drop('housing',axis=1,inplace=True)
now merge new one in you df
sampleDF= pd.concat([sampleDF,housing ],axis=1)
Solution 11 - Python
A simple and intuitive way to convert the whole dataframe to 0's and 1's might be:
sampleDF = sampleDF.replace(to_replace = "yes", value = 1)
sampleDF = sampleDF.replace(to_replace = "no", value = 0)
Solution 12 - Python
sampleDF['housing'] = sampleDF['housing'].map(lambda x: 1 if x == 'yes' else 0)
sampleDF['housing'] = sampleDF['housing'].astype(int)
This will work.
Solution 13 - Python
Try the following:
sampleDF['housing'] = sampleDF['housing'].str.lower().replace({'yes': 1, 'no': 0})
Solution 14 - Python
I used the preprocesssing function from sklearn. First you create an encoder.
e = preprocessing.LabelEncoder()
Then for each attribute or characteristic in the data use the label encoder to transform it into an integer value
size = le.fit_transform(list(data["size"]))
color = le.fit_transform(list(data["color"]))
It's converting a list of all the "size" or "color" attributes, and converting that into a list of their corresponding integer values. To put all of this into one list, use the zip function.
It is not going to be in the same format as the csv file; it will be a giant list of everything.
data = list(zip(buying, size))
Hopefully I explained that somewhat clearly.
Solution 15 - Python
You can also try :
sampleDF["housing"] = (sampleDF["housing"]=="Yes")*1
Solution 16 - Python
This is just a bool to int.
Try this.
sampleDF.housing = (sampleDF.housing == 'yes').astype(int)
Solution 17 - Python
Try this, it will work.
sampleDF.housing.replace(['no', 'yes'], [0,1], inplace = True)
Solution 18 - Python
use pandas.Series.map
sampleDF.map({'yes':1,'no':0})
Solution 19 - Python
comprehension array
sampleDF['housing'] = [int(v == 'yes') for v in sampleDF['housing']]