How to lowercase a pandas dataframe string column if it has missing values?

PythonStringPandasMissing Data

Python Problem Overview


The following code does not work.

import pandas as pd
import numpy as np
df=pd.DataFrame(['ONE','Two', np.nan],columns=['x']) 
xLower = df["x"].map(lambda x: x.lower())

How should I tweak it to get xLower = ['one','two',np.nan] ? Efficiency is important since the real data frame is huge.

Python Solutions


Solution 1 - Python

use pandas vectorized string methods; as in the documentation:

> these methods exclude missing/NA values automatically

.str.lower() is the very first example there;

>>> df['x'].str.lower()
0    one
1    two
2    NaN
Name: x, dtype: object

Solution 2 - Python

Another possible solution, in case the column has not only strings but numbers too, is to use astype(str).str.lower() or to_string(na_rep='') because otherwise, given that a number is not a string, when lowered it will return NaN, therefore:

import pandas as pd
import numpy as np
df=pd.DataFrame(['ONE','Two', np.nan,2],columns=['x']) 
xSecureLower = df['x'].to_string(na_rep='').lower()
xLower = df['x'].str.lower()

then we have:

>>> xSecureLower
0    one
1    two
2   
3      2
Name: x, dtype: object

and not

>>> xLower
0    one
1    two
2    NaN
3    NaN
Name: x, dtype: object

edit:

if you don't want to lose the NaNs, then using map will be better, (from @wojciech-walczak, and @cs95 comment) it will look something like this

xSecureLower = df['x'].map(lambda x: x.lower() if isinstance(x,str) else x)

Solution 3 - Python

you can try this one also,

df= df.applymap(lambda s:s.lower() if type(s) == str else s)

Solution 4 - Python

Pandas >= 0.25: Remove Case Distinctions with str.casefold

Starting from v0.25, I recommend using the "vectorized" string method str.casefold if you're dealing with unicode data (it works regardless of string or unicodes):

s = pd.Series(['lower', 'CAPITALS', np.nan, 'SwApCaSe'])
s.str.casefold()

0       lower
1    capitals
2         NaN
3    swapcase
dtype: object

Also see related GitHub issue GH25405.

casefold lends itself to more aggressive case-folding comparison. It also handles NaNs gracefully (just as str.lower does).

But why is this better?

The difference is seen with unicodes. Taking the example in the python str.casefold docs,

> Casefolding is similar to lowercasing but more aggressive because it > is intended to remove all case distinctions in a string. For example, > the German lowercase letter 'ß' is equivalent to "ss". Since it is > already lowercase, lower() would do nothing to 'ß'; casefold() > converts it to "ss".

Compare the output of lower for,

s = pd.Series(["der Fluß"])
s.str.lower()

0    der fluß
dtype: object

Versus casefold,

s.str.casefold()

0    der fluss
dtype: object

Also see https://stackoverflow.com/questions/45745661/python-lower-vs-casefold-in-string-matching-and-converting-to-lowercase.

Solution 5 - Python

Apply lambda function
df['original_category'] = df['original_category'].apply(lambda x:x.lower())

Solution 6 - Python

A possible solution:

import pandas as pd
import numpy as np

df=pd.DataFrame(['ONE','Two', np.nan],columns=['x']) 
xLower = df["x"].map(lambda x: x if type(x)!=str else x.lower())
print (xLower)

And a result:

0    one
1    two
2    NaN
Name: x, dtype: object

Not sure about the efficiency though.

Solution 7 - Python

May be using List comprehension

import pandas as pd
import numpy as np
df=pd.DataFrame(['ONE','Two', np.nan],columns=['Name']})
df['Name'] = [str(i).lower() for i in df['Name']] 

print(df)

Solution 8 - Python

copy your Dataframe column and simply apply

df=data['x']
newdf=df.str.lower()

Solution 9 - Python

Use apply function,

Xlower = df['x'].apply(lambda x: x.upper()).head(10) 

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionP.EscondidoView Question on Stackoverflow
Solution 1 - Pythonbehzad.nouriView Answer on Stackoverflow
Solution 2 - PythonMike WView Answer on Stackoverflow
Solution 3 - PythonFaridView Answer on Stackoverflow
Solution 4 - Pythoncs95View Answer on Stackoverflow
Solution 5 - PythonAravinda_gnView Answer on Stackoverflow
Solution 6 - PythonWojciech WalczakView Answer on Stackoverflow
Solution 7 - PythondeepeshView Answer on Stackoverflow
Solution 8 - PythonCh HaXamView Answer on Stackoverflow
Solution 9 - PythonAshutosh ShankarView Answer on Stackoverflow