Conditional Replace Pandas

PythonPandasReplaceConditional StatementsSeries

Python Problem Overview


I have a DataFrame, and I want to replace the values in a particular column that exceed a value with zero. I had thought this was a way of achieving this:

df[df.my_channel > 20000].my_channel = 0

If I copy the channel into a new data frame it's simple:

df2 = df.my_channel 

df2[df2 > 20000] = 0

This does exactly what I want, but seems not to work with the channel as part of the original DataFrame.

Python Solutions


Solution 1 - Python

.ix indexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ix indexer is deprecated, so you should avoid using it. Instead, you can use .loc or iloc indexers. You can solve this problem by:

mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0

Or, in one line,

df.loc[df.my_channel > 20000, 'my_channel'] = 0

mask helps you to select the rows in which df.my_channel > 20000 is True, while df.loc[mask, column_name] = 0 sets the value 0 to the selected rows where maskholds in the column which name is column_name.

Update: In this case, you should use loc because if you use iloc, you will get a NotImplementedError telling you that iLocation based boolean indexing on an integer type is not available.

Solution 2 - Python

Try

df.loc[df.my_channel > 20000, 'my_channel'] = 0

Note: Since v0.20.0, ix has been deprecated in favour of loc / iloc.

Solution 3 - Python

np.where function works as follows:

df['X'] = np.where(df['Y']>=50, 'yes', 'no')

In your case you would want:

import numpy as np
df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

Solution 4 - Python

The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:

> When setting values in a pandas object, care must be taken to avoid > what is called chained indexing.

You have a few alternatives:-

loc + Boolean indexing

loc may be used for setting values and supports Boolean masks:

df.loc[df['my_channel'] > 20000, 'my_channel'] = 0
mask + Boolean indexing

You can assign to your series:

df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)

Or you can update your series in place:

df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)
np.where + Boolean indexing

You can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.

df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])

Solution 5 - Python

Try this:

df.my_channel = df.my_channel.where(df.my_channel <= 20000, other= 0)

or

df.my_channel = df.my_channel.mask(df.my_channel > 20000, other= 0)

Solution 6 - Python

I would use lambda function on a Series of a DataFrame like this:

f = lambda x: 0 if x>100 else 1
df['my_column'] = df['my_column'].map(f)

I do not assert that this is an efficient way, but it works fine.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBMichellView Question on Stackoverflow
Solution 1 - PythonlmiguelvargasfView Answer on Stackoverflow
Solution 2 - PythonlowtechView Answer on Stackoverflow
Solution 3 - PythonseeiespiView Answer on Stackoverflow
Solution 4 - PythonjppView Answer on Stackoverflow
Solution 5 - PythonR. ShamsView Answer on Stackoverflow
Solution 6 - PythonOzkan SerttasView Answer on Stackoverflow