Compare two columns using pandas

PythonPandasIf StatementDataframe

Python Problem Overview


Using this as a starting point:

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

Out[8]: 
  one  two three
0   10  1.2   4.2
1   15  70   0.03
2    8   5     0

I want to use something like an if statement within pandas.

if df['one'] >= df['two'] and df['one'] <= df['three']:
    df['que'] = df['one']

Basically, check each row via the if statement, create new column.

The docs say to use .all but there is no example...

Python Solutions


Solution 1 - Python

You could use np.where. If cond is a boolean array, and A and B are arrays, then

C = np.where(cond, A, B)

defines C to be equal to A where cond is True, and B where cond is False.

import numpy as np
import pandas as pd

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

df['que'] = np.where((df['one'] >= df['two']) & (df['one'] <= df['three'])
                     , df['one'], np.nan)

yields

  one  two three  que
0  10  1.2   4.2   10
1  15   70  0.03  NaN
2   8    5     0  NaN

If you have more than one condition, then you could use np.select instead. For example, if you wish df['que'] to equal df['two'] when df['one'] < df['two'], then

conditions = [    (df['one'] >= df['two']) & (df['one'] <= df['three']), 
    df['one'] < df['two']]

choices = [df['one'], df['two']]

df['que'] = np.select(conditions, choices, default=np.nan)

yields

  one  two three  que
0  10  1.2   4.2   10
1  15   70  0.03   70
2   8    5     0  NaN

If we can assume that df['one'] >= df['two'] when df['one'] < df['two'] is False, then the conditions and choices could be simplified to

conditions = [
    df['one'] < df['two'],
    df['one'] <= df['three']]

choices = [df['two'], df['one']]

(The assumption may not be true if df['one'] or df['two'] contain NaNs.)


Note that

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

defines a DataFrame with string values. Since they look numeric, you might be better off converting those strings to floats:

df2 = df.astype(float)

This changes the results, however, since strings compare character-by-character, while floats are compared numerically.

In [61]: '10' <= '4.2'
Out[61]: True

In [62]: 10 <= 4.2
Out[62]: False

Solution 2 - Python

You can use .equals for columns or entire dataframes.

df['col1'].equals(df['col2'])

If they're equal, that statement will return True, else False.

Solution 3 - Python

You could use apply() and do something like this

df['que'] = df.apply(lambda x : x['one'] if x['one'] >= x['two'] and x['one'] <= x['three'] else "", axis=1)

or if you prefer not to use a lambda

def que(x):
    if x['one'] >= x['two'] and x['one'] <= x['three']:
        return x['one']
    return ''
df['que'] = df.apply(que, axis=1)

Solution 4 - Python

One way is to use a Boolean series to index the column df['one']. This gives you a new column where the True entries have the same value as the same row as df['one'] and the False values are NaN.

The Boolean series is just given by your if statement (although it is necessary to use & instead of and):

>>> df['que'] = df['one'][(df['one'] >= df['two']) & (df['one'] <= df['three'])]
>>> df
	one	two	three	que
0	10	1.2	4.2	     10
1	15	70	0.03	NaN
2	8	5	0	    NaN

If you want the NaN values to be replaced by other values, you can use the fillna method on the new column que. I've used 0 instead of the empty string here:

>>> df['que'] = df['que'].fillna(0)
>>> df
    one	two	three	que
0	10	1.2	  4.2    10
1	15	 70	 0.03	  0
2	 8	  5	    0	  0

Solution 5 - Python

Wrap each individual condition in parentheses, and then use the & operator to combine the conditions:

df.loc[(df['one'] >= df['two']) & (df['one'] <= df['three']), 'que'] = df['one']

You can fill the non-matching rows by just using ~ (the "not" operator) to invert the match:

df.loc[~ ((df['one'] >= df['two']) & (df['one'] <= df['three'])), 'que'] = ''

You need to use & and ~ rather than and and not because the & and ~ operators work element-by-element.

The final result:

df
Out[8]: 
  one  two three que
0  10  1.2   4.2  10
1  15   70  0.03    
2   8    5     0  

Solution 6 - Python

Use np.select if you have multiple conditions to be checked from the dataframe and output a specific choice in a different column

conditions=[(condition1),(condition2)]
choices=["choice1","chocie2"]

df["new column"]=np.select=(condtion,choice,default=)

Note: No of conditions and no of choices should match, repeat text in choice if for two different conditions you have same choices

Solution 7 - Python

Use lambda expression:

df[df.apply(lambda x: x['col1'] != x['col2'], axis = 1)]

Solution 8 - Python

I'd like to add this answer for those who are trying to compare the equality of values in two columns that have NaN values, and get False when both values are NaN. By definition, NaN != NaN (See: https://stackoverflow.com/questions/29528092/numpy-isnanvalue-not-the-same-as-value-numpy-nan).

If you want the two NaN comparison to return True, you can use:

df['compare'] = (df["col_1"] == df["col_2"]) | (df["col_1"].isna() & df["col_2"].isna())

Solution 9 - Python

You can use the Series method where:

df['que'] = df['one'].where((df['one'] >= df['two']) & (df['one'] <= df['three']))

Result:

  one  two three  que
0  10  1.2   4.2   10
1  15   70  0.03  NaN
2   8    5     0  NaN

Solution 10 - Python

I think the closest to the OP's intuition is an inline if statement:

df['que'] = (df['one'] if ((df['one'] >= df['two']) and (df['one'] <= df['three'])) 

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMerlinView Question on Stackoverflow
Solution 1 - PythonunutbuView Answer on Stackoverflow
Solution 2 - Pythonccook5760View Answer on Stackoverflow
Solution 3 - PythonBob HaffnerView Answer on Stackoverflow
Solution 4 - PythonAlex RileyView Answer on Stackoverflow
Solution 5 - PythonMariusView Answer on Stackoverflow
Solution 6 - Pythonpsn1997View Answer on Stackoverflow
Solution 7 - Pythonaze45sq6dView Answer on Stackoverflow
Solution 8 - PythonRafael LView Answer on Stackoverflow
Solution 9 - PythonMykola ZotkoView Answer on Stackoverflow
Solution 10 - PythonNic ScozzaroView Answer on Stackoverflow