Pandas: Check if row exists with certain values

PythonPandasContains

Python Problem Overview


I have a two dimensional (or more) pandas DataFrame like this:

>>> import pandas as pd
>>> df = pd.DataFrame([[0,1],[2,3],[4,5]], columns=['A', 'B'])
>>> df
   A  B
0  0  1
1  2  3
2  4  5

Now suppose I have a numpy array like np.array([2,3]) and want to check if there is any row in df that matches with the contents of my array. Here the answer should obviously true but eg. np.array([1,2]) should return false as there is no row with both 1 in column A and 2 in column B.

Sure this is easy but don't see it right now.

Python Solutions


Solution 1 - Python

Turns out it is really easy, the following does the job here:

>>> ((df['A'] == 2) & (df['B'] == 3)).any()
True
>>> ((df['A'] == 1) & (df['B'] == 2)).any()
False

Maybe somebody comes up with a better solution which allows directly passing in the array and the list of columns to match.

Note that the parenthesis around df['A'] == 2 are not optional since the & operator binds just as strong as the == operator.

Solution 2 - Python

an easier way is:

a = np.array([2,3])
(df == a).all(1).any()

Solution 3 - Python

If you also want to return the index where the matches occurred:

index_list = df[(df['A'] == 2)&(df['B'] == 3)].index.tolist()

Solution 4 - Python

An answer that works with larger dataframes so you don't need to manually check for each columns:

import pandas as pd
import numpy as np

#define variables
df = pd.DataFrame([[0,1],[2,3],[4,5]], columns=['A', 'B'])
a = np.array([2,3])

def check_if_np_array_is_in_df(df, a):
    # transform a into a dataframe
    da = pd.DataFrame(np.expand_dims(a,axis=0), columns=['A','B'])

    # drop duplicates from df
    ddf=df.drop_duplicates()

    result = pd.concat([ddf,da]).shape[0] - pd.concat([ddf,da]).drop_duplicates().shape[0]
    return result

print(check_if_np_array_is_in_df(df, a))
print(check_if_np_array_is_in_df(df, [1,3]))

Solution 5 - Python

To find rows where a single column equals a certain value:

df[df['column name'] == value]

To find rows where multiple columns equal different values, Note the inner ():

df[(df["Col1"] == Value1 & df["Col2"] == Value2 & ....)]

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRobinView Question on Stackoverflow
Solution 1 - PythonRobinView Answer on Stackoverflow
Solution 2 - PythonacushnerView Answer on Stackoverflow
Solution 3 - PythonsparrowView Answer on Stackoverflow
Solution 4 - PythonYannick PezeuView Answer on Stackoverflow
Solution 5 - PythonPantzarisView Answer on Stackoverflow