Boolean Series key will be reindexed to match DataFrame index
PythonPandasPython Problem Overview
Here is how I encountered the error:
df.loc[a_list][df.a_col.isnull()]
The type of a_list
is Int64Index
, it contains a list of row indexes. All of these row indexes belong to df
.
The df.a_col.isnull()
part is a condition I need for filtering.
If I execute the following commands individually, I do not get any warnings:
df.loc[a_list]
df[df.a_col.isnull()]
But if I put them together df.loc[a_list][df.a_col.isnull()]
, I get the warning message (but I can see the result):
> Boolean Series key will be reindexed to match DataFrame index
What is the meaning of this error message? Does it affect the result that it returned?
Python Solutions
Solution 1 - Python
Your approach will work despite the warning, but it's best not to rely on implicit, unclear behavior.
Solution 1, make the selection of indices in a_list
a boolean mask:
df[df.index.isin(a_list) & df.a_col.isnull()]
Solution 2, do it in two steps:
df2 = df.loc[a_list]
df2[df2.a_col.isnull()]
Solution 3, if you want a one-liner, use a trick found here:
df.loc[a_list].query('a_col != a_col')
The warning comes from the fact that the boolean vector df.a_col.isnull()
is the length of df
, while df.loc[a_list]
is of the length of a_list
, i.e. shorter. Therefore, some indices in df.a_col.isnull()
are not in df.loc[a_list]
.
What pandas does is reindex the boolean series on the index of the calling dataframe. In effect, it gets from df.a_col.isnull()
the values corresponding to the indices in a_list
. This works, but the behavior is implicit, and could easily change in the future, so that's what the warning is about.