pandas comparison raises TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]
PythonPandasTypeerrorDataframePython Problem Overview
I have the following structure to my dataFrame:
Index: 1008 entries, Trial1.0 to Trial3.84
Data columns (total 5 columns):
CHUNK_NAME 1008 non-null values
LAMBDA 1008 non-null values
BETA 1008 non-null values
HIT_RATE 1008 non-null values
AVERAGE_RECIPROCAL_HITRATE 1008 non-null values
chunks=['300_321','322_343','344_365','366_387','388_408','366_408','344_408','322_408','300_408']
lam_beta=[(lambda1,beta1),(lambda1,beta2),(lambda1,beta3),...(lambda1,beta_n),(lambda2,beta1),(lambda2,beta2)...(lambda2,beta_n),........]
my_df.ix[my_df.CHUNK_NAME==chunks[0]&my_df.LAMBDA==lam_beta[0][0]]
I want to get the rows of the DataFrame for a particular chunk lets say chunks[0]
and particular lambda
value. So in this case, the output should be all rows in the DataFrame having CHUNK_NAME='300_321'
and LAMBDA=lambda1
. There would be n rows one for each beta
value that would be returned. But instead I get the following error. Any help in solving this problem would be appreciated.
TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]
Python Solutions
Solution 1 - Python
&
has higher precedence than ==
. Write:
my_df.ix[(my_df.CHUNK_NAME==chunks[0])&(my_df.LAMBDA==lam_beta[0][0])]
^ ^ ^ ^
Solution 2 - Python
One way to make sure you don't get into trouble with operator precedence is to use the wrapper methods of comparison operators. For example, use eq
method instead of the ==
operator.
Other wrappers are:
ne
:!=
le
:<=
lt
:<
ge
:>=
gt
:>
So the expression in OP would be:
my_df.loc[my_df.CHUNK_NAME.eq(chunks[0]) & my_df.LAMBDA.eq(lam_beta[0][0])]
The wrappers can do more than the comparison operators. You can choose the axis along which to compare. Also, if you're dealing with a MultiIndex object, you can choose the level.
Example:
For df
:
a b c
0 1 3 5.0
1 2 4 6.0
the following line:
out = df.loc[df['a']<3 & df['c']==5]
results in the following error:
> TypeError: Cannot perform 'rand_' with a dtyped [float64] array and
> scalar of type [bool]
However, if we use the equivalent wrappers:
out = df.loc[df['a'].lt(3) & df['c'].eq(5)])
Output:
a b c
0 1 3 5.0