Tilde sign in pandas DataFrame
PythonPandasDataframePython Problem Overview
I'm new to python/pandas and came across a code snippet.
df = df[~df['InvoiceNo'].str.contains('C')]
Would be much obliged if I could know what is the tilde sign's usage in this context?
Python Solutions
Solution 1 - Python
It means bitwise not, inversing boolean mask - False
s to True
s and True
s to False
s.
Sample:
df = pd.DataFrame({'InvoiceNo': ['aaC','ff','lC'],
'a':[1,2,5]})
print (df)
InvoiceNo a
0 aaC 1
1 ff 2
2 lC 5
#check if column contains C
print (df['InvoiceNo'].str.contains('C'))
0 True
1 False
2 True
Name: InvoiceNo, dtype: bool
#inversing mask
print (~df['InvoiceNo'].str.contains('C'))
0 False
1 True
2 False
Name: InvoiceNo, dtype: bool
Filter by boolean indexing
:
df = df[~df['InvoiceNo'].str.contains('C')]
print (df)
InvoiceNo a
1 ff 2
So output is all rows of DataFrame, which not contains C
in column InvoiceNo
.
Solution 2 - Python
It's used to invert boolean Series, see pandas-doc.
Solution 3 - Python
df = df[~df['InvoiceNo'].str.contains('C')]
The above code block denotes that remove all data tuples from pandas dataframe, which has "C" letters in the strings values in [InvoiceNo] column.
tilde(~) sign works as a NOT(!) operator in this scenario.
Generally above statement uses to remove data tuples that have null values from data columns.
Solution 4 - Python
tilde ~ is a bitwise operator. If the operand is 1, it returns 0, and if 0, it returns 1. So you will get the InvoiceNo values in the df that does not contain the string 'C'