Get rows based on distinct values from one column

PythonPandas

Python Problem Overview


How can I get the rows by distinct values in COL2?

For example, I have the dataframe below:

COL1   COL2
a.com  22
b.com  45
c.com  34
e.com  45
f.com  56
g.com  22
h.com  45

I want to get the rows based on unique values in COL2:

COL1  COL2
a.com 22
b.com 45
c.com 34
f.com 56

So, how can I get that? I would appreciate it very much if anyone can provide any help.

Python Solutions


Solution 1 - Python

Use drop_duplicates with specifying column COL2 for check duplicates:

df = df.drop_duplicates('COL2')
#same as
#df = df.drop_duplicates('COL2', keep='first')
print (df)
    COL1  COL2
0  a.com    22
1  b.com    45
2  c.com    34
4  f.com    56

You can also keep only last values:

df = df.drop_duplicates('COL2', keep='last')
print (df)
    COL1  COL2
2  c.com    34
4  f.com    56
5  g.com    22
6  h.com    45

Or remove all duplicates:

df = df.drop_duplicates('COL2', keep=False)
print (df)
    COL1  COL2
2  c.com    34
4  f.com    56

Solution 2 - Python

You can use groupby in combination with first and last methods. To get the first row from each group:

df.groupby('COL2', as_index=False).first()

Output:

   COL2   COL1
0    22  a.com
1    34  c.com
2    45  b.com
3    56  f.com

To get the last row from each group:

df.groupby('COL2', as_index=False).last()

Output:

   COL2   COL1
0    22  g.com
1    34  c.com
2    45  h.com
3    56  f.com

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionimport.zeeView Question on Stackoverflow
Solution 1 - PythonjezraelView Answer on Stackoverflow
Solution 2 - PythonMykola ZotkoView Answer on Stackoverflow