print the unique values in every column in a pandas dataframe

PythonFor LoopPandas

Python Problem Overview


I have a dataframe (df) and want to print the unique values from each column in the dataframe.

I need to substitute the variable (i) [column name] into the print statement

column_list = df.columns.values.tolist()
for column_name in column_list:
    print(df."[column_name]".unique()

Update

When I use this: I get "Unexpected EOF Parsing" with no extra details.

column_list = sorted_data.columns.values.tolist()
for column_name in column_list:
      print(sorted_data[column_name].unique()

What is the difference between your syntax YS-L (above) and the below:

for column_name in sorted_data:
      print(column_name)
      s = sorted_data[column_name].unique()
      for i in s:
        print(str(i))

Python Solutions


Solution 1 - Python

It can be written more concisely like this:

for col in df:
    print(df[col].unique())

Generally, you can access a column of the DataFrame through indexing using the [] operator (e.g. df['col']), or through attribute (e.g. df.col).

Attribute accessing makes the code a bit more concise when the target column name is known beforehand, but has several caveats -- for example, it does not work when the column name is not a valid Python identifier (e.g. df.123), or clashes with the built-in DataFrame attribute (e.g. df.index). On the other hand, the [] notation should always work.

Solution 2 - Python

Most upvoted answer is a loop solution, hence adding a one line solution using pandas apply() method and lambda function.

print(df.apply(lambda col: col.unique()))

Solution 3 - Python

This will get the unique values in proper format:

pd.Series({col:df[col].unique() for col in df})

Solution 4 - Python

If you're trying to create multiple separate dataframes as mentioned in your comments, create a dictionary of dataframes:

df_dict = dict(zip([i for i in df.columns] , [pd.DataFrame(df[i].unique(), columns=[i]) for i in df.columns]))

Then you can access any dataframe easily using the name of the column:

df_dict[column name]

Solution 5 - Python

We can make this even more concise:

df.describe(include='all').loc['unique', :]

Pandas describe gives a few key statistics about each column, but we can just grab the 'unique' statistic and leave it at that.

Note that this will give a unique count of NaN for numeric columns - if you want to include those columns as well, you can do something like this:

df.astype('object').describe(include='all').loc['unique', :]

Solution 6 - Python

The code below could provide you a list of unique values for each field, I find it very useful when you want to take a deeper look at the data frame:

for col in list(df):
    print(col)
    print(df[col].unique())

You can also sort the unique values if you want them to be sorted:

import numpy as np
for col in list(df):
    print(col)
    print(np.sort(df[col].unique()))

Solution 7 - Python

cu = []
i = []
for cn in card.columns[:7]:
    cu.append(card[cn].unique())
    i.append(cn)
    
pd.DataFrame( cu, index=i).T

Solution 8 - Python

Simply do this:

for i in df.columns:
    print(df[i].unique())

Solution 9 - Python

I was seeking for a solution to this problem as well, and the code below proved to be more helpful in my situation,

for col in df:
    print(col)
    print(df[col].unique())
    print('\n')

It gives something like below:

Fuel_Type
['Diesel' 'Petrol' 'CNG']


HP
[ 90 192  69 110  97  71 116  98  86  72 107  73]


Met_Color
[1 0]

Solution 10 - Python

Or in short it can be written as:

for val in df['column_name'].unique():
    print(val)

Solution 11 - Python

Even better. Here's code to view all the unique values as a dataframe column-wise transposed:

columns=[*df.columns]
unique_values={}
for i in columns:
    unique_values[i]=df[i].unique()
unique=pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in unique_vals.items() ]))
unique.fillna('').T

Solution 12 - Python

The best way to do that:

Series.unique()

For example students.age.unique() the output will be the different values that occurred in the age column of the students' data frame.

To get only the number of how many different values:

Series.nunique()

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionyoshiserryView Question on Stackoverflow
Solution 1 - PythonYS-LView Answer on Stackoverflow
Solution 2 - PythonRahul MandreView Answer on Stackoverflow
Solution 3 - Pythonshoaib sipaiView Answer on Stackoverflow
Solution 4 - PythonA.KotView Answer on Stackoverflow
Solution 5 - PythonmgoldwasserView Answer on Stackoverflow
Solution 6 - PythonSimon LoView Answer on Stackoverflow
Solution 7 - PythonbhavinView Answer on Stackoverflow
Solution 8 - PythonAshish SainiView Answer on Stackoverflow
Solution 9 - Pythonuser15590289View Answer on Stackoverflow
Solution 10 - PythonShanmukh SainView Answer on Stackoverflow
Solution 11 - PythonKhoushView Answer on Stackoverflow
Solution 12 - PythonKhaled JallouliView Answer on Stackoverflow