print the unique values in every column in a pandas dataframe
PythonFor LoopPandasPython Problem Overview
I have a dataframe (df) and want to print the unique values from each column in the dataframe.
I need to substitute the variable (i) [column name] into the print statement
column_list = df.columns.values.tolist()
for column_name in column_list:
print(df."[column_name]".unique()
Update
When I use this: I get "Unexpected EOF Parsing" with no extra details.
column_list = sorted_data.columns.values.tolist()
for column_name in column_list:
print(sorted_data[column_name].unique()
What is the difference between your syntax YS-L (above) and the below:
for column_name in sorted_data:
print(column_name)
s = sorted_data[column_name].unique()
for i in s:
print(str(i))
Python Solutions
Solution 1 - Python
It can be written more concisely like this:
for col in df:
print(df[col].unique())
Generally, you can access a column of the DataFrame through indexing using the []
operator (e.g. df['col']
), or through attribute (e.g. df.col
).
Attribute accessing makes the code a bit more concise when the target column name is known beforehand, but has several caveats -- for example, it does not work when the column name is not a valid Python identifier (e.g. df.123
), or clashes with the built-in DataFrame attribute (e.g. df.index
). On the other hand, the []
notation should always work.
Solution 2 - Python
Most upvoted answer is a loop solution, hence adding a one line solution using pandas apply() method and lambda function.
print(df.apply(lambda col: col.unique()))
Solution 3 - Python
This will get the unique values in proper format:
pd.Series({col:df[col].unique() for col in df})
Solution 4 - Python
If you're trying to create multiple separate dataframes as mentioned in your comments, create a dictionary of dataframes:
df_dict = dict(zip([i for i in df.columns] , [pd.DataFrame(df[i].unique(), columns=[i]) for i in df.columns]))
Then you can access any dataframe easily using the name of the column:
df_dict[column name]
Solution 5 - Python
We can make this even more concise:
df.describe(include='all').loc['unique', :]
Pandas describe gives a few key statistics about each column, but we can just grab the 'unique' statistic and leave it at that.
Note that this will give a unique count of NaN
for numeric columns - if you want to include those columns as well, you can do something like this:
df.astype('object').describe(include='all').loc['unique', :]
Solution 6 - Python
The code below could provide you a list of unique values for each field, I find it very useful when you want to take a deeper look at the data frame:
for col in list(df):
print(col)
print(df[col].unique())
You can also sort the unique values if you want them to be sorted:
import numpy as np
for col in list(df):
print(col)
print(np.sort(df[col].unique()))
Solution 7 - Python
cu = []
i = []
for cn in card.columns[:7]:
cu.append(card[cn].unique())
i.append(cn)
pd.DataFrame( cu, index=i).T
Solution 8 - Python
Simply do this:
for i in df.columns:
print(df[i].unique())
Solution 9 - Python
I was seeking for a solution to this problem as well, and the code below proved to be more helpful in my situation,
for col in df:
print(col)
print(df[col].unique())
print('\n')
It gives something like below:
Fuel_Type
['Diesel' 'Petrol' 'CNG']
HP
[ 90 192 69 110 97 71 116 98 86 72 107 73]
Met_Color
[1 0]
Solution 10 - Python
Or in short it can be written as:
for val in df['column_name'].unique():
print(val)
Solution 11 - Python
Even better. Here's code to view all the unique values as a dataframe column-wise transposed:
columns=[*df.columns]
unique_values={}
for i in columns:
unique_values[i]=df[i].unique()
unique=pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in unique_vals.items() ]))
unique.fillna('').T
Solution 12 - Python
The best way to do that:
Series.unique()
For example students.age.unique() the output will be the different values that occurred in the age column of the students' data frame.
To get only the number of how many different values:
Series.nunique()