create pandas dataframe from dictionary of dictionaries
DictionaryPandasDataframeDictionary Problem Overview
I have a dictionary of dictionaries of the form:
{'user':{movie:rating} }
For example,
{Jill': {'Avenger: Age of Ultron': 7.0,
'Django Unchained': 6.5,
'Gone Girl': 9.0,
'Kill the Messenger': 8.0}
'Toby': {'Avenger: Age of Ultron': 8.5,
'Django Unchained': 9.0,
'Zoolander': 2.0}}
I want to convert this dict of dicts into a pandas dataframe with column 1 the user name and the other columns the movie ratings i.e.
user Gone_Girl Horrible_Bosses_2 Django_Unchained Zoolander etc. \
However, some users did not rate the movies and so these movies are not included in the values() for that user key(). It would be nice in these cases to just fill the entry with NaN.
As of now, I iterate over the keys, fill a list, and then use this list to create a dataframe:
data=[]
for i,key in enumerate(movie_user_preferences.keys() ):
try:
data.append((key
,movie_user_preferences[key]['Gone Girl']
,movie_user_preferences[key]['Horrible Bosses 2']
,movie_user_preferences[key]['Django Unchained']
,movie_user_preferences[key]['Zoolander']
,movie_user_preferences[key]['Avenger: Age of Ultron']
,movie_user_preferences[key]['Kill the Messenger']))
# if no entry, skip
except:
pass
df=pd.DataFrame(data=data,columns=['user','Gone_Girl','Horrible_Bosses_2','Django_Unchained','Zoolander','Avenger_Age_of_Ultron','Kill_the_Messenger'])
But this only gives me a dataframe of users who rated all the movies in the set.
My goal is to append to the data list by iterating over the movie labels (rather than the brute force approach shown above) and, secondly, create a dataframe that includes all users and that places null values in the elements that do not have movie ratings.
Dictionary Solutions
Solution 1 - Dictionary
You can pass the dict of dict to the DataFrame constructor:
In [11]: d = {'Jill': {'Django Unchained': 6.5, 'Gone Girl': 9.0, 'Kill the Messenger': 8.0, 'Avenger: Age of Ultron': 7.0}, 'Toby': {'Django Unchained': 9.0, 'Zoolander': 2.0, 'Avenger: Age of Ultron': 8.5}}
In [12]: pd.DataFrame(d)
Out[12]:
Jill Toby
Avenger: Age of Ultron 7.0 8.5
Django Unchained 6.5 9.0
Gone Girl 9.0 NaN
Kill the Messenger 8.0 NaN
Zoolander NaN 2.0
Or use the from_dict
method:
In [13]: pd.DataFrame.from_dict(d)
Out[13]:
Jill Toby
Avenger: Age of Ultron 7.0 8.5
Django Unchained 6.5 9.0
Gone Girl 9.0 NaN
Kill the Messenger 8.0 NaN
Zoolander NaN 2.0
In [14]: pd.DataFrame.from_dict(d, orient='index')
Out[14]:
Django Unchained Gone Girl Kill the Messenger Avenger: Age of Ultron Zoolander
Jill 6.5 9 8 7.0 NaN
Toby 9.0 NaN NaN 8.5 2
Solution 2 - Dictionary
This brute-force approach also appears to work, but iterating over the movie labels would still be more robust in my opinion.
data=[]
for i,key in enumerate(movie_user_preferences.keys() ):
try:
data.append((key
,movie_user_preferences[key]['Gone Girl'] if 'Gone Girl' in movie_user_preferences[key] else 'NaN'
,movie_user_preferences[key]['Horrible Bosses 2'] if 'Horrible Bosses 2' in movie_user_preferences[key] else 'NaN'
,movie_user_preferences[key]['Django Unchained'] if 'Django Unchained' in movie_user_preferences[key] else 'NaN'
,movie_user_preferences[key]['Zoolander'] if 'Zoolander' in movie_user_preferences[key] else 'NaN'
,movie_user_preferences[key]['Avenger: Age of Ultron'] if 'Avenger: Age of Ultron' in movie_user_preferences[key] else 'NaN'
,movie_user_preferences[key]['Kill the Messenger'] if 'Kill the Messenger' in movie_user_preferences[key] else 'NaN' ))
# if no entry, skip
except:
pass
user Gone_Girl Horrible_Bosses_2 Django_Unchained Zoolander \
0 Sam 6 3 7.5 7
1 Max 10 6 7.0 10
2 Robert NaN 5 7.0 9
3 Toby NaN NaN 9.0 2
4 Julia 6.5 NaN 6.0 6.5
5 William 7 4 8.0 4
6 Jill 9 NaN 6.5 NaN
Avenger_Age_of_Ultron Kill_the_Messenger
0 10.0 5.5
1 7.0 5
2 8.0 9
3 8.5 NaN
4 10.0 6
5 6.0 6.5
6 7.0 8