Nested dictionary to multiindex dataframe where dictionary keys are column labels

PythonDictionaryPandasDataframeMulti Index

Python Problem Overview


Say I have a dictionary that looks like this:

dictionary = {'A' : {'a': [1,2,3,4,5],
                     'b': [6,7,8,9,1]},

              'B' : {'a': [2,3,4,5,6],
                     'b': [7,8,9,1,2]}}

and I want a dataframe that looks something like this:

     A   B
     a b a b
  0  1 6 2 7
  1  2 7 3 8
  2  3 8 4 9
  3  4 9 5 1
  4  5 1 6 2

Is there a convenient way to do this? If I try:

In [99]:

DataFrame(dictionary)

Out[99]:
     A	             B
a	[1, 2, 3, 4, 5]	[2, 3, 4, 5, 6]
b	[6, 7, 8, 9, 1]	[7, 8, 9, 1, 2]

I get a dataframe where each element is a list. What I need is a multiindex where each level corresponds to the keys in the nested dict and the rows corresponding to each element in the list as shown above. I think I can work a very crude solution but I'm hoping there might be something a bit simpler.

Python Solutions


Solution 1 - Python

Pandas wants the MultiIndex values as tuples, not nested dicts. The simplest thing is to convert your dictionary to the right format before trying to pass it to DataFrame:

>>> reform = {(outerKey, innerKey): values for outerKey, innerDict in dictionary.iteritems() for innerKey, values in innerDict.iteritems()}
>>> reform
{('A', 'a'): [1, 2, 3, 4, 5],
 ('A', 'b'): [6, 7, 8, 9, 1],
 ('B', 'a'): [2, 3, 4, 5, 6],
 ('B', 'b'): [7, 8, 9, 1, 2]}
>>> pandas.DataFrame(reform)
   A     B   
   a  b  a  b
0  1  6  2  7
1  2  7  3  8
2  3  8  4  9
3  4  9  5  1
4  5  1  6  2

[5 rows x 4 columns]

Solution 2 - Python

This answer is a little late to the game, but...

You're looking for the functionality in .stack:

df = pandas.DataFrame.from_dict(dictionary, orient="index").stack().to_frame()
# to break out the lists into columns
df = pd.DataFrame(df[0].values.tolist(), index=df.index)

Solution 3 - Python

dict_of_df = {k: pd.DataFrame(v) for k,v in dictionary.items()}
df = pd.concat(dict_of_df, axis=1)

Note that the order of columns is lost for python < 3.6

Solution 4 - Python

If lists in the dictionary are not of the same lenght, you can adapte the method of BrenBarn.

>>> dictionary = {'A' : {'a': [1,2,3,4,5],
                         'b': [6,7,8,9,1]},
                 'B' : {'a': [2,3,4,5,6],
                        'b': [7,8,9,1]}}

>>> reform = {(outerKey, innerKey): values for outerKey, innerDict in dictionary.items() for innerKey, values in innerDict.items()}
>>> reform
 {('A', 'a'): [1, 2, 3, 4, 5],
  ('A', 'b'): [6, 7, 8, 9, 1],
  ('B', 'a'): [2, 3, 4, 5, 6],
  ('B', 'b'): [7, 8, 9, 1]}

>>> pandas.DataFrame.from_dict(reform, orient='index').transpose()
>>> df.columns = pd.MultiIndex.from_tuples(df.columns)
   A     B   
   a  b  a  b
0  1  6  2  7
1  2  7  3  8
2  3  8  4  9
3  4  9  5  1
4  5  1  6  NaN
[5 rows x 4 columns]

Solution 5 - Python

This recursive function should work:

def reform_dict(dictionary, t=tuple(), reform={}):
    for key, val in dictionary.items():
        t = t + (key,)
        if isinstance(val, dict):
            reform_dict(val, t, reform)
        else:
            reform.update({t: val})
        t = t[:-1]
    return reform

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionpbreachView Question on Stackoverflow
Solution 1 - PythonBrenBarnView Answer on Stackoverflow
Solution 2 - PythonViraView Answer on Stackoverflow
Solution 3 - Pythonuser8227892View Answer on Stackoverflow
Solution 4 - PythonDimitriView Answer on Stackoverflow
Solution 5 - PythonmadsentailView Answer on Stackoverflow