How to convert a dataframe to a dictionary

Python Problem Overview

I have a dataframe with two columns and intend to convert it to a dictionary. The first column will be the key and the second will be the value.

Dataframe:

    id    value
0    0     10.2
1    1      5.7
2    2      7.4

How can I do this?

Python Solutions

Solution 1 - Python

If lakes is your DataFrame, you can do something like

area_dict = dict(zip(lakes.id, lakes.value))

Solution 2 - Python

See the docs for to_dict. You can use it like this:

df.set_index('id').to_dict()

And if you have only one column, to avoid the column name is also a level in the dict (actually, in this case you use the Series.to_dict()):

df.set_index('id')['value'].to_dict()

Solution 3 - Python

mydict = dict(zip(df.id, df.value))

Solution 4 - Python

If you want a simple way to preserve duplicates, you could use groupby:

>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) 
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3
>>> {k: g["value"].tolist() for k,g in ptest.groupby("id")}
{'a': [1, 2], 'b': [3]}

Solution 5 - Python

The answers by joris in this thread and by punchagan in the duplicated thread are very elegant, however they will not give correct results if the column used for the keys contains any duplicated value.

For example:

>>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) 
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3

# note that in both cases the association a->1 is lost:
>>> ptest.set_index('id')['value'].to_dict()
{'a': 2, 'b': 3}
>>> dict(zip(ptest.id, ptest.value))
{'a': 2, 'b': 3}

If you have duplicated entries and do not want to lose them, you can use this ugly but working code:

>>> mydict = {}
>>> for x in range(len(ptest)):
...     currentid = ptest.iloc[x,0]
...     currentvalue = ptest.iloc[x,1]
...     mydict.setdefault(currentid, [])
...     mydict[currentid].append(currentvalue)
>>> mydict
{'a': [1, 2], 'b': [3]}

Solution 6 - Python

Here is what I think is the simplest solution:

df.set_index('id').T.to_dict('records')

Example:

df= pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
df.set_index('id').T.to_dict('records')

If you have multiple values, like val1, val2, val3, etc., and you want them as lists, then use the below code:

df.set_index('id').T.to_dict('list')

Read more about records from above here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html

Solution 7 - Python

You can use 'dict comprehension'

my_dict = {row[0]: row[1] for row in df.values}

Solution 8 - Python

With pandas it can be done as:

If lakes is your DataFrame:

area_dict = lakes.to_dict('records')

Solution 9 - Python

in some versions the code below might not work

mydict = dict(zip(df.id, df.value))

so make it explicit

id_=df.id.values
value=df.value.values
mydict=dict(zip(id_,value))

Note i used id_ because the word id is reserved word

Solution 10 - Python

Here is an example for converting a dataframe with three columns A, B, and C (let's say A and B are the geographical coordinates of longitude and latitude and C the country region/state/etc., which is more or less the case).

I want a dictionary with each pair of A,B values (dictionary key) matching the value of C (dictionary value) in the corresponding row (each pair of A,B values is guaranteed to be unique due to previous filtering, but it is possible to have the same value of C for different pairs of A,B values in this context), so I would do:

mydict = dict(zip(zip(df['A'],df['B']), df['C']))

Using pandas to_dict() also works:

mydict = df.set_index(['A','B']).to_dict(orient='dict')['C']

(none of the columns A or B are used as an index before executing the line creating the dictionary)

Both approaches are fast (less than one second on a dataframe with 85k rows on a ~2015 fast dual-core laptop).

Solution 11 - Python

Another (slightly shorter) solution for not losing duplicate entries:

>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3

>>> pdict = dict()
>>> for i in ptest['id'].unique().tolist():
...     ptest_slice = ptest[ptest['id'] == i]
...     pdict[i] = ptest_slice['value'].tolist()
...

>>> pdict
{'b': [3], 'a': [1, 2]}

Solution 12 - Python

You can also do this if you want to play around with pandas. However, I like punchagan's way.

# replicating your dataframe
lake = pd.DataFrame({'co tp': ['DE Lake', 'Forest', 'FR Lake', 'Forest'], 
                 'area': [10, 20, 30, 40], 
                 'count': [7, 5, 2, 3]})
lake.set_index('co tp', inplace=True)

# to get key value using pandas
area_dict = lake.set_index('area').T.to_dict('records')[0]
print(area_dict)

output: {10: 7, 20: 5, 30: 2, 40: 3}

Solution 13 - Python

If 'lakes' is your DataFrame, you can also do something like:

# Your dataframe
lakes = pd.DataFrame({'co tp': ['DE Lake', 'Forest', 'FR Lake', 'Forest'], 
                 'area': [10, 20, 30, 40], 
                 'count': [7, 5, 2, 3]})
lakes.set_index('co tp', inplace=True)

My solution:

area_dict = lakes.set_index("area")["count"].to_dict()

or @punchagan 's solution (which I prefer)

area_dict = dict(zip(lakes.area, lakes.count))

Both should work.

Solution 14 - Python

You need a list as a dictionary value. This code will do the trick.

from collections import defaultdict
mydict = defaultdict(list)
for k, v in zip(df.id.values,df.value.values):
	mydict[k].append(v)

Solution 15 - Python

If you set the the index than the dictionary will result in unique key value pairs

encoder=LabelEncoder()
df['airline_enc']=encoder.fit_transform(df['airline'])
dictAirline= df[['airline_enc','airline']].set_index('airline_enc').to_dict()

Solution 16 - Python

you need this it

area_dict = lakes.to_dict(orient='records')

Solution 17 - Python

def get_dict_from_pd(df, key_col, row_col):
    result = dict()
    for i in set(df[key_col].values):
        is_i = df[key_col] == i
        result[i] = list(df[is_i][row_col].values)
    return result

This is my solution; a basic loop.

Solution 18 - Python

This is my solution:

import pandas as pd
df = pd.read_excel('dic.xlsx')
df_T = df.set_index('id').T
dic = df_T.to_dict('records')
print(dic)

Content Type	Original Author	Original Content on Stackoverflow
Question	perigee	View Question on Stackoverflow
Solution 1 - Python	punchagan	View Answer on Stackoverflow
Solution 2 - Python	joris	View Answer on Stackoverflow
Solution 3 - Python	praful gupta	View Answer on Stackoverflow
Solution 4 - Python	DSM	View Answer on Stackoverflow
Solution 5 - Python	dalloliogm	View Answer on Stackoverflow
Solution 6 - Python	Gil Baggio	View Answer on Stackoverflow
Solution 7 - Python	Dongwan Kim	View Answer on Stackoverflow
Solution 8 - Python	AnandSin	View Answer on Stackoverflow
Solution 9 - Python	Vincent Appiah	View Answer on Stackoverflow
Solution 10 - Python	Alexandre Dias	View Answer on Stackoverflow
Solution 11 - Python	user1376377	View Answer on Stackoverflow
Solution 12 - Python	Samlex	View Answer on Stackoverflow
Solution 13 - Python	Allan	View Answer on Stackoverflow
Solution 14 - Python	Dmitry	View Answer on Stackoverflow
Solution 15 - Python	Golden Lion	View Answer on Stackoverflow
Solution 16 - Python	Heeda	View Answer on Stackoverflow
Solution 17 - Python	SummersKing	View Answer on Stackoverflow
Solution 18 - Python	Hamoon	View Answer on Stackoverflow