How to create a dictionary of two pandas DataFrame columns

Python Problem Overview

What is the most efficient way to organise the following pandas Dataframe:

data =

Position	Letter
1	        a
2	        b
3	        c
4	        d
5	        e

into a dictionary like alphabet[1 : 'a', 2 : 'b', 3 : 'c', 4 : 'd', 5 : 'e']?

Python Solutions

Solution 1 - Python

In [9]: pd.Series(df.Letter.values,index=df.Position).to_dict()
Out[9]: {1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}

Speed comparion (using Wouter's method)

In [6]: df = pd.DataFrame(randint(0,10,10000).reshape(5000,2),columns=list('AB'))

In [7]: %timeit dict(zip(df.A,df.B))
1000 loops, best of 3: 1.27 ms per loop

In [8]: %timeit pd.Series(df.A.values,index=df.B).to_dict()
1000 loops, best of 3: 987 us per loop

Solution 2 - Python

I found a faster way to solve the problem, at least on realistically large datasets using: df.set_index(KEY).to_dict()[VALUE]

Proof on 50,000 rows:

df = pd.DataFrame(np.random.randint(32, 120, 100000).reshape(50000,2),columns=list('AB'))
df['A'] = df['A'].apply(chr)

%timeit dict(zip(df.A,df.B))
%timeit pd.Series(df.A.values,index=df.B).to_dict()
%timeit df.set_index('A').to_dict()['B']

Output:

100 loops, best of 3: 7.04 ms per loop  # WouterOvermeire
100 loops, best of 3: 9.83 ms per loop  # Jeff
100 loops, best of 3: 4.28 ms per loop  # Kikohs (me)

Solution 3 - Python

In Python 3.6 the fastest way is still the WouterOvermeire one. Kikohs' proposal is slower than the other two options.

import timeit

setup = '''
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(32, 120, 100000).reshape(50000,2),columns=list('AB'))
df['A'] = df['A'].apply(chr)
'''

timeit.Timer('dict(zip(df.A,df.B))', setup=setup).repeat(7,500)
timeit.Timer('pd.Series(df.A.values,index=df.B).to_dict()', setup=setup).repeat(7,500)
timeit.Timer('df.set_index("A").to_dict()["B"]', setup=setup).repeat(7,500)

Results:

1.1214002349999777 s  # WouterOvermeire
1.1922008498571748 s  # Jeff
1.7034366211428602 s  # Kikohs

Solution 4 - Python

dict (zip(data['position'], data['letter']))

this will give you:

{1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}

Solution 5 - Python

TL;DR

>>> import pandas as pd
>>> df = pd.DataFrame({'Position':[1,2,3,4,5], 'Letter':['a', 'b', 'c', 'd', 'e']})
>>> dict(sorted(df.values.tolist())) # Sort of sorted... 
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
>>> from collections import OrderedDict
>>> OrderedDict(df.values.tolist())
OrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', 5)])

In Long

Explaining solution: dict(sorted(df.values.tolist()))

Given:

df = pd.DataFrame({'Position':[1,2,3,4,5], 'Letter':['a', 'b', 'c', 'd', 'e']})

[out]:

 Letter	Position
0	a	1
1	b	2
2	c	3
3	d	4
4	e	5

Try:

# Get the values out to a 2-D numpy array, 
df.values

[out]:

array([['a', 1],
       ['b', 2],
       ['c', 3],
       ['d', 4],
       ['e', 5]], dtype=object)

Then optionally:

# Dump it into a list so that you can sort it using `sorted()`
sorted(df.values.tolist()) # Sort by key

Or:

# Sort by value:
from operator import itemgetter
sorted(df.values.tolist(), key=itemgetter(1))

[out]:

[['a', 1], ['b', 2], ['c', 3], ['d', 4], ['e', 5]]

Lastly, cast the list of list of 2 elements into a dict.

dict(sorted(df.values.tolist()))

[out]:

{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}

Answering @sbradbio comment:

If there are multiple values for a specific key and you would like to keep all of them, it's the not the most efficient but the most intuitive way is:

from collections import defaultdict
import pandas as pd

multivalue_dict = defaultdict(list)

df = pd.DataFrame({'Position':[1,2,4,4,4], 'Letter':['a', 'b', 'd', 'e', 'f']})

for idx,row in df.iterrows():
    multivalue_dict[row['Position']].append(row['Letter'])

[out]:

>>> print(multivalue_dict)
defaultdict(list, {1: ['a'], 2: ['b'], 4: ['d', 'e', 'f']})

Solution 6 - Python

Here are two other ways tested with the following df.

df = pd.DataFrame(np.random.randint(0,10,10000).reshape(5000,2),columns=list('AB'))

using to_records()

dict(df.to_records(index=False))

using MultiIndex.from_frame()

dict(pd.MultiIndex.from_frame(df))

Time of each.

24.6 ms ± 847 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
1.86 ms ± 11.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Solution 7 - Python

I like the Wouter method, however the behaviour with duplicate values might not be what is expected and this scenario is not discussed one way or the other by the OP unfortunately. Wouter, will always choose the last value for each key encountered. So in other words, it will keep overwriting the value for each key.

The expected behavior in my mind would be more like https://stackoverflow.com/questions/60085490 where a list is kept for each key.

So for the case of keeping duplicates, let me submit df.groupby('Position')['Letter'].apply(list).to_dict() (Or perhaps even a set instead of a list)

Content Type	Original Author	Original Content on Stackoverflow
Question	user1083734	View Question on Stackoverflow
Solution 1 - Python	Jeff	View Answer on Stackoverflow
Solution 2 - Python	Kirell	View Answer on Stackoverflow
Solution 3 - Python	pakobill	View Answer on Stackoverflow
Solution 4 - Python	user18475123	View Answer on Stackoverflow
Solution 5 - Python	alvas	View Answer on Stackoverflow
Solution 6 - Python	rhug123	View Answer on Stackoverflow
Solution 7 - Python	LaptopProfile	View Answer on Stackoverflow

How to create a dictionary of two pandas DataFrame columns

Python Problem Overview

Python Solutions

Solution 1 - Python

Solution 2 - Python

Solution 3 - Python

Solution 4 - Python

Solution 5 - Python

TL;DR

In Long

Solution 6 - Python

Solution 7 - Python

How to add Google Analytics Tracking ID to GitHub Pages

Textarea Auto height

Attributions

Python Problem Overview

Python Solutions

Solution 1 - Python

Solution 2 - Python

Solution 3 - Python

Solution 4 - Python

Solution 5 - Python

TL;DR

In Long

Related

Solution 6 - Python

Solution 7 - Python

How to add Google Analytics Tracking ID to GitHub Pages

Textarea Auto height

Attributions