Transform a Counter object into a Pandas DataFrame

PythonPandasDataframeCounter

Python Problem Overview


I used Counter on a list to compute this variable:

final = Counter(event_container)

print final gives:

Counter({'fb_view_listing': 76, 'fb_homescreen': 63, 'rt_view_listing': 50, 'rt_home_start_app': 46, 'fb_view_wishlist': 39, 'fb_view_product': 37, 'fb_search': 29, 'rt_view_product': 23, 'fb_view_cart': 22, 'rt_search': 12, 'rt_view_cart': 12, 'add_to_cart': 2, 'create_campaign': 1, 'fb_connect': 1, 'sale': 1, 'guest_sale': 1, 'remove_from_cart': 1, 'rt_transaction_confirmation': 1, 'login': 1})

Now I want to convert final into a Pandas DataFrame, but when I'm doing:

final_df = pd.DataFrame(final)

but I got an error.

I guess final is not a proper dictionary, so how can I convert final to a dictionary? Or is it an other way to convert final to a DataFrame?

Python Solutions


Solution 1 - Python

You can construct using from_dict and pass param orient='index', then call reset_index so you get a 2 column df:

In [40]:
from collections import Counter
d = Counter({'fb_view_listing': 76, 'fb_homescreen': 63, 'rt_view_listing': 50, 'rt_home_start_app': 46, 'fb_view_wishlist': 39, 'fb_view_product': 37, 'fb_search': 29, 'rt_view_product': 23, 'fb_view_cart': 22, 'rt_search': 12, 'rt_view_cart': 12, 'add_to_cart': 2, 'create_campaign': 1, 'fb_connect': 1, 'sale': 1, 'guest_sale': 1, 'remove_from_cart': 1, 'rt_transaction_confirmation': 1, 'login': 1})
df = pd.DataFrame.from_dict(d, orient='index').reset_index()
df

Out[40]:
                          index   0
0                         login   1
1   rt_transaction_confirmation   1
2                  fb_view_cart  22
3                    fb_connect   1
4               rt_view_product  23
5                     fb_search  29
6                          sale   1
7               fb_view_listing  76
8                   add_to_cart   2
9                  rt_view_cart  12
10                fb_homescreen  63
11              fb_view_product  37
12            rt_home_start_app  46
13             fb_view_wishlist  39
14              create_campaign   1
15                    rt_search  12
16                   guest_sale   1
17             remove_from_cart   1
18              rt_view_listing  50

You can rename the columns to something more meaningful:

In [43]:
df = df.rename(columns={'index':'event', 0:'count'})
df

Out[43]:
                          event  count
0                         login      1
1   rt_transaction_confirmation      1
2                  fb_view_cart     22
3                    fb_connect      1
4               rt_view_product     23
5                     fb_search     29
6                          sale      1
7               fb_view_listing     76
8                   add_to_cart      2
9                  rt_view_cart     12
10                fb_homescreen     63
11              fb_view_product     37
12            rt_home_start_app     46
13             fb_view_wishlist     39
14              create_campaign      1
15                    rt_search     12
16                   guest_sale      1
17             remove_from_cart      1
18              rt_view_listing     50

Solution 2 - Python

Another option is to use DataFrame.from_records method

import pandas as pd
from collections import Counter

c = Counter({'fb_view_listing': 76, 'fb_homescreen': 63, 'rt_view_listing': 50, 'rt_home_start_app': 46, 'fb_view_wishlist': 39, 'fb_view_product': 37, 'fb_search': 29, 'rt_view_product': 23, 'fb_view_cart': 22, 'rt_search': 12, 'rt_view_cart': 12, 'add_to_cart': 2, 'create_campaign': 1, 'fb_connect': 1, 'sale': 1, 'guest_sale': 1, 'remove_from_cart': 1, 'rt_transaction_confirmation': 1, 'login': 1})

df = pd.DataFrame.from_records(list(dict(c).items()), columns=['page','count'])

It's a one-liner and speed seems to be the same.

Or use this variant to have them sorted by most used. Again the performance is about the same.

df = pd.DataFrame.from_records(c.most_common(), columns=['page','count'])

Solution 3 - Python

If you want two columns, set the keyword argument orient='index' when creating a DataFrame from a dictionary using from_dict:

final_df = pd.DataFrame.from_dict(final, orient='index')

See the documentation on DataFrame.from_dict

Solution 4 - Python

I found it more useful to transform the Counter to a pandas Series that is already ordered by count and where the ordered items are the index, so I used zip:

def counter_to_series(counter):
  if not counter:
    return pd.Series() 
  counter_as_tuples = counter.most_common(len(counter)) 

  items, counts = zip(*counter_as_tuples)
  return pd.Series(counts, index=items)

The most_common method of the counter object returns a list of (item, count) tuples. zip will throw an exception when the counter has no items, so an empty Counter must be checked beforehand.

Solution 5 - Python

The error you got was probably "If using all scalar values, you must pass an index." To fix this, just provide an index (e.g., "count") and then transpose:

final_df = pd.DataFrame(final, index=['count']).transpose()

Done. You can rename the index afterwards if you wish.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionwoshitomView Question on Stackoverflow
Solution 1 - PythonEdChumView Answer on Stackoverflow
Solution 2 - PythonpvasekView Answer on Stackoverflow
Solution 3 - PythongalathView Answer on Stackoverflow
Solution 4 - PythonSuzanaView Answer on Stackoverflow
Solution 5 - PythonDavid RView Answer on Stackoverflow