Item frequency count in Python

PythonCountFrequencyCounting

Python Problem Overview


Assume I have a list of words, and I want to find the number of times each word appears in that list.

An obvious way to do this is:

words = "apple banana apple strawberry banana lemon"
uniques = set(words.split())
freqs = [(item, words.split().count(item)) for item in uniques]
print(freqs)

But I find this code not very good, because the program runs through the word list twice, once to build the set, and a second time to count the number of appearances.

Of course, I could write a function to run through the list and do the counting, but that wouldn't be so Pythonic. So, is there a more efficient and Pythonic way?

Python Solutions


Solution 1 - Python

The Counter class in the collections module is purpose built to solve this type of problem:

from collections import Counter
words = "apple banana apple strawberry banana lemon"
Counter(words.split())
# Counter({'apple': 2, 'banana': 2, 'strawberry': 1, 'lemon': 1})

Solution 2 - Python

defaultdict to the rescue!

from collections import defaultdict

words = "apple banana apple strawberry banana lemon"

d = defaultdict(int)
for word in words.split():
    d[word] += 1

This runs in O(n).

Solution 3 - Python

freqs = {}
for word in words:
    freqs[word] = freqs.get(word, 0) + 1 # fetch and increment OR initialize

I think this results to the same as Triptych's solution, but without importing collections. Also a bit like Selinap's solution, but more readable imho. Almost identical to Thomas Weigel's solution, but without using Exceptions.

This could be slower than using defaultdict() from the collections library however. Since the value is fetched, incremented and then assigned again. Instead of just incremented. However using += might do just the same internally.

Solution 4 - Python

Standard approach:

from collections import defaultdict

words = "apple banana apple strawberry banana lemon"
words = words.split()
result = defaultdict(int)
for word in words:
    result[word] += 1

print result

Groupby oneliner:

from itertools import groupby

words = "apple banana apple strawberry banana lemon"
words = words.split()

result = dict((key, len(list(group))) for key, group in groupby(sorted(words)))
print result

Solution 5 - Python

If you don't want to use the standard dictionary method (looping through the list incrementing the proper dict. key), you can try this:

>>> from itertools import groupby
>>> myList = words.split() # ['apple', 'banana', 'apple', 'strawberry', 'banana', 'lemon']
>>> [(k, len(list(g))) for k, g in groupby(sorted(myList))]
[('apple', 2), ('banana', 2), ('lemon', 1), ('strawberry', 1)]

It runs in O(n log n) time.

Solution 6 - Python

Without defaultdict:

words = "apple banana apple strawberry banana lemon"
my_count = {}
for word in words.split():
    try: my_count[word] += 1
    except KeyError: my_count[word] = 1

Solution 7 - Python

user_input = list(input().split(' '))

for word in user_input:

    print('{} {}'.format(word, user_input.count(word)))

Solution 8 - Python

words = "apple banana apple strawberry banana lemon"
w=words.split()
e=list(set(w))       
word_freqs = {}
for i in e:
    word_freqs[i]=w.count(i)
print(word_freqs)   

Hope this helps!

Solution 9 - Python

Can't you just use count?

words = 'the quick brown fox jumps over the lazy gray dog'
words.count('z')
#output: 1

Solution 10 - Python

I happened to work on some Spark exercise, here is my solution.

tokens = ['quick', 'brown', 'fox', 'jumps', 'lazy', 'dog']

print {n: float(tokens.count(n))/float(len(tokens)) for n in tokens}

**#output of the above **

{'brown': 0.16666666666666666, 'lazy': 0.16666666666666666, 'jumps': 0.16666666666666666, 'fox': 0.16666666666666666, 'dog': 0.16666666666666666, 'quick': 0.16666666666666666}

Solution 11 - Python

Use reduce() to convert the list to a single dict.

from functools import reduce

words = "apple banana apple strawberry banana lemon"
reduce( lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {})

returns

{'strawberry': 1, 'lemon': 1, 'apple': 2, 'banana': 2}

Solution 12 - Python

list = input()  # Providing user input passes multiple tests
text = list.split()

for word in text:
    freq = text.count(word) 
    print(word, freq)

Solution 13 - Python

I had a similar assignment on Zybook, this is the solution that worked for me.

def build_dictionary(words):
    counts = dict()
    for word in words:
        if word in counts:
             counts[word] += 1
        else:
             counts = 1
    return counts
if __name__ == '__main__':
    words = input().split()
    your_dictionary = build_dictionary(words)
    sorted_keys = sorted(your_dictionary.keys())
    for key in sorted_keys:
        print(key + ':' + str(your_dictionary[key])) 

Solution 14 - Python

The answer below takes some extra cycles, but it is another method

def func(tup):
    return tup[-1]


def print_words(filename):
    f = open("small.txt",'r')
    whole_content = (f.read()).lower()
    print whole_content
    list_content = whole_content.split()
    dict = {}
    for one_word in list_content:
        dict[one_word] = 0
    for one_word in list_content:
        dict[one_word] += 1
    print dict.items()
    print sorted(dict.items(),key=func)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDaniyarView Question on Stackoverflow
Solution 1 - PythonsykoraView Answer on Stackoverflow
Solution 2 - PythonKenan BanksView Answer on Stackoverflow
Solution 3 - PythonhoplaView Answer on Stackoverflow
Solution 4 - PythonnoskloView Answer on Stackoverflow
Solution 5 - PythonNick PrestaView Answer on Stackoverflow
Solution 6 - PythonThomas WeigelView Answer on Stackoverflow
Solution 7 - PythondB_19View Answer on Stackoverflow
Solution 8 - PythonVarun ShaandheshView Answer on Stackoverflow
Solution 9 - PythonAntonioView Answer on Stackoverflow
Solution 10 - PythonjavaidiotView Answer on Stackoverflow
Solution 11 - PythonGadiView Answer on Stackoverflow
Solution 12 - PythonPanamaPHatView Answer on Stackoverflow
Solution 13 - PythonB1029View Answer on Stackoverflow
Solution 14 - PythonPrabhu SView Answer on Stackoverflow