What hash algorithm does Python's dictionary mapping use?

PythonHashmap

Python Problem Overview


I was messing around with making a command line parser and was wondering what kind of hash algorithm python dict's use?

The way I have it set up, I have a pattern match algorithm which matches tokenized input sequences with a dictionary key. Some of the keys are relatively long (length 5 or 6 tuples of 6-7 character strings). I was wondering if there was a point at which long dictionary keys significantly reduce the efficiency of key retrieval.

Python Solutions


Solution 1 - Python

The hash that it uses depends on the object being used as a key -- each class can define its own __hash__() method, and the value that it returns for a particular instance is what is used for the dictionary.

Python itself provides the hash implementation for str and tuple types. A quick look at the source should reveal the exact algorithm for those.

The hash of a tuple is based on the hashes of its contents. The algorithm is essentially this (simplified slightly):

def hash(tuple):
    mult = 1000003
    x = 0x345678
    for index, item in enumerate(tuple):
        x = ((x ^ hash(item)) * mult) & (1<<32)
        mult += (82520 + (len(tuple)-index)*2)
    return x + 97531

For strings, the interpreter also iterates over every character, combining them with this (again, slightly simplified) algorithm:

def hash(string):
    x = string[0] << 7
    for chr in string[1:]:
        x = ((1000003 * x) ^ chr) & (1<<32)
    return x

A bigger issue to worry about is avoiding hash collisions. Colliding hash keys will cause a linear search as the dictionary tries to find a place to store the new object (this is now being recognized as a security issue, and tha behavior may be changing in upcoming python versions)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJoel CornettView Question on Stackoverflow
Solution 1 - PythonIan ClellandView Answer on Stackoverflow