Probability of collision when using a 32 bit hash

AlgorithmHashCollisionProbabilityCrc

Algorithm Problem Overview


I have a 10 character string key field in a database. I've used CRC32 to hash this field but I'm worry about duplicates. Could somebody show me the probability of collision in this situation?

p.s. my string field is unique in the database. If the number of string fields is 1 million, what is probability of collision ?

Algorithm Solutions


Solution 1 - Algorithm

Solution 2 - Algorithm

In the case you cite, at least one collision is essentially guaranteed. The probability of at least one collision is about 1 - 3x10-51. The average number of collisions you would expect is about 116.

In general, the average number of collisions in k samples, each a random choice among n possible values is:

N(n,k)~=k(k-1)/(2n)

The probability of at least one collision is:

p(n,k)~=1-e^(-k(k-1)/(2n))

In your case, n = 232 and k = 106.

The probability of a three-way collision in your case is about 0.01. See the Birthday Problem.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionnguyenngoc101View Question on Stackoverflow
Solution 1 - AlgorithmAdam MorrisView Answer on Stackoverflow
Solution 2 - AlgorithmMark AdlerView Answer on Stackoverflow