MD5 vs CRC32: Which one's better for common use?

AlgorithmHash

Algorithm Problem Overview


Recently I read somewhere that although both CRC32 and MD5 are sufficiently uniform and stable, CRC32 is more efficient than MD5. MD5 seems to be a very commonly used hashing algorithm but if CRC32 is faster/more memory efficient then why not use that?

Algorithm Solutions


Solution 1 - Algorithm

MD5 is a one-way-hash algorithm. One-way-hash algorithm are often used in cryptography as they have the property (per design) that it's hard to find the input that produced a specific hash value. Specifically it's hard to make two different inputs that gives the same one-way-hash. Those they are often used as a way to show that a amount of data have not been altered intentionally since the hash code was produced. As the MD5 is a one-way-hash algorithm the emphasis is on security over speed. Unfortunately MD5 is now considered insecure.

CRC32 is designed to detect accidental changes to data and are commonly used in networks and storage devices. The purpose of this algorithm is not to protect against intentionally changes , but rather to catch accidents like network errors and disk write errors etc. The emphasis of this algorithm is those more on speed than on security.

Solution 2 - Algorithm

From Wikipedia's article on MD5 (emphasis mine):

> MD5 is a widely used cryptographic hash function

Now CRC32:

> CRC is an error-detecting code

So, as you can see, CRC32 is not a hashing algorithm. That means you should not use it for hashing, because it was not built for that.

And I think it doesn't make much sense to talk about common use, because similar algorithms are used for different purposes, each with significantly different requirements. There is no single algorithm that's best for common use, instead, you should choose the algorithm that's most suited for your specific use.

Solution 3 - Algorithm

It depends on your goals. Here are some examples what can be done with CRC32 versus MD5:

Detecting duplicate files

If you want to check if two files are the same, CRC32 checksum is the way to go because it's faster than MD5. But be careful: CRC only reliably tells you if the binaries are different; it doesn't tell you if they're identical. If you get different hashes for two files, they cannot be the same file, so you can reject them as being duplicates very quickly.

No matter what your keys are, the CRC32 checksum will be one of 2^32 different values. Assuming random sample files, the probability of collision between the hashes of two given files is 1 / 2^32. The probability of collisions between any of N given files is (N - 1) / 2^32.

Detecting malicious software

If security is an issue, like downloading a file and checking the source's hash against yours to see if the binaries aren't corrupted, then CRC is a poor option. This is because attackers can make malware that will have the same CRC checksum. In this case, an MD5 digest is more secure -- CRC was not made for security. Two different binaries are far more likely to have the same CRC checksum than the same MD5 digest.

Securing passwords for user authentication

Synchronous (one-way) encryption is usually easier, faster, and more secure than asynchronous (two-way) encryption, so it's a common method to store passwords. Basically, the password will be combined with other data (salts) then the hash will be done on all of this combined data. Random salts greatly reduce the chances of two passwords being the same. By default, the same password will have the same hash for most algorithms, so you must add your own randomness. Of course, the salt must be saved externally.

To log a user in, you just take the information they give you when they log in. You use their username to get their salt from a database. You then combine this salt with the user's password to get a new hash. If it matches the one in in the database, then their login is successful. Since you're storing these passwords, they must be VERY secure, which means a CRC checksum is out of the question.

Cryptographic digests are more expensive to compute than CRC checksums. Also, better hashes like sha256 are more secure, but slower for hashing and take up more database space (their hashes are longer).

Solution 4 - Algorithm

One big difference between CRC32 and MD5 is that it is usually easy to pick a CRC32 checksum and then come up with a message that hashes to that checksum, even if there are constraints imposed on the message, whereas MD5 is specifically designed to make this sort of thing difficult (although it is showing its age - this is now possible in some situations).

If you are in a situation where it is possible that an adversary might decide to sit down and create a load of messages with specified CRC32 hashes, to mimic other messages, or just to make a hash table perform very badly because everything hashes to the same value, then MD5 would be a better option. (Even better, IMHO, would be HMAC-MD5 with a keyed value that is unique to the module using it and unknown outside it).

Solution 5 - Algorithm

CRCs are used to guard against random errors, for example in data transmission.

Cryptographic hash functions are designed to guard against intelligent adversaries forging the message, though MD5 has been broken in that respect.

Solution 6 - Algorithm

You should use MD5 which is 128bit long. CRC32 is only 32 bit long and its purpose is to detect errors not to hash things. In case you need only a 32bit hash function you can choose 32 bits that are returned by MD5 the LSBs/MSBs/Whatever.

Solution 7 - Algorithm

Actually, CRC32 is not faster than MD5 is.

Please take a look at: <https://3v4l.org/2MAUr>

That php script runs several hashing algorithms and measures the time spent to calculate the hashes by each algorithm. It shows that MD5 is generally the fastest hashing algorithm around. And, it shows that even SHA1 is faster than MD5 in most of the test cases.

So, anyway, if you want to do some quick error-detection, or look for random changes... I would always advice to go with MD5, as it simply does it all.

Solution 8 - Algorithm

The primary reason CRC32 (or CRC8, or CRC16) is used for any purpose whatsoever is that it can be cheaply implemented in hardware as a means of detecting "random" corruption of data. Even in software implementations, it can be useful as a means of detecting random corruption of data from hardware causes (such as noisy communications line or unreliable flash media). It is not tamper-resistant, nor is it generally suitable for testing whether two arbitrary files are likely to be the same: if each chunk of data in file is immediately followed by a CRC32 of that chunk (some data formats do that), each chunk will have the same effect on the overall file's CRC as would a chunk of all zero bytes, regardless of what data was stored in that chunk.

If one has the means to calculate a CRC32 quickly, it might be helpful in conjunction with other checksum or hash methods, if different files that had identical CRC's would be likely to differ in one of the other hashes and vice versa, but on many machines other checksum or hash methods are likely to be easier to compute relative to the amount of protection they provide.

Solution 9 - Algorithm

One man's common is another man's infrequent. Common varies depending on which field you are working in.

If you are doing very quick transmissions or working out hash codes for small items, then CRCs are better since they are a lot faster and the chances of getting the same 16 or 32 bit CRC for wrong data are slim.

If it is megabytes of data, for instance, a linux iso, then you could lose a few megabytes and still end up with the same CRC. Not so likely with MD5. For that reason MD5 is normally used for huge transfers. It is slower but more reliable.

So basically, if you are going to do one huge transmission and check at the end whether you have the correct result, use MD5. If you are going to transmit in small chunks, then use CRC.

Solution 10 - Algorithm

I would say if you don't know what to choose, go for md5.

It's less probable to cause you a headache.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionbytefireView Question on Stackoverflow
Solution 1 - AlgorithmEbbe M. PedersenView Answer on Stackoverflow
Solution 2 - AlgorithmsvickView Answer on Stackoverflow
Solution 3 - AlgorithmVictor StoddardView Answer on Stackoverflow
Solution 4 - AlgorithmmcdowellaView Answer on Stackoverflow
Solution 5 - AlgorithmstarblueView Answer on Stackoverflow
Solution 6 - Algorithm0x90View Answer on Stackoverflow
Solution 7 - AlgorithmRednaelView Answer on Stackoverflow
Solution 8 - AlgorithmsupercatView Answer on Stackoverflow
Solution 9 - AlgorithmcupView Answer on Stackoverflow
Solution 10 - AlgorithmMarinos AnView Answer on Stackoverflow