Why do seemingly empty files and strings produce md5sums?

StringNullMd5sum

String Problem Overview


Consider the following:

% md5sum /dev/null
d41d8cd98f00b204e9800998ecf8427e  /dev/null
% touch empty; md5sum empty
d41d8cd98f00b204e9800998ecf8427e  empty
% echo '' | md5sum
68b329da9893e34099c7d8ad5cb9c940  -
% perl -e 'print chr(0)' | md5sum
93b885adfe0da089cdf634904fd59f71  -
% md5sum ''
md5sum: : No such file or directory

First of all, I'm surprised by the output of all these commands. If anything, I would expect the sum to be the same for all of them.

String Solutions


Solution 1 - String

The md5sum of "nothing" (a zero-length stream of characters) is d41d8cd98f00b204e9800998ecf8427e, which you're seeing in your first two examples.

The third and fourth examples are processing a single character. In the "echo" case, it's a newline, i.e.

$ echo -ne '\n' | md5sum
68b329da9893e34099c7d8ad5cb9c940 -

In the perl example, it's a single byte with value 0x00, i.e.

$ echo -ne '\x00' | md5sum
93b885adfe0da089cdf634904fd59f71 -

You can reproduce the empty checksum using "echo" as follows:

$ echo -n '' | md5sum
d41d8cd98f00b204e9800998ecf8427e -

...and using Perl as follows:

$ perl -e 'print ""' | md5sum
d41d8cd98f00b204e9800998ecf8427e  -

In all four cases, you should expect the same output from checksumming the same data, but different data should produce a wildly different checksum (that's the whole point -- even if it's only a single character that differs.)

Solution 2 - String

Why do seemingly empty files and strings produce md5sums?

Because the "sum" in the md5sum is somewhat misleading. It's not like e.g. CRC32 checksum, that is zero for the empty file.

MD5 is one of message digest algorithms. You can imagine it as a box that produces fixed-length random-looking value (hash) depending on its internal state. You change the internal state by feeding in the data.

And that box internal state is predefined, such that that it yields randomly looking hash value even before any data is fed in. For MD5, it happens to be d41d8cd98f00b204e9800998ecf8427e.

Solution 3 - String

No need for surprise. The first two produce true empty inputs to md5sum. The echo produces a newline (echo -n '' should produce an empty output; I don't have a linux machine here to check). The perl produces a single zero byte (not to be confused with C where a zero byte marks end of string). The last command is looking for a file with the empty string as its file name.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDanielView Question on Stackoverflow
Solution 1 - StringGraemeView Answer on Stackoverflow
Solution 2 - StringmykhalView Answer on Stackoverflow
Solution 3 - StringGeneView Answer on Stackoverflow