du counting hardlinks towards filesize?

LinuxDiskspaceDu

Linux Problem Overview


I have a backup system that creates directories named after Unix Timestamps, and then creates incremental backups using a hardlink system (--link-dest in rsync), so typically the first backup is very large, and then later backups are fractions as big.

This is my output of my current backups:

root@athos:/media/awesomeness_drive# du -sh lantea_home/*
31G	lantea_home/1384197192
17M	lantea_home/1384205953
17M	lantea_home/1384205979
17M	lantea_home/1384206056
17M	lantea_home/1384206195
17M	lantea_home/1384207349
3.1G	lantea_home/1384207678
14M	lantea_home/1384208111
14M	lantea_home/1384208128
16M	lantea_home/1384232401
15G	lantea_home/1384275601
43M	lantea_home/1384318801

Everything seems correct, however, take for example the last directory, lantea_home/1384318801:

root@athos:/media/awesomeness_drive# du -sh lantea_home/1384318801/
28G	lantea_home/1384318801/

I consistently get this behavior, why is the directory considered 28G by the second du command?

Note - the output remains the same with the -P and -L flags.

Linux Solutions


Solution 1 - Linux

Hardlinks are real references to the same file (represented by its inode). There is no difference between the "original" file and a hard link pointing to it as well. Both files have the same status, both are then references to this file. Removing one of them lets the other stay intact. Only removing the last hardlink will remove the file at last and free the disk space.

So if you ask du what it sees in one directory only, it does not care that there are hardlinks elsewhere pointing to the same contents. It simply counts all the files' sizes and sums them up. Only hardlinks within the considered directory are not counted more than once. du is that clever (not all programs necessarily need to be).

So in effect, directory A might have a du size of 28G, directory B might have a size of 29G, but together they still only occupy 30G and if you ask du of the size of A and B, you will get that number.

Disc usage by several directories when hardlinks are involved.

Solution 2 - Linux

And with the switch "-l" du counts the hardlinks in every subdir too, so I can see, how big the whole backup is, not only the increment delta.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDan LaMannaView Question on Stackoverflow
Solution 1 - LinuxAlfeView Answer on Stackoverflow
Solution 2 - LinuxTobiasView Answer on Stackoverflow