Maximum number of files/directories on Linux?

LinuxDirectoryDirectory Structure

Linux Problem Overview


I'm developing a LAMP online store, which will allow admins to upload multiple images for each item.

My concern is - right off the bat there will be 20000 items meaning roughly 60000 images.

Questions:

  1. What is the maximum number of files and/or directories on Linux?

  2. What is the usual way of handling this situation (best practice)?

My idea was to make a directory for each item, based on its unique ID, but then I'd still have 20000 directories in a main uploads directory, and it will grow indefinitely as old items won't be removed.

Thanks for any help.

Linux Solutions


Solution 1 - Linux

ext[234] filesystems have a fixed maximum number of inodes; every file or directory requires one inode. You can see the current count and limits with df -i. For example, on a 15GB ext3 filesystem, created with the default settings:

Filesystem           Inodes  IUsed   IFree IUse% Mounted on
/dev/xvda           1933312 134815 1798497    7% /

There's no limit on directories in particular beyond this; keep in mind that every file or directory requires at least one filesystem block (typically 4KB), though, even if it's a directory with only a single item in it.

As you can see, though, 80,000 inodes is unlikely to be a problem. And with the dir_index option (enablable with tune2fs), lookups in large directories aren't too much of a big deal. However, note that many administrative tools (such as ls or rm) can have a hard time dealing with directories with too many files in them. As such, it's recommended to split your files up so that you don't have more than a few hundred to a thousand items in any given directory. An easy way to do this is to hash whatever ID you're using, and use the first few hex digits as intermediate directories.

For example, say you have item ID 12345, and it hashes to 'DEADBEEF02842.......'. You might store your files under /storage/root/d/e/12345. You've now cut the number of files in each directory by 1/256th.

Solution 2 - Linux

If your server's filesystem has the dir_index feature turned on (see tune2fs(8) for details on checking and turning on the feature) then you can reasonably store upwards of 100,000 files in a directory before the performance degrades. (dir_index has been the default for new filesystems for most of the distributions for several years now, so it would only be an old filesystem that doesn't have the feature on by default.)

That said, adding another directory level to reduce the number of files in a directory by a factor of 16 or 256 would drastically improve the chances of things like ls * working without over-running the kernel's maximum argv size.

Typically, this is done by something like:

/a/a1111
/a/a1112
...
/b/b1111
...
/c/c6565
...

i.e., prepending a letter or digit to the path, based on some feature you can compute off the name. (The first two characters of md5sum or sha1sum of the file name is one common approach, but if you have unique object ids, then 'a'+ id % 16 is easy enough mechanism to determine which directory to use.)

Solution 3 - Linux

60000 is nothing, 20000 as well. But you should put group these 20000 by any means in order to speed up access to them. Maybe in groups of 100 or 1000, by taking the number of the directory and dividing it by 100, 500, 1000, whatever.

E.g., I have a project where the files have numbers. I group them in 1000s, so I have

id/1/1332
id/3/3256
id/12/12334
id/350/350934

You actually might have a hard limit - some systems have 32 bit inodes, so you are limited to a number of 2^32 per file system.

Solution 4 - Linux

In addition of the general answers (basically "don't bother that much", and "tune your filesystem", and "organize your directory with subdirectories containing a few thousand files each"):

If the individual images are small (e.g. less than a few kilobytes), instead of putting them in a folder, you could also put them in a database (e.g. with MySQL as a BLOB) or perhaps inside a GDBM indexed file. Then each small item won't consume an inode (on many filesystems, each inode wants at least some kilobytes). You could also do that for some threshold (e.g. put images bigger than 4kbytes in individual files, and smaller ones in a data base or GDBM file). Of course, don't forget to backup your data (and define a backup stategy).

Solution 5 - Linux

The year is 2014. I come back in time to add this answer. Lots of big/small files? You can use Amazon S3 and other alternatives based on Ceph like DreamObjects, where there are no directory limits to worry about.

I hope this helps someone decide from all the alternatives.

Solution 6 - Linux

md5($id) ==> 0123456789ABCDEF

$file_path = items/012/345/678/9AB/CDE/F.jpg 

1 node = 4096 subnodes (fast)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionCodeVirtuosoView Question on Stackoverflow
Solution 1 - LinuxbdonlanView Answer on Stackoverflow
Solution 2 - LinuxsarnoldView Answer on Stackoverflow
Solution 3 - LinuxglglglView Answer on Stackoverflow
Solution 4 - LinuxBasile StarynkevitchView Answer on Stackoverflow
Solution 5 - LinuxAbhishek DujariView Answer on Stackoverflow
Solution 6 - LinuxgibzView Answer on Stackoverflow