Counting the number of files in a directory using Java

JavaPerformanceFileDirectory

Java Problem Overview


How do I count the number of files in a directory using Java ? For simplicity, lets assume that the directory doesn't have any sub-directories.

I know the standard method of :

new File(<directory path>).listFiles().length

But this will effectively go through all the files in the directory, which might take long if the number of files is large. Also, I don't care about the actual files in the directory unless their number is greater than some fixed large number (say 5000).

I am guessing, but doesn't the directory (or its i-node in case of Unix) store the number of files contained in it? If I could get that number straight away from the file system, it would be much faster. I need to do this check for every HTTP request on a Tomcat server before the back-end starts doing the real processing. Therefore, speed is of paramount importance.

I could run a daemon every once in a while to clear the directory. I know that, so please don't give me that solution.

Java Solutions


Solution 1 - Java

Ah... the rationale for not having a straightforward method in Java to do that is file storage abstraction: some filesystems may not have the number of files in a directory readily available... that count may not even have any meaning at all (see for example distributed, P2P filesystems, fs that store file lists as a linked list, or database-backed filesystems...). So yes,

new File(<directory path>).list().length

is probably your best bet.

Solution 2 - Java

Since Java 8, you can do that in three lines:

try (Stream<Path> files = Files.list(Paths.get("your/path/here"))) {
    long count = files.count();
}

Regarding the 5000 child nodes and inode aspects:

This method will iterate over the entries but as Varkhan suggested you probably can't do better besides playing with JNI or direct system commands calls, but even then, you can never be sure these methods don't do the same thing!

However, let's dig into this a little:

Looking at JDK8 source, Files.list exposes a stream that uses an Iterable from Files.newDirectoryStream that delegates to FileSystemProvider.newDirectoryStream.

On UNIX systems (decompiled sun.nio.fs.UnixFileSystemProvider.class), it loads an iterator: A sun.nio.fs.UnixSecureDirectoryStream is used (with file locks while iterating through the directory).

So, there is an iterator that will loop through the entries here.

Now, let's look to the counting mechanism.

The actual count is performed by the count/sum reducing API exposed by Java 8 streams. In theory, this API can perform parallel operations without much effort (with multihtreading). However the stream is created with parallelism disabled so it's a no go...

The good side of this approach is that it won't load the array in memory as the entries will be counted by an iterator as they are read by the underlying (Filesystem) API.

Finally, for the information, conceptually in a filesystem, a directory node is not required to hold the number of the files that it contains, it can just contain the list of it's child nodes (list of inodes). I'm not an expert on filesystems, but I believe that UNIX filesystems work just like that. So you can't assume there is a way to have this information directly (i.e: there can always be some list of child nodes hidden somewhere).

Solution 3 - Java

Unfortunately, I believe that is already the best way (although list() is slightly better than listFiles(), since it doesn't construct File objects).

Solution 4 - Java

This might not be appropriate for your application, but you could always try a native call (using jni or jna), or exec a platform-specific command and read the output before falling back to list().length. On *nix, you could exec ls -1a | wc -l (note - that's dash-one-a for the first command, and dash-lowercase-L for the second). Not sure what would be right on windows - perhaps just a dir and look for the summary.

Before bothering with something like this I'd strongly recommend you create a directory with a very large number of files and just see if list().length really does take too long. As this blogger suggests, you may not want to sweat this.

I'd probably go with Varkhan's answer myself.

Solution 5 - Java

Since you don't really need the total number, and in fact want to perform an action after a certain number (in your case 5000), you can use java.nio.file.Files.newDirectoryStream. The benefit is that you can exit early instead having to go through the entire directory just to get a count.

public boolean isOverMax(){
    Path dir = Paths.get("C:/foo/bar");
    int i = 1;

    try (DirectoryStream<Path> stream = Files.newDirectoryStream(dir)) {
        for (Path p : stream) {
            //larger than max files, exit
            if (++i > MAX_FILES) {
                return true;
            }
        }
    } catch (IOException ex) {
        ex.printStackTrace();
    }

    return false;
}

The interface doc for DirectoryStream also has some good examples.

Solution 6 - Java

If you have directories containing really (>100'000) many files, here is a (non-portable) way to go:

String directoryPath = "a path";

// -f flag is important, because this way ls does not sort it output,
// which is way faster
String[] params = { "/bin/sh", "-c",
    "ls -f " + directoryPath + " | wc -l" };
Process process = Runtime.getRuntime().exec(params);
BufferedReader reader = new BufferedReader(new InputStreamReader(
    process.getInputStream()));
String fileCount = reader.readLine().trim() - 2; // accounting for .. and .
reader.close();
System.out.println(fileCount);

Solution 7 - Java

Using sigar should help. Sigar has native hooks to get the stats

new Sigar().getDirStat(dir).getTotal()

Solution 8 - Java

This method works for me very well.

	// Recursive method to recover files and folders and to print the information
public static void listFiles(String directoryName) {

	File file = new File(directoryName);
	File[] fileList = file.listFiles(); // List files inside the main dir
	int j;
	String extension;
	String fileName;

	if (fileList != null) {
		for (int i = 0; i < fileList.length; i++) {
			extension = "";
			if (fileList[i].isFile()) {
				fileName = fileList[i].getName();

				if (fileName.lastIndexOf(".") != -1 && fileName.lastIndexOf(".") != 0) {
					extension = fileName.substring(fileName.lastIndexOf(".") + 1);
					System.out.println("THE " + fileName + "  has the extension =   " + extension);
				} else {
					extension = "Unknown";
					System.out.println("extension2 =    " + extension);
				}

				filesCount++;
				allStats.add(new FilePropBean(filesCount, fileList[i].getName(), fileList[i].length(), extension,
						fileList[i].getParent()));
			} else if (fileList[i].isDirectory()) {
				filesCount++;
				extension = "";
				allStats.add(new FilePropBean(filesCount, fileList[i].getName(), fileList[i].length(), extension,
						fileList[i].getParent()));
				listFiles(String.valueOf(fileList[i]));
			}
		}
	}
}

Solution 9 - Java

Unfortunately, as mmyers said, File.list() is about as fast as you are going to get using Java. If speed is as important as you say, you may want to consider doing this particular operation using JNI. You can then tailor your code to your particular situation and filesystem.

Solution 10 - Java

public void shouldGetTotalFilesCount() {
    Integer reduce = of(listRoots()).parallel().map(this::getFilesCount).reduce(0, ((a, b) -> a + b));
}

private int getFilesCount(File directory) {
    File[] files = directory.listFiles();
    return Objects.isNull(files) ? 1 : Stream.of(files)
            .parallel()
            .reduce(0, (Integer acc, File p) -> acc + getFilesCount(p), (a, b) -> a + b);
}

Solution 11 - Java

Count files in directory and all subdirectories.

var path = Path.of("your/path/here");
var count = Files.walk(path).filter(Files::isRegularFile).count();

Solution 12 - Java

In spring batch I did below

private int getFilesCount() throws IOException {
		ResourcePatternResolver resolver = new PathMatchingResourcePatternResolver();
		Resource[] resources = resolver.getResources("file:" + projectFilesFolder + "/**/input/splitFolder/*.csv");
		return resources.length;
	}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questioneuphoria83View Question on Stackoverflow
Solution 1 - JavaVarkhanView Answer on Stackoverflow
Solution 2 - JavasuperbobView Answer on Stackoverflow
Solution 3 - JavaMichael MyersView Answer on Stackoverflow
Solution 4 - JavaMarty LambView Answer on Stackoverflow
Solution 5 - JavamateuscbView Answer on Stackoverflow
Solution 6 - JavaRenaudView Answer on Stackoverflow
Solution 7 - Javauser2162827View Answer on Stackoverflow
Solution 8 - JavaMaged AlmaweriView Answer on Stackoverflow
Solution 9 - JavaSebastian CelisView Answer on Stackoverflow
Solution 10 - JavaSergii PovzaniukView Answer on Stackoverflow
Solution 11 - JavaRené WinklerView Answer on Stackoverflow
Solution 12 - JavaSanthosh HirekerurView Answer on Stackoverflow