List all files from a directory recursively with Java

JavaFile Io

Java Problem Overview


I have this function that prints the name of all the files in a directory recursively. The problem is that my code is very slow because it has to access a remote network device with every iteration.

My plan is to first load all the files from the directory recursively and then after that go through all files with the regex to filter out all the files I don't want. Does anyone have a better suggestion?

public static printFnames(String sDir) {
    File[] faFiles = new File(sDir).listFiles();
    for (File file : faFiles) {
        if (file.getName().matches("^(.*?)")) {
            System.out.println(file.getAbsolutePath());
        }
        if (file.isDirectory()) {
            printFnames(file.getAbsolutePath());
        }
    }
}

This is just a test later on I'm not going to use the code like this, instead I'm going to add the path and modification date of every file which matches an advanced regex to an array.

Java Solutions


Solution 1 - Java

Assuming this is actual production code you'll be writing, then I suggest using the solution to this sort of thing that's already been solved - Apache Commons IO, specifically FileUtils.listFiles(). It handles nested directories, filters (based on name, modification time, etc).

For example, for your regex:

Collection files = FileUtils.listFiles(
  dir, 
  new RegexFileFilter("^(.*?)"), 
  DirectoryFileFilter.DIRECTORY
);

This will recursively search for files matching the ^(.*?) regex, returning the results as a collection.

It's worth noting that this will be no faster than rolling your own code, it's doing the same thing - trawling a filesystem in Java is just slow. The difference is, the Apache Commons version will have no bugs in it.

Solution 2 - Java

In Java 8, it's a 1-liner via Files.find() with an arbitrarily large depth (eg 999) and BasicFileAttributes of isRegularFile()

public static printFnames(String sDir) {
    Files.find(Paths.get(sDir), 999, (p, bfa) -> bfa.isRegularFile()).forEach(System.out::println);
}

To add more filtering, enhance the lambda, for example all jpg files modified in the last 24 hours:

(p, bfa) -> bfa.isRegularFile()
  && p.getFileName().toString().matches(".*\\.jpg")
  && bfa.lastModifiedTime().toMillis() > System.currentMillis() - 86400000

Solution 3 - Java

This is a very simple recursive method to get all files from a given root.

It uses the Java 7 NIO Path class.

private List<String> getFileNames(List<String> fileNames, Path dir) {
    try(DirectoryStream<Path> stream = Files.newDirectoryStream(dir)) {
        for (Path path : stream) {
            if(path.toFile().isDirectory()) {
                getFileNames(fileNames, path);
            } else {
                fileNames.add(path.toAbsolutePath().toString());
                System.out.println(path.getFileName());
            }
        }
    } catch(IOException e) {
        e.printStackTrace();
    }
    return fileNames;
} 

Solution 4 - Java

With Java 7 a faster way to walk thru a directory tree was introduced with the Paths and Files functionality. They're much faster then the "old" File way.

This would be the code to walk thru and check path names with a regular expression:

public final void test() throws IOException, InterruptedException {
	final Path rootDir = Paths.get("path to your directory where the walk starts");
	
	// Walk thru mainDir directory
	Files.walkFileTree(rootDir, new FileVisitor<Path>() {
		// First (minor) speed up. Compile regular expression pattern only one time.
		private Pattern pattern = Pattern.compile("^(.*?)");
		
		@Override
		public FileVisitResult preVisitDirectory(Path path,
				BasicFileAttributes atts) throws IOException {
			
			boolean matches = pattern.matcher(path.toString()).matches();
			
			// TODO: Put here your business logic when matches equals true/false
			
			return (matches)? FileVisitResult.CONTINUE:FileVisitResult.SKIP_SUBTREE;
		}

		@Override
		public FileVisitResult visitFile(Path path, BasicFileAttributes mainAtts)
				throws IOException {
			
			boolean matches = pattern.matcher(path.toString()).matches();
			
			// TODO: Put here your business logic when matches equals true/false
			
			return FileVisitResult.CONTINUE;
		}

		@Override
		public FileVisitResult postVisitDirectory(Path path,
				IOException exc) throws IOException {
			// TODO Auto-generated method stub
			return FileVisitResult.CONTINUE;
		}

		@Override
		public FileVisitResult visitFileFailed(Path path, IOException exc)
				throws IOException {
			exc.printStackTrace();

			// If the root directory has failed it makes no sense to continue
			return path.equals(rootDir)? FileVisitResult.TERMINATE:FileVisitResult.CONTINUE;
		}
	});
}

Solution 5 - Java

The fast way to get the content of a directory using Java 7 NIO :

import java.nio.file.DirectoryStream;
import java.nio.file.Files;
import java.nio.file.FileSystems;
import java.nio.file.Path;

...

Path dir = FileSystems.getDefault().getPath( filePath );
DirectoryStream<Path> stream = Files.newDirectoryStream( dir );
for (Path path : stream) {
   System.out.println( path.getFileName() );
}
stream.close();

Solution 6 - Java

Java's interface for reading filesystem folder contents is not very performant (as you've discovered). JDK 7 fixes this with a completely new interface for this sort of thing, which should bring native level performance to these sorts of operations.

The core issue is that Java makes a native system call for every single file. On a low latency interface, this is not that big of a deal - but on a network with even moderate latency, it really adds up. If you profile your algorithm above, you'll find that the bulk of the time is spent in the pesky isDirectory() call - that's because you are incurring a round trip for every single call to isDirectory(). Most modern OSes can provide this sort of information when the list of files/folders was originally requested (as opposed to querying each individual file path for it's properties).

If you can't wait for JDK7, one strategy for addressing this latency is to go multi-threaded and use an ExecutorService with a maximum # of threads to perform your recursion. It's not great (you have to deal with locking of your output data structures), but it'll be a heck of a lot faster than doing this single threaded.

In all of your discussions about this sort of thing, I highly recommend that you compare against the best you could do using native code (or even a command line script that does roughly the same thing). Saying that it takes an hour to traverse a network structure doesn't really mean that much. Telling us that you can do it native in 7 second, but it takes an hour in Java will get people's attention.

Solution 7 - Java

this will work just fine ... and its recursive

File root = new File("ROOT PATH");
for ( File file : root.listFiles())
{
	getFilesRecursive(file);
}


private static void getFilesRecursive(File pFile)
{
	for(File files : pFile.listFiles())
	{
		if(files.isDirectory())
		{
			getFilesRecursive(files);
		}
		else
		{
			// do your thing 
            // you can either save in HashMap and use it as
            // per your requirement
		}
	}
}

Solution 8 - Java

I personally like this version of FileUtils. Here's an example that finds all mp3s or flacs in a directory or any of its subdirectories:

String[] types = {"mp3", "flac"};
Collection<File> files2 = FileUtils.listFiles(/path/to/your/dir, types , true);

Solution 9 - Java

This will work fine

public void displayAll(File path){		
	if(path.isFile()){
		System.out.println(path.getName());
	}else{
		System.out.println(path.getName());			
		File files[] = path.listFiles();
		for(File dirOrFile: files){
			displayAll(dirOrFile);
		}
	}
}

Solution 10 - Java

Java 8

public static void main(String[] args) throws IOException {

		Path start = Paths.get("C:\\data\\");
		try (Stream<Path> stream = Files.walk(start, Integer.MAX_VALUE)) {
		    List<String> collect = stream
		        .map(String::valueOf)
		        .sorted()
		        .collect(Collectors.toList());
		    
		    collect.forEach(System.out::println);
		}
		
		
	}

Solution 11 - Java

This Function will probably list all the file name and its path from its directory and its subdirectories.

public void listFile(String pathname) {
	File f = new File(pathname);
	File[] listfiles = f.listFiles();
	for (int i = 0; i < listfiles.length; i++) {
		if (listfiles[i].isDirectory()) {
			File[] internalFile = listfiles[i].listFiles();
			for (int j = 0; j < internalFile.length; j++) {
				System.out.println(internalFile[j]);
				if (internalFile[j].isDirectory()) {
					String name = internalFile[j].getAbsolutePath();
					listFile(name);
				}

			}
		} else {
			System.out.println(listfiles[i]);
		}

	}

}

Solution 12 - Java

public class GetFilesRecursive {
	public static List <String> getFilesRecursively(File dir){
		List <String> ls = new ArrayList<String>();
		for (File fObj : dir.listFiles()) {
			if(fObj.isDirectory()) {
				ls.add(String.valueOf(fObj));
				ls.addAll(getFilesRecursively(fObj));				
			} else {
				ls.add(String.valueOf(fObj));		
			}
		}

		return ls;
	}
	public static List <String> getListOfFiles(String fullPathDir) {
		List <String> ls = new ArrayList<String> ();
		File f = new File(fullPathDir);
		if (f.exists()) {
			if(f.isDirectory()) {
				ls.add(String.valueOf(f));
				ls.addAll(getFilesRecursively(f));
			}
		} else {
			ls.add(fullPathDir);
		}
		return ls;
	}

	public static void main(String[] args) {
		List <String> ls = getListOfFiles("/Users/srinivasab/Documents");
		for (String file:ls) {
			System.out.println(file);
		}
		System.out.println(ls.size());
	}
}

Solution 13 - Java

> it feels like it's stupid access the > filesystem and get the contents for > every subdirectory instead of getting > everything at once.

Your feeling is wrong. That's how filesystems work. There is no faster way (except when you have to do this repeatedly or for different patterns, you can cache all the file paths in memory, but then you have to deal with cache invalidation i.e. what happens when files are added/removed/renamed while the app runs).

Solution 14 - Java

Just so you know isDirectory() is quite a slow method. I'm finding it quite slow in my file browser. I'll be looking into a library to replace it with native code.

Solution 15 - Java

The more efficient way I found in dealing with millions of folders and files is to capture directory listing through DOS command in some file and parse it. Once you have parsed data then you can do analysis and compute statistics.

Solution 16 - Java

import java.io.*;

public class MultiFolderReading {

public void checkNoOfFiles (String filename) throws IOException {

    File dir=new File(filename);
	File files[]=dir.listFiles();//files array stores the list of files
    
 for(int i=0;i<files.length;i++)
    {
    	if(files[i].isFile()) //check whether files[i] is file or directory
    	{
    		System.out.println("File::"+files[i].getName());
    		System.out.println();
    		
    	}
    	else if(files[i].isDirectory())
    	{
    		System.out.println("Directory::"+files[i].getName());
    		System.out.println();
    		checkNoOfFiles(files[i].getAbsolutePath());
    	}
    }
}
	
public static void main(String[] args) throws IOException {
	
	MultiFolderReading mf=new MultiFolderReading();
	String str="E:\\file"; 
	mf.checkNoOfFiles(str);
   }
}

Solution 17 - Java

In Guava you don't have to wait for a Collection to be returned to you but can actually iterate over the files. It is easy to imagine a IDoSomethingWithThisFile interface in the signature of the below function:

public static void collectFilesInDir(File dir) {
    TreeTraverser<File> traverser = Files.fileTreeTraverser();
    FluentIterable<File> filesInPostOrder = traverser.preOrderTraversal(dir);
    for (File f: filesInPostOrder)
        System.out.printf("File: %s\n", f.getPath());
}

TreeTraverser also allows you to between various traversal styles.

Solution 18 - Java

Another optimized code

import java.io.File;
import java.util.ArrayList;
import java.util.List;

public class GetFilesRecursive {
	public static List <String> getFilesRecursively(File dir){
		List <String> ls = new ArrayList<String>();
		if (dir.isDirectory())
			for (File fObj : dir.listFiles()) {
				if(fObj.isDirectory()) {
					ls.add(String.valueOf(fObj));
					ls.addAll(getFilesRecursively(fObj));				
				} else {
					ls.add(String.valueOf(fObj));		
				}
			}
		else
			ls.add(String.valueOf(dir));

		return ls;
	}

	public static void main(String[] args) {
		List <String> ls = getFilesRecursively(new File("/Users/srinivasab/Documents"));
		for (String file:ls) {
			System.out.println(file);
		}
		System.out.println(ls.size());
	}
}

Solution 19 - Java

One more example of listing files and directories using Java 8 filter

public static void main(String[] args) {

System.out.println("Files!!");
        try {
            Files.walk(Paths.get("."))
                    .filter(Files::isRegularFile)
                    .filter(c ->
                            c.getFileName().toString().substring(c.getFileName().toString().length()-4).contains(".jpg")
                            ||
                            c.getFileName().toString().substring(c.getFileName().toString().length()-5).contains(".jpeg")
                    )
                    .forEach(System.out::println);

        } catch (IOException e) {
        System.out.println("No jpeg or jpg files");
        }

        System.out.println("\nDirectories!!\n");
        try {
            Files.walk(Paths.get("."))
                    .filter(Files::isDirectory)
                    .forEach(System.out::println);

        } catch (IOException e) {
            System.out.println("No Jpeg files");
        }
}

Solution 20 - Java

test folder I tested some method with 60K-files-in-284-folders on Windows 11

public class App {

    public static void main(String[] args) throws Exception {
        Path path = Paths.get("E:\\书籍");
        // 1.walkFileTree
        long start1 = System.currentTimeMillis();
        Files.walkFileTree(path, new SimpleFileVisitor<Path>() {

            @Override
            public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) {
                // if(pathMatcher.matches(file))
                // files.add(file.toFile());

                return FileVisitResult.CONTINUE;
            }

            @Override
            public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) {
                // System.out.println(dir.getFileName());
                return FileVisitResult.CONTINUE;
            }

            @Override
            public FileVisitResult visitFileFailed(Path file, IOException e) {
                return FileVisitResult.CONTINUE;
            }

        });
        long end1 = System.currentTimeMillis();
        
        // 2.newDirectoryStream
        long start2 = System.currentTimeMillis();
        search(path.toFile());
        long end2 = System.currentTimeMillis();

		// 3.listFiles
        long start3 = System.currentTimeMillis();
        getFileNames(path);
        long end3 = System.currentTimeMillis();

        System.out.println("\r执行耗时:" + (end1 - start1));
        System.out.println("\r执行耗时:" + (end2 - start2));
        System.out.println("\r执行耗时:" + (end3 - start3));

    }


    private static void getFileNames(Path dir) {
        try(DirectoryStream<Path> stream = Files.newDirectoryStream(dir)) {
            for (Path path : stream) {
                if(Files.isDirectory(path)) {
                    getFileNames(path);
                }
            }
        } catch(IOException e) {
            e.printStackTrace();
        }
    } 

    public static void search(File file) {
        Queue<File> q = new LinkedList<>();
        q.offer(file);
        while (!q.isEmpty()) {
            try {
                for (File childfile : q.poll().listFiles()) {
                    // System.out.println(childfile.getName());
                    if (childfile.isDirectory()) {
                        q.offer(childfile);
                    }
                }
            } catch (Exception e) {

            }
        }
    }
}

result(msec):
walkFileTree listFiles newDirectoryStream
68 451 493
64 464 482
61 478 457
67 477 488
59 474 466

Known performance issues:

> If you profile your algorithm above, you'll find that the bulk of the time is spent in the pesky isDirectory() call - that's because you are incurring a round trip for every single call to isDirectory(). ——Kevin Day

  1. listfiles() will create new File Object for every entry

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionHultnerView Question on Stackoverflow
Solution 1 - JavaskaffmanView Answer on Stackoverflow
Solution 2 - JavaBohemianView Answer on Stackoverflow
Solution 3 - JavaDanView Answer on Stackoverflow
Solution 4 - JavajboiView Answer on Stackoverflow
Solution 5 - JavaRealHowToView Answer on Stackoverflow
Solution 6 - JavaKevin DayView Answer on Stackoverflow
Solution 7 - JavaPrathamesh sawantView Answer on Stackoverflow
Solution 8 - JavathoulihaView Answer on Stackoverflow
Solution 9 - JavaMam'sView Answer on Stackoverflow
Solution 10 - JavaNiraj SonawaneView Answer on Stackoverflow
Solution 11 - JavaVishal MokalView Answer on Stackoverflow
Solution 12 - JavaSriView Answer on Stackoverflow
Solution 13 - JavaMichael BorgwardtView Answer on Stackoverflow
Solution 14 - JavaDaniel RyanView Answer on Stackoverflow
Solution 15 - JavaKiranView Answer on Stackoverflow
Solution 16 - JavaprajaktaView Answer on Stackoverflow
Solution 17 - JavaMarcus Junius BrutusView Answer on Stackoverflow
Solution 18 - JavaSriView Answer on Stackoverflow
Solution 19 - JavaUddhav P. GautamView Answer on Stackoverflow
Solution 20 - JavamotilyforView Answer on Stackoverflow