Retrieving files from directory that contains large amount of files
C#FileDirectoryGetfilesC# Problem Overview
I have directory that contains nearly 14,000,000 audio samples in *.wav format.
All plain storage, no subdirectories.
I want to loop through the files, but when I use DirectoryInfo.GetFiles()
on that folder the whole application freezes for minutes!
Can this be done another way? Perhaps read 1000, process them, then take next 1000 and so on?
C# Solutions
Solution 1 - C#
Have you tried EnumerateFiles method of DirectoryInfo class?
As MSDN Says
> The EnumerateFiles
and GetFiles
methods differ as follows: When you
> use EnumerateFiles
, you can start enumerating the collection of
> FileInfo
objects before the whole collection is returned; when you
> use GetFiles
, you must wait for the whole array of FileInfo
objects to
> be returned before you can access the array. Therefore, when you are
> working with many files and directories, EnumerateFiles
can be more
> efficient.
Solution 2 - C#
In .NET 4.0, Directory.EnumerateFiles(...)
is IEnumerable<string>
(rather than the string[]
of Directory.GetFiles(...)
), so it can stream entries rather than buffer them all; i.e.
foreach(var file in Directory.EnumerateFiles(path)) {
// ...
}
Solution 3 - C#
you are hitting the limitation of Windows file system itself. When number of files in a directory grows to a large number (and 14M is way beyond that threshold), accessing the directory becomes incredibly slow. It doesn't really matter if you read one file at a time or 1000, it's just directory access.
One way to solve this is to create subdirectories and break apart your files into groups. If each directory has 1000-5000 (guessing but you can experiment with actual numbers), then you should get decent performance opening/creating/deleting files.
This is why if you look at applications like Doxygen, which creates a file for every class, they follow this scheme and put everything into 2 levels of subdirectories which use random names.
Solution 4 - C#
Use Win32 Api FindFile functions to do it without blocking the app.
You can also call Directory.GetFiles in a System.Threading.Task (TPL) to prevent your UI from freezing.
Solution 5 - C#
Enjoy.
public List<string> LoadPathToAllFiles(string pathToFolder, int numberOfFilesToReturn)
{
var dirInfo = new DirectoryInfo(pathToFolder);
var firstFiles = dirInfo.EnumerateFiles().Take(numberOfFilesToReturn).ToList();
return firstFiles.Select(l => l.FullName).ToList();
}
Solution 6 - C#
I hit this issue of accessing large files in a single directory a lot of the time. Sub-directories are a good option, but soon even they don't offer much help sometimes. What I now do is create an Index file - a text file with names of all the files in the directory (provided I am creating files in that directory). I then read the index file and then open then actual file from the directory for processing