How to get all files under a specific directory in MATLAB?

MatlabFileRecursionFile IoDirectory

Matlab Problem Overview


I need to get all those files under D:\dic and loop over them to further process individually.

Does MATLAB support this kind of operations?

It can be done in other scripts like PHP,Python...

Matlab Solutions


Solution 1 - Matlab

Update: Given that this post is quite old, and I've modified this utility a lot for my own use during that time, I thought I should post a new version. My newest code can be found on The MathWorks File Exchange: dirPlus.m. You can also get the source from GitHub.

I made a number of improvements. It now gives you options to prepend the full path or return just the file name (incorporated from Doresoom and Oz Radiano) and apply a regular expression pattern to the file names (incorporated from Peter D). In addition, I added the ability to apply a validation function to each file, allowing you to select them based on criteria other than just their names (i.e. file size, content, creation date, etc.).


NOTE: In newer versions of MATLAB (R2016b and later), the dir function has recursive search capabilities! So you can do this to get a list of all *.m files in all subfolders of the current folder:

dirData = dir('**/*.m');

Old code: (for posterity)

Here's a function that searches recursively through all subdirectories of a given directory, collecting a list of all file names it finds:

function fileList = getAllFiles(dirName)

  dirData = dir(dirName);      %# Get the data for the current directory
  dirIndex = [dirData.isdir];  %# Find the index for directories
  fileList = {dirData(~dirIndex).name}';  %'# Get a list of the files
  if ~isempty(fileList)
    fileList = cellfun(@(x) fullfile(dirName,x),...  %# Prepend path to files
                       fileList,'UniformOutput',false);
  end
  subDirs = {dirData(dirIndex).name};  %# Get a list of the subdirectories
  validIndex = ~ismember(subDirs,{'.','..'});  %# Find index of subdirectories
                                               %#   that are not '.' or '..'
  for iDir = find(validIndex)                  %# Loop over valid subdirectories
    nextDir = fullfile(dirName,subDirs{iDir});    %# Get the subdirectory path
    fileList = [fileList; getAllFiles(nextDir)];  %# Recursively call getAllFiles
  end

end

After saving the above function somewhere on your MATLAB path, you can call it in the following way:

fileList = getAllFiles('D:\dic');

Solution 2 - Matlab

You're looking for dir to return the directory contents.

To loop over the results, you can simply do the following:

dirlist = dir('.');
for i = 1:length(dirlist)
    dirlist(i)
end

This should give you output in the following format, e.g.:

name: 'my_file'
date: '01-Jan-2010 12:00:00'
bytes: 56
isdir: 0
datenum: []

Solution 3 - Matlab

I used the code mentioned in this great answer and expanded it to support 2 additional parameters which I needed in my case. The parameters are file extensions to filter on and a flag indicating whether to concatenate the full path to the name of the file or not.

I hope it is clear enough and someone will finds it beneficial.

function fileList = getAllFiles(dirName, fileExtension, appendFullPath)

  dirData = dir([dirName '/' fileExtension]);      %# Get the data for the current directory
  dirWithSubFolders = dir(dirName);
  dirIndex = [dirWithSubFolders.isdir];  %# Find the index for directories
  fileList = {dirData.name}';  %'# Get a list of the files
  if ~isempty(fileList)
    if appendFullPath
      fileList = cellfun(@(x) fullfile(dirName,x),...  %# Prepend path to files
                       fileList,'UniformOutput',false);
    end
  end
  subDirs = {dirWithSubFolders(dirIndex).name};  %# Get a list of the subdirectories
  validIndex = ~ismember(subDirs,{'.','..'});  %# Find index of subdirectories
                                               %#   that are not '.' or '..'
  for iDir = find(validIndex)                  %# Loop over valid subdirectories
    nextDir = fullfile(dirName,subDirs{iDir});    %# Get the subdirectory path
    fileList = [fileList; getAllFiles(nextDir, fileExtension, appendFullPath)];  %# Recursively call getAllFiles
  end

end

Example for running the code:

fileList = getAllFiles(dirName, '*.xml', 0); %#0 is false obviously

Solution 4 - Matlab

You can use regexp or strcmp to eliminate . and .. Or you could use the isdir field if you only want files in the directory, not folders.

list=dir(pwd);  %get info of files/folders in current directory
isfile=~[list.isdir]; %determine index of files vs folders
filenames={list(isfile).name}; %create cell array of file names

or combine the last two lines:

filenames={list(~[list.isdir]).name};

For a list of folders in the directory excluding . and ..

dirnames={list([list.isdir]).name};
dirnames=dirnames(~(strcmp('.',dirnames)|strcmp('..',dirnames)));

From this point, you should be able to throw the code in a nested for loop, and continue searching each subfolder until your dirnames returns an empty cell for each subdirectory.

Solution 5 - Matlab

This answer does not directly answer the question but may be a good solution outside of the box.

I upvoted gnovice's solution, but want to offer another solution: Use the system dependent command of your operating system:

tic
asdfList = getAllFiles('../TIMIT_FULL/train');
toc
% Elapsed time is 19.066170 seconds.

tic
[status,cmdout] = system('find ../TIMIT_FULL/train/ -iname "*.wav"');
C = strsplit(strtrim(cmdout));
toc
% Elapsed time is 0.603163 seconds.

Positive:

  • Very fast (in my case for a database of 18000 files on linux).
  • You can use well tested solutions.
  • You do not need to learn or reinvent a new syntax to select i.e. *.wav files.

Negative:

  • You are not system independent.
  • You rely on a single string which may be hard to parse.

Solution 6 - Matlab

I don't know a single-function method for this, but you can use genpath to recurse a list of subdirectories only. This list is returned as a semicolon-delimited string of directories, so you'll have to separate it using strread, i.e.

dirlist = strread(genpath('/path/of/directory'),'%s','delimiter',';')

If you don't want to include the given directory, remove the first entry of dirlist, i.e. dirlist(1)=[]; since it is always the first entry.

Then get the list of files in each directory with a looped dir.

filenamelist=[];
for d=1:length(dirlist)
    % keep only filenames
    filelist=dir(dirlist{d});
    filelist={filelist.name};

    % remove '.' and '..' entries
    filelist([strmatch('.',filelist,'exact');strmatch('..',filelist,'exact'))=[];
    % or to ignore all hidden files, use filelist(strmatch('.',filelist))=[];

    % prepend directory name to each filename entry, separated by filesep*
    for f=1:length(filelist)
        filelist{f}=[dirlist{d} filesep filelist{f}];
    end

    filenamelist=[filenamelist filelist];
end

filesep returns the directory separator for the platform on which MATLAB is running.

This gives you a list of filenames with full paths in the cell array filenamelist. Not the neatest solution, I know.

Solution 7 - Matlab

This is a handy function for getting filenames, with the specified format (usually .mat) in a root folder!

    function filenames = getFilenames(rootDir, format)
        % Get filenames with specified `format` in given `foler` 
        %
        % Parameters
        % ----------
        % - rootDir: char vector
        %   Target folder
        % - format: char vector = 'mat'
        %   File foramt
        
        % default values
        if ~exist('format', 'var')
            format = 'mat';
        end
        
        format = ['*.', format];
        filenames = dir(fullfile(rootDir, format));
        filenames = arrayfun(...
            @(x) fullfile(x.folder, x.name), ...
            filenames, ...
            'UniformOutput', false ...
        );
    end

In your case, you can use the following snippet :)

filenames = getFilenames('D:/dic/**');
for i = 1:numel(filenames)
    filename = filenames{i};
    % do your job!
end

Solution 8 - Matlab

With little modification but almost similar approach to get the full file path of each sub folder

dataFolderPath = 'UCR_TS_Archive_2015/';

dirData = dir(dataFolderPath);      %# Get the data for the current directory
dirIndex = [dirData.isdir];  %# Find the index for directories
fileList = {dirData(~dirIndex).name}';  %'# Get a list of the files
if ~isempty(fileList)
    fileList = cellfun(@(x) fullfile(dataFolderPath,x),...  %# Prepend path to files
        fileList,'UniformOutput',false);
end
subDirs = {dirData(dirIndex).name};  %# Get a list of the subdirectories
validIndex = ~ismember(subDirs,{'.','..'});  %# Find index of subdirectories
%#   that are not '.' or '..'
for iDir = find(validIndex)                  %# Loop over valid subdirectories
    nextDir = fullfile(dataFolderPath,subDirs{iDir});    %# Get the subdirectory path
    getAllFiles = dir(nextDir);
    for k = 1:1:size(getAllFiles,1)
        validFileIndex = ~ismember(getAllFiles(k,1).name,{'.','..'});
        if(validFileIndex)
            filePathComplete = fullfile(nextDir,getAllFiles(k,1).name);
            fprintf('The Complete File Path: %s\n', filePathComplete);
        end
    end
end  

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionGtkerView Question on Stackoverflow
Solution 1 - MatlabgnoviceView Answer on Stackoverflow
Solution 2 - MatlabJames BView Answer on Stackoverflow
Solution 3 - MatlabOz RadianoView Answer on Stackoverflow
Solution 4 - MatlabDoresoomView Answer on Stackoverflow
Solution 5 - MatlabLukasView Answer on Stackoverflow
Solution 6 - MatlabJS NgView Answer on Stackoverflow
Solution 7 - MatlabYasView Answer on Stackoverflow
Solution 8 - MatlabTanmoy MondalView Answer on Stackoverflow