glob pattern matching in .NET

C#.NetGlob

C# Problem Overview


Is there a built-in mechanism in .NET to match patterns other than Regular Expressions? I'd like to match using UNIX style (glob) wildcards (* = any number of any character).

I'd like to use this for a end-user facing control. I fear that permitting all RegEx capabilities will be very confusing.

C# Solutions


Solution 1 - C#

I like my code a little more semantic, so I wrote this extension method:

using System.Text.RegularExpressions;

namespace Whatever
{
    public static class StringExtensions
    {
        /// <summary>
        /// Compares the string against a given pattern.
        /// </summary>
        /// <param name="str">The string.</param>
        /// <param name="pattern">The pattern to match, where "*" means any sequence of characters, and "?" means any single character.</param>
        /// <returns><c>true</c> if the string matches the given pattern; otherwise <c>false</c>.</returns>
        public static bool Like(this string str, string pattern)
        {
            return new Regex(
                "^" + Regex.Escape(pattern).Replace(@"\*", ".*").Replace(@"\?", ".") + "$",
                RegexOptions.IgnoreCase | RegexOptions.Singleline
            ).IsMatch(str);
        }
    }
}

(change the namespace and/or copy the extension method to your own string extensions class)

Using this extension, you can write statements like this:

if (File.Name.Like("*.jpg"))
{
   ....
}

Just sugar to make your code a little more legible :-)

Solution 2 - C#

Just for the sake of completeness. Since 2016 in dotnet core there is a new nuget package called Microsoft.Extensions.FileSystemGlobbing that supports advanced globing paths. (Nuget Package)

some examples might be, searching for wildcard nested folder structures and files which is very common in web development scenarios.

  • wwwroot/app/**/*.module.js
  • wwwroot/app/**/*.js

This works somewhat similar with what .gitignore files use to determine which files to exclude from source control.

Solution 3 - C#

I found the actual code for you:

Regex.Escape( wildcardExpression ).Replace( @"\*", ".*" ).Replace( @"\?", "." );

Solution 4 - C#

The 2- and 3-argument variants of the listing methods like GetFiles() and EnumerateDirectories() take a search string as their second argument that supports filename globbing, with both * and ?.

class GlobTestMain
{
    static void Main(string[] args)
    {
        string[] exes = Directory.GetFiles(Environment.CurrentDirectory, "*.exe");
        foreach (string file in exes)
        {
            Console.WriteLine(Path.GetFileName(file));
        }
    }
}

would yield

GlobTest.exe
GlobTest.vshost.exe

The docs state that there are some caveats with matching extensions. It also states that 8.3 file names are matched (which may be generated automatically behind the scenes), which can result in "duplicate" matches in given some patterns.

The methods that support this are GetFiles(), GetDirectories(), and GetFileSystemEntries(). The Enumerate variants also support this.

Solution 5 - C#

If you use VB.Net, you can use the Like statement, which has Glob like syntax.

http://www.getdotnetcode.com/gdncstore/free/Articles/Intoduction%20to%20the%20VB%20NET%20Like%20Operator.htm

Solution 6 - C#

I wrote a FileSelector class that does selection of files based on filenames. It also selects files based on time, size, and attributes. If you just want filename globbing then you express the name in forms like "*.txt" and similar. If you want the other parameters then you specify a boolean logic statement like "name = *.xls and ctime < 2009-01-01" - implying an .xls file created before January 1st 2009. You can also select based on the negative: "name != *.xls" means all files that are not xls.

Check it out. Open source. Liberal license. Free to use elsewhere.

Solution 7 - C#

I have written a globbing library for .NETStandard, with tests and benchmarks. My goal was to produce a library for .NET, with minimal dependencies, that doesn't use Regex, and outperforms Regex.

You can find it here:

Solution 8 - C#

If you want to avoid regular expressions this is a basic glob implementation:

public static class Globber
{
    public static bool Glob(this string value, string pattern)
    {
        int pos = 0;

        while (pattern.Length != pos)
        {
            switch (pattern[pos])
            {
                case '?':
                    break;

                case '*':
                    for (int i = value.Length; i >= pos; i--)
                    {
                        if (Glob(value.Substring(i), pattern.Substring(pos + 1)))
                        {
                            return true;
                        }
                    }
                    return false;

                default:
                    if (value.Length == pos || char.ToUpper(pattern[pos]) != char.ToUpper(value[pos]))
                    {
                        return false;
                    }
                    break;
            }

            pos++;
        }

        return value.Length == pos;
    }
}

Use it like this:

Assert.IsTrue("text.txt".Glob("*.txt"));

Solution 9 - C#

Based on previous posts, I threw together a C# class:

using System;
using System.Text.RegularExpressions;

public class FileWildcard
{
    Regex mRegex;

    public FileWildcard(string wildcard)
    {
        string pattern = string.Format("^{0}$", Regex.Escape(wildcard)
            .Replace(@"\*", ".*").Replace(@"\?", "."));
        mRegex = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline);
    }
    public bool IsMatch(string filenameToCompare)
    {
        return mRegex.IsMatch(filenameToCompare);
    }
}

Using it would go something like this:

FileWildcard w = new FileWildcard("*.txt");
if (w.IsMatch("Doug.Txt"))
   Console.WriteLine("We have a match");

The matching is NOT the same as the System.IO.Directory.GetFiles() method, so don't use them together.

Solution 10 - C#

From C# you can use .NET's LikeOperator.LikeString method. That's the backing implementation for VB's LIKE operator. It supports patterns using *, ?, #, [charlist], and [!charlist].

You can use the LikeString method from C# by adding a reference to the Microsoft.VisualBasic.dll assembly, which is included with every version of the .NET Framework. Then you invoke the LikeString method just like any other static .NET method:

using Microsoft.VisualBasic;
using Microsoft.VisualBasic.CompilerServices;
...
bool isMatch = LikeOperator.LikeString("I love .NET!", "I love *", CompareMethod.Text);
// isMatch should be true.

Solution 11 - C#

https://www.nuget.org/packages/Glob.cs

https://github.com/mganss/Glob.cs

A GNU Glob for .NET.

You can get rid of the package reference after installing and just compile the single Glob.cs source file.

And as it's an implementation of GNU Glob it's cross platform and cross language once you find another similar implementation enjoy!

Solution 12 - C#

I don't know if the .NET framework has glob matching, but couldn't you replace the * with .*? and use regexes?

Solution 13 - C#

Just out of curiosity I've glanced into Microsoft.Extensions.FileSystemGlobbing - and it was dragging quite huge dependencies on quite many libraries - I've decided why I cannot try to write something similar?

Well - easy to say than done, I've quickly noticed that it was not so trivial function after all - for example "*.txt" should match for files only in current directly, while "**.txt" should also harvest sub folders.

Microsoft also tests some odd matching pattern sequences like "./*.txt" - I'm not sure who actually needs "./" kind of string - since they are removed anyway while processing. (https://github.com/aspnet/FileSystem/blob/dev/test/Microsoft.Extensions.FileSystemGlobbing.Tests/PatternMatchingTests.cs)

Anyway, I've coded my own function - and there will be two copies of it - one in svn (I might bugfix it later on) - and I'll copy one sample here as well for demo purposes. I recommend to copy paste from svn link.

SVN Link:

https://sourceforge.net/p/syncproj/code/HEAD/tree/SolutionProjectBuilder.cs#l800 (Search for matchFiles function if not jumped correctly).

And here is also local function copy:

/// <summary>
/// Matches files from folder _dir using glob file pattern.
/// In glob file pattern matching * reflects to any file or folder name, ** refers to any path (including sub-folders).
/// ? refers to any character.
/// 
/// There exists also 3-rd party library for performing similar matching - 'Microsoft.Extensions.FileSystemGlobbing'
/// but it was dragging a lot of dependencies, I've decided to survive without it.
/// </summary>
/// <returns>List of files matches your selection</returns>
static public String[] matchFiles( String _dir, String filePattern )
{
    if (filePattern.IndexOfAny(new char[] { '*', '?' }) == -1)      // Speed up matching, if no asterisk / widlcard, then it can be simply file path.
    {
        String path = Path.Combine(_dir, filePattern);
        if (File.Exists(path))
            return new String[] { filePattern };
        return new String[] { };
    }

    String dir = Path.GetFullPath(_dir);        // Make it absolute, just so we can extract relative path'es later on.
    String[] pattParts = filePattern.Replace("/", "\\").Split('\\');
    List<String> scanDirs = new List<string>();
    scanDirs.Add(dir);

    //
    //  By default glob pattern matching specifies "*" to any file / folder name, 
    //  which corresponds to any character except folder separator - in regex that's "[^\\]*"
    //  glob matching also allow double astrisk "**" which also recurses into subfolders. 
    //  We split here each part of match pattern and match it separately.
    //
    for (int iPatt = 0; iPatt < pattParts.Length; iPatt++)
    {
        bool bIsLast = iPatt == (pattParts.Length - 1);
        bool bRecurse = false;

        String regex1 = Regex.Escape(pattParts[iPatt]);         // Escape special regex control characters ("*" => "\*", "." => "\.")
        String pattern = Regex.Replace(regex1, @"\\\*(\\\*)?", delegate (Match m)
            {
                if (m.ToString().Length == 4)   // "**" => "\*\*" (escaped) - we need to recurse into sub-folders.
                {
                    bRecurse = true;
                    return ".*";
                }
                else
                    return @"[^\\]*";
            }).Replace(@"\?", ".");

        if (pattParts[iPatt] == "..")                           // Special kind of control, just to scan upper folder.
        {
            for (int i = 0; i < scanDirs.Count; i++)
                scanDirs[i] = scanDirs[i] + "\\..";
            
            continue;
        }
            
        Regex re = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
        int nScanItems = scanDirs.Count;
        for (int i = 0; i < nScanItems; i++)
        {
            String[] items;
            if (!bIsLast)
                items = Directory.GetDirectories(scanDirs[i], "*", (bRecurse) ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly);
            else
                items = Directory.GetFiles(scanDirs[i], "*", (bRecurse) ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly);

            foreach (String path in items)
            {
                String matchSubPath = path.Substring(scanDirs[i].Length + 1);
                if (re.Match(matchSubPath).Success)
                    scanDirs.Add(path);
            }
        }
        scanDirs.RemoveRange(0, nScanItems);    // Remove items what we have just scanned.
    } //for

    //  Make relative and return.
    return scanDirs.Select( x => x.Substring(dir.Length + 1) ).ToArray();
} //matchFiles

If you find any bugs, I'll be grad to fix them.

Solution 14 - C#

I wrote a solution that does it. It does not depend on any library and it does not support "!" or "[]" operators. It supports the following search patterns:

C:\Logs\*.txt

C:\Logs\\*P1?\\asd*.pdf

    /// <summary>
    /// Finds files for the given glob path. It supports ** * and ? operators. It does not support !, [] or ![] operators
    /// </summary>
    /// <param name="path">the path</param>
    /// <returns>The files that match de glob</returns>
    private ICollection<FileInfo> FindFiles(string path)
    {
        List<FileInfo> result = new List<FileInfo>();
        //The name of the file can be any but the following chars '<','>',':','/','\','|','?','*','"'
        const string folderNameCharRegExp = @"[^\<\>:/\\\|\?\*" + "\"]";
        const string folderNameRegExp = folderNameCharRegExp + "+";
        //We obtain the file pattern
        string filePattern = Path.GetFileName(path);
        List<string> pathTokens = new List<string>(Path.GetDirectoryName(path).Split('\\', '/'));
        //We obtain the root path from where the rest of files will obtained 
        string rootPath = null;
        bool containsWildcardsInDirectories = false;
        for (int i = 0; i < pathTokens.Count; i++)
        {
            if (!pathTokens[i].Contains("*")
                && !pathTokens[i].Contains("?"))
            {
                if (rootPath != null)
                    rootPath += "\\" + pathTokens[i];
                else
                    rootPath = pathTokens[i];
                pathTokens.RemoveAt(0);
                i--;
            }
            else
            {
                containsWildcardsInDirectories = true;
                break;
            }
        }
        if (Directory.Exists(rootPath))
        {
            //We build the regular expression that the folders should match
            string regularExpression = rootPath.Replace("\\", "\\\\").Replace(":", "\\:").Replace(" ", "\\s");
            foreach (string pathToken in pathTokens)
            {
                if (pathToken == "**")
                {
                    regularExpression += string.Format(CultureInfo.InvariantCulture, @"(\\{0})*", folderNameRegExp);
                }
                else
                {
                    regularExpression += @"\\" + pathToken.Replace("*", folderNameCharRegExp + "*").Replace(" ", "\\s").Replace("?", folderNameCharRegExp);
                }
            }
            Regex globRegEx = new Regex(regularExpression, RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);
            string[] directories = Directory.GetDirectories(rootPath, "*", containsWildcardsInDirectories ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly);
            foreach (string directory in directories)
            {
                if (globRegEx.Matches(directory).Count > 0)
                {
                    DirectoryInfo directoryInfo = new DirectoryInfo(directory);
                    result.AddRange(directoryInfo.GetFiles(filePattern));
                }
            }
       
        }
        return result;
    }

Solution 15 - C#

Unfortunately the accepted answer will not handle escaped input correctly, because string .Replace("\*", ".*") fails to distinguish between "*" and "\*" - it will happily replace "*" in both of these strings, leading to incorrect results.

Instead, a basic tokenizer can be used to convert the glob path into a regex pattern, which can then be matched against a filename using Regex.Match. This is a more robust and flexible solution.

Here is a method to do this. It handles ?, *, and **, and surrounds each of these globs with a capture group, so the values of each glob can be inspected after the Regex has been matched.

static string GlobbedPathToRegex(ReadOnlySpan<char> pattern, ReadOnlySpan<char> dirSeparatorChars)
{
    StringBuilder builder = new StringBuilder();
    builder.Append('^');

    ReadOnlySpan<char> remainder = pattern;

    while (remainder.Length > 0)
    {
        int specialCharIndex = remainder.IndexOfAny('*', '?');

        if (specialCharIndex >= 0)
        {
            ReadOnlySpan<char> segment = remainder.Slice(0, specialCharIndex);

            if (segment.Length > 0)
            {
                string escapedSegment = Regex.Escape(segment.ToString());
                builder.Append(escapedSegment);
            }

            char currentCharacter = remainder[specialCharIndex];
            char nextCharacter = specialCharIndex < remainder.Length - 1 ? remainder[specialCharIndex + 1] : '\0';

            switch (currentCharacter)
            {
                case '*':
                    if (nextCharacter == '*')
                    {
                        // We have a ** glob expression
                        // Match any character, 0 or more times.
                        builder.Append("(.*)");

                        // Skip over **
                        remainder = remainder.Slice(specialCharIndex + 2);
                    }
                    else
                    {
                        // We have a * glob expression
                        // Match any character that isn't a dirSeparatorChar, 0 or more times.
                        if(dirSeparatorChars.Length > 0) {
                            builder.Append($"([^{Regex.Escape(dirSeparatorChars.ToString())}]*)");
                        }
                        else {
                            builder.Append("(.*)");
                        }

                        // Skip over *
                        remainder = remainder.Slice(specialCharIndex + 1);
                    }
                    break;
                case '?':
                    builder.Append("(.)"); // Regex equivalent of ?

                    // Skip over ?
                    remainder = remainder.Slice(specialCharIndex + 1);
                    break;
            }
        }
        else
        {
            // No more special characters, append the rest of the string
            string escapedSegment = Regex.Escape(remainder.ToString());
            builder.Append(escapedSegment);
            remainder = ReadOnlySpan<char>.Empty;
        }
    }

    builder.Append('$');

    return builder.ToString();
}

The to use it:

string testGlobPathInput = "/Hello/Test/Blah/**/test*123.fil?";
string globPathRegex = GlobbedPathToRegex(testGlobPathInput, "/"); // Could use "\\/" directory separator chars on Windows

Console.WriteLine($"Globbed path: {testGlobPathInput}");
Console.WriteLine($"Regex conversion: {globPathRegex}");

string testPath = "/Hello/Test/Blah/All/Hail/The/Hypnotoad/test_somestuff_123.file";
Console.WriteLine($"Test Path: {testPath}");
var regexGlobPathMatch = Regex.Match(testPath, globPathRegex);

Console.WriteLine($"Match: {regexGlobPathMatch.Success}");

for(int i = 0; i < regexGlobPathMatch.Groups.Count; i++) {
    Console.WriteLine($"Group [{i}]: {regexGlobPathMatch.Groups[i]}");
}

Output:

Globbed path: /Hello/Test/Blah/**/test*123.fil?
Regex conversion: ^/Hello/Test/Blah/(.*)/test([^/]*)123\.fil(.)$
Test Path: /Hello/Test/Blah/All/Hail/The/Hypnotoad/test_somestuff_123.file
Match: True
Group [0]: /Hello/Test/Blah/All/Hail/The/Hypnotoad/test_somestuff_123.file
Group [1]: All/Hail/The/Hypnotoad
Group [2]: _somestuff_
Group [3]: e

I have created a gist here as a canonical version of this method:

https://gist.github.com/crozone/9a10156a37c978e098e43d800c6141ad

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiondmoView Question on Stackoverflow
Solution 1 - C#mindplay.dkView Answer on Stackoverflow
Solution 2 - C#cleftherisView Answer on Stackoverflow
Solution 3 - C#Jonathan C DickinsonView Answer on Stackoverflow
Solution 4 - C#Dan MangiarelliView Answer on Stackoverflow
Solution 5 - C#torialView Answer on Stackoverflow
Solution 6 - C#CheesoView Answer on Stackoverflow
Solution 7 - C#DarrellView Answer on Stackoverflow
Solution 8 - C#Tony EdgecombeView Answer on Stackoverflow
Solution 9 - C#Doug ClutterView Answer on Stackoverflow
Solution 10 - C#Bill MeneesView Answer on Stackoverflow
Solution 11 - C#Matthew SheeranView Answer on Stackoverflow
Solution 12 - C#FerruccioView Answer on Stackoverflow
Solution 13 - C#TarmoPikaroView Answer on Stackoverflow
Solution 14 - C#JonView Answer on Stackoverflow
Solution 15 - C#RyanView Answer on Stackoverflow