How to make a valid Windows filename from an arbitrary string?

C#WindowsFilenames

C# Problem Overview


I've got a string like "Foo: Bar" that I want to use as a filename, but on Windows the ":" char isn't allowed in a filename.

Is there a method that will turn "Foo: Bar" into something like "Foo- Bar"?

C# Solutions


Solution 1 - C#

Try something like this:

string fileName = "something";
foreach (char c in System.IO.Path.GetInvalidFileNameChars())
{
   fileName = fileName.Replace(c, '_');
}

Edit:

Since GetInvalidFileNameChars() will return 10 or 15 chars, it's better to use a StringBuilder instead of a simple string; the original version will take longer and consume more memory.

Solution 2 - C#

fileName = fileName.Replace(":", "-") 

However ":" is not the only illegal character for Windows. You will also have to handle:

/, \, :, *, ?, ", <, > and |

These are contained in System.IO.Path.GetInvalidFileNameChars();

Also (on Windows), "." cannot be the only character in the filename (both ".", "..", "...", and so on are invalid). Be careful when naming files with ".", for example:

echo "test" > .test.

Will generate a file named ".test"

Lastly, if you really want to do things correctly, there are some special file names you need to look out for. On Windows you can't create files named:

CON, PRN, AUX, CLOCK$, NUL
COM0, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9
LPT0, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9.

Solution 3 - C#

This isn't more efficient, but it's more fun :)

var fileName = "foo:bar";
var invalidChars = System.IO.Path.GetInvalidFileNameChars();
var cleanFileName = new string(fileName.Where(m => !invalidChars.Contains(m)).ToArray<char>());

Solution 4 - C#

In case anyone wants an optimized version based on StringBuilder, use this. Includes rkagerer's trick as an option.

static char[] _invalids;

/// <summary>Replaces characters in <c>text</c> that are not allowed in 
/// file names with the specified replacement character.</summary>
/// <param name="text">Text to make into a valid filename. The same string is returned if it is valid already.</param>
/// <param name="replacement">Replacement character, or null to simply remove bad characters.</param>
/// <param name="fancy">Whether to replace quotes and slashes with the non-ASCII characters ” and ⁄.</param>
/// <returns>A string that can be used as a filename. If the output string would otherwise be empty, returns "_".</returns>
public static string MakeValidFileName(string text, char? replacement = '_', bool fancy = true)
{
	StringBuilder sb = new StringBuilder(text.Length);
	var invalids = _invalids ?? (_invalids = Path.GetInvalidFileNameChars());
	bool changed = false;
	for (int i = 0; i < text.Length; i++) {
		char c = text[i];
		if (invalids.Contains(c)) {
			changed = true;
			var repl = replacement ?? '\0';
			if (fancy) {
				if (c == '"')       repl = '”'; // U+201D right double quotation mark
				else if (c == '\'') repl = '’'; // U+2019 right single quotation mark
				else if (c == '/')  repl = '⁄'; // U+2044 fraction slash
			}
			if (repl != '\0')
				sb.Append(repl);
		} else
			sb.Append(c);
	}
	if (sb.Length == 0)
		return "_";
	return changed ? sb.ToString() : text;
}

Solution 5 - C#

Here's a version of the accepted answer using Linq which uses Enumerable.Aggregate:

string fileName = "something";

Path.GetInvalidFileNameChars()
    .Aggregate(fileName, (current, c) => current.Replace(c, '_'));

Solution 6 - C#

Here's a slight twist on Diego's answer.

If you're not afraid of Unicode, you can retain a bit more fidelity by replacing the invalid characters with valid Unicode symbols that resemble them. Here's the code I used in a recent project involving lumber cutlists:

static string MakeValidFilename(string text) {
  text = text.Replace('\'', '’'); // U+2019 right single quotation mark
  text = text.Replace('"',  '”'); // U+201D right double quotation mark
  text = text.Replace('/', '⁄');  // U+2044 fraction slash
  foreach (char c in System.IO.Path.GetInvalidFileNameChars()) {
    text = text.Replace(c, '_');
  }
  return text;
}

This produces filenames like 1⁄2” spruce.txt instead of 1_2_ spruce.txt

Yes, it really works:

Explorer sample

Caveat Emptor

I knew this trick would work on NTFS but was surprised to find it also works on FAT and FAT32 partitions. That's because long filenames are stored in Unicode, even as far back as Windows 95/NT. I tested on Win7, XP, and even a Linux-based router and they showed up OK. Can't say the same for inside a DOSBox.

That said, before you go nuts with this, consider whether you really need the extra fidelity. The Unicode look-alikes could confuse people or old programs, e.g. older OS's relying on codepages.

Solution 7 - C#

Diego does have the correct solution but there is one very small mistake in there. The version of string.Replace being used should be string.Replace(char, char), there isn't a string.Replace(char, string)

I can't edit the answer or I would have just made the minor change.

So it should be:

string fileName = "something";
foreach (char c in System.IO.Path.GetInvalidFileNameChars())
{
   fileName = fileName.Replace(c, '_');
}

Solution 8 - C#

A simple one line code:

var validFileName = Path.GetInvalidFileNameChars().Aggregate(fileName, (f, c) => f.Replace(c, '_'));

You can wrap it in an extension method if you want to reuse it.

public static string ToValidFileName(this string fileName) => Path.GetInvalidFileNameChars().Aggregate(fileName, (f, c) => f.Replace(c, '_'));

Solution 9 - C#

Here's a version that uses StringBuilder and IndexOfAny with bulk append for full efficiency. It also returns the original string rather than create a duplicate string.

Last but not least, it has a switch statement that returns look-alike characters which you can customize any way you wish. Check out Unicode.org's confusables lookup to see what options you might have, depending on the font.

public static string GetSafeFilename(string arbitraryString)
{
    var invalidChars = System.IO.Path.GetInvalidFileNameChars();
    var replaceIndex = arbitraryString.IndexOfAny(invalidChars, 0);
    if (replaceIndex == -1) return arbitraryString;
        
    var r = new StringBuilder();
    var i = 0;

    do
    {
        r.Append(arbitraryString, i, replaceIndex - i);

        switch (arbitraryString[replaceIndex])
        {
            case '"':
                r.Append("''");
                break;
            case '<':
                r.Append('\u02c2'); // '˂' (modifier letter left arrowhead)
                break;
            case '>':
                r.Append('\u02c3'); // '˃' (modifier letter right arrowhead)
                break;
            case '|':
                r.Append('\u2223'); // '∣' (divides)
                break;
            case ':':
                r.Append('-');
                break;
            case '*':
                r.Append('\u2217'); // '∗' (asterisk operator)
                break;
            case '\\':
            case '/':
                r.Append('\u2044'); // '⁄' (fraction slash)
                break;
            case '\0':
            case '\f':
            case '?':
                break;
            case '\t':
            case '\n':
            case '\r':
            case '\v':
                r.Append(' ');
                break;
            default:
                r.Append('_');
                break;
        }

        i = replaceIndex + 1;
        replaceIndex = arbitraryString.IndexOfAny(invalidChars, i);
    } while (replaceIndex != -1);

    r.Append(arbitraryString, i, arbitraryString.Length - i);

    return r.ToString();
}

It doesn't check for ., .., or reserved names like CON because it isn't clear what the replacement should be.

Solution 10 - C#

Another simple solution:

private string MakeValidFileName(string original, char replacementChar = '_')
{
  var invalidChars = new HashSet<char>(Path.GetInvalidFileNameChars());
  return new string(original.Select(c => invalidChars.Contains(c) ? replacementChar : c).ToArray());
}

Solution 11 - C#

Cleaning a little my code and making a little refactoring... I created an extension for string type:

public static string ToValidFileName(this string s, char replaceChar = '_', char[] includeChars = null)
{
  var invalid = Path.GetInvalidFileNameChars();
  if (includeChars != null) invalid = invalid.Union(includeChars).ToArray();
  return string.Join(string.Empty, s.ToCharArray().Select(o => o.In(invalid) ? replaceChar : o));
}
       

Now it's easier to use with:

var name = "Any string you want using ? / \ or even +.zip";
var validFileName = name.ToValidFileName();

If you want to replace with a different char than "_" you can use:

var validFileName = name.ToValidFileName(replaceChar:'#');

And you can add chars to replace.. for example you dont want spaces or commas:

var validFileName = name.ToValidFileName(includeChars: new [] { ' ', ',' });

Hope it helps...

Cheers

Solution 12 - C#

I needed a system that couldn't create collisions so I couldn't map multiple characters to one. I ended up with:

public static class Extension
{
    /// <summary>
    /// Characters allowed in a file name. Note that curly braces don't show up here
    /// becausee they are used for escaping invalid characters.
    /// </summary>
    private static readonly HashSet<char> CleanFileNameChars = new HashSet<char>
    {
        ' ', '!', '#', '$', '%', '&', '\'', '(', ')', '+', ',', '-', '.',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '=', '@',
        'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
        'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
        '[', ']', '^', '_', '`',
        'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
        'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
    };

    /// <summary>
    /// Creates a clean file name from one that may contain invalid characters in 
    /// a way that will not collide.
    /// </summary>
    /// <param name="dirtyFileName">
    /// The file name that may contain invalid filename characters.
    /// </param>
    /// <returns>
    /// A file name that does not contain invalid filename characters.
    /// </returns>
    /// <remarks>
    /// <para>
    /// Escapes invalid characters by converting their ASCII values to hexadecimal
    /// and wrapping that value in curly braces. Curly braces are escaped by doubling
    /// them, for example '{' => "{{".
    /// </para>
    /// <para>
    /// Note that although NTFS allows unicode characters in file names, this
    /// method does not.
    /// </para>
    /// </remarks>
    public static string CleanFileName(this string dirtyFileName)
    {
        string EscapeHexString(char c) =>
            "{" + (c > 255 ? $"{(uint)c:X4}" : $"{(uint)c:X2}") + "}";

        return string.Join(string.Empty,
                           dirtyFileName.Select(
                               c =>
                                   c == '{' ? "{{" :
                                   c == '}' ? "}}" :
                                   CleanFileNameChars.Contains(c) ? $"{c}" :
                                   EscapeHexString(c)));
    }
}

Solution 13 - C#

I needed to do this today... in my case, I needed to concatenate a customer name with the date and time for a final .kmz file. My final solution was this:

 string name = "Whatever name with valid/invalid chars";
 char[] invalid = System.IO.Path.GetInvalidFileNameChars();
 string validFileName = string.Join(string.Empty,
                            string.Format("{0}.{1:G}.kmz", name, DateTime.Now)
                            .ToCharArray().Select(o => o.In(invalid) ? '_' : o));

You can even make it replace spaces if you add the space char to the invalid array.

Maybe it's not the fastest, but as performance wasn't an issue, I found it elegant and understandable.

Cheers!

Solution 14 - C#

There are no valid answers in this topic yet. Author said: "...I want to use as a filename...". Remove/replace invalid characters is not enough to use something as filename. You should at least check that:

  1. You don't already have file with such name in a folder, where you want to create a new one
  2. Total path to file (path to folder + filename + extension) is not more than MAX_PATH (260 symbols). Yes, there are tricks to hack this on latest Windows, but if you want your app to work fine - you should check it
  3. You don't use any special filenames (see answer by @Phil Price)

Probably the best way would be to:

  1. Remove bad characters using one of the other answers here.
  2. Make sure total path is less than 260 characters (if not - remove last N chars)
  3. Make sure file with given filename doesn't exist (if it does - replace last N chars until find available filename)
  4. Make sure you don't use any reserved filenames (if you do - replace last N chars until find proper and available filename)

As always, things are more complicated, then they look. Better to use some already existing function, like GetTempFileNameW

Solution 15 - C#

Still another solution I am using for the last ~10 years, very similar to previous solutions, without the 'fancy' parts: The main method gets the specialcharacters as input, since I was using it also for other purposes, e.g. getting web compatible names, especially back then when renaming files for SharePoint/OneDrive

Not sure how much it improves the speed, but also chose to check the filename for any special characters BEFORE using the StringBuilder with IndexOfAny().

private static string SanitizeFilename(this string filename) 
   => filename.RemoveOrReplaceSpecialCharacters(Path.GetInvalidFileNameChars(), '_');

private static string RemoveOrReplaceSpecialCharacters(this string str, char[] specialCharacters, char? replaceChar)
{
    if (string.IsNullOrEmpty(str))
        return str;
    if (specialCharacters == null || specialCharacters.Length == 0)
        return str;

    if (str.IndexOfAny(specialCharacters) == 0)
        return str;

    var sb = new StringBuilder(str.Length);
    foreach (char c in str)
    {
        if (!specialCharacters.Contains(c))
            sb.Append(c);
        else if (replaceChar.HasValue)
            sb.Append(replaceChar.Value);
    }
    return sb.ToString();         
}

I tried also

return new string(str.Except(specialCharacters).ToArray());

but it created strange behavior, where duplicate are ignored and further issue. For instance, "Bla-ID" became "BlaI" when specifying - as single special char.

Solution 16 - C#

You can do this with a sed command:

 sed -e "
 s/[?()\[\]=+<>:;©®”,*|]/_/g
 s/"$'\t'"/ /g
 s/–/-/g
 s/\"/_/g
 s/[[:cntrl:]]/_/g"

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionKenView Question on Stackoverflow
Solution 1 - C#Diego JancicView Answer on Stackoverflow
Solution 2 - C#Phil PriceView Answer on Stackoverflow
Solution 3 - C#Joseph GabrielView Answer on Stackoverflow
Solution 4 - C#QwertieView Answer on Stackoverflow
Solution 5 - C#DavidGView Answer on Stackoverflow
Solution 6 - C#rkagererView Answer on Stackoverflow
Solution 7 - C#leggetterView Answer on Stackoverflow
Solution 8 - C#Moch YusupView Answer on Stackoverflow
Solution 9 - C#jnm2View Answer on Stackoverflow
Solution 10 - C#GDemartiniView Answer on Stackoverflow
Solution 11 - C#Joan VilariñoView Answer on Stackoverflow
Solution 12 - C#mheymanView Answer on Stackoverflow
Solution 13 - C#Joan VilariñoView Answer on Stackoverflow
Solution 14 - C#EzhView Answer on Stackoverflow
Solution 15 - C#EricBDevView Answer on Stackoverflow
Solution 16 - C#D WView Answer on Stackoverflow