Is there a way of making strings file-path safe in c#?

C#.NetStringFilepath

C# Problem Overview


My program will take arbitrary strings from the internet and use them for file names. Is there a simple way to remove the bad characters from these strings or do I need to write a custom function for this?

C# Solutions


Solution 1 - C#

Ugh, I hate it when people try to guess at which characters are valid. Besides being completely non-portable (always thinking about Mono), both of the earlier comments missed more 25 invalid characters.

foreach (var c in Path.GetInvalidFileNameChars()) 
{ 
  fileName = fileName.Replace(c, '-'); 
}

Or in VB:

'Clean just a filename
Dim filename As String = "salmnas dlajhdla kjha;dmas'lkasn"
For Each c In IO.Path.GetInvalidFileNameChars
	filename = filename.Replace(c, "")
Next

'See also IO.Path.GetInvalidPathChars

Solution 2 - C#

To strip invalid characters:

static readonly char[] invalidFileNameChars = Path.GetInvalidFileNameChars();

// Builds a string out of valid chars
var validFilename = new string(filename.Where(ch => !invalidFileNameChars.Contains(ch)).ToArray());

To replace invalid characters:

static readonly char[] invalidFileNameChars = Path.GetInvalidFileNameChars();

// Builds a string out of valid chars and an _ for invalid ones
var validFilename = new string(filename.Select(ch => invalidFileNameChars.Contains(ch) ? '_' : ch).ToArray());

To replace invalid characters (and avoid potential name conflict like Hell* vs Hell$):

static readonly IList<char> invalidFileNameChars = Path.GetInvalidFileNameChars();

// Builds a string out of valid chars and replaces invalid chars with a unique letter (Moves the Char into the letter range of unicode, starting at "A")
var validFilename = new string(filename.Select(ch => invalidFileNameChars.Contains(ch) ? Convert.ToChar(invalidFileNameChars.IndexOf(ch) + 65) : ch).ToArray());

Solution 3 - C#

This question has been asked https://stackoverflow.com/questions/1862993">many</a> https://stackoverflow.com/questions/1976007">times</a> https://stackoverflow.com/questions/62771">before</a> and, as pointed out many times before, IO.Path.GetInvalidFileNameChars is not adequate.

First, there are many names like PRN and CON that are reserved and not allowed for filenames. There are other names not allowed only at the root folder. Names that end in a period are also not allowed.

Second, there are a variety of length limitations. Read the full list for NTFS http://msdn.microsoft.com/en-us/library/aa365247%28VS.85%29.aspx">here</a>;.

Third, you can attach to filesystems that have other limitations. For example, ISO 9660 filenames cannot start with "-" but can contain it.

Fourth, what do you do if two processes "arbitrarily" pick the same name?

In general, using externally-generated names for file names is a bad idea. I suggest generating your own private file names and storing human-readable names internally.

Solution 4 - C#

I agree with Grauenwolf and would highly recommend the Path.GetInvalidFileNameChars()

Here's my C# contribution:

string file = @"38?/.\}[+=n a882 a.a*/|n^%$ ad#(-))";
Array.ForEach(Path.GetInvalidFileNameChars(), 
      c => file = file.Replace(c.ToString(), String.Empty));

p.s. -- this is more cryptic than it should be -- I was trying to be concise.

Solution 5 - C#

Here's my version:

static string GetSafeFileName(string name, char replace = '_') {
  char[] invalids = Path.GetInvalidFileNameChars();
  return new string(name.Select(c => invalids.Contains(c) ? replace : c).ToArray());
}

I'm not sure how the result of GetInvalidFileNameChars is calculated, but the "Get" suggests it's non-trivial, so I cache the results. Further, this only traverses the input string once instead of multiple times, like the solutions above that iterate over the set of invalid chars, replacing them in the source string one at a time. Also, I like the Where-based solutions, but I prefer to replace invalid chars instead of removing them. Finally, my replacement is exactly one character to avoid converting characters to strings as I iterate over the string.

I say all that w/o doing the profiling -- this one just "felt" nice to me. : )

Solution 6 - C#

Here's the function that I am using now (thanks jcollum for the C# example):

public static string MakeSafeFilename(string filename, char replaceChar)
{
    foreach (char c in System.IO.Path.GetInvalidFileNameChars())
    {
        filename = filename.Replace(c, replaceChar);
    }
    return filename;
}

I just put this in a "Helpers" class for convenience.

Solution 7 - C#

If you want to quickly strip out all special characters which is sometimes more user readable for file names this works nicely:

string myCrazyName = "q`w^e!r@t#y$u%i^o&p*a(s)d_f-g+h=j{k}l|z:x\"c<v>b?n[m]q\\w;e'r,t.y/u";
string safeName = Regex.Replace(
    myCrazyName,
    "\W",  /*Matches any nonword character. Equivalent to '[^A-Za-z0-9_]'*/
    "",
    RegexOptions.IgnoreCase);
// safeName == "qwertyuiopasd_fghjklzxcvbnmqwertyu"

Solution 8 - C#

static class Utils
{
	public static string MakeFileSystemSafe(this string s)
	{
		return new string(s.Where(IsFileSystemSafe).ToArray());
	}
	
	public static bool IsFileSystemSafe(char c)
	{
		return !Path.GetInvalidFileNameChars().Contains(c);
	}
}

Solution 9 - C#

Here's what I just added to ClipFlair's (http://github.com/Zoomicon/ClipFlair) StringExtensions static class (Utils.Silverlight project), based on info gathered from the links to related stackoverflow questions posted by Dour High Arch above:

public static string ReplaceInvalidFileNameChars(this string s, string replacement = "")
{
  return Regex.Replace(s,
    "[" + Regex.Escape(new String(System.IO.Path.GetInvalidPathChars())) + "]",
    replacement, //can even use a replacement string of any length
    RegexOptions.IgnoreCase);
    //not using System.IO.Path.InvalidPathChars (deprecated insecure API)
}

Solution 10 - C#

Why not convert the string to a Base64 equivalent like this:

string UnsafeFileName = "salmnas dlajhdla kjha;dmas'lkasn";
string SafeFileName = Convert.ToBase64String(Encoding.UTF8.GetBytes(UnsafeFileName));

If you want to convert it back so you can read it:

UnsafeFileName = Encoding.UTF8.GetString(Convert.FromBase64String(SafeFileName));

I used this to save PNG files with a unique name from a random description.

Solution 11 - C#

private void textBoxFileName_KeyPress(object sender, KeyPressEventArgs e)
{
   e.Handled = CheckFileNameSafeCharacters(e);
}

/// <summary>
/// This is a good function for making sure that a user who is naming a file uses proper characters
/// </summary>
/// <param name="e"></param>
/// <returns></returns>
internal static bool CheckFileNameSafeCharacters(System.Windows.Forms.KeyPressEventArgs e)
{
    if (e.KeyChar.Equals(24) || 
        e.KeyChar.Equals(3) || 
        e.KeyChar.Equals(22) || 
        e.KeyChar.Equals(26) || 
        e.KeyChar.Equals(25))//Control-X, C, V, Z and Y
            return false;
    if (e.KeyChar.Equals('\b'))//backspace
        return false;

    char[] charArray = Path.GetInvalidFileNameChars();
    if (charArray.Contains(e.KeyChar))
       return true;//Stop the character from being entered into the control since it is non-numerical
    else
        return false;            
}

Solution 12 - C#

From my older projects, I've found this solution, which has been working perfectly over 2 years. I'm replacing illegal chars with "!", and then check for double !!'s, use your own char.

    public string GetSafeFilename(string filename)
    {
        string res = string.Join("!", filename.Split(Path.GetInvalidFileNameChars()));

        while (res.IndexOf("!!") >= 0)
            res = res.Replace("!!", "!");

        return res;
    }

Solution 13 - C#

I find using this to be quick and easy to understand:

<Extension()>
Public Function MakeSafeFileName(FileName As String) As String
	Return FileName.Where(Function(x) Not IO.Path.GetInvalidFileNameChars.Contains(x)).ToArray
End Function

This works because a string is IEnumerable as a char array and there is a string constructor string that takes a char array.

Solution 14 - C#

Many anwer suggest to use Path.GetInvalidFileNameChars() which seems like a bad solution to me. I encourage you to use whitelisting instead of blacklisting because hackers will always find a way eventually to bypass it.

Here is an example of code you could use :

    string whitelist = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.";
    foreach (char c in filename)
    {
        if (!whitelist.Contains(c))
        {
            filename = filename.Replace(c, '-');
        }
    }

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMartin DomsView Question on Stackoverflow
Solution 1 - C#Jonathan AllenView Answer on Stackoverflow
Solution 2 - C#SquirrelView Answer on Stackoverflow
Solution 3 - C#Dour High ArchView Answer on Stackoverflow
Solution 4 - C#Aaron WagnerView Answer on Stackoverflow
Solution 5 - C#csellsView Answer on Stackoverflow
Solution 6 - C#sidewinderguyView Answer on Stackoverflow
Solution 7 - C#KeithView Answer on Stackoverflow
Solution 8 - C#Ronnie OverbyView Answer on Stackoverflow
Solution 9 - C#George BirbilisView Answer on Stackoverflow
Solution 10 - C#Bart VanseerView Answer on Stackoverflow
Solution 11 - C#ecklerpaView Answer on Stackoverflow
Solution 12 - C#Roni ToviView Answer on Stackoverflow
Solution 13 - C#cjbarthView Answer on Stackoverflow
Solution 14 - C#AnonBirdView Answer on Stackoverflow