What characters should be restricted from a Unix file name?

ValidationUnixFile

Validation Problem Overview


Consider a Save As dialog with a free text entry where the user enters a file name as free text, then clicks a Save button. The software then validates the file name, and saves the file if the name is valid.

On a Unix file system, what rules should be applied in the validation such that:

  • The name will not be difficult to manipulate later in terms of escaping special characters, etc.
  • The rules are not so restrictive that saving a file becomes non-user-friendly.

So basically, what is the minimum set of characters that should be restricted from a Unix file name?

Validation Solutions


Solution 1 - Validation

The minimum are slash ('/') and NULL ('\0')

Solution 2 - Validation

Firstly, what you're describing is black listing. Your better option is to white list your characters, as it is easier (from a user perspective) to have characters inserted rather than taken away.

In terms of what would be good in a unix environment:

  • a-z
  • A-Z
  • 0-9
  • underscore (_)
  • dash (-)
  • period (.)

Should cover your basics. Spaces can be okay, but make things difficult. Windows users love them, unix/linux don't. So depending on your target audience choose accordingly.

Solution 3 - Validation

Although the accepted answer might have truth I think there's a benefit to having some restrictions that could be potentially annoying for scripting or other stuff:

  • forward slash (/)
  • backslash (\)
  • NULL (\0)
  • tick (`)
  • starts with a dash (-)
  • star (*)
  • pipes (|)
  • semicolon (;)
  • quotations (" or ')
  • colon (:)

( - maybe space though I'm reluctant to add that.)

As you can see you might just be better off whitelisting as @Gavin suggests...

Solution 4 - Validation

Often forgotten: the colon (:) is not a good idea, since it's commonly used in stuff like $PATH, i.e. the list of directories where executables are found "automatically". This can cause confusion with DOS/Windows directory names, where of course the colon is used in drive names.

Solution 5 - Validation

Do not forget that you can add a dot (.) at the beginning to hide files and folders... Otherwise, I'd follow a *NIX name convention (from Wikipedia):

Most UNIX file systems

  • Case handling: case-sensitive case-preservation
  • Allowed character set: any.
  • Reserved characters: /, null.
  • Max length: 255.
  • Notes: A leading . indicates that ls and file managers will not by default show the file

Link to wikipedia article about file names

Solution 6 - Validation

Encode FTW

As Bombe points out in their answer, restricting user input is at least frustrating if not downright annoying. Though, as developers we should assume that every interaction with our code is malicious and treat them as such.

To solve both problems in a practical application, rather than white-or-black-listing certain characters, we should simply not use the user input as the file name.

Instead, use a safe name (hex chars [a-f0-9] only for ultimate safety) of our own devising, either encoded from the user input (e.g. PHP's bin2hex), or a randomly generated ID (e.g. PHP's uniqid) which is then mapped by some method (take your pick) to the user input.

Encoding/decoding can be done on the fly with no reliance on mapping, so is practically ideal. The user never needs to know what the file is really called; as long as they can get/set the file, and it appears to be called what they wanted, everyone's a winner.

By this methodology, the user can call their file whatever they like, hackers will be the only people frustrated, and your file system will love you :-)

Solution 7 - Validation

Let the user enter whatever name he wants. Artificially restricting the range of characters will only annoy the users and serve no real purpose.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionbarrymcView Question on Stackoverflow
Solution 1 - ValidationmouvicielView Answer on Stackoverflow
Solution 2 - ValidationGavin MillerView Answer on Stackoverflow
Solution 3 - ValidationThinkBonoboView Answer on Stackoverflow
Solution 4 - ValidationunwindView Answer on Stackoverflow
Solution 5 - ValidationTobias WärreView Answer on Stackoverflow
Solution 6 - ValidationFred GandtView Answer on Stackoverflow
Solution 7 - ValidationBombeView Answer on Stackoverflow