What characters should be restricted from a Unix file name?
ValidationUnixFileValidation Problem Overview
Consider a Save As dialog with a free text entry where the user enters a file name as free text, then clicks a Save button. The software then validates the file name, and saves the file if the name is valid.
On a Unix file system, what rules should be applied in the validation such that:
- The name will not be difficult to manipulate later in terms of escaping special characters, etc.
- The rules are not so restrictive that saving a file becomes non-user-friendly.
So basically, what is the minimum set of characters that should be restricted from a Unix file name?
Validation Solutions
Solution 1 - Validation
The minimum are slash ('/') and NULL ('\0')
Solution 2 - Validation
Firstly, what you're describing is black listing. Your better option is to white list your characters, as it is easier (from a user perspective) to have characters inserted rather than taken away.
In terms of what would be good in a unix environment:
- a-z
- A-Z
- 0-9
- underscore (
_
) - dash (
-
) - period (
.
)
Should cover your basics. Spaces can be okay, but make things difficult. Windows users love them, unix/linux don't. So depending on your target audience choose accordingly.
Solution 3 - Validation
Although the accepted answer might have truth I think there's a benefit to having some restrictions that could be potentially annoying for scripting or other stuff:
- forward slash (/)
- backslash (\)
- NULL (\0)
- tick (`)
- starts with a dash (-)
- star (*)
- pipes (|)
- semicolon (;)
- quotations (" or ')
- colon (:)
( - maybe space though I'm reluctant to add that.)
As you can see you might just be better off whitelisting as @Gavin suggests...
Solution 4 - Validation
Often forgotten: the colon (:) is not a good idea, since it's commonly used in stuff like $PATH, i.e. the list of directories where executables are found "automatically". This can cause confusion with DOS/Windows directory names, where of course the colon is used in drive names.
Solution 5 - Validation
Do not forget that you can add a dot (.
) at the beginning to hide files and folders... Otherwise, I'd follow a *NIX name convention (from Wikipedia):
Most UNIX file systems
- Case handling: case-sensitive case-preservation
- Allowed character set: any.
- Reserved characters:
/
,null
. - Max length: 255.
- Notes: A leading . indicates that ls and file managers will not by default show the file
Solution 6 - Validation
Encode FTW
As Bombe points out in their answer, restricting user input is at least frustrating if not downright annoying. Though, as developers we should assume that every interaction with our code is malicious and treat them as such.
To solve both problems in a practical application, rather than white-or-black-listing certain characters, we should simply not use the user input as the file name.
Instead, use a safe name (hex chars [a-f0-9]
only for ultimate safety) of our own devising, either encoded from the user input (e.g. PHP's bin2hex), or a randomly generated ID (e.g. PHP's uniqid) which is then mapped by some method (take your pick) to the user input.
Encoding/decoding can be done on the fly with no reliance on mapping, so is practically ideal. The user never needs to know what the file is really called; as long as they can get/set the file, and it appears to be called what they wanted, everyone's a winner.
By this methodology, the user can call their file whatever they like, hackers will be the only people frustrated, and your file system will love you :-)
Solution 7 - Validation
Let the user enter whatever name he wants. Artificially restricting the range of characters will only annoy the users and serve no real purpose.