Regular expression for letters, numbers and - _

Regex

Regex Problem Overview


I'm having trouble checking in PHP if a value is is any of the following combinations

  • letters (upper or lowercase)
  • numbers (0-9)
  • underscore (_)
  • dash (-)
  • point (.)
  • no spaces! or other characters

a few examples:

  • OK: "screen123.css"
  • OK: "screen-new-file.css"
  • OK: "screen_new.js"
  • NOT OK: "screen new file.css"

I guess I need a regex for this, since I need to throw an error when a give string has other characters in it than the ones mentioned above.

Regex Solutions


Solution 1 - Regex

The pattern you want is something like (see it on rubular.com):

^[a-zA-Z0-9_.-]*$

Explanation:

  • ^ is the beginning of the line anchor
  • $ is the end of the line anchor
  • [...] is a character class definition
  • * is "zero-or-more" repetition

Note that the literal dash - is the last character in the character class definition, otherwise it has a different meaning (i.e. range). The . also has a different meaning outside character class definitions, but inside, it's just a literal .

References

In PHP

Here's a snippet to show how you can use this pattern:

<?php
 
$arr = array(
  'screen123.css',
  'screen-new-file.css',
  'screen_new.js',
  'screen new file.css'
);
 
foreach ($arr as $s) {
  if (preg_match('/^[\w.-]*$/', $s)) {
    print "$s is a match\n";
  } else {
    print "$s is NO match!!!\n";
  };
}
 
?>

The above prints (as seen on ideone.com):

screen123.css is a match
screen-new-file.css is a match
screen_new.js is a match
screen new file.css is NO match!!!

Note that the pattern is slightly different, using \w instead. This is the character class for "word character".

API references

Note on specification

This seems to follow your specification, but note that this will match things like ....., etc, which may or may not be what you desire. If you can be more specific what pattern you want to match, the regex will be slightly more complicated.

The above regex also matches the empty string. If you need at least one character, then use + (one-or-more) instead of * (zero-or-more) for repetition.

In any case, you can further clarify your specification (always helps when asking regex question), but hopefully you can also learn how to write the pattern yourself given the above information.

Solution 2 - Regex

you can use

^[\w.-]+$

the + is to make sure it has at least 1 character. Need the ^ and $ to denote the begin and end, otherwise if the string has a match in the middle, such as @@@@xyz%%%% then it is still a match.

\w already includes alphabets (upper and lower case), numbers, and underscore. So the rest ., -, are just put into the "class" to match. The + means 1 occurrence or more.

P.S. thanks for the note in the comment about preventing - to denote a range.

Solution 3 - Regex

This is the pattern you are looking for

/^[\w-_.]*$/

What this means:

  • ^ Start of string
  • [...] Match characters inside
  • \w Any word character so 0-9 a-z A-Z
  • -_. Match - and _ and .
  • * Zero or more of pattern or unlimited
  • $ End of string

If you want to limit the amount of characters:

/^[\w-_.]{0,5}$/

{0,5} Means 0-5 characters

Solution 4 - Regex

To actually cover your pattern, i.e, valid file names according to your rules, I think that you need a little more. Note this doesn't match legal file names from a system perspective. That would be system dependent and more liberal in what it accepts. This is intended to match your acceptable patterns.

^([a-zA-Z0-9]+[_-])*[a-zA-Z0-9]+\.[a-zA-Z0-9]+$

Explanation:

  • ^ Match the start of a string. This (plus the end match) forces the string to conform to the exact expression, not merely contain a substring matching the expression.
  • ([a-zA-Z0-9]+[_-])* Zero or more occurrences of one or more letters or numbers followed by an underscore or dash. This causes all names that contain a dash or underscore to have letters or numbers between them.
  • [a-zA-Z0-9]+ One or more letters or numbers. This covers all names that do not contain an underscore or a dash.
  • \. A literal period (dot). Forces the file name to have an extension and, by exclusion from the rest of the pattern, only allow the period to be used between the name and the extension. If you want more than one extension that could be handled as well using the same technique as for the dash/underscore, just at the end.
  • [a-zA-Z0-9]+ One or more letters or numbers. The extension must be at least one character long and must contain only letters and numbers. This is typical, but if you wanted allow underscores, that could be addressed as well. You could also supply a length range {2,3} instead of the one or more + matcher, if that were more appropriate.
  • $ Match the end of the string. See the starting character.

Solution 5 - Regex

Something like this should work

$code = "screen new file.css";
if (!preg_match("/^[-_a-zA-Z0-9.]+$/", $code))
{
	echo "not valid";
}

This will echo "not valid"

Solution 6 - Regex

[A-Za-z0-9_.-]*

This will also match for empty strings, if you do not want that exchange the last * for an +

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJorreView Question on Stackoverflow
Solution 1 - RegexpolygenelubricantsView Answer on Stackoverflow
Solution 2 - RegexnonopolarityView Answer on Stackoverflow
Solution 3 - RegexFletcher RipponView Answer on Stackoverflow
Solution 4 - RegextvanfossonView Answer on Stackoverflow
Solution 5 - RegexTomView Answer on Stackoverflow
Solution 6 - RegexMad ScientistView Answer on Stackoverflow