Difference between regex [A-z] and [a-zA-Z]
JavaRegexJava Problem Overview
I am using a regex to program an input validator for a text box where I only want alphabetical characters. I was wondering if [A-z]
and [a-zA-Z]
were equivalent or if there were differences performance wise.
I keep reading [a-zA-Z]
on my searches and no mention of [A-z]
.
I am using java's String.matches(regex)
.
Java Solutions
Solution 1 - Java
[A-z]
will match ASCII characters in the range from A
to z
, while [a-zA-Z]
will match ASCII characters in the range from A
to Z
and in the range from a
to z
. At first glance, this might seem equivalent -- however, if you look at this table of ASCII characters, you'll see that A-z
includes several other characters. Specifically, they are [
, \
, ]
, ^
, _
, and `
(which you clearly don't want).
Solution 2 - Java
When you take a look at the ASCII table, you will see following:
A = 65
Z = 90
a = 97
z = 122
So, [A-z]
will match every char from 65 to 122. This includes these characters (91 -> 96
) as well:
[\]^_`
This means [A-Za-z]
will match only the alphabet, without the extra characters above.
Solution 3 - Java
The a-z matchs 'a' to 'z' A-Z matchs 'A' to 'Z' A-z matches all these as well as the characters between the 'Z' and 'a' which are [ ] ^ / _ `
Refer to http://www.asciitable.com/
Solution 4 - Java
Take a look at ASCII table. You'll see that there are some characters between Z
and a
, so you will match more than you intented to.
Solution 5 - Java
The square brackets create a character class and the hyphen is a shorthand for adding every character between the two provided characters. i.e. [A-F]
can be written [ABCDEF]
.
The character class [A-z]
will match every character between those characters, which in ASCII includes some other characters such as '[', '' and ']'.
An alternative to specifying both cases would be to set the regular expression to be case-insensitive, by using the /i
modifier.
Solution 6 - Java
Take a look at the ASCII chart (which Java characters are based on): there are quite a few punctuation characters situated between Z and a, namely these:
[\]^ _`