List of all special characters that need to be escaped in a regex

JavaRegex

Java Problem Overview


I am trying to create an application that matches a message template with a message that a user is trying to send. I am using Java regex for matching the message. The template/message may contain special characters.

How would I get the complete list of special characters that need to be escaped in order for my regex to work and match in the maximum possible cases?

Is there a universal solution for escaping all special characters in Java regex?

Java Solutions


Solution 1 - Java

You can look at the javadoc of the Pattern class: http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html

You need to escape any char listed there if you want the regular char and not the special meaning.

As a maybe simpler solution, you can put the template between \Q and \E - everything between them is considered as escaped.

Solution 2 - Java

  • Java characters that have to be escaped in regular expressions are:
    \.[]{}()<>*+-=!?^$|
  • Two of the closing brackets (] and }) are only need to be escaped after opening the same type of bracket.
  • In []-brackets some characters (like + and -) do sometimes work without escape.

Solution 3 - Java

To escape you could just use this from Java 1.5:

Pattern.quote("$test");

You will match exacty the word $test

Solution 4 - Java

According to the String Literals / Metacharacters documentation page, they are:

<([{\^-=$!|]})?*+.>

Also it would be cool to have that list refereed somewhere in code, but I don't know where that could be...

Solution 5 - Java

Combining what everyone said, I propose the following, to keep the list of characters special to RegExp clearly listed in their own String, and to avoid having to try to visually parse thousands of "\\"'s. This seems to work pretty well for me:

final String regExSpecialChars = "<([{\\^-=$!|]})?*+.>";
final String regExSpecialCharsRE = regExSpecialChars.replaceAll( ".", "\\\\$0");
final Pattern reCharsREP = Pattern.compile( "[" + regExSpecialCharsRE + "]");

String quoteRegExSpecialChars( String s)
{
    Matcher m = reCharsREP.matcher( s);
    return m.replaceAll( "\\\\$0");
}

Solution 6 - Java

although the answer is for Java, but the code can be easily adapted from this Kotlin String extension I came up with (adapted from that @brcolow provided):

private val escapeChars = charArrayOf(
    '<',
    '(',
    '[',
    '{',
    '\\',
    '^',
    '-',
    '=',
    '$',
    '!',
    '|',
    ']',
    '}',
    ')',
    '?',
    '*',
    '+',
    '.',
    '>'
)

fun String.escapePattern(): String {
    return this.fold("") {
      acc, chr ->
        acc + if (escapeChars.contains(chr)) "\\$chr" else "$chr"
    }
}

fun main() {
    println("(.*)".escapePattern())
}

prints \(\.\*\)

check it in action here https://pl.kotl.in/h-3mXZkNE

Solution 7 - Java

On @Sorin's suggestion of the Java Pattern docs, it looks like chars to escape are at least:

\.[{(*+?^$|

Solution 8 - Java

The Pattern.quote(String s) sort of does what you want. However it leaves a little left to be desired; it doesn't actually escape the individual characters, just wraps the string with \Q...\E.

There is not a method that does exactly what you are looking for, but the good news is that it is actually fairly simple to escape all of the special characters in a Java regular expression:

regex.replaceAll("[\\W]", "\\\\$0")

Why does this work? Well, the documentation for Pattern specifically says that its permissible to escape non-alphabetic characters that don't necessarily have to be escaped:

> It is an error to use a backslash prior to any alphabetic character that does not denote an escaped construct; these are reserved for future extensions to the regular-expression language. A backslash may be used prior to a non-alphabetic character regardless of whether that character is part of an unescaped construct.

For example, ; is not a special character in a regular expression. However, if you escape it, Pattern will still interpret \; as ;. Here are a few more examples:

  • > becomes \> which is equivalent to >
  • [ becomes \[ which is the escaped form of [
  • 8 is still 8.
  • \) becomes \\\) which is the escaped forms of \ and ( concatenated.

Note: The key is is the definition of "non-alphabetic", which in the documentation really means "non-word" characters, or characters outside the character set [a-zA-Z_0-9].

Solution 9 - Java

on the other side of the coin, you should use "non-char" regex that looks like this if special characters = allChars - number - ABC - space in your app context.

String regepx = "[^\\s\\w]*";

Solution 10 - Java

Assuming that you have and trust (to be authoritative) the list of escape characters Java regex uses (would be nice if these characters were exposed in some Pattern class member) you can use the following method to escape the character if it is indeed necessary:

private static final char[] escapeChars = { '<', '(', '[', '{', '\\', '^', '-', '=', '$', '!', '|', ']', '}', ')', '?', '*', '+', '.', '>' };

private static String regexEscape(char character) {
    for (char escapeChar : escapeChars) {
        if (character == escapeChar) {
            return "\\" + character;
        }
    }
    return String.valueOf(character);
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAvinash NairView Question on Stackoverflow
Solution 1 - JavaSorinView Answer on Stackoverflow
Solution 2 - JavaTobi G.View Answer on Stackoverflow
Solution 3 - JavamadxView Answer on Stackoverflow
Solution 4 - JavaBohdanView Answer on Stackoverflow
Solution 5 - JavaNeuroDuckView Answer on Stackoverflow
Solution 6 - JavapocesarView Answer on Stackoverflow
Solution 7 - Javauser3033429View Answer on Stackoverflow
Solution 8 - JavawheelerView Answer on Stackoverflow
Solution 9 - JavaBo6BearView Answer on Stackoverflow
Solution 10 - JavabrcolowView Answer on Stackoverflow