Is Java Regex Thread Safe?

JavaRegexMultithreading

Java Problem Overview


I have a function that uses Pattern#compile and a Matcher to search a list of strings for a pattern.

This function is used in multiple threads. Each thread will have a unique pattern passed to the Pattern#compile when the thread is created. The number of threads and patterns are dynamic, meaning that I can add more Patterns and threads during configuration.

Do I need to put a synchronize on this function if it uses regex? Is regex in java thread safe?

Java Solutions


Solution 1 - Java

Yes, from the Java API documentation for the Pattern class

Instances of this (Pattern) class are immutable and are safe for use by multiple concurrent threads. Instances of the Matcher class are not safe for such use.

If you are looking at performance centric code, attempt to reset the Matcher instance using the reset() method, instead of creating new instances. This would reset the state of the Matcher instance, making it usable for the next regex operation. In fact, it is the state maintained in the Matcher instance that is responsible for it to be unsafe for concurrent access.

Solution 2 - Java

Thread-safety with regular expressions in Java

>SUMMARY: > > The Java regular expression API has > been designed to allow a single > compiled pattern to be shared across > multiple match operations. > > You can safely call > Pattern.matcher() on the same pattern from different threads and > safely use the matchers concurrently. > Pattern.matcher() is safe to construct matchers without > synchronization. Although the method > isn't synchronized, internal to the > Pattern class, a volatile variable > called compiled is always set after > constructing a pattern and read at the > start of the call to matcher(). > This forces any thread referring to > the Pattern to correctly "see" the > contents of that object. > > On the other hand, you shouldn't share > a Matcher between different threads. > Or at least, if you ever did, you > should use explicit synchronization.

Solution 3 - Java

While you need to remember that thread safety has to take into account the surrounding code as well, you appear to be in luck. The fact that Matchers are created using the Pattern's matcher factory method and lack public constructors is a positive sign. Likewise, you use the compile static method to create the encompassing Pattern.

So, in short, if you do something like the example:

Pattern p = Pattern.compile("a*b");
Matcher m = p.matcher("aaaaab");
boolean b = m.matches();

you should be doing pretty well.

Follow-up to the code example for clarity: note that this example strongly implies that the Matcher thus created is thread-local with the Pattern and the test. I.e., you should not expose the Matcher thus created to any other threads.

Frankly, that's the risk of any thread-safety question. The reality is that any code can be made thread-unsafe if you try hard enough. Fortunately, there are wonderful books that teach us a whole bunch of ways that we could ruin our code. If we stay away from those mistakes, we greatly reduce our own probability of threading problems.

Solution 4 - Java

A quick look at the code for Matcher.java shows a bunch of member variables including the text that is being matched, arrays for groups, a few indexes for maintain location and a few booleans for other state. This all points to a stateful Matcher that would not behave well if accessed by multiple Threads. So does the JavaDoc:

>

Instances of this class are not safe for use by multiple concurrent > threads.

This is only an issue if, as @Bob Cross points out, you go out of your way to allow use of your Matcher in separate Threads. If you need to do this, and you think that synchronization will be an issue for your code, an option you have is to use a ThreadLocal storage object to maintain a Matcher per working thread.

Solution 5 - Java

To sum up, you can reuse (keep in static variables) the compiled Pattern(s) and tell them to give you new Matchers when needed to validate those regex pattens against some string

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * Validation helpers
 */
public final class Validators {

private static final String EMAIL_PATTERN = "^[_A-Za-z0-9-]+(\\.[_A-Za-z0-9-]+)*@[A-Za-z0-9-]+(\\.[A-Za-z0-9-]+)*(\\.[A-Za-z]{2,})$";

private static Pattern email_pattern;
               
  static {
    email_pattern = Pattern.compile(EMAIL_PATTERN);
  }

  /**
   * Check if e-mail is valid
   */
  public static boolean isValidEmail(String email) { 
    Matcher matcher = email_pattern.matcher(email);
    return matcher.matches();
  }

}

see http://zoomicon.wordpress.com/2012/06/01/validating-e-mails-using-regular-expressions-in-java/ (near the end) regarding the RegEx pattern used above for validating e-mails (in case it doesn't fit ones needs for e-mail validation as it is posted here)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionjmqView Question on Stackoverflow
Solution 1 - JavaVineet ReynoldsView Answer on Stackoverflow
Solution 2 - JavaKV PrajapatiView Answer on Stackoverflow
Solution 3 - JavaBob CrossView Answer on Stackoverflow
Solution 4 - JavaakfView Answer on Stackoverflow
Solution 5 - JavaGeorge BirbilisView Answer on Stackoverflow