Regular expression to find two strings anywhere in input

RegexString

Regex Problem Overview


How do I write a regular expression to match two given strings, at any position in the string?

For example, if I am searching for cat and mat, it should match:

The cat slept on the mat in front of the fire.
At 5:00 pm, I found the cat scratching the wool off the mat.

No matter what precedes these strings.

Regex Solutions


Solution 1 - Regex

/^.*?\bcat\b.*?\bmat\b.*?$/m

Using the m modifier (which ensures the beginning/end metacharacters match on line breaks rather than at the very beginning and end of the string):

  • ^ matches the line beginning
  • .*? matches anything on the line before...
  • \b matches a word boundary the first occurrence of a word boundary (as @codaddict discussed)
  • then the string cat and another word boundary; note that underscores are treated as "word" characters, so _cat_ would not match*;
  • .*?: any characters before...
  • boundary, mat, boundary
  • .*?: any remaining characters before...
  • $: the end of the line.

It's important to use \b to ensure the specified words aren't part of longer words, and it's important to use non-greedy wildcards (.*?) versus greedy (.*) because the latter would fail on strings like "There is a cat on top of the mat which is under the cat." (It would match the last occurrence of "cat" rather than the first.)

* If you want to be able to match _cat_, you can use:

/^.*?(?:\b|_)cat(?:\b|_).*?(?:\b|_)mat(?:\b|_).*?$/m

which matches either underscores or word boundaries around the specified words. (?:) indicates a non-capturing group, which can help with performance or avoid conflicted captures.

Edit: A question was raised in the comments about whether the solution would work for phrases rather than just words. The answer is, absolutely yes. The following would match "A line which includes both the first phrase and the second phrase":

/^.*?(?:\b|_)first phrase here(?:\b|_).*?(?:\b|_)second phrase here(?:\b|_).*?$/m

Edit 2: If order doesn't matter you can use:

/^.*?(?:\b|_)(first(?:\b|_).*?(?:\b|_)second|second(?:\b|_).*?(?:\b|_)first)(?:\b|_).*?$/m

And if performance is really an issue here, it's possible lookaround (if your regex engine supports it) might (but probably won't) perform better than the above, but I'll leave both the arguably more complex lookaround version and performance testing as an exercise to the questioner/reader.

Edited per @Alan Moore's comment. I didn't have a chance to test it, but I'll take your word for it.

Solution 2 - Regex

(.* word1.* word2.* )|(.* word2.* word1.*)

Solution 3 - Regex

If you absolutely need to only use one regex then

/(?=.*?(string1))(?=.*?(string2))/is

i modifier = case-insensitive

.*? Lazy evaluation for any character (matches as few as possible)

?= for Positive LookAhead it has to match somewhere

s modifier = .(period) also accepts line breaks

Solution 4 - Regex

You can try:

\bcat\b.*\bmat\b

\b is an anchor and matches a word boundary. It will look for words cat and mat anywhere in the string with mat following cat. It will not match:

Therez caterpillar on the mat.

but will match

The cat slept on the mat in front of the fire

If you want to match strings which have letters cat followed by mat, you can try:

cat.*mat

This will match both the above example strings.

Solution 5 - Regex

This is fairly easy on processing power required:

(string1(.|\n)*string2)|(string2(.|\n)*string1)

I used this in visual studio 2013 to find all files that had both string 1 and 2 in it.

Solution 6 - Regex

you don't have to use regex. In your favourite language, split on spaces, go over the splitted words, check for cat and mat. eg in Python

>>> for line in open("file"):
...     g=0;f=0
...     s = line.split()
...     for item in s:
...         if item =="cat": f=1
...         if item =="mat": g=1
...     if (g,f)==(1,1): print "found: " ,line.rstrip()
            
found:  The cat slept on the mat in front of the fire.
found:  At 5:00 pm, I found the cat scratching the wool off the mat.

Solution 7 - Regex

This works for searching files that contain both String1 and String2

(((.|\n)*)String1((.|\n)*)String2)|(((.|\n)*)String2((.|\n)*)String1)

Match any number of characters or line fields followed by String1 followed by any number of characters or line fields followed by String2 OR Match any number of characters or line fields followed by String2 followed by any number of characters or line fields followed by String1

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionPhanindra KView Question on Stackoverflow
Solution 1 - RegexeyelidlessnessView Answer on Stackoverflow
Solution 2 - RegexJohanView Answer on Stackoverflow
Solution 3 - RegexKevin JohnsonView Answer on Stackoverflow
Solution 4 - RegexcodaddictView Answer on Stackoverflow
Solution 5 - RegexMichael SochaView Answer on Stackoverflow
Solution 6 - Regexghostdog74View Answer on Stackoverflow
Solution 7 - RegexDonView Answer on Stackoverflow