Regex to match string containing two names in any order

RegexString

Regex Problem Overview


I need logical AND in regex.

something like

jack AND james

agree with following strings

  • 'hi jack here is james'

  • 'hi james here is jack'

Regex Solutions


Solution 1 - Regex

You can do checks using positive lookaheads. Here is a summary from the indispensable regular-expressions.info:

> Lookahead and lookbehind, collectively called “lookaround”, are > zero-length assertions...lookaround actually matches characters, but > then gives up the match, returning only the result: match or no match. > That is why they are called “assertions”. They do not consume > characters in the string, but only assert whether a match is possible > or not.

It then goes on to explain that positive lookaheads are used to assert that what follows matches a certain expression without taking up characters in that matching expression.

So here is an expression using two subsequent postive lookaheads to assert that the phrase matches jack and james in either order:

^(?=.*\bjack\b)(?=.*\bjames\b).*$

Test it.

The expressions in parentheses starting with ?= are the positive lookaheads. I'll break down the pattern:

  1. ^ asserts the start of the expression to be matched.
  2. (?=.*\bjack\b) is the first positive lookahead saying that what follows must match .*\bjack\b.
  3. .* means any character zero or more times.
  4. \b means any word boundary (white space, start of expression, end of expression, etc.).
  5. jack is literally those four characters in a row (the same for james in the next positive lookahead).
  6. $ asserts the end of the expression to me matched.

So the first lookahead says "what follows (and is not itself a lookahead or lookbehind) must be an expression that starts with zero or more of any characters followed by a word boundary and then jack and another word boundary," and the second look ahead says "what follows must be an expression that starts with zero or more of any characters followed by a word boundary and then james and another word boundary." After the two lookaheads is .* which simply matches any characters zero or more times and $ which matches the end of the expression.

"start with anything then jack or james then end with anything" satisfies the first lookahead because there are a number of characters then the word jack, and it satisfies the second lookahead because there are a number of characters (which just so happens to include jack, but that is not necessary to satisfy the second lookahead) then the word james. Neither lookahead asserts the end of the expression, so the .* that follows can go beyond what satisfies the lookaheads, such as "then end with anything".

I think you get the idea, but just to be absolutely clear, here is with jack and james reversed, i.e. "start with anything then james or jack then end with anything"; it satisfies the first lookahead because there are a number of characters then the word james, and it satisfies the second lookahead because there are a number of characters (which just so happens to include james, but that is not necessary to satisfy the second lookahead) then the word jack. As before, neither lookahead asserts the end of the expression, so the .* that follows can go beyond what satisfies the lookaheads, such as "then end with anything".

This approach has the advantage that you can easily specify multiple conditions.

^(?=.*\bjack\b)(?=.*\bjames\b)(?=.*\bjason\b)(?=.*\bjules\b).*$

Solution 2 - Regex

Try:

james.*jack

If you want both at the same time, then or them:

james.*jack|jack.*james

Solution 3 - Regex

Explanation of command that i am going to write:-

. means any character, digit can come in place of .

* means zero or more occurrences of thing written just previous to it.

| means 'or'.

So,

james.*jack

would search james , then any number of character until jack comes.

Since you want either jack.*james or james.*jack

Hence Command:

jack.*james|james.*jack

Solution 4 - Regex

Its short and sweet

(?=.*jack)(?=.*james)

Test Cases:

[
  "xxx james xxx jack xxx",
  "jack xxx james ",
  "jack xxx jam ",
  "  jam and jack",
  "jack",
  "james",
]
.forEach(s => console.log(/(?=.*james)(?=.*jack)/.test(s)) )

Solution 5 - Regex

You can do:

\bjack\b.*\bjames\b|\bjames\b.*\bjack\b

Solution 6 - Regex

The expression in this answer does that for one jack and one james in any order.

Here, we'd explore other scenarios.

METHOD 1: One jack and One james

Just in case, two jack or two james would not be allowed, only one jack and one james would be valid, we can likely design an expression similar to:

^(?!.*\bjack\b.*\bjack\b)(?!.*\bjames\b.*\bjames\b)(?=.*\bjames\b)(?=.*\bjack\b).*$

Here, we would exclude those instances using these statements:

(?!.*\bjack\b.*\bjack\b)

and,

(?!.*\bjames\b.*\bjames\b)
RegEx Demo 1

We can also simplify that to:

^(?!.*\bjack\b.*\bjack\b|.*\bjames\b.*\bjames\b)(?=.*\bjames\b|.*\bjack\b).*$
RegEx Demo 2

If you wish to simplify/update/explore the expression, it's been explained on the top right panel of regex101.com. You can watch the matching steps or modify them in this debugger link, if you'd be interested. The debugger demonstrates that how a RegEx engine might step by step consume some sample input strings and would perform the matching process.


RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Test

const regex = /^(?!.*\bjack\b.*\bjack\b|.*\bjames\b.*\bjames\b)(?=.*\bjames\b|.*\bjack\b).*$/gm;
const str = `hi jack here is james
hi james here is jack
hi james jack here is jack james
hi jack james here is james jack
hi jack jack here is jack james
hi james james here is james jack
hi jack jack jack here is james
`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}


METHOD 2: One jack and One james in a specific order

The expression can be also designed for first a james then a jack, similar to the following one:

^(?!.*\bjack\b.*\bjack\b|.*\bjames\b.*\bjames\b)(?=.*\bjames\b.*\bjack\b).*$
RegEx Demo 3

and vice versa:

^(?!.*\bjack\b.*\bjack\b|.*\bjames\b.*\bjames\b)(?=.*\bjack\b.*\bjames\b).*$
RegEx Demo 4

Solution 7 - Regex

You can make use of regex's quantifier feature since lookaround may not be supported all the time.

(\bjames\b){1,}.*(\bjack\b){1,}|(\bjack\b){1,}.*(\bjames\b){1,}

Solution 8 - Regex

Vim has a branch operator \& that is useful when searching for a line containing a set of words, in any order. Moreover, extending the set of required words is trivial.

For example,

/.*jack\&.*james

will match a line containing jack and james, in any order.

See this answer for more information on usage. I am not aware of any other regex flavor that implements branching; the operator is not even documented on the Regular Expression wikipedia entry.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMelounView Question on Stackoverflow
Solution 1 - RegexAlin PurcaruView Answer on Stackoverflow
Solution 2 - Regexicyrock.comView Answer on Stackoverflow
Solution 3 - RegexShubham SharmaView Answer on Stackoverflow
Solution 4 - RegexShivam AgrawalView Answer on Stackoverflow
Solution 5 - RegexcodaddictView Answer on Stackoverflow
Solution 6 - RegexEmmaView Answer on Stackoverflow
Solution 7 - RegexXPMaiView Answer on Stackoverflow
Solution 8 - RegexFirstrockView Answer on Stackoverflow