Regex to detect one of several strings
RegexRegex Problem Overview
I've got a list of email addresses belonging to several domains. I'd like a regex that will match addresses belonging to three specific domains (for this example: foo, bar, & baz)
So these would match:
- a@foo
- a@bar
- b@baz
This would not:
- a@fnord
Ideally, these would not match either (though it's not critical for this particular problem):
- a@foobar
- b@foofoo
Abstracting the problem a bit: I want to match a string that contains at least one of a given list of substrings.
Regex Solutions
Solution 1 - Regex
Use the pipe symbol to indicate "or":
/a@(foo|bar|baz)\b/
If you don't want the capture-group, use the non-capturing grouping symbol:
/a@(?:foo|bar|baz)\b/
(Of course I'm assuming "a
" is OK for the front of the email address! You should replace that with a suitable regex.)
Solution 2 - Regex
^(a|b)@(foo|bar|baz)$
if you have this strongly defined a list. The start and end character will only search for those three strings.
Solution 3 - Regex
Use:
/@(foo|bar|baz)\.?$/i
Note the differences from other answers:
\.?
- matching 0 or 1 dots, in case the domains in the e-mail address are "fully qualified"$
- to indicate that the string must end with this sequence,/i
- to make the test case insensitive.
Note, this assumes that each e-mail address is on a line on its own.
If the string being matched could be anywhere in the string, then drop the $
, and replace it with \s+
(which matches one or more white space characters)
Solution 4 - Regex
should be more generic, the a shouldn't count, although the @ should.
/@(foo|bar|baz)(?:\W|$)/
Here is a good reference on regex.
edit: change ending to allow end of pattern or word break. now assuming foo/bar/baz are full domain names.
Solution 5 - Regex
If the previous (and logical) answers about '|' don't suit you, have a look at
http://metacpan.org/pod/Regex::PreSuf
module description : create regular expressions from word lists
Solution 6 - Regex
You don't need a regex to find whether a string contains at least one of a given list of substrings. In Python:
def contain(string_, substrings):
return any(s in string_ for s in substrings)
The above is slow for a large string_
and many substrings. GNU fgrep can efficiently search for multiple patterns at the same time.
Using regex
import re
def contain(string_, substrings):
regex = '|'.join("(?:%s)" % re.escape(s) for s in substrings)
return re.search(regex, string_) is not None
Related
Solution 7 - Regex
Ok I know you asked for a regex answer. But have you considered just splitting the string with the '@' char taking the second array value (the domain) and doing a simple match test
if (splitString[1] == "foo" && splitString[1] == "bar" && splitString[1] == "baz")
{
//Do Something!
}
Seems to me that RegEx is overkill. Of course my assumption is that your case is really as simple as you have listed.