What is the difference between re.search and re.match?

PythonRegexSearchMatch

Python Problem Overview


What is the difference between the search() and match() functions in the Python re module?

I've read the documentation (current documentation), but I never seem to remember it. I keep having to look it up and re-learn it. I'm hoping that someone will answer it clearly with examples so that (perhaps) it will stick in my head. Or at least I'll have a better place to return with my question and it will take less time to re-learn it.

Python Solutions


Solution 1 - Python

re.match is anchored at the beginning of the string. That has nothing to do with newlines, so it is not the same as using ^ in the pattern.

As the re.match documentation says:

> If zero or more characters at the > beginning of string match the regular expression pattern, return a > corresponding MatchObject instance. > Return None if the string does not > match the pattern; note that this is > different from a zero-length match. > > Note: If you want to locate a match > anywhere in string, use search() > instead.

re.search searches the entire string, as the documentation says:

> Scan through string looking for a > location where the regular expression > pattern produces a match, and return a > corresponding MatchObject instance. > Return None if no position in the > string matches the pattern; note that > this is different from finding a > zero-length match at some point in the > string.

So if you need to match at the beginning of the string, or to match the entire string use match. It is faster. Otherwise use search.

The documentation has a specific section for match vs. search that also covers multiline strings:

> Python offers two different primitive > operations based on regular > expressions: match checks for a match > only at the beginning of the string, > while search checks for a match > anywhere in the string (this is what > Perl does by default). > > Note that match may differ from search > even when using a regular expression > beginning with '^': '^' matches only > at the start of the string, or in > MULTILINE mode also immediately > following a newline. The “match” > operation succeeds only if the pattern > matches at the start of the string > regardless of mode, or at the starting > position given by the optional pos > argument regardless of whether a > newline precedes it.

Now, enough talk. Time to see some example code:

# example code:
string_with_newlines = """something
someotherthing"""

import re

print re.match('some', string_with_newlines) # matches
print re.match('someother', 
               string_with_newlines) # won't match
print re.match('^someother', string_with_newlines, 
               re.MULTILINE) # also won't match
print re.search('someother', 
                string_with_newlines) # finds something
print re.search('^someother', string_with_newlines, 
                re.MULTILINE) # also finds something

m = re.compile('thing$', re.MULTILINE)

print m.match(string_with_newlines) # no match
print m.match(string_with_newlines, pos=4) # matches
print m.search(string_with_newlines, 
               re.MULTILINE) # also matches

Solution 2 - Python

search ⇒ find something anywhere in the string and return a match object.

match ⇒ find something at the beginning of the string and return a match object.

Solution 3 - Python

> match is much faster than search, so instead of doing regex.search("word") you can do regex.match((.*?)word(.*?)) and gain tons of performance if you are working with millions of samples.

This comment from @ivan_bilan under the accepted answer above got me thinking if such hack is actually speeding anything up, so let's find out how many tons of performance you will really gain.

I prepared the following test suite:

import random
import re
import string
import time

LENGTH = 10
LIST_SIZE = 1000000

def generate_word():
    word = [random.choice(string.ascii_lowercase) for _ in range(LENGTH)]
    word = ''.join(word)
	return word

wordlist = [generate_word() for _ in range(LIST_SIZE)]

start = time.time()
[re.search('python', word) for word in wordlist]
print('search:', time.time() - start)

start = time.time()
[re.match('(.*?)python(.*?)', word) for word in wordlist]
print('match:', time.time() - start)

I made 10 measurements (1M, 2M, ..., 10M words) which gave me the following plot:

match vs. search regex speedtest line plot

As you can see, searching for the pattern 'python' is faster than matching the pattern '(.*?)python(.*?)'.

Python is smart. Avoid trying to be smarter.

Solution 4 - Python

re.search searches for the pattern throughout the string, whereas re.match does not search the pattern; if it does not, it has no other choice than to match it at start of the string.

Solution 5 - Python

You can refer the below example to understand the working of re.match and re.search

a = "123abc"
t = re.match("[a-z]+",a)
t = re.search("[a-z]+",a)

re.match will return none, but re.search will return abc.

Solution 6 - Python

The difference is, re.match() misleads anyone accustomed to Perl, grep, or sed regular expression matching, and re.search() does not. :-)

More soberly, As John D. Cook remarks, re.match() "behaves as if every pattern has ^ prepended." In other words, re.match('pattern') equals re.search('^pattern'). So it anchors a pattern's left side. But it also doesn't anchor a pattern's right side: that still requires a terminating $.

Frankly given the above, I think re.match() should be deprecated. I would be interested to know reasons it should be retained.

Solution 7 - Python

Much shorter:

  • search scans through the whole string.

  • match scans only the beginning of the string.

Following Ex says it:

>>> a = "123abc"
>>> re.match("[a-z]+",a)
None
>>> re.search("[a-z]+",a)
abc

Solution 8 - Python

re.match attempts to match a pattern at the beginning of the string. re.search attempts to match the pattern throughout the string until it finds a match.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDaryl SpitzerView Question on Stackoverflow
Solution 1 - PythonnoskloView Answer on Stackoverflow
Solution 2 - PythonDhanasekaran AnbalaganView Answer on Stackoverflow
Solution 3 - PythonJeyekomonView Answer on Stackoverflow
Solution 4 - PythonxilunView Answer on Stackoverflow
Solution 5 - PythonldRView Answer on Stackoverflow
Solution 6 - PythonCODE-REaDView Answer on Stackoverflow
Solution 7 - PythonU12-ForwardView Answer on Stackoverflow
Solution 8 - PythoncscholView Answer on Stackoverflow