Find out how many times a regex matches in a string in Python

PythonRegex

Python Problem Overview


Is there a way that I can find out how many matches of a regex are in a string in Python? For example, if I have the string "It actually happened when it acted out of turn."

I want to know how many times "t a" appears in the string. In that string, "t a" appears twice. I want my function to tell me it appeared twice. Is this possible?

Python Solutions


Solution 1 - Python

import re
len(re.findall(pattern, string_to_search))

Solution 2 - Python

The existing solutions based on findall are fine for non-overlapping matches (and no doubt optimal except maybe for HUGE number of matches), although alternatives such as sum(1 for m in re.finditer(thepattern, thestring)) (to avoid ever materializing the list when all you care about is the count) are also quite possible. Somewhat idiosyncratic would be using subn and ignoring the resulting string...:

def countnonoverlappingrematches(pattern, thestring):
  return re.subn(pattern, '', thestring)[1]

the only real advantage of this latter idea would come if you only cared to count (say) up to 100 matches; then, re.subn(pattern, '', thestring, 100)[1] might be practical (returning 100 whether there are 100 matches, or 1000, or even larger numbers).

Counting overlapping matches requires you to write more code, because the built-in functions in question are all focused on NON-overlapping matches. There's also a problem of definition, e.g, with pattern being 'a+' and thestring being 'aa', would you consider this to be just one match, or three (the first a, the second one, both of them), or...?

Assuming for example that you want possibly-overlapping matches starting at distinct spots in the string (which then would give TWO matches for the example in the previous paragraph):

def countoverlappingdistinct(pattern, thestring):
  total = 0
  start = 0
  there = re.compile(pattern)
  while True:
    mo = there.search(thestring, start)
    if mo is None: return total
    total += 1
    start = 1 + mo.start()

Note that you do have to compile the pattern into a RE object in this case: function re.search does not accept a start argument (starting position for the search) the way method search does, so you'd have to be slicing thestring as you go -- definitely more effort than just having the next search start at the next possible distinct starting point, which is what I'm doing in this function.

Solution 3 - Python

I know this is a question about regex. I just thought I'd mention the count method for future reference if someone wants a non-regex solution.

>>> s = "It actually happened when it acted out of turn."
>>> s.count('t a')
2

Which return the number of non-overlapping occurrences of the substring

Solution 4 - Python

You can find overlapping matches by using a noncapturing subpattern:

def count_overlapping(pattern, string):
    return len(re.findall("(?=%s)" % pattern, string))

Solution 5 - Python

Have you tried this?

 len( pattern.findall(source) )

Solution 6 - Python

import re
print len(re.findall(r'ab',u'ababababa'))

Solution 7 - Python

To avoid creating a list of matches one may also use re.sub with a callable as replacement. It will be called on each match, incrementing internal counter.

class Counter(object):
    def __init__(self):
        self.matched = 0
    def __call__(self, matchobj):
        self.matched += 1

counter = Counter()
re.sub(some_pattern, counter, text)

print counter.matched

Solution 8 - Python

this works fine

ptr_str = lambda pattern,string1 :print(f'pattern = {pattern} times = {len(re.findall(pattern,string1))}')
pattern = 'AGATC'
str='AAGGTAAGTTTAGAATATAAAAGGTGAGTTAAATAGATCATAGGTTATATTGT'
ptr_str(pattern,string1)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDanView Question on Stackoverflow
Solution 1 - PythonSilentGhostView Answer on Stackoverflow
Solution 2 - PythonAlex MartelliView Answer on Stackoverflow
Solution 3 - PythonNadia AlramliView Answer on Stackoverflow
Solution 4 - PythonAnts AasmaView Answer on Stackoverflow
Solution 5 - PythonS.LottView Answer on Stackoverflow
Solution 6 - Pythonc_harmView Answer on Stackoverflow
Solution 7 - PythonjanislawView Answer on Stackoverflow
Solution 8 - PythonRahulView Answer on Stackoverflow