Finding multiple occurrences of a string within a string in Python

PythonString

Python Problem Overview


How do I find multiple occurrences of a string within a string in Python? Consider this:

>>> text = "Allowed Hello Hollow"
>>> text.find("ll")
1
>>> 

So the first occurrence of ll is at 1 as expected. How do I find the next occurrence of it?

Same question is valid for a list. Consider:

>>> x = ['ll', 'ok', 'll']

How do I find all the ll with their indexes?

Python Solutions


Solution 1 - Python

Using regular expressions, you can use re.finditer to find all (non-overlapping) occurences:

>>> import re
>>> text = 'Allowed Hello Hollow'
>>> for m in re.finditer('ll', text):
         print('ll found', m.start(), m.end())

ll found 1 3
ll found 10 12
ll found 16 18

Alternatively, if you don't want the overhead of regular expressions, you can also repeatedly use str.find to get the next index:

>>> text = 'Allowed Hello Hollow'
>>> index = 0
>>> while index < len(text):
        index = text.find('ll', index)
        if index == -1:
            break
        print('ll found at', index)
        index += 2 # +2 because len('ll') == 2

ll found at  1
ll found at  10
ll found at  16

This also works for lists and other sequences.

Solution 2 - Python

I think what you are looking for is string.count

"Allowed Hello Hollow".count('ll')
>>> 3

Hope this helps
NOTE: this only captures non-overlapping occurences

Solution 3 - Python

For the list example, use a comprehension:

>>> l = ['ll', 'xx', 'll']
>>> print [n for (n, e) in enumerate(l) if e == 'll']
[0, 2]

Similarly for strings:

>>> text = "Allowed Hello Hollow"
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 10, 16]

this will list adjacent runs of "ll', which may or may not be what you want:

>>> text = 'Alllowed Hello Holllow'
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 2, 11, 17, 18]

Solution 4 - Python

FWIW, here are a couple of non-RE alternatives that I think are neater than poke's solution.

The first uses str.index and checks for ValueError:

def findall(sub, string):
    """
    >>> text = "Allowed Hello Hollow"
    >>> tuple(findall('ll', text))
    (1, 10, 16)
    """
    index = 0 - len(sub)
    try:
        while True:
            index = string.index(sub, index + len(sub))
            yield index
    except ValueError:
        pass

The second tests uses str.find and checks for the sentinel of -1 by using iter:

def findall_iter(sub, string):
    """
    >>> text = "Allowed Hello Hollow"
    >>> tuple(findall_iter('ll', text))
    (1, 10, 16)
    """
    def next_index(length):
        index = 0 - length
        while True:
            index = string.find(sub, index + length)
            yield index
    return iter(next_index(len(sub)).next, -1)

To apply any of these functions to a list, tuple or other iterable of strings, you can use a higher-level function —one that takes a function as one of its arguments— like this one:

def findall_each(findall, sub, strings):
    """
    >>> texts = ("fail", "dolly the llama", "Hello", "Hollow", "not ok")
    >>> list(findall_each(findall, 'll', texts))
    [(), (2, 10), (2,), (2,), ()]
    >>> texts = ("parallellized", "illegally", "dillydallying", "hillbillies")
    >>> list(findall_each(findall_iter, 'll', texts))
    [(4, 7), (1, 6), (2, 7), (2, 6)]
    """
    return (tuple(findall(sub, string)) for string in strings)

Solution 5 - Python

For your list example:

In [1]: x = ['ll','ok','ll']

In [2]: for idx, value in enumerate(x):
   ...:     if value == 'll':
   ...:         print idx, value       
0 ll
2 ll

If you wanted all the items in a list that contained 'll', you could also do that.

In [3]: x = ['Allowed','Hello','World','Hollow']

In [4]: for idx, value in enumerate(x):
   ...:     if 'll' in value:
   ...:         print idx, value
   ...:         
   ...:         
0 Allowed
1 Hello
3 Hollow

Solution 6 - Python

>>> for n,c in enumerate(text):
...   try:
...     if c+text[n+1] == "ll": print n
...   except: pass
...
1
10
16

Solution 7 - Python

This version should be linear in length of the string, and should be fine as long as the sequences aren't too repetitive (in which case you can replace the recursion with a while loop).

def find_all(st, substr, start_pos=0, accum=[]):
    ix = st.find(substr, start_pos)
    if ix == -1:
        return accum
    return find_all(st, substr, start_pos=ix + 1, accum=accum + [ix])

bstpierre's list comprehension is a good solution for short sequences, but looks to have quadratic complexity and never finished on a long text I was using.

findall_lc = lambda txt, substr: [n for n in xrange(len(txt))
                                   if txt.find(substr, n) == n]

For a random string of non-trivial length, the two functions give the same result:

import random, string; random.seed(0)
s = ''.join([random.choice(string.ascii_lowercase) for _ in range(100000)])

>>> find_all(s, 'th') == findall_lc(s, 'th')
True
>>> findall_lc(s, 'th')[:4]
[564, 818, 1872, 2470]

But the quadratic version is about 300 times slower

%timeit find_all(s, 'th')
1000 loops, best of 3: 282 µs per loop

%timeit findall_lc(s, 'th')    
10 loops, best of 3: 92.3 ms per loop

Solution 8 - Python

This code might not be the shortest/most efficient but it is simple and understandable

def findall(f, s):
    l = []
    i = -1
    while True:
        i = s.find(f, i+1)
        if i == -1:
            return l
        l.append(s.find(f, i))

findall('test', 'test test test test')
# [0, 5, 10, 15]

Solution 9 - Python

For the first version, checking a string:

def findall(text, sub):
    """Return all indices at which substring occurs in text"""
    return [
        index
        for index in range(len(text) - len(sub) + 1)
        if text[index:].startswith(sub)
    ]

print(findall('Allowed Hello Hollow', 'll'))
# [1, 10, 16]

No need to import re. This should run in linear time, as it only loops through the string once (and stops before the end, once there aren't enough characters left to fit the substring). I also find it quite readable, personally.

Note that this will find overlapping occurrences:

print(findall('aaa', 'aa'))
# [0, 1]

Solution 10 - Python

Brand new to programming in general and working through an online tutorial. I was asked to do this as well, but only using the methods I had learned so far (basically strings and loops). Not sure if this adds any value here, and I know this isn't how you would do it, but I got it to work with this:

needle = input()
haystack = input()
counter = 0
n=-1
for i in range (n+1,len(haystack)+1):
   for j in range(n+1,len(haystack)+1):
      n=-1
      if needle != haystack[i:j]:
         n = n+1
         continue
      if needle == haystack[i:j]:
         counter = counter + 1
print (counter)

Solution 11 - Python

The following function finds all the occurrences of a string inside another while informing the position where each occurrence is found.

You can call the function using the test cases in the table below. You can try with words, spaces and numbers all mixed up.

The function works well with overlapping characters.

theString aString
"661444444423666455678966" "55"
"661444444423666455678966" "44"
"6123666455678966" "666"
"66123666455678966" "66"

Calling examples:

1. print("Number of occurrences: ", find_all("123666455556785555966", "5555"))
   
   output:
           Found in position:  7
           Found in position:  14
           Number of occurrences:  2
   
2. print("Number of occurrences: ", find_all("Allowed Hello Hollow", "ll "))

   output:
          Found in position:  1
          Found in position:  10
          Found in position:  16
          Number of occurrences:  3

3. print("Number of occurrences: ", find_all("Aaa bbbcd$#@@abWebbrbbbbrr 123", "bbb"))

   output:
         Found in position:  4
         Found in position:  21
         Number of occurrences:  2
         

def find_all(theString, aString):
    count = 0
    i = len(aString)
    x = 0

    while x < len(theString) - (i-1): 
        if theString[x:x+i] == aString:        
            print("Found in position: ", x)
            x=x+i
            count=count+1
        else:
            x=x+1
    return count

Solution 12 - Python

#!/usr/local/bin python3
#-*- coding: utf-8 -*-

main_string = input()
sub_string = input()

count = counter = 0

for i in range(len(main_string)):
    if main_string[i] == sub_string[0]:
	    k = i + 1
	    for j in range(1, len(sub_string)):
		    if k != len(main_string) and main_string[k] == sub_string[j]:
			    count += 1
			    k += 1
	    if count == (len(sub_string) - 1):
		    counter += 1
	    count = 0

print(counter) 

This program counts the number of all substrings even if they are overlapped without the use of regex. But this is a naive implementation and for better results in worst case it is advised to go through either Suffix Tree, KMP and other string matching data structures and algorithms.

Solution 13 - Python

Here is my function for finding multiple occurrences. Unlike the other solutions here, it supports the optional start and end parameters for slicing, just like str.index:

def all_substring_indexes(string, substring, start=0, end=None):
    result = []
    new_start = start
    while True:
        try:
            index = string.index(substring, new_start, end)
        except ValueError:
            return result
        else:
            result.append(index)
            new_start = index + len(substring)

Solution 14 - Python

A simple iterative code which returns a list of indices where the substring occurs.

	    def allindices(string, sub):
		   l=[]
		   i = string.find(sub)
		   while i >= 0:
		      l.append(i)
		      i = string.find(sub, i + 1)
		   return l

Solution 15 - Python

You can split to get relative positions then sum consecutive numbers in a list and add (string length * occurence order) at the same time to get the wanted string indexes.

>>> key = 'll'
>>> text = "Allowed Hello Hollow"
>>> x = [len(i) for i in text.split(key)[:-1]]
>>> [sum(x[:i+1]) + i*len(key) for i in range(len(x))]
[1, 10, 16]
>>> 

Solution 16 - Python

Maybe not so Pythonic, but somewhat more self-explanatory. It returns the position of the word looked in the original string.

def retrieve_occurences(sequence, word, result, base_counter):
     indx = sequence.find(word)
     if indx == -1:
         return result
     result.append(indx + base_counter)
     base_counter += indx + len(word)
     return retrieve_occurences(sequence[indx + len(word):], word, result, base_counter)

Solution 17 - Python

I think there's no need to test for length of text; just keep finding until there's nothing left to find. Like this:

    >>> text = 'Allowed Hello Hollow'
    >>> place = 0
    >>> while text.find('ll', place) != -1:
            print('ll found at', text.find('ll', place))
            place = text.find('ll', place) + 2


    ll found at 1
    ll found at 10
    ll found at 16

Solution 18 - Python

You can also do it with conditional list comprehension like this:

string1= "Allowed Hello Hollow"
string2= "ll"
print [num for num in xrange(len(string1)-len(string2)+1) if string1[num:num+len(string2)]==string2]
# [1, 10, 16]

Solution 19 - Python

I had randomly gotten this idea just a while ago. Using a While loop with string splicing and string search can work, even for overlapping strings.

findin = "algorithm alma mater alison alternation alpines"
search = "al"
inx = 0
num_str = 0

while True:
    inx = findin.find(search)
    if inx == -1: #breaks before adding 1 to number of string
        break
    inx = inx + 1
    findin = findin[inx:] #to splice the 'unsearched' part of the string
    num_str = num_str + 1 #counts no. of string

if num_str != 0:
    print("There are ",num_str," ",search," in your string.")
else:
    print("There are no ",search," in your string.")

I'm an amateur in Python Programming (Programming of any language, actually), and am not sure what other issues it could have, but I guess it's working fine?

I guess lower() could be used somewhere in it too if needed.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser225312View Question on Stackoverflow
Solution 1 - PythonpokeView Answer on Stackoverflow
Solution 2 - PythoninspectorG4dgetView Answer on Stackoverflow
Solution 3 - PythonbstpierreView Answer on Stackoverflow
Solution 4 - PythonintuitedView Answer on Stackoverflow
Solution 5 - PythonchaunceyView Answer on Stackoverflow
Solution 6 - Pythonghostdog74View Answer on Stackoverflow
Solution 7 - PythonbeardcView Answer on Stackoverflow
Solution 8 - Pythonuser15716305View Answer on Stackoverflow
Solution 9 - PythonCrazyChuckyView Answer on Stackoverflow
Solution 10 - PythonAaron SemeniukView Answer on Stackoverflow
Solution 11 - PythonjhrozoView Answer on Stackoverflow
Solution 12 - Pythonpmsh.93View Answer on Stackoverflow
Solution 13 - PythonElias ZamariaView Answer on Stackoverflow
Solution 14 - PythonFReeze FRancisView Answer on Stackoverflow
Solution 15 - PythonKenlyView Answer on Stackoverflow
Solution 16 - PythonblasrodriView Answer on Stackoverflow
Solution 17 - PythonrdoView Answer on Stackoverflow
Solution 18 - PythonStefan GruenwaldView Answer on Stackoverflow
Solution 19 - PythonMystearica Primal FendeView Answer on Stackoverflow