Finding all positions of substring in a larger string in C#

C#.Netasp.netString

C# Problem Overview


I have a large string I need to parse, and I need to find all the instances of extract"(me,i-have lots. of]punctuation, and store the index of each to a list.

So say this piece of string was in the beginning and middle of the larger string, both of them would be found, and their indexes would be added to the List. and the List would contain 0 and the other index whatever it would be.

I've been playing around, and the string.IndexOf does almost what I'm looking for, and I've written some code - but it's not working and I've been unable to figure out exactly what is wrong:

List<int> inst = new List<int>();
int index = 0;
while (index < source.LastIndexOf("extract\"(me,i-have lots. of]punctuation", 0) + 39)
{
    int src = source.IndexOf("extract\"(me,i-have lots. of]punctuation", index);
    inst.Add(src);
    index = src + 40;
}
  • inst = The list
  • source = The large string

Any better ideas?

C# Solutions


Solution 1 - C#

Here's an example extension method for it:

public static List<int> AllIndexesOf(this string str, string value) {
    if (String.IsNullOrEmpty(value))
        throw new ArgumentException("the string to find may not be empty", "value");
    List<int> indexes = new List<int>();
    for (int index = 0;; index += value.Length) {
        index = str.IndexOf(value, index);
        if (index == -1)
            return indexes;
        indexes.Add(index);
    }
}

If you put this into a static class and import the namespace with using, it appears as a method on any string, and you can just do:

List<int> indexes = "fooStringfooBar".AllIndexesOf("foo");

For more information on extension methods, http://msdn.microsoft.com/en-us/library/bb383977.aspx

Also the same using an iterator:

public static IEnumerable<int> AllIndexesOf(this string str, string value) {
    if (String.IsNullOrEmpty(value))
        throw new ArgumentException("the string to find may not be empty", "value");
    for (int index = 0;; index += value.Length) {
        index = str.IndexOf(value, index);
        if (index == -1)
            break;
        yield return index;
    }
}

Solution 2 - C#

Why don't you use the built in RegEx class:

public static IEnumerable<int> GetAllIndexes(this string source, string matchString)
{
   matchString = Regex.Escape(matchString);
   foreach (Match match in Regex.Matches(source, matchString))
   {
      yield return match.Index;
   }
}

If you do need to reuse the expression then compile it and cache it somewhere. Change the matchString param to a Regex matchExpression in another overload for the reuse case.

Solution 3 - C#

using LINQ

public static IEnumerable<int> IndexOfAll(this string sourceString, string subString)
{
    return Regex.Matches(sourceString, subString).Cast<Match>().Select(m => m.Index);
}

Solution 4 - C#

Polished version + case ignoring support:

public static int[] AllIndexesOf(string str, string substr, bool ignoreCase = false)
{
	if (string.IsNullOrWhiteSpace(str) ||
	    string.IsNullOrWhiteSpace(substr))
	{
		throw new ArgumentException("String or substring is not specified.");
	}

	var indexes = new List<int>();
	int index = 0;

	while ((index = str.IndexOf(substr, index, ignoreCase ? StringComparison.OrdinalIgnoreCase : StringComparison.Ordinal)) != -1)
	{
		indexes.Add(index++);
	}

	return indexes.ToArray();
}

Solution 5 - C#

It could be done in efficient time complexity using KMP algorithm in O(N + M) where N is the length of text and M is the length of the pattern.

This is the implementation and usage:

static class StringExtensions
{
    public static IEnumerable<int> AllIndicesOf(this string text, string pattern)
    {
        if (string.IsNullOrEmpty(pattern))
        {
            throw new ArgumentNullException(nameof(pattern));
        }
        return Kmp(text, pattern);
    }

    private static IEnumerable<int> Kmp(string text, string pattern)
    {
        int M = pattern.Length;
        int N = text.Length;

        int[] lps = LongestPrefixSuffix(pattern);
        int i = 0, j = 0; 

        while (i < N)
        {
            if (pattern[j] == text[i])
            {
                j++;
                i++;
            }
            if (j == M)
            {
                yield return i - j;
                j = lps[j - 1];
            }

            else if (i < N && pattern[j] != text[i])
            {
                if (j != 0)
                {
                    j = lps[j - 1];
                }
                else
                {
                    i++;
                }
            }
        }
    }

    private static int[] LongestPrefixSuffix(string pattern)
    {
        int[] lps = new int[pattern.Length];
        int length = 0;
        int i = 1;

        while (i < pattern.Length)
        {
            if (pattern[i] == pattern[length])
            {
                length++;
                lps[i] = length;
                i++;
            }
            else
            {
                if (length != 0)
                {
                    length = lps[length - 1];
                }
                else
                {
                    lps[i] = length;
                    i++;
                }
            }
        }
        return lps;
    }

and this is an example of how to use it:

static void Main(string[] args)
    {
        string text = "this is a test";
        string pattern = "is";
        foreach (var index in text.AllIndicesOf(pattern))
        {
            Console.WriteLine(index); // 2 5
        }
    }

Solution 6 - C#

public List<int> GetPositions(string source, string searchString)
{
    List<int> ret = new List<int>();
    int len = searchString.Length;
    int start = -len;
    while (true)
    {
        start = source.IndexOf(searchString, start + len);
        if (start == -1)
        {
            break;
        }
        else
        {
            ret.Add(start);
        }
    }
    return ret;
}

Call it like this:

List<int> list = GetPositions("bob is a chowder head bob bob sldfjl", "bob");
// list will contain 0, 22, 26

Solution 7 - C#

@csam is correct in theory, although his code will not complie and can be refractored to

public static IEnumerable<int> IndexOfAll(this string sourceString, string matchString)
{
    matchString = Regex.Escape(matchString);
    return from Match match in Regex.Matches(sourceString, matchString) select match.Index;
}

Solution 8 - C#

Hi nice answer by @Matti Virkkunen

public static List<int> AllIndexesOf(this string str, string value) {
    if (String.IsNullOrEmpty(value))
        throw new ArgumentException("the string to find may not be empty", "value");
    List<int> indexes = new List<int>();
    for (int index = 0;; index += value.Length) {
        index = str.IndexOf(value, index);
        if (index == -1)
            return indexes;
        indexes.Add(index);
        index--;
    }
}

But this covers tests cases like AOOAOOA where substring

are AOOA and AOOA

Output 0 and 3

Solution 9 - C#

Without Regex, using string comparison type:

string search = "123aa456AA789bb9991AACAA";
string pattern = "AA";
Enumerable.Range(0, search.Length)
   .Select(index => { return new { Index = index, Length = (index + pattern.Length) > search.Length ? search.Length - index : pattern.Length }; })
   .Where(searchbit => searchbit.Length == pattern.Length && pattern.Equals(search.Substring(searchbit.Index, searchbit.Length),StringComparison.OrdinalIgnoreCase))
   .Select(searchbit => searchbit.Index)

This returns {3,8,19,22}. Empty pattern would match all positions.

For multiple patterns:

string search = "123aa456AA789bb9991AACAA";
string[] patterns = new string[] { "aa", "99" };
patterns.SelectMany(pattern => Enumerable.Range(0, search.Length)
   .Select(index => { return new { Index = index, Length = (index + pattern.Length) > search.Length ? search.Length - index : pattern.Length }; })
   .Where(searchbit => searchbit.Length == pattern.Length && pattern.Equals(search.Substring(searchbit.Index, searchbit.Length), StringComparison.OrdinalIgnoreCase))
   .Select(searchbit => searchbit.Index))

This returns {3, 8, 19, 22, 15, 16}

Solution 10 - C#

I noticed that at least two proposed solutions don't handle overlapping search hits. I didn't check the one marked with the green checkmark. Here is one that handles overlapping search hits:

    public static List<int> GetPositions(this string source, string searchString)
    {
        List<int> ret = new List<int>();
        int len = searchString.Length;
        int start = -1;
        while (true)
        {
            start = source.IndexOf(searchString, start +1);
            if (start == -1)
            {
                break;
            }
            else
            {
                ret.Add(start);
            }
        }
        return ret;
    }

Solution 11 - C#

public static Dictionary<string, IEnumerable<int>> GetWordsPositions(this string input, string[] Susbtrings)
{
    Dictionary<string, IEnumerable<int>> WordsPositions = new Dictionary<string, IEnumerable<int>>();
    IEnumerable<int> IndexOfAll = null;
    foreach (string st in Susbtrings)
    {
        IndexOfAll = Regex.Matches(input, st).Cast<Match>().Select(m => m.Index);
        WordsPositions.Add(st, IndexOfAll);

    }
    return WordsPositions;
}

Solution 12 - C#

you can use linq to select and enumerate all elements, then find by any string:

I've created a class:

class Pontos 
{
	//index on string
    public int Pos { get; set; }
	//caractere 
	public string Caractere { get; set; }    		
}

And use like this:

int count = 0;

var pontos = texto.Select(y => new Pontos { Pos = count++, Caractere = y.ToString() }).Where(x=>x.Caractere == ".").ToList();

then: input string: enter image description here

output list: enter image description here

PS: SeForNumero is another field of my class, I need this for my own purposes, but is not necessary to this use.

Solution 13 - C#

Based on the code I've used for finding multiple instances of a string within a larger string, your code would look like:

List<int> inst = new List<int>();
int index = 0;
while (index >=0)
{
    index = source.IndexOf("extract\"(me,i-have lots. of]punctuation", index);
    inst.Add(index);
    index++;
}

Solution 14 - C#

I found this example and incorporated it into a function:

    public static int solution1(int A, int B)
    {
        // Check if A and B are in [0...999,999,999]
        if ( (A >= 0 && A <= 999999999) && (B >= 0 && B <= 999999999))
        {
            if (A == 0 && B == 0)
            {
                return 0;
            }
            // Make sure A < B
            if (A < B)
            {                    
                // Convert A and B to strings
                string a = A.ToString();
                string b = B.ToString();
                int index = 0;

                // See if A is a substring of B
                if (b.Contains(a))
                {
                    // Find index where A is
                    if (b.IndexOf(a) != -1)
                    {                            
                        while ((index = b.IndexOf(a, index)) != -1)
                        {
                            Console.WriteLine(A + " found at position " + index);
                            index++;
                        }
                        Console.ReadLine();
                        return b.IndexOf(a);
                    }
                    else
                        return -1;
                }
                else
                {
                    Console.WriteLine(A + " is not in " + B + ".");
                    Console.ReadLine();

                    return -1;
                }
            }
            else
            {
                Console.WriteLine(A + " must be less than " + B + ".");
               // Console.ReadLine();

                return -1;
            }                
        }
        else
        {
            Console.WriteLine("A or B is out of range.");
            //Console.ReadLine();

            return -1;
        }
    }

    static void Main(string[] args)
    {
        int A = 53, B = 1953786;
        int C = 78, D = 195378678;
        int E = 57, F = 153786;

        solution1(A, B);
        solution1(C, D);
        solution1(E, F);

        Console.WriteLine();
    }

Returns:

53 found at position 2

78 found at position 4
78 found at position 7

57 is not in 153786

Solution 15 - C#

How is this alternative implementation?

 public static class MyExtensions
    {
        public static int HowMany(this string str, char needle)
        {
            int counter = 0;
            int nextIndex = 0;
            for (; nextIndex != -1; )
            {
                nextIndex = str.IndexOf(needle, nextIndex);
                if (nextIndex != -1)
                {
                    counter++;
                    //step over to the next char
                    nextIndex++;
                }
            }
            return counter;
        }
    }

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestioncaesayView Question on Stackoverflow
Solution 1 - C#Matti VirkkunenView Answer on Stackoverflow
Solution 2 - C#csaamView Answer on Stackoverflow
Solution 3 - C#ehoscaView Answer on Stackoverflow
Solution 4 - C#net_progView Answer on Stackoverflow
Solution 5 - C#M.KhooryaniView Answer on Stackoverflow
Solution 6 - C#MusiGenesisView Answer on Stackoverflow
Solution 7 - C#arame3333View Answer on Stackoverflow
Solution 8 - C#Pranay DeepView Answer on Stackoverflow
Solution 9 - C#SeanView Answer on Stackoverflow
Solution 10 - C#Kevin BakerView Answer on Stackoverflow
Solution 11 - C#miguel quiros garciaView Answer on Stackoverflow
Solution 12 - C#JohnnyView Answer on Stackoverflow
Solution 13 - C#CorinView Answer on Stackoverflow
Solution 14 - C#Mark SView Answer on Stackoverflow
Solution 15 - C#Zhi ChenView Answer on Stackoverflow