Match at every second occurrence
RegexRegex Problem Overview
Is there a way to specify a regular expression to match every 2nd occurrence of a pattern in a string?
Examples
- searching for a against string abcdabcd should find one occurrence at position 5
- searching for ab against string abcdabcd should find one occurrence at position 5
- searching for dab against string abcdabcd should find no occurrences
- searching for a against string aaaa should find two occurrences at positions 2 and 4
Regex Solutions
Solution 1 - Regex
Use grouping.
foo.*?(foo)
Solution 2 - Regex
Suppose the pattern you want is abc+d. You want to match the second occurrence of this pattern in a string.
You would construct the following regex:
abc+d.*?(abc+d)
This would match strings of the form: <your-pattern>...<your-pattern>
. Since we're using the reluctant qualifier *? we're safe that there cannot be another match of
Solution 3 - Regex
Would something like
(pattern.*?(pattern))*
work for you?
Edit:
The problem with this is that it uses the non-greedy operator *?
, which can require an awful lot of backtracking along the string instead of just looking at each letter once. What this means for you is that this could be slow for large gaps.
Solution 4 - Regex
If you're using C#, you can either get all the matches at once (i.e. use Regex.Matches()
, which returns a MatchCollection
, and check the index of the item: index % 2 != 0
).
If you want to find the occurrence to replace it, use one of the overloads of Regex.Replace()
that uses a MatchEvaluator
(e.g. Regex.Replace(String, String, MatchEvaluator)
. Here's the code:
using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string input = "abcdabcd";
// Replace *second* a with m
string replacedString = Regex.Replace(
input,
"a",
new SecondOccuranceFinder("m").MatchEvaluator);
Console.WriteLine(replacedString);
Console.Read();
}
class SecondOccuranceFinder
{
public SecondOccuranceFinder(string replaceWith)
{
_replaceWith = replaceWith;
_matchEvaluator = new MatchEvaluator(IsSecondOccurance);
}
private string _replaceWith;
private MatchEvaluator _matchEvaluator;
public MatchEvaluator MatchEvaluator
{
get
{
return _matchEvaluator;
}
}
private int _matchIndex;
public string IsSecondOccurance(Match m)
{
_matchIndex++;
if (_matchIndex % 2 == 0)
return _replaceWith;
else
return m.Value;
}
}
}
}
Solution 5 - Regex
Back references can find interesting solutions here. This regex:
([a-z]+).*(\1)
will find the longest repeated sequence.
This one will find a sequence of 3 letters that is repeated:
([a-z]{3}).*(\1)
Solution 6 - Regex
There's no "direct" way of doing so but you can specify the pattern twice as in: a[^a]*a
that match up to the second "a".
The alternative is to use your programming language (perl? C#? ...) to match the first occurence and then the second one.
EDIT: I've seen other responded using the "non-greedy" operators which might be a good way to go, assuming you have them in your regex library!