Regex Until But Not Including

RegexSearchRegex Lookarounds

Regex Problem Overview


For regex what is the syntax for search until but not including? Kinda like:

Haystack:
The quick red fox jumped over the lazy brown dog

Expression:
.*?quick -> and then everything until it hits the letter "z" but do not include z

Regex Solutions


Solution 1 - Regex

The explicit way of saying "search until X but not including X" is:

(?:(?!X).)*

where X can be any regular expression.

In your case, though, this might be overkill - here the easiest way would be

[^z]*

This will match anything except z and therefore stop right before the next z.

So .*?quick[^z]* will match The quick fox jumps over the la.

However, as soon as you have more than one simple letter to look out for, (?:(?!X).)* comes into play, for example

(?:(?!lazy).)* - match anything until the start of the word lazy.

This is using a lookahead assertion, more specifically a negative lookahead.

.*?quick(?:(?!lazy).)* will match The quick fox jumps over the .

Explanation:

(?:        # Match the following but do not capture it:
 (?!lazy)  # (first assert that it's not possible to match "lazy" here
 .         # then match any character
)*         # end of group, zero or more repetitions.

Furthermore, when searching for keywords, you might want to surround them with word boundary anchors: \bfox\b will only match the complete word fox but not the fox in foxy.

Note

If the text to be matched can also include linebreaks, you will need to set the "dot matches all" option of your regex engine. Usually, you can achieve that by prepending (?s) to the regex, but that doesn't work in all regex engines (notably JavaScript).

Alternative solution:

In many cases, you can also use a simpler, more readable solution that uses a lazy quantifier. By adding a ? to the * quantifier, it will try to match as few characters as possible from the current position:

.*?(?=(?:X)|$)

will match any number of characters, stopping right before X (which can be any regex) or the end of the string (if X doesn't match). You may also need to set the "dot matches all" option for this to work. (Note: I added a non-capturing group around X in order to reliably isolate it from the alternation)

Solution 2 - Regex

A lookahead regex syntax can help you to achieve your goal. Thus a regex for your example is

.*?quick.*?(?=z)

And it's important to notice the .*? lazy matching before the (?=z) lookahead: the expression matches a substring until a first occurrence of the z letter.

Here is C# code sample:

const string text = "The quick red fox jumped over the lazy brown dogz";

string lazy = new Regex(".*?quick.*?(?=z)").Match(text).Value;
Console.WriteLine(lazy); // The quick red fox jumped over the la

string greedy = new Regex(".*?quick.*(?=z)").Match(text).Value;
Console.WriteLine(greedy); // The quick red fox jumped over the lazy brown dog

Solution 3 - Regex

Try this

(.*?quick.*?)z

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionNoodleOfDeathView Question on Stackoverflow
Solution 1 - RegexTim PietzckerView Answer on Stackoverflow
Solution 2 - RegexIgor KustovView Answer on Stackoverflow
Solution 3 - RegexMaxView Answer on Stackoverflow