Regex not operator

RegexString

Regex Problem Overview


Is there an NOT operator in Regexes? Like in that string : "(2001) (asdf) (dasd1123_asd 21.01.2011 zqge)(dzqge) name (20019)"

I want to delete all \([0-9a-zA-z _\.\-:]*\) but not the one where it is a year: (2001).

So what the regex should return must be: (2001) name.

NOTE: something like \((?![\d]){4}[0-9a-zA-z _\.\-:]*\) does not work for me (the (20019) somehow also matches...)

Regex Solutions


Solution 1 - Regex

Not quite, although generally you can usually use some workaround on one of the forms

  • [^abc], which is character by character not a or b or c,
  • or negative lookahead: a(?!b), which is a not followed by b
  • or negative lookbehind: (?<!a)b, which is b not preceeded by a

Solution 2 - Regex

No, there's no direct not operator. At least not the way you hope for.

You can use a zero-width negative lookahead, however:

\((?!2001)[0-9a-zA-z _\.\-:]*\)

The (?!...) part means "only match if the text following (hence: lookahead) this doesn't (hence: negative) match this. But it doesn't actually consume the characters it matches (hence: zero-width).

There are actually 4 combinations of lookarounds with 2 axes:

  • lookbehind / lookahead : specifies if the characters before or after the point are considered
  • positive / negative : specifies if the characters must match or must not match.

Solution 3 - Regex

You could capture the (2001) part and replace the rest with nothing.

public static string extractYearString(string input) {
    return input.replaceAll(".*\(([0-9]{4})\).*", "$1");
}

var subject = "(2001) (asdf) (dasd1123_asd 21.01.2011 zqge)(dzqge) name (20019)";
var result = extractYearString(subject);
System.out.println(result); // <-- "2001"

.*\(([0-9]{4})\).* means

  • .* match anything
  • \( match a ( character
  • ( begin capture
  • [0-9]{4} any single digit four times
  • ) end capture
  • \) match a ) character
  • .* anything (rest of string)

Solution 4 - Regex

Here is an alternative:

(\(\d{4}\))((?:\s*\([0-9a-zA-z _\.\-:]*\))*)([^()]*)(( ?\([0-9a-zA-z _\.\-:]*\))*)

Repetitive patterns are embedded in a single group with this construction, where the inner group is not a capturing one: ((:?pattern)*), which enable to have control on the group numbers of interrest.

Then you get what you want with: \1\3

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSonnenhutView Question on Stackoverflow
Solution 1 - RegexJohan SjöbergView Answer on Stackoverflow
Solution 2 - RegexJoachim SauerView Answer on Stackoverflow
Solution 3 - RegexbirgerspView Answer on Stackoverflow
Solution 4 - RegexlalebardeView Answer on Stackoverflow