How do you debug a regex?

RegexDebuggingPcre

Regex Problem Overview


Regular expressions can become quite complex. The lack of white space makes them difficult to read. I can't step though a regular expression with a debugger. So how do experts debug complex regular expressions?

Regex Solutions


Solution 1 - Regex

You buy RegexBuddy and use its built in debug feature. If you work with regexes more than twice a year, you will make this money back in time saved in no time. RegexBuddy will also help you to create simple and complex regular expressions, and even generate the code for you in a variety of languages.

alt text

Also, according to the developer, this tool runs nearly flawlessly on Linux when used with WINE.

Solution 2 - Regex

With Perl 5.10, use re 'debug';. (Or debugcolor, but I can't format the output properly on Stack Overflow.)

$ perl -Mre=debug -e'"foobar"=~/(.)\1/'
Compiling REx "(.)\1"
Final program:
1: OPEN1 (3)
3:   REG_ANY (4)
4: CLOSE1 (6)
6: REF1 (8)
8: END (0)
minlen 1
Matching REx "(.)\1" against "foobar"
0 <> <foobar>             |  1:OPEN1(3)
0 <> <foobar>             |  3:REG_ANY(4)
1 <f> <oobar>             |  4:CLOSE1(6)
1 <f> <oobar>             |  6:REF1(8)
failed...
1 <f> <oobar>             |  1:OPEN1(3)
1 <f> <oobar>             |  3:REG_ANY(4)
2 <fo> <obar>             |  4:CLOSE1(6)
2 <fo> <obar>             |  6:REF1(8)
3 <foo> <bar>             |  8:END(0)
Match successful!
Freeing REx: "(.)\1"

Also, you can add whitespace and comments to regexes to make them more readable. In Perl, this is done with the /x modifier. With pcre, there is the PCRE_EXTENDED flag.

"foobar" =~ /
    (.)  # any character, followed by a
    \1   # repeat of previously matched character
/x;

pcre *pat = pcre_compile("(.)  # any character, followed by a\n"
                         "\\1  # repeat of previously matched character\n",
                         PCRE_EXTENDED,
                         ...);
pcre_exec(pat, NULL, "foobar", ...);

Solution 3 - Regex

I'll add another so that I don't forget it : debuggex

It's good because it's very visual: Photo of the Debuggex regex helper

Solution 4 - Regex

When I get stuck on a regex I usually turn to this: https://regexr.com/

Its perfect for quickly testing where something is going wrong.

Solution 5 - Regex

I use Kodos - The Python Regular Expression Debugger:

> Kodos is a Python GUI utility for creating, testing and debugging regular expressions for the Python programming language. Kodos should aid any developer to efficiently and effortlessly develop regular expressions in Python. Since Python's implementation of regular expressions is based on the PCRE standard, Kodos should benefit developers in other programming languages that also adhere to the PCRE standard (Perl, PHP, etc...). > > (...) > > alt text

Runs on Linux, Unix, Windows, Mac.

Solution 6 - Regex

I think they don't. If your regexp is too complicated, and problematic to the point you need a debugger, you should create a specific parser, or use another method. It will be much more readable and maintainable.

Solution 7 - Regex

There is an excellent free tool, the Regex Coach. The latest version is only available for Windows; its author Dr. Edmund Weitz stopped maintaining the Linux version because too few people downloaded it, but there is an older version for Linux on the download page.

Solution 8 - Regex

I've just seen a presentation of Regexp::Debugger by its creator: Damian Conway. Very impressive stuff: run inplace or using a command line tool (rxrx), interactively or on a "logged" execution file (stored in JSON), step forward and backward at any point, stop on breakpoints or events, colored output (user configurable), heat maps on regexp and string for optimization, etc...

Available on CPAN for free: http://search.cpan.org/~dconway/Regexp-Debugger/lib/Regexp/Debugger.pm

Solution 9 - Regex

I use this online tool to debug my regex:

https://www.regextester.com/

But yeah, it can't beat RegexBuddy.

Solution 10 - Regex

I debug my regexes with my own eyes. That's why I use /x modifier, write comments for them and split them in parts. Read Jeffrey Friedl's Mastering Regular Expressions to learn how to develop fast and readable regular expressions. Various regex debugging tools just provoke voodoo programming.

Solution 11 - Regex

As for me I usually use pcretest utility which can dump the byte code of any regex, and usually it is much more easier to read (for me at least). Example:

PCRE version 8.30-PT1 2012-01-01

  re> /ab|c[de]/iB
------------------------------------------------------------------
  0   7 Bra
  3  /i ab
  7  38 Alt
 10  /i c
 12     [DEde]
 45  45 Ket
 48     End
------------------------------------------------------------------

Solution 12 - Regex

I use:

http://regexlib.com/RETester.aspx

You can also try Regex Hero (uses Silverlight):

http://regexhero.net/tester/

Solution 13 - Regex

If I'm feeling stuck, I like to go backward and generate the regex directly from a sample text using txt2re (although I usually end up tweaking the resulting regex by hand).

Solution 14 - Regex

If you're a Mac user, I just came across this one:

http://atastypixel.com/blog/reginald-regex-explorer/

It's free, and simple to use, and it's been a great help for me to get to grips with RegExs in general.

Solution 15 - Regex

Solution 16 - Regex

Writing reg exes using a notation like PCREs is like writing assembler: it's fine if you can just see the corresponding finite state automata in your head, but it can get difficult to maintain very quickly.

The reasons for not using a debugger are much the same as for not using a debugger with a programming language: you can fix local mistakes, but they won't help you solve the design problems that led you to make the local mistakes in the first place.

The more reflective way is to use data representations to generate regexps in your programming language, and have appropriate abstractions to build them. Olin Shiver's introduction to his scheme regexp notation gives an excellent overview of the issues faced in designing these data representations.

Solution 17 - Regex

I often use pcretest - hardly a "debugger" but it works over a text-only SSH connection and parses exactly the regex dialect I need: my (C++) code links to libpcre, so there's no difficulty with subtle differences in what's magic and what isn't, etc.

In general I agree with the guy above to whom needing a regex debugger is a code smell. For me the hardest about using regexes is usually not the regex itself, but the multiple layers of quoting needed to make them work.

Solution 18 - Regex

I often use Ruby based regexp tester Rubular

and also in Emacs use M-x re-builder

Firefox also has a useful extension

Solution 19 - Regex

I use the Rx Toolkit included with ActiveState Komodo.

Solution 20 - Regex

Solution 21 - Regex

For me, after having eyeballed the regex (as I'm fairly fluent, and nearly always use /x or equivalent), I might debug rather than test if I am unsure if I would hit some degenerate matching (i.e. something that excessively backtracks) to see if I could solve such issues by modifying the greedyness of an operator for example.

To do that, I'd use one of the methods mentioned above: pcretest, RegexBuddy (if my current workplace has licensed it) or similar, and sometimes I time it in Linqpad if I'm working in C# regexes.

(The perl trick is a new one for me, so will probably add that to my regex toolkit too.)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionrookView Question on Stackoverflow
Solution 1 - RegexMickView Answer on Stackoverflow
Solution 2 - RegexephemientView Answer on Stackoverflow
Solution 3 - RegexkevinView Answer on Stackoverflow
Solution 4 - RegexthetaikoView Answer on Stackoverflow
Solution 5 - RegexPascal ThiventView Answer on Stackoverflow
Solution 6 - RegexValentin RocherView Answer on Stackoverflow
Solution 7 - RegexAPCView Answer on Stackoverflow
Solution 8 - RegexYvesView Answer on Stackoverflow
Solution 9 - RegexgfeView Answer on Stackoverflow
Solution 10 - RegexcodeholicView Answer on Stackoverflow
Solution 11 - Regexdark100View Answer on Stackoverflow
Solution 12 - RegexLeniel MaccaferriView Answer on Stackoverflow
Solution 13 - RegexeggsyntaxView Answer on Stackoverflow
Solution 14 - RegexjaypView Answer on Stackoverflow
Solution 15 - RegexSkilldrickView Answer on Stackoverflow
Solution 16 - RegexCharles StewartView Answer on Stackoverflow
Solution 17 - RegexBernd JendrissekView Answer on Stackoverflow
Solution 18 - RegexocodoView Answer on Stackoverflow
Solution 19 - RegexCzechnologyView Answer on Stackoverflow
Solution 20 - RegexEugeniu ToricaView Answer on Stackoverflow
Solution 21 - RegexChrisFView Answer on Stackoverflow