How do you unit test regular expressions?

RegexUnit TestingTdd

Regex Problem Overview


I'm new to TDD, and I find RegExp quite a particular case. Is there any special way to unit test them, or may I just treat them as regular functions?

Regex Solutions


Solution 1 - Regex

You should always test your regexen, much like any other chunk of code. They're at the most simple a function that takes a string and returns a bool, or returns an array of values.

Here are some suggestions on what to think about when it comes to designing unit tests for regexen. These are not not hard and fast prescriptions for unit test design, but some guidelines to shape your thinking. As always, weigh the needs of your testing versus cost of failure balanced with the time required to implement them all. (I find that 'implementing' the test is the easy part! :-] )

Points to consider:

  • Think of every group (the parentheses) as a curly brace.
  • Think of every | as a condition. Make sure to test for each branch.
  • Think of every modifier (*, +, ? ) as a different path.
  • (side note to the above: remember the difference between *, +, ? and *?, +?, and ??.)
  • for \d, \s, \w, and their negations, give several in each range a try.
  • For * and +, you need to test for the 'no value', 'one of', and 'one or more' for each.
  • For important 'control' characters (eg, strings in the regex you look for) test to see what happens if they show up in the wrong places. This may surprise you.
  • If you have real world data, use as much of it as you can.
  • If you don't, make sure to test both the simple and complex forms that should be valid.
  • Make sure to test what regex control characters do when inserted.
  • Make sure to verify that the empty string is properly accepted/rejected.
  • Make sure to verify that a string of each of the different kind of space characters are properly accepted or rejected.
  • Make sure that proper handling of case insensitivity is done (the i flag). This has bit me more times than almost anything else in text parsing (other than spaces).
  • If you have the x, m or s options, make sure you understand what they do and test for it (the behavior here can be different)

For a regex that returns lists, also remember:

  • Verify that the data you expect is returned, in the right order, in the right fields.
  • Verify that slight modifications do not return good data.
  • Verify that mixed anonymous groups and named groups parse correctly (eg, (?<name> thing1 ( thing2) )) - this behavior can be different based on the regex engine you're using.
  • Once again, give lots of real world trials.

If you use any advanced features, such as non-backtracking groups, make sure you understand completely how the feature works, and using the guidelines above, build example strings that should work for and against each of them.

Depending on your regex library implementation, the way groups are captured may be different as well. Perl 5 has a 'open paren order' ordering, C# has that partially except for named groups and so on. Make sure to experiment with your flavor to know exactly what it does.

Then, integrate them right in with your other unit tests, either in their own module or alongside the module that contains the regex. For particularly nasty regexen, you may find you need lots and lots of tests to verify that the pattern and all the features you use are correct. If the regex makes up a large (or nearly all) of the work that the method is doing, I will use the advice above to fashion inputs to test that function and not the regex directly. That way, if later you decide that the regex is not the way to go, or you want to break it up, you can capture the behavior the regex provided without changing the interface - ie, the method that invokes the regex.

As long as you really know how a regex feature is supposed to work in your flavor of regex, you should be able to develop decent test cases for it. Just make sure you really, really, really do understand how the feature works!

Solution 2 - Regex

Just throw a bunch of values at it, checking that you get the right result (whether that's match/no-match or a particular replacement value etc).

Importantly, if there are any corner cases which you wonder whether they'll work or not, capture them in a unit test and explain in a comment why they work. That way someone else who wants to change the regex will be able to check that the corner case still works, and it'll give a hint to them as to how to fix it if it breaks.

Solution 3 - Regex

Presumably your regular expressions are contained within a method of a class. For example:

public bool ValidateEmailAddress( string emailAddr )
{
    // Validate the email address using regular expression.
    return RegExProvider.Match( this.ValidEmailRegEx, emailAddr );
}

You can now write tests for this method. I guess the point is is that the regex is an implementation detail - your test needs to test the interface, which in this case is just the validate email method.

Solution 4 - Regex

I would create a set of input values with expected output values, much like every other test case.

Also, I can thoroughly recommmend the free Regex Tool Expresso. It's a fantastic regex editor/debugger that has saved me days of pain in the past.

Solution 5 - Regex

Consider writing the tests first, and only writing as much of the regexp as is needed to pass each test. If you need to expand your regexp, do it by adding failing tests.

Solution 6 - Regex

I always test them just as I do any other function. Make sure they match things you think they should match and that they don't match things they shouldn't.

Solution 7 - Regex

I like to test the regexp against an opposite regex, I'll execute both against the possible test and make sure that the intersection is empty.

Solution 8 - Regex

I think a simple input ouput test is sufficient. As time goes by and some cases occur in which your regex fails, don't forget to add these cases to the test as well while fixing.

Solution 9 - Regex

Use a fixture in your unit test library of choice and follow the usual TDD approach:

  • Check: Tests are green
  • Break the tests by adding a test for the next "feature"
  • Make it green by adjusting the regex (without breaking existing tests)
  • Refactor regex for better readability (e.g. named groups, character classes instead of character ranges, ...)

Here is a sample fixture stub for spock as a test runner:

@Grab('org.spockframework:spock-core:1.3-groovy-2.5')
@GrabExclude('org.codehaus.groovy:groovy-nio')
@GrabExclude('org.codehaus.groovy:groovy-macro')
@GrabExclude('org.codehaus.groovy:groovy-sql')
@GrabExclude('org.codehaus.groovy:groovy-xml')

import spock.lang.Unroll

class RegexSpec extends spock.lang.Specification {
  String REGEX = /[-+]?\d+(\.\d+)?([eE][-+]?\d+)?/

  @Unroll
  def 'matching example #example for case "#description" should yield #isMatchExpected'(String description, String example, Boolean isMatchExpected) {
    expect:
    isMatchExpected == (example ==~ REGEX)

    where:
    description                                  | example        || isMatchExpected
    "empty string"                               | ""             || false
    "single non-digit"                           | "a"            || false
    "single digit"                               | "1"            || true
    "integer"                                    | "123"          || true
    "integer, negative sign"                     | "-123"         || true
    "integer, positive sign"                     | "+123"         || true
    "float"                                      | "123.12"       || true
    "float with exponent extension but no value" | "123.12e"      || false
    "float with exponent"                        | "123.12e12"    || true
    "float with uppercase exponent"              | "123.12E12"    || true
    "float with non-integer exponent"            | "123.12e12.12" || false
    "float with exponent, positive sign"         | "123.12e+12"   || true
    "float with exponent, negative sign"         | "123.12e-12"   || true
  }
}

It can be run as a stand-alone groovy script like

groovy regex-test.groovy

Disclaimer: The snippet is taken from a series of blog posts I wrote some weeks ago

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJader DiasView Question on Stackoverflow
Solution 1 - RegexRobert PView Answer on Stackoverflow
Solution 2 - RegexJon SkeetView Answer on Stackoverflow
Solution 3 - Regexng5000View Answer on Stackoverflow
Solution 4 - RegexAndrew RollingsView Answer on Stackoverflow
Solution 5 - RegexAndrew GrimmView Answer on Stackoverflow
Solution 6 - RegexBill the LizardView Answer on Stackoverflow
Solution 7 - RegexmandelView Answer on Stackoverflow
Solution 8 - RegexDaStephView Answer on Stackoverflow
Solution 9 - RegexJFKView Answer on Stackoverflow