Where can I find unit tests for regular expressions in multiple languages?
JavaPhpJavascript.NetRegexJava Problem Overview
I'm building a regex helper at http://www.debuggex.com. The amount of detail I want to show requires me to write my own parser and matcher.
To make sure my parser and matcher work correctly, I've written my own unit tests for the Javascript flavor of regexes, but these only cover edge cases I know about. I would like to use a standard test suite, and was recently pointed to http://hg.ecmascript.org/tests/test262/summary, which I will be using.
My question is, where can I find such test suites for other regex flavors? I'd like to support other flavors in the future. I have not been able to find anything by googling ("test" pollutes the results with regex testers). I am looking for test suites for the languages python, php, perl, java, ruby, and .net.
Java Solutions
Solution 1 - Java
Most of those languages are open source. Any decent project should have their test cases in said repo, otherwise I would be pretty concerned.
- Python's regex tests
- PHP's regex tests
- Perl's regex tests looks really extensive
- Open JDK's unit tests (an open source flavour of Java)
- Ruby's regex tests
- Mono's regex tests (open source version of .NET)
- .NET Core's regex tests
- RE2's tests (C++ regex engine developed at Google)
- C test suite (developed by AT&T Research)
- PCRE regex tests (Perl Compatible Regular Expressions C library)
- JavaScript regex tests (Ecma Technical Committee 39 compatability suite)
I also found an extensive chart on this page which might be of some help to you.
Solution 2 - Java
To have a complete list on one page, I've found the ones omitted from the accepted answer:
Solution 3 - Java
Additional regex test suites for additional languages:
- D's standard library regex tests (look for
tests.*d
files) - Go's regex tests (look for
.*test.*go
files) - GNU grep's tests (Command line C regex engine)
- regex-posix-unittest (POSIX regex test suite written in Haskell)
- ICU's regex tests (C/C++ and Java libraries for Unicode, look for files named
re[_g].*txt
) - Rust's regex tests
- TCL's regex tests (look for the
reg.*test
files) - TRE's regex tests (C regex engine which aims for strict POSIX compliance)
- V8's regex tests (V8 is the JavaScript engine of Chrome, search for files named
.*regexp.*js
) - WebKit's regex tests (JavaScript tests are in script-tests folders)
- Yarr's regex tests (C++ regex engine of WebKit's JavaScriptCore)
Bonus
- Regfuzz (C toolkit for testing regular expression robustness using randomly generated and invalid regexes)