What's the Time Complexity of Average Regex algorithms?

Regex Problem Overview

I'm not new to using regular expressions, and I understand the basic theory they're based on--finite state machines.

I'm not so good at algorithmic analysis though and don't understand how a regex compares to say, a basic linear search. I'm asking because on the surface it seems like a linear array search. (If the regex is simple.)

Where could I go to learn more about implementing a regex engine?

Regex Solutions

Solution 1 - Regex

This is one of the most popular outlines: Regular Expression Matching Can Be Simple And Fast . Running a DFA-compiled regular expression against a string is indeed O(n), but can require up to O(2^m) construction time/space (where m = regular expression size).

Solution 2 - Regex

Are you familiar with the term Deterministic/Non-Deterministic Finite Automata?

Real regular expressions (when I say real I'm refering to those regex that recognize Regular Languages, and not the regex that almost every programming language include with backreferences, etc) can be converted into a DFA/NFA and both can be implemented in a mechanical way in a programming language (a NFA can be converted into a DFA)

What you have to do is:

Find a way to convert a regex into an automaton
Implement the recognition of the automaton in the programming language of your preference

That way, given a regex you can convert it to a DFA and run it to see if it matches or not a specified text.

This can be implemented in O(n), because DFA don't go backward (like a Turing Machine), so it matches the string or not. That is supposing you won't take in count overlapped matches, otherwise you will have to go back and start matching again...

Solution 3 - Regex

The classic regular expression can be implemented in a way which is fast in practice but has really bad worst case behaviour (the standard DFA) or in a way which has guaranteed reasonable worst case behaviour (keeping it as an NFA). The standard DFA can be extended to support lots of extra matching characters and flags, which make use of the fact that it is basically back-tracking search.

Examples of the standard approach are everywhere (e.g. built into Perl). There is an example that claims good worst case behaviour at http://code.google.com/p/re2/ - in fact it is even better than I expected in the worst case, so they may have found an extra trick or two.

If you are at all interested in this, or care about writing programs that can be made to lock up solid given pathological inputs, read http://swtch.com/~rsc/regexp/regexp1.html.

Content Type	Original Author	Original Content on Stackoverflow
Question	avgvstvs	View Question on Stackoverflow
Solution 1 - Regex	porges	View Answer on Stackoverflow
Solution 2 - Regex	Oscar Mederos	View Answer on Stackoverflow
Solution 3 - Regex	mcdowella	View Answer on Stackoverflow

What's the Time Complexity of Average Regex algorithms?

Regex Problem Overview

Regex Solutions

Solution 1 - Regex

Solution 2 - Regex

Solution 3 - Regex

Mac OS X doesn't allow to name files starting with a dot. How do I name the .htaccess file?

Pyaudio installation error - 'command 'gcc' failed with exit status 1'

Attributions