When not to use RegexOptions.Compiled

C#Regex

C# Problem Overview


I understand the advantage of using RegexOptions.Compiled - it improves upon the execution time of app by having the regular expression in compiled form instead of interpreting it at run-time. Although using this is not recommended for application which are already slow at start-up time.

But if my application can bear any slight increase in start-up time -
what are the other scenarios in which I should NOT use RegexOptions.Compiled?

Just as a note I am calling this method several times -

private static string GetName(string objString)
{
    return Regex.Replace(objString, "[^a-zA-Z&-]+", "");
}

So, this method is called with different values for 'objString' (although values for objString may repeat as well).

Do you think it's good/not good to use RegexOptions.Compiled here? Any web link would be really helpful.
Thank you!


EDIT

I tested my web app with both

  • RegexOptions.Compiled, and
  • Instantiate Regex as class variable

But couldn't find any big difference in time taken by my web application - Only thing I noticed in both scenarios is that first time when the application loads it's taking double of the time taken compared to that in successive page loads and that is irrespective of whether I use RegexOptions.Compiled or not.

Any comments for --
why my web application takes longer for the Regex to process for first time and time taken is reduced to almost half or less in subsequent loads - Is there any inbuilt caching or some other .net feature is helping here. P.S. This thing is same if I use RegexOptions.Compiled or not.

C# Solutions


Solution 1 - C#

For any specific performance question like this, the best way to find out which way is faster is to test both and see.

In general, compiling a regex is unlikely to have much benefit unless you're using the regex a lot, or on very large strings. (Or both.) I think it's more of an optimization to try after you've determined that you have a performance problem and you think this might help, than one to try randomly.

For some general discussion on the drawbacks of RegexOptions.Compiled, see this blog post by Jeff Atwood; it's very old (from the days of .NET Framework 1.1), but from what I understand, none of the major relevant facts have changed since it was written.


Solution 2 - C#

Two things to think about are that RegexOptions.Compiled takes up CPU time and memory.

With that in mind, there's basically just one instance when you should not use RegexOptions.Compiled :

  • Your regular expression only runs a handful of times and the net speedup at runtime doesn't justify the cost of compilation.

There are too many variables to predict and draw a line in the sand, so to speak. It'd really require testing to determine the optimal approach. Or, if you don't feel like testing, then don't use Compiled until you do.

Now, if you do choose RegexOptions.Compiled it's important that you're not wasteful with it.

Often the best way to go about it is to define your object as a static variable that can be reused over and over. For example...

public static Regex NameRegex = new Regex(@"[^a-zA-Z&-]+", RegexOptions.Compiled);

The one problem with this approach is that if you're declaring this globally, then it may be a waste if your application doesn't always use it, or doesn't use it upon startup. So a slightly different approach would be to use lazy loading as I describe in the article I wrote yesterday.

So in this case it'd be something like this...

public static Lazy<Regex> NameRegex = 
    new Lazy<Regex>(() => new Regex("[^a-zA-Z&-]+", RegexOptions.Compiled));

Then you simply reference NameRegex.Value whenever you want to use this regular expression and it's only instantiated when it's first accessed.


RegexOptions.Compiled in the Real World

On a couple of my sites, I'm using Regex routes for ASP.NET MVC. And this scenario is a perfect use for RegexOptions.Compiled. The routes are defined when the web application starts up, and are then reused for all subsequent requests as long as the application keeps running. So these regular expressions are instantiated and compiled once and reused millions of times.

Solution 3 - C#

From a BCL blog post, compiling increases the startup time by an order of magnitude, but decreases subsequent runtimes by about 30%. Using these numbers, compilation should be considered for a pattern that you expect to be evaluated more than about 30 times. (Of course, like any performance optimization, both alternatives should be measured for acceptability.)

If performance is critical for a simple expression called repeatedly, you may want to avoid using regular expressions altogether. I tried running some variants about 5 million times each:

Note: edited from previous version to correct regular expression.

    static string GetName1(string objString)
    {
        return Regex.Replace(objString, "[^a-zA-Z&-]+", "");
    }

    static string GetName2(string objString)
    {
        return Regex.Replace(objString, "[^a-zA-Z&-]+", "", RegexOptions.Compiled);
    }

    static string GetName3(string objString)
    {
        var sb = new StringBuilder(objString.Length);
        foreach (char c in objString)
            if ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == '-' || c == '&')
                sb.Append(c);
        return sb.ToString();
    }


    static string GetName4(string objString)
    {
        char[] c = objString.ToCharArray();
        int pos = 0;
        int writ = 0;
        while (pos < c.Length)
        {
            char curr = c[pos];
            if ((curr >= 'A' && curr <= 'Z') || (curr >= 'a' && curr <= 'z') || curr == '-' || curr == '&')
            {
                c[writ++] = c[pos];
            }
            pos++;
        }
        return new string(c, 0, writ);
    }


    unsafe static string GetName5(string objString)
    {
        char* buf = stackalloc char[objString.Length];
        int writ = 0;
        fixed (char* sp = objString)
        {
            char* pos = sp;
            while (*pos != '\0')
            {
                char curr = *pos;
                if ((curr >= 'A' && curr <= 'Z') ||
                    (curr >= 'a' && curr <= 'z') ||
                     curr == '-' || curr == '&')
                    buf[writ++] = curr;
                pos++;
            }
        }
        return new string(buf, 0, writ);
    }

Executing independently for 5 million random ASCII strings, 30 characters each, consistently gave these numbers:

   Method 1: 32.3  seconds (interpreted regex)
   Method 2: 24.4  seconds (compiled regex)
   Method 3:  1.82 seconds (StringBuilder concatenation)
   Method 4:  1.64 seconds (char[] manipulation)
   Method 5:  1.54 seconds (unsafe char* manipulation)

That is, compilation provided about a 25% performance benefit for a very large number of evaluations of this pattern, with the first execution being about 3 times slower. Methods that operated on the underlying character arrays were 12 times faster than the compiled regular expressions.

While method 4 or method 5 may provide some performance benefit over regular expressions, the other methods may provide other benefits (maintainability, readability, flexibility, etc.). This simple test does suggest that, in this case, compiling the regex has a modest performance benefit over interpreting it for a large number of evaluations.

Solution 4 - C#

Compilation generally only improves performance if you are saving the Regex object that you create. Since you are not, in your example, saving the Regex, you should not compile it.

You might want to restructure the code this way (note I re-wrote the regex to what I think you want. Having the start-of-line carat in a repeating group doesn't make a whole lot of sense, and I assume a name prefix ends with a dash):

    private static readonly Regex CompiledRegex = new Regex("^[a-zA-Z]+-", RegexOptions.Compiled);
    private static string GetNameCompiled(string objString)
    {
        return CompiledRegex.Replace(objString, "");
    }

I wrote some test code for this also:

    public static void TestSpeed()
    {
        var testData = "fooooo-bar";
        var timer = new Stopwatch();

        timer.Start();
        for (var i = 0; i < 10000; i++)
            Assert.AreEqual("bar", GetNameCompiled(testData));
        timer.Stop();
        Console.WriteLine("Compiled took " + timer.ElapsedMilliseconds + "ms");
        timer.Reset();

        timer.Start();
        for (var i = 0; i < 10000; i++)
            Assert.AreEqual("bar", GetName(testData));
        timer.Stop();
        Console.WriteLine("Uncompiled took " + timer.ElapsedMilliseconds + "ms");
        timer.Reset();

    }

    private static readonly Regex CompiledRegex = new Regex("^[a-zA-Z]+-", RegexOptions.Compiled);
    private static string GetNameCompiled(string objString)
    {
        return CompiledRegex.Replace(objString, "");
    }

    private static string GetName(string objString)
    {
        return Regex.Replace(objString, "^[a-zA-Z]+-", "");
    }

On my machine, I get:

> Compiled took 21ms > > Uncompiled took 37ms

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestioninutanView Question on Stackoverflow
Solution 1 - C#ruakhView Answer on Stackoverflow
Solution 2 - C#Steve WorthamView Answer on Stackoverflow
Solution 3 - C#drfView Answer on Stackoverflow
Solution 4 - C#Chris ShainView Answer on Stackoverflow