Commons Lang StringUtils.replace performance vs String.replace

Java

Java Problem Overview


When I compared performance of Apache's StringUtils.replace() vs String.replace() I was surprised to know that the former is about 4 times faster. I used Google's Caliper framework to measure performance. Here's my test

public class Performance extends SimpleBenchmark {
	String s = "111222111222";

	public int timeM1(int n) {
		int res = 0;
		for (int x = 0; x < n; x++) {
			res += s.replace("111", "333").length();
		}
		return res;
	}

	public int timeM2(int n) {
		int res = 0;
		for (int x = 0; x < n; x++) {
			res += StringUtils.replace(s, "111", "333", -1).length();
		}
		return res;
	}

	public static void main(String... args) {
		Runner.main(Performance.class, args);
	}
}

output

 0% Scenario{vm=java, trial=0, benchmark=M1} 9820,93 ns; ?=1053,91 ns @ 10 trials
50% Scenario{vm=java, trial=0, benchmark=M2} 2594,67 ns; ?=58,12 ns @ 10 trials

benchmark   us linear runtime
       M1 9,82 ==============================
       M2 2,59 =======

Why is that? Both methods seem to do the same work, StringUtils.replace() is even more flexible.

Java Solutions


Solution 1 - Java

In modern Java, this is not the case anymore. String.replace was improved in Java-9 moving from regular expression to StringBuilder, and was improved even more in Java-13 moving to direct allocation of the target byte[] array calculating its exact size in advance. Thanks to internal JDK features used, like the ability to allocate an uninitialized array, ability to access String coder and ability to use private String constructor which avoids copying, it's unlikely that current implementation can be beaten by a third-party implementation.

Here are my benchmarking results for your test using JDK 8, JDK 9 and JDK 13 (caliper:0.5-rc1; commons-lang3:3.9)

Java 8 (4x slower indeed):

 0% Scenario{vm=java, trial=0, benchmark=M1} 291.42 ns; σ=6.56 ns @ 10 trials
50% Scenario{vm=java, trial=0, benchmark=M2} 70.34 ns; σ=0.15 ns @ 3 trials

benchmark    ns linear runtime
       M1 291.4 ==============================
       M2  70.3 =======

Java 9 (almost equal performance):

 0% Scenario{vm=java, trial=0, benchmark=M2} 99,15 ns; σ=8,34 ns @ 10 trials
50% Scenario{vm=java, trial=0, benchmark=M1} 103,43 ns; σ=9,01 ns @ 10 trials

benchmark    ns linear runtime
       M2  99,1 ============================
       M1 103,4 ==============================

Java 13 (standard method is 38% faster):

 0% Scenario{vm=java, trial=0, benchmark=M2} 91,64 ns; σ=5,12 ns @ 10 trials
50% Scenario{vm=java, trial=0, benchmark=M1} 57,38 ns; σ=2,51 ns @ 10 trials

benchmark   ns linear runtime
       M2 91,6 ==============================
       M1 57,4 ==================

Solution 2 - Java

From the source code of java.lang.String1:

public String replace(CharSequence target, CharSequence replacement) {
   return Pattern
            .compile(target.toString(), Pattern.LITERAL)
            .matcher(this )
            .replaceAll(
                    Matcher.quoteReplacement(replacement.toString()));
}

String.replace(CharSequence target, CharSequence replacement) is implemented with java.util.regex.Pattern, therefore, it is not surprising that it is slower that StringUtils.replace(String text, String searchString, String replacement)2, which is implemented with indexOf and StringBuffer.

public static String replace(String text, String searchString, String replacement) {
    return replace(text, searchString, replacement, -1);
}

public static String replace(String text, String searchString, String replacement, int max) {
    if (isEmpty(text) || isEmpty(searchString) || replacement == null || max == 0) {
        return text;
    }
    int start = 0;
    int end = text.indexOf(searchString, start);
    if (end == -1) {
        return text;
    }
    int replLength = searchString.length();
    int increase = replacement.length() - replLength;
    increase = (increase < 0 ? 0 : increase);
    increase *= (max < 0 ? 16 : (max > 64 ? 64 : max));
    StringBuffer buf = new StringBuffer(text.length() + increase);
    while (end != -1) {
        buf.append(text.substring(start, end)).append(replacement);
        start = end + replLength;
        if (--max == 0) {
            break;
        }
        end = text.indexOf(searchString, start);
    }
    buf.append(text.substring(start));
    return buf.toString();
}
Footnote

1 The version that I links to and copied source code from is JDK 7

2 The version that I links to and copied source code from is common-lang-2.5

Solution 3 - Java

Try this one, you'll notice that it's extremely performant than Apache's one:

public static String replace (String source, String os, String ns) {
    if (source == null) {
        return null;
    }
    int i = 0;
    if ((i = source.indexOf(os, i)) >= 0) {
        char[] sourceArray = source.toCharArray();
        char[] nsArray = ns.toCharArray();
        int oLength = os.length();
        StringBuilder buf = new StringBuilder (sourceArray.length);
        buf.append (sourceArray, 0, i).append(nsArray);
        i += oLength;
        int j = i;
        // Replace all remaining instances of oldString with newString.
        while ((i = source.indexOf(os, i)) > 0) {
            buf.append (sourceArray, j, i - j).append(nsArray);
            i += oLength;
            j = i;
        }
        buf.append (sourceArray, j, sourceArray.length - j);
        source = buf.toString();
        buf.setLength (0);
    }
    return source;
}

Solution 4 - Java

on my test with JMH:https://github.com/qxo/Benchmark4StringReplace The beset is loukili's way:

java -jar target/benchmarks.jar  StringReplaceBenchmark -wi 3 -i 6 -f 1 -tu ms
Benchmark                                      Mode  Cnt     Score     Error   Units
StringReplaceBenchmark.test4String            thrpt    6  1255.017 ± 230.012  ops/ms
StringReplaceBenchmark.test4StringUtils       thrpt    6  4068.229 ±  67.708  ops/ms
StringReplaceBenchmark.test4fast              thrpt    6  4821.035 ±  97.790  ops/ms
StringReplaceBenchmark.test4lang3StringUtils  thrpt    6  3186.007 ± 102.786  ops/ms

Solution 5 - Java

> Why is that? Both methods seem to do the same work.

You would need to look at the source-code and do some serious investigation with a profiler to get a good (technical) answer to that.

However, one possible explanation is that StringUtils.replace and String.replace have been tuned for different use-cases. You are only looking at one case ... with a pretty small string, and a replacement string that is the same size as the substring being replaced.

Another possible explanation is that the Apache developers simply spent more time on tuning. (And lets not blame the Java developers for that. They have been working under severe staffing constraints for a long time. In the big scheme of things, there are many tasks more important than performance tuning String.replace.)


In fact, looking at the source code, it looks like the Java 7 version just uses the regular expression-based replace under the hood. By contrast, the Apache version is going to considerable lengths to avoid copying. Based on that evidence, I'd expect the performance difference between the two versions to be relatively smaller for large target strings. And I suspect the Java 7 version might even be better in some cases.

(Either non-technical explanation is plausible too, based on the evidence in the code.)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionEvgeniy DorofeevView Question on Stackoverflow
Solution 1 - JavaTagir ValeevView Answer on Stackoverflow
Solution 2 - JavanhahtdhView Answer on Stackoverflow
Solution 3 - JavaloukiliView Answer on Stackoverflow
Solution 4 - JavaqxoView Answer on Stackoverflow
Solution 5 - JavaStephen CView Answer on Stackoverflow