Should I use Java's String.format() if performance is important?

JavaStringPerformanceString FormattingMicro Optimization

Java Problem Overview


We have to build Strings all the time for log output and so on. Over the JDK versions we have learned when to use StringBuffer (many appends, thread safe) and StringBuilder (many appends, non-thread-safe).

What's the advice on using String.format()? Is it efficient, or are we forced to stick with concatenation for one-liners where performance is important?

e.g. ugly old style,

String s = "What do you get if you multiply " + varSix + " by " + varNine + "?";

vs. tidy new style (String.format, which is possibly slower),

String s = String.format("What do you get if you multiply %d by %d?", varSix, varNine);

Note: my specific use case is the hundreds of 'one-liner' log strings throughout my code. They don't involve a loop, so StringBuilder is too heavyweight. I'm interested in String.format() specifically.

Java Solutions


Solution 1 - Java

I took hhafez code and added a memory test:

private static void test() {
    Runtime runtime = Runtime.getRuntime();
    long memory;
    ...
    memory = runtime.freeMemory();
    // for loop code
    memory = memory-runtime.freeMemory();

I run this separately for each approach, the '+' operator, String.format and StringBuilder (calling toString()), so the memory used will not be affected by other approaches. I added more concatenations, making the string as "Blah" + i + "Blah"+ i +"Blah" + i + "Blah".

The result are as follow (average of 5 runs each):
Approach       Time(ms)  Memory allocated (long)
'+' operator     747           320,504
String.format  16484       373,312
StringBuilder  769           57,344

We can see that String '+' and StringBuilder are practically identical time-wise, but StringBuilder is much more efficient in memory use. This is very important when we have many log calls (or any other statements involving strings) in a time interval short enough so the Garbage Collector won't get to clean the many string instances resulting of the '+' operator.

And a note, BTW, don't forget to check the logging level before constructing the message.

Conclusions:

  1. I'll keep on using StringBuilder.
  2. I have too much time or too little life.

Solution 2 - Java

I wrote a small class to test which has the better performance of the two and + comes ahead of format. by a factor of 5 to 6. Try it your self

import java.io.*;
import java.util.Date;

public class StringTest{

    public static void main( String[] args ){
	int i = 0;
	long prev_time = System.currentTimeMillis();
	long time;

	for( i = 0; i< 100000; i++){
	    String s = "Blah" + i + "Blah";
	}
	time = System.currentTimeMillis() - prev_time;

	System.out.println("Time after for loop " + time);
	
	prev_time = System.currentTimeMillis();
	for( i = 0; i<100000; i++){
	    String s = String.format("Blah %d Blah", i);
	}
	time = System.currentTimeMillis() - prev_time;
	System.out.println("Time after for loop " + time);
	
    }
}

Running the above for different N shows that both behave linearly, but String.format is 5-30 times slower.

The reason is that in the current implementation String.format first parses the input with regular expressions and then fills in the parameters. Concatenation with plus, on the other hand, gets optimized by javac (not by the JIT) and uses StringBuilder.append directly.

Runtime comparison

Solution 3 - Java

All the benchmarks presented here have some flaws, thus results are not reliable.

I was surprised that nobody used JMH for benchmarking, so I did.

Results:

Benchmark             Mode  Cnt     Score     Error  Units
MyBenchmark.testOld  thrpt   20  9645.834 ± 238.165  ops/s  // using +
MyBenchmark.testNew  thrpt   20   429.898 ±  10.551  ops/s  // using String.format

Units are operations per second, the more the better. Benchmark source code. OpenJDK IcedTea 2.5.4 Java Virtual Machine was used.

So, old style (using +) is much faster.

Solution 4 - Java

Your old ugly style is automatically compiled by JAVAC 1.6 as :

StringBuilder sb = new StringBuilder("What do you get if you multiply ");
sb.append(varSix);
sb.append(" by ");
sb.append(varNine);
sb.append("?");
String s =  sb.toString();

So there is absolutely no difference between this and using a StringBuilder.

String.format is a lot more heavyweight since it creates a new Formatter, parses your input format string, creates a StringBuilder, append everything to it and calls toString().

Solution 5 - Java

Java's String.format works like so:

  1. it parses the format string, exploding into a list of format chunks
  2. it iterates the format chunks, rendering into a StringBuilder, which is basically an array that resizes itself as necessary, by copying into a new array. this is necessary because we don't yet know how large to allocate the final String
  3. StringBuilder.toString() copies his internal buffer into a new String

if the final destination for this data is a stream (e.g. rendering a webpage or writing to a file), you can assemble the format chunks directly into your stream:

new PrintStream(outputStream, autoFlush, encoding).format("hello {0}", "world");

I speculate that the optimizer will optimize away the format string processing. If so, you're left with equivalent amortized performance to manually unrolling your String.format into a StringBuilder.

Solution 6 - Java

To expand/correct on the first answer above, it's not translation that String.format would help with, actually.
What String.format will help with is when you're printing a date/time (or a numeric format, etc), where there are localization(l10n) differences (ie, some countries will print 04Feb2009 and others will print Feb042009).
With translation, you're just talking about moving any externalizable strings (like error messages and what-not) into a property bundle so that you can use the right bundle for the right language, using ResourceBundle and MessageFormat.

Looking at all the above, I'd say that performance-wise, String.format vs. plain concatenation comes down to what you prefer. If you prefer looking at calls to .format over concatenation, then by all means, go with that.
After all, code is read a lot more than it's written.

Solution 7 - Java

In your example, performance probalby isn't too different but there are other issues to consider: namely memory fragmentation. Even concatenate operation is creating a new string, even if its temporary (it takes time to GC it and it's more work). String.format() is just more readable and it involves less fragmentation.

Also, if you're using a particular format a lot, don't forget you can use the Formatter() class directly (all String.format() does is instantiate a one use Formatter instance).

Also, something else you should be aware of: be careful of using substring(). For example:

String getSmallString() {
  String largeString = // load from file; say 2M in size
  return largeString.substring(100, 300);
}

That large string is still in memory because that's just how Java substrings work. A better version is:

  return new String(largeString.substring(100, 300));

or

  return String.format("%s", largeString.substring(100, 300));

The second form is probably more useful if you're doing other stuff at the same time.

Solution 8 - Java

Generally you should use String.Format because it's relatively fast and it supports globalization (assuming you're actually trying to write something that is read by the user). It also makes it easier to globalize if you're trying to translate one string versus 3 or more per statement (especially for languages that have drastically different grammatical structures).

Now if you never plan on translating anything, then either rely on Java's built in conversion of + operators into StringBuilder. Or use Java's StringBuilder explicitly.

Solution 9 - Java

Another perspective from Logging point of view Only.

I see a lot of discussion related to logging on this thread so thought of adding my experience in answer. May be someone will find it useful.

I guess the motivation of logging using formatter comes from avoiding the string concatenation. Basically, you do not want to have an overhead of string concat if you are not going to log it.

You do not really need to concat/format unless you want to log. Lets say if I define a method like this

public void logDebug(String... args, Throwable t) {
    if(debugOn) {
       // call concat methods for all args
       //log the final debug message
    }
}

In this approach the cancat/formatter is not really called at all if its a debug message and debugOn = false

Though it will still be better to use StringBuilder instead of formatter here. The main motivation is to avoid any of that.

At the same time I do not like adding "if" block for each logging statement since

  • It affects readability
  • Reduces coverage on my unit tests - thats confusing when you want to make sure every line is tested.

Therefore I prefer to create a logging utility class with methods like above and use it everywhere without worrying about performance hit and any other issues related to it.

Solution 10 - Java

I just modified hhafez's test to include StringBuilder. StringBuilder is 33 times faster than String.format using jdk 1.6.0_10 client on XP. Using the -server switch lowers the factor to 20.

public class StringTest {

   public static void main( String[] args ) {
      test();
      test();
   }

   private static void test() {
      int i = 0;
      long prev_time = System.currentTimeMillis();
      long time;

      for ( i = 0; i < 1000000; i++ ) {
         String s = "Blah" + i + "Blah";
      }
      time = System.currentTimeMillis() - prev_time;

      System.out.println("Time after for loop " + time);

      prev_time = System.currentTimeMillis();
      for ( i = 0; i < 1000000; i++ ) {
         String s = String.format("Blah %d Blah", i);
      }
      time = System.currentTimeMillis() - prev_time;
      System.out.println("Time after for loop " + time);

      prev_time = System.currentTimeMillis();
      for ( i = 0; i < 1000000; i++ ) {
         new StringBuilder("Blah").append(i).append("Blah");
      }
      time = System.currentTimeMillis() - prev_time;
      System.out.println("Time after for loop " + time);
   }
}

While this might sound drastic, I consider it to be relevant only in rare cases, because the absolute numbers are pretty low: 4 s for 1 million simple String.format calls is sort of ok - as long as I use them for logging or the like.

Update: As pointed out by sjbotha in the comments, the StringBuilder test is invalid, since it is missing a final .toString().

The correct speed-up factor from String.format(.) to StringBuilder is 23 on my machine (16 with the -server switch).

Solution 11 - Java

Here is modified version of hhafez entry. It includes a string builder option.

public class BLA
{
public static final String BLAH = "Blah ";
public static final String BLAH2 = " Blah";
public static final String BLAH3 = "Blah %d Blah";


public static void main(String[] args) {
	int i = 0;
	long prev_time = System.currentTimeMillis();
	long time;
	int numLoops = 1000000;
	
	for( i = 0; i< numLoops; i++){
		String s = BLAH + i + BLAH2;
	}
	time = System.currentTimeMillis() - prev_time;

	System.out.println("Time after for loop " + time);

	prev_time = System.currentTimeMillis();
	for( i = 0; i<numLoops; i++){
		String s = String.format(BLAH3, i);
	}
	time = System.currentTimeMillis() - prev_time;
	System.out.println("Time after for loop " + time);

	prev_time = System.currentTimeMillis();
	for( i = 0; i<numLoops; i++){
		StringBuilder sb = new StringBuilder();
		sb.append(BLAH);
		sb.append(i);
		sb.append(BLAH2);
		String s = sb.toString();
	}
	time = System.currentTimeMillis() - prev_time;
	System.out.println("Time after for loop " + time);
		
}

}

Time after for loop 391 Time after for loop 4163 Time after for loop 227

Solution 12 - Java

The answer to this depends very much on how your specific Java compiler optimizes the bytecode it generates. Strings are immutable and, theoretically, each "+" operation can create a new one. But, your compiler almost certainly optimizes away interim steps in building long strings. It's entirely possible that both lines of code above generate the exact same bytecode.

The only real way to know is to test the code iteratively in your current environment. Write a QD app that concatenates strings both ways iteratively and see how they time out against each other.

Solution 13 - Java

Consider using "hello".concat( "world!" ) for small number of strings in concatenation. It could be even better for performance than other approaches.

If you have more than 3 strings, than consider using StringBuilder, or just String, depending on compiler that you use.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAirView Question on Stackoverflow
Solution 1 - JavaItamarView Answer on Stackoverflow
Solution 2 - JavahhafezView Answer on Stackoverflow
Solution 3 - JavaAdam StelmaszczykView Answer on Stackoverflow
Solution 4 - JavaRaphaëlView Answer on Stackoverflow
Solution 5 - JavaDustin GetzView Answer on Stackoverflow
Solution 6 - Javadw.mackieView Answer on Stackoverflow
Solution 7 - JavacletusView Answer on Stackoverflow
Solution 8 - JavaOrion AdrianView Answer on Stackoverflow
Solution 9 - Javasoftware.wikipediaView Answer on Stackoverflow
Solution 10 - Javathe.duckmanView Answer on Stackoverflow
Solution 11 - JavaANONView Answer on Stackoverflow
Solution 12 - JavaYes - that Jake.View Answer on Stackoverflow
Solution 13 - JavaSašaView Answer on Stackoverflow