Support for Compressed Strings being Dropped in HotSpot JVM?

JavaPerformanceJvmJava 7

Java Problem Overview


On this Oracle page Java HotSpot VM Options, it lists -XX:+UseCompressedStrings as being available and on by default. However in Java 6 update 29, it is off by default and in Java 7 update 2 it reports a warning

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseCompressedStrings; support was removed in 7.0

Does anyone know the thinking behind removing this option?


https://stackoverflow.com/questions/8832822/sorting-lines-of-an-enormous-file-txt-in-java/8833257#8833257

With -mx2g, this example took 4.541 seconds with the option on and 5.206 second with it off in Java 6 update 29. It is hard to see that it impacts performance.

Note: Java 7 update 2 requires 2.0 G whereas Java 6 update 29 without compressed strings requires 1.8 GB and with compressed string requires only 1.0 GB.

Java Solutions


Solution 1 - Java

Originally, this option was added to improve SPECjBB performance. The gains are due to reduced memory bandwidth requirements between the processor and DRAM. Loading and storing bytes in the byte[] consumes 1/2 the bandwidth versus chars in the char[].

However, this comes at a price. The code has to determine if the internal array is a byte[] or char[]. This takes CPU time and if the workload is not memory bandwidth constrained, it can cause a performance regression. There is also a code maintenance price due to the added complexity.

Because there weren't enough production-like workloads that showed significant gains (except perhaps SPECjBB), the option was removed.

There is another angle to this. The option reduces heap usage. For applicable Strings, it reduces the memory usage of those Strings by 1/2. This angle wasn't considered at the time of option removal. For workloads that are memory capacity constrained (i.e. have to run with limited heap space and GC takes a lot of time), this option can prove useful.

If enough memory capacity constrained production-like workloads can be found to justify the option's inclusion, then maybe the option will be brought back.

Edit 3/20/2013: An average server heap dump uses 25% of the space on Strings. Most Strings are compressible. If the option is reintroduced, it could save half of this space (e.g. ~12%)!

Edit 3/10/2016: A feature similar to compressed strings is coming back in JDK 9 JEP 254.

Solution 2 - Java

Just to add, for those interested...

The java.lang.CharSequence interface (which java.lang.String implements), allows more compact representations of Strings than UTF-16.

Apps which manipulate a lot of strings, should probably be written to accept CharSequence, such that they would work with java.lang.String, or more compact representations.

8-bit (UTF-8), or even 5, 6, or 7-bit encoded, or even compressed strings can be represented as CharSequence.

CharSequences can also be a lot more efficient to manipulate - subsequences can be defined as views (pointers) onto the original content for example, instead of copying.

For example in concurrent-trees, a suffix tree of ten of Shakespeare's plays, requires 2GB of RAM using CharSequence-based nodes, and would require 249GB of RAM if using char[] or String-based nodes.

Solution 3 - Java

Since there were up votes, I figure I wasn't missing something obvious so I have logged it as a bug (at the very least an omission in the documentation)

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7129417

(Should be visible in a couple of days)

Solution 4 - Java

Java 9 executes the sorting lines of an enormous file.txt in java twice as fast on my machine as Java 6 and also only needs 1G of memory as it has -XX:+CompactStrings enabled by default. Also, in Java 6, the compressed strings only worked for 7-bit ASCII characters, whereas in Java 9, it supports Latin1 (ISO-8859-1). Some operations like charAt(idx) might be slightly slower though. With the new design, they could also support other encodings in future.

I wrote a newsletter about this on The Java Specialists' Newsletter.

Solution 5 - Java

In OpenJDK 7 (1.7.0_147-icedtea, Ubuntu 11.10), the JVM simply fails with an

> Unrecognized VM option 'UseCompressedStrings'

when JAVA_OPTS (or command line) contains -XX:+UseCompressedStrings.

It seems Oracle really removed the option.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionPeter LawreyView Question on Stackoverflow
Solution 1 - JavaNathanView Answer on Stackoverflow
Solution 2 - JavanpgallView Answer on Stackoverflow
Solution 3 - JavaPeter LawreyView Answer on Stackoverflow
Solution 4 - JavaHeinz KabutzView Answer on Stackoverflow
Solution 5 - JavaRodrigo CoacciView Answer on Stackoverflow