Is it good practice to use java.lang.String.intern()?

JavaString

Java Problem Overview


The Javadoc about String.intern() doesn't give much detail. (In a nutshell: It returns a canonical representation of the string, allowing interned strings to be compared using ==)

  • When would I use this function in favor to String.equals()?

  • Are there side effects not mentioned in the Javadoc, i.e. more or less optimization by the JIT compiler?

  • Are there further uses of String.intern()?

Java Solutions


Solution 1 - Java

This has (almost) nothing to do with string comparison. String interning is intended for saving memory if you have many strings with the same content in you application. By using String.intern() the application will only have one instance in the long run and a side effect is that you can perform fast reference equality comparison instead of ordinary string comparison (but this is usually not advisable because it is realy easy to break by forgetting to intern only a single instance).

Solution 2 - Java

> When would I use this function in favor to String.equals()

when you need speed since you can compare strings by reference (== is faster than equals)

> Are there side effects not mentioned in the Javadoc?

The primary disadvantage is that you have to remember to make sure that you actually do intern() all of the strings that you're going to compare. It's easy to forget to intern() all strings and then you can get confusingly incorrect results. Also, for everyone's sake, please be sure to very clearly document that you're relying on the strings being internalized.

The second disadvantage if you decide to internalize strings is that the intern() method is relatively expensive. It has to manage the pool of unique strings so it does a fair bit of work (even if the string has already been internalized). So, be careful in your code design so that you e.g., intern() all appropriate strings on input so you don't have to worry about it anymore.

(from JGuru)

Third disadvantage (Java 7 or less only): interned Strings live in PermGen space, which is usually quite small; you may run into an OutOfMemoryError with plenty of free heap space.

(from Michael Borgwardt)

Solution 3 - Java

String.intern() is definitely garbage collected in modern JVMs.
The following NEVER runs out of memory, because of GC activity:

// java -cp . -Xmx128m UserOfIntern

public class UserOfIntern {
    public static void main(String[] args) {
        Random random = new Random();
        System.out.println(random.nextLong());
        while (true) {
            String s = String.valueOf(random.nextLong());
            s = s.intern();
        }
    }
}

See more (from me) on the myth of non GCed String.intern().

Solution 4 - Java

I have recently written an article about String.intern() implementation in Java 6, 7 and 8: String.intern in Java 6, 7 and 8 - string pooling.

I hope it should contain enough information about current situation with string pooling in Java.

In a nutshell:

  • Avoid String.intern() in Java 6, because it goes into PermGen
  • Prefer String.intern() in Java 7 & Java 8: it uses 4-5x less memory than rolling your own object pool
  • Be sure to tune -XX:StringTableSize (the default is probably too small; set a Prime number)

Solution 5 - Java

Comparing strings with == is much faster than with equals()

5 Time faster, but since String comparision usually represents only a small percentage of the total execution time of an application, the overall gain is much smaller than that, and the final gain will be diluted to a few percent.

String.intern() pull the string away from Heap and put it in PermGen

String internalized are put in a different storage area : Permanent Generation which is an area of the JVM that is reserved for non-user objects, like Classes, Methods and other internal JVM objects. The size of this area is limited and the is much precious than heap. Being this area smaller than Heap there are more probability to use all the space and get an OutOfMemoryException.

String.intern() string are garbage collected

In the new versions of JVM also internalized string are garbage collected when not referenced by any object.

Keeping in mind the above 3 point you could deduct that String intern() could be useful only in few situation when you do a lot of string comparison, however it is better don't use internal string if you don't know exactly what you are doing ...

Solution 6 - Java

> When would I use this function in favor to String.equals()

Given they do different things, probably never.

Interning strings for performance reasons so that you can compare them for reference equality is only going to be of benefit if you are holding references to the strings for a while - strings coming from user input or IO won't be interned.

That means in your application you receive input from an external source and process it into an object which has a semantic value - an identifier say - but that object has a type indistinguishable from the raw data, and has different rules as to how the programmer should use it.

It's almost always better to create a UserId type which is interned ( it's easy to create a thread-safe generic interning mechanism ) and acts like an open enum, than to overload the java.lang.String type with reference semantics if it happens to be a User ID.

That way you don't get confusion between whether or not a particular String has been interned, and you can encapsulate any additional behaviour you require in the open enum.

Solution 7 - Java

Am not aware of any advantages, and if there were in one would think that equals() would itself use intern() internally (which it doesn't).

Busting intern() myths

Solution 8 - Java

Daniel Brückner is absolutely right. String interning is meant to save memory (heap). Our system currently have a giant hashmap for holding certain data. As system scales, the hashmap will be big enough to make the heap out of memory (as we've tested). By interning all the duplicated strings all the objects in the hashmap, it saves us a significant amount of heap space.

Also in Java 7, interned strings no long live in PermGen but heap instead. So you don't need to worry about its size and yes it gets garbage collected:

> In JDK 7, interned strings are no longer allocated in the permanent > generation of the Java heap, but are instead allocated in the main > part of the Java heap (known as the young and old generations), along > with the other objects created by the application. This change will > result in more data residing in the main Java heap, and less data in > the permanent generation, and thus may require heap sizes to be > adjusted. Most applications will see only relatively small differences > in heap usage due to this change, but larger applications that load > many classes or make heavy use of the String.intern() method will see > more significant differences.

Solution 9 - Java

> Are there side effects not mentioned in the Javadoc, i.e. more or less optimization by the JIT compiler?

I don't know about the JIT level, but there is direct bytecode support for the string pool, which is implemented magically and efficiently with a dedicated CONSTANT_String_info struct (unlike most other objects which have more generic representations).

JVMS

JVMS 7 5.1 says:

> A string literal is a reference to an instance of class String, and is derived from a CONSTANT_String_info structure (§4.4.3) in the binary representation of a class or interface. The CONSTANT_String_info structure gives the sequence of Unicode code points constituting the string literal.

> The Java programming language requires that identical string literals (that is, literals that contain the same sequence of code points) must refer to the same instance of class String (JLS §3.10.5). In addition, if the method String.intern is called on any string, the result is a reference to the same class instance that would be returned if that string appeared as a literal. Thus, the following expression must have the value true:

("a" + "b" + "c").intern() == "abc"

> To derive a string literal, the Java Virtual Machine examines the sequence of code points given by the CONSTANT_String_info structure.

> - If the method String.intern has previously been called on an instance of class String containing a sequence of Unicode code points identical to that given by the CONSTANT_String_info structure, then the result of string literal derivation is a reference to that same instance of class String.

> - Otherwise, a new instance of class String is created containing the sequence of Unicode code points given by the CONSTANT_String_info structure; a reference to that class instance is the result of string literal derivation. Finally, the intern method of the new String instance is invoked.

Bytecode

It is also instructive to look at the bytecode implementation on OpenJDK 7.

If we decompile:

public class StringPool {
    public static void main(String[] args) {
        String a = "abc";
        String b = "abc";
        String c = new String("abc");
        System.out.println(a);
        System.out.println(b);
        System.out.println(a == c);
    }
}

we have on the constant pool:

#2 = String             #32   // abc
[...]
#32 = Utf8               abc

and main:

 0: ldc           #2          // String abc
 2: astore_1
 3: ldc           #2          // String abc
 5: astore_2
 6: new           #3          // class java/lang/String
 9: dup
10: ldc           #2          // String abc
12: invokespecial #4          // Method java/lang/String."<init>":(Ljava/lang/String;)V
15: astore_3
16: getstatic     #5          // Field java/lang/System.out:Ljava/io/PrintStream;
19: aload_1
20: invokevirtual #6          // Method java/io/PrintStream.println:(Ljava/lang/String;)V
23: getstatic     #5          // Field java/lang/System.out:Ljava/io/PrintStream;
26: aload_2
27: invokevirtual #6          // Method java/io/PrintStream.println:(Ljava/lang/String;)V
30: getstatic     #5          // Field java/lang/System.out:Ljava/io/PrintStream;
33: aload_1
34: aload_3
35: if_acmpne     42
38: iconst_1
39: goto          43
42: iconst_0
43: invokevirtual #7          // Method java/io/PrintStream.println:(Z)V

Note how:

  • 0 and 3: the same ldc #2 constant is loaded (the literals)
  • 12: a new string instance is created (with #2 as argument)
  • 35: a and c are compared as regular objects with if_acmpne

The representation of constant strings is quite magic on the bytecode:

and the JVMS quote above seems to say that whenever the Utf8 pointed to is the same, then identical instances are loaded by ldc.

I have done similar tests for fields, and:

  • static final String s = "abc" points to the constant table through the ConstantValue Attribute
  • non-final fields don't have that attribute, but can still be initialized with ldc

Bonus: compare that to the Integer pool, which does not have direct bytecode support (i.e. no CONSTANT_String_info analogue).

Solution 10 - Java

I would examine intern and ==-comparison instead of equals only in the case of equals-comparison being bottleneck in multiple comparisons of string. This is highly unlikely to help with small number of comparisons, because intern() is not free. After aggressively interning strings you will find calls to intern() getting slower and slower.

Solution 11 - Java

An kind of memory leak can come from the use of subString() when the result is small compared to the source string and the object has a long life.

The normal solution is to use new String( s.subString(...)) but when you have a class that stores the result of a potential/likely subString(...) and have no control over the caller, you might consider to store the intern() of the String arguments passed to the constructor. This releases the potential large buffer.

Solution 12 - Java

String interning is useful in the case where the equals() method is being invoked often because the equals() method does a quick check to see if the objects are the same at the beginning of the method.

if (this == anObject) {
    return true;
}

This usually occurs on when searching through a Collection though other code may also do string equality checks.

There is a cost involved to interning though, I performed a microbenchmark of some code and found that the interning process increases the runtime by a factor of 10.

The best place to do the interning is usually when you are reading keys that are stored outside of the code as strings in the code are automatically interned. This would normally happen at the initialization stages of your application in order to prevent the first-user penalty.

Another place where it can be done is when processing user input that could be used to do key lookups. This normally happens in your request processor, note that the interned strings should be passed down.

Aside from that there isn't much point doing interning in the rest of the code as it generally won't give any benefit.

Solution 13 - Java

I would vote for it not being worth the maintenance hassle.

Most of the time, there will be no need, and no performance benefit, unless you're code does a lot of work with substrings. In which case the String class will use the original string plus an offset to save memory. If your code uses substrings a lot, then I suspect that it'll just cause your memory requirements to explode.

Solution 14 - Java

http://kohlerm.blogspot.co.uk/2009/01/is-javalangstringintern-really-evil.html

asserts that String.equals() uses "==" to compare String objects before, according to

http://www.codeinstructions.com/2009/01/busting-javalangstringintern-myths.html

it compares the lengths of Strings, and then the contents.

(By the way, product code strings in a sales catalogue are liable to be all the same length - BIC0417 is a bicycist's safety helmet, TIG0003 is a live adult male tiger - you probably need all sorts of licences to order one of those. And maybe you better order a safety helmet at the same time.)

So it sounds as though you get a benefit from replacing your Strings with their intern() version, but you get safety - and readability and standard compliance - -without- using "==" for equals() in your programming. And most of what I'm going to say depends on that being true, if it is true.

But does String.equals() test that you passed it a String and not some other object, before using "==" ? I'm not qualified to say, but I would guess not, because overwhelmingly most such equals() operations will be String to String, so that test is almost always passed. Indeed, prioritising "==" inside String.equals() implies a confidence that you frequently are comparing the String to the same actual object.

I hope no one is surprised that the following lines produce a result of "false":

	Integer i = 1;
	System.out.println("1".equals(i));

But if you change i to i.toString() in the second line, of course it's true.

Venues where you might hope for a benefit from interning include Set and Map, obviously. I hope that interned strings have their hashcodes cached... I think that would be a requirement. And I hope I haven't just given away an idea that could earn me a million dollars. :-)

As for memory, it's also obvious that that is an important limit if your volume of Strings is large, or if you want the memory used by your program code to be very small. If your volume of -distinct- Strings is very large, then it may be time to consider using dedicated database program code to manage them, and a separate database server. Likewise, if you can improve a small program (that needs to run in 10000 instances simultaneously) by having it not store its Strings itself at all.

It feels wasteful to create a new String and then discard it straight away for its intern() substitute, but there isn't a clear alternative, except for keeping the duplicate String. So really the execution cost is of searching for your string in the intern pool and then allowing the garbage collector to dispose of the original. And if it's a string literal then it comes intern-ed already anyway.

I am wondering whether intern() can be abused by malicious program code to detect whether some String and their object references already exist in the intern() pool, and therefore exist elsewhere in the Java session, when that shouldn't be known. But that would only be possible when the program code is already being used in a trusting way, I guess. Still, it is something to consider about the third-party libraries that you include in your program to store and remember your ATM PIN numbers!

Solution 15 - Java

The real reason to use intern is not the above. You get to use it after you get out-of-memory error. Lots of the string in a typical program are String.substring() of other big string [think of taking out a user-name from a 100K xml file. The java implementation is that , the substring holds a reference to the original string and the start+end in that huge string. (The thought behind it is a reuse of the same big string)

After 1000 big files , from which you only save 1000 short names , you will keep in memory the whole 1000 files! Solution: in this scenario just use smallsubstring.intern()

Solution 16 - Java

I am using intern to save memory, I hold a large amount of String data in memory and moving to use intern() saved a massive amount of memory. Unfortunately although it use alot less memory the memory it does use is stored in PermGen memory not Heap and it is difficult to explain to customers how to increase the allocation of this type of memory.

So is there an alternative to intern() for reducing memory consumption, (the == versus equals performance benefits is not a aissue for me)

Solution 17 - Java

Let's face it: the main use-case scenario is when you read a stream of data (either through an input stream, or from a JDBC ResultSet) and there is a myriad of little Strings that are repeated all throughout.

Here is a little trick that gives you some control over what kind of mechanism you'd like to use to internalize Strings and other immutables, and an example implementation:

/**
 * Extends the notion of String.intern() to different mechanisms and
 * different types. For example, an implementation can use an
 * LRUCache<T,?>, or a WeakHashMap.
 */
public interface Internalizer<T> {
	public T get(T obj);
}
public static class LRUInternalizer<T> implements Internalizer<T> {
	private final LRUCache<T, T> cache;
	public LRUInternalizer(int size) {
		cache = new LRUCache<T, T>(size) {
			private static final long serialVersionUID = 1L;
			@Override
			protected T retrieve(T key) {
				return key;
			}
		};
	}
	@Override
	public T get(T obj) {
		return cache.get(obj);
	}
}
public class PermGenInternalizer implements Internalizer<String> {
	@Override
	public String get(String obj) {
		return obj.intern();
	}
}

I use that often when I read fields from streams or from ResultSets. Note: LRUCache is a simple cache based on LinkedHashMap<K,V>. It automatically calls the user-supplied retrieve() method for all cache misses.

The way to use this is to create one LRUInternalizer before your read (or reads), use it to internalize Strings and other small immutable objects, then free it. For example:

Internalizer<String> internalizer = new LRUInternalizer(2048);
// ... get some object "input" that stream fields
for (String s : input.nextField()) {
    s = internalizer.get(s);
    // store s...
}

Solution 18 - Java

I am using it in order to cache the contents of approximately 36000 codes which link to associated names. I intern the strings in the cache because many of the codes point to the same string.

By interning the strings in my cache, I am ensuring that codes that point to the same string actually point to the same memory, thereby saving me RAM space.

If the interned strings were actually garbage collected, it would not work for me at all. This would basically negate the purpose of interning. Mine won't be garbage collected because I am holding a reference to each and every string in the cache.

Solution 19 - Java

The cost of interning a string is much more than the time saved in a single stringA.equals(B) comparison. Only use it (for performance reasons) when you are repeatedly using the same unchanged string variables. For example if you regularly iterate over a stable list of strings to update some maps keyed on the same string field you can get a nice saving.

I would suggest using string interning to tweak performance when you are optimising specific parts of your code.

Also remember that String are immutable and don't make the silly mistake of

String a = SOME_RANDOM_VALUE
a.intern()

remember to do

String a = SOME_RANDOM_VALUE.intern()

Solution 20 - Java

If you are looking for an unlimited replacement for String.intern, also garbage collected, the following is working well for me.

private static WeakHashMap<String, WeakReference<String>> internStrings = new WeakHashMap<>();
public static String internalize(String k) {
	synchronized (internStrings) {
		WeakReference<String> weakReference = internStrings.get(k);
		String v = weakReference != null ? weakReference.get() : null;
		if (v == null) {
			v = k;
			internStrings.put(v, new WeakReference<String>(v));
		}
		return v;
	}
}

Of course, if you can roughly estimate how many different strings there will be, then simply use String.intern() with -XX:StringTableSize=highEnoughValue.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDaniel RikowskiView Question on Stackoverflow
Solution 1 - JavaDaniel BrücknerView Answer on Stackoverflow
Solution 2 - JavadfaView Answer on Stackoverflow
Solution 3 - JavaGili NachumView Answer on Stackoverflow
Solution 4 - Javamik1View Answer on Stackoverflow
Solution 5 - JavaalerootView Answer on Stackoverflow
Solution 6 - JavaPete KirkhamView Answer on Stackoverflow
Solution 7 - JavaobjectsView Answer on Stackoverflow
Solution 8 - JavaxliView Answer on Stackoverflow
Solution 9 - JavaCiro Santilli Путлер Капут 六四事View Answer on Stackoverflow
Solution 10 - JavaMikko MaunuView Answer on Stackoverflow
Solution 11 - JavaeremmelView Answer on Stackoverflow
Solution 12 - JavaArchimedes TrajanoView Answer on Stackoverflow
Solution 13 - Javawm_eddieView Answer on Stackoverflow
Solution 14 - JavaRobert CarnegieView Answer on Stackoverflow
Solution 15 - JavaasafView Answer on Stackoverflow
Solution 16 - JavaPaul TaylorView Answer on Stackoverflow
Solution 17 - JavaPierre DView Answer on Stackoverflow
Solution 18 - JavaRodney P. BarbatiView Answer on Stackoverflow
Solution 19 - JavagrumblebeeView Answer on Stackoverflow
Solution 20 - JavabdruemenView Answer on Stackoverflow