String interning in .Net Framework - What are the benefits and when to use interning

C#.NetStringPerformanceString Interning

C# Problem Overview


I want to know the process and internals of string interning specific to .Net framework. Would also like to know the benefits of using interning and the scenarios/situations where we should use string interning to improve the performance. Though I have studied interning from the Jeffery Richter's CLR book but I am still confused and would like to know it in more detail.

[Editing] to ask a specific question with a sample code as below:

private void MethodA()
{
    string s = "String"; // line 1 - interned literal as explained in the answer        

    //s.intern(); // line 2 - what would happen in line 3 if we uncomment this line, will it make any difference?
}

private bool MethodB(string compareThis)
{
    if (compareThis == "String") // line 3 - will this line use interning (with and without uncommenting line 2 above)?
    {
        return true;
    }
    return false;
}

C# Solutions


Solution 1 - C#

In general, interning is something that just happens, automatically, when you use literal string values. Interning provides the benefit of only having one copy of the literal in memory, no matter how often it's used.

That being said, it's rare that there is a reason to intern your own strings that are generated at runtime, or ever even think about string interning for normal development.

There are potentially some benefits if you're going to be doing a lot of work with comparisons of potentially identical runtime generated strings (as interning can speed up comparisons via ReferenceEquals). However, this is a highly specialized usage, and would require a fair amount of profiling and testing, and wouldn't be an optimization I'd consider unless there was a measured problem in place.

Solution 2 - C#

Interning is an internal implementation detail. Unlike boxing, I do not think there is any benefit in knowing more than what you have read in Richter's book.

Micro-optimisation benefits of interning strings manually are minimal hence is generally not recommended.

This probably describes it:

class Program
{
	const string SomeString = "Some String"; // gets interned

	static void Main(string[] args)
	{
		var s1 = SomeString; // use interned string
		var s2 = SomeString; // use interned string
		var s = "String";
		var s3 = "Some " + s; // no interning 

		Console.WriteLine(s1 == s2); // uses interning comparison
		Console.WriteLine(s1 == s3); // do NOT use interning comparison
	}
}

Solution 3 - C#

This is an "old" question, but I have a different angle on it.

If you're going to have a lot of long-lived strings from a small pool, interning can improve memory efficiency.

In my case, I was interning another type of object in a static dictionary because they were reused frequently, and this served as a fast cache before persisting them to disk.

Most of the fields in these objects are strings, and the pool of values is fairly small (much smaller than the number of instances, anyway).

If these were transient objects, it wouldn't matter because the string fields would be garbage collected often. But because references to them were being held, their memory usage started to accumulate (even when no new unique values were being added).

So interning the objects reduced the memory usage substantially, and so did interning their string values while they were being interned.

Solution 4 - C#

Interned strings have the following characteristics:

  • Two interned strings that are identical will have the same address in memory.
  • Memory occupied by interned strings is not freed until your application terminates.
  • Interning a string involves calculating a hash and looking it up in a dictionary which consumes CPU cycles.
  • If multiple threads intern strings at the same time they will block each other because accesses to the dictionary of interned strings are serialized.

The consequences of these characteristics are:

  • You can test two interned strings for equality by just comparing the address pointer which is a lot faster than comparing each character in the string. This is especially true if the strings are very long and start with the same characters. You can compare interned strings with the Object.ReferenceEquals method, but it is safer to use the string == operator because it checks to see if the strings are interned first.

  • If you use the same string many times in your application, your application will only store one copy of the string in memory reducing the memory required to run your application.

  • If you intern many different strings this will allocate memory for those strings that will never be freed, and your application will consume ever increasing amounts of memory.

  • If you have a very large number of interned strings, string interning can become slow, and threads will block each other when accessing the interned string dictionary.

You should use string interning only if:

  1. The set of strings you are interning is fairly small.
  2. You compare these strings many times for each time that you intern them.
  3. You really care about minute performance optimizations.
  4. You don't have many threads aggressively interning strings.

Solution 5 - C#

Internalization of strings affects memory consumption.

For example if you read strings and keep them it in a list for caching; and the exact same string occurs 10 times, the string is actually stored only once in memory if string.Intern is used. If not, the string is stored 10 times.

In the example below, the string.Intern variant consumes about 44 MB and the without-version (uncommented) consumes 1195 MB.

static void Main(string[] args)
{
    var list = new List<string>();

    for (int i = 0; i < 5 * 1000 * 1000; i++)
    {
        var s = ReadFromDb();
        list.Add(string.Intern(s));
        //list.Add(s);
    }

    Console.WriteLine(Process.GetCurrentProcess().PrivateMemorySize64 / 1024 / 1024 + " MB");
}

private static string ReadFromDb()
{
    return "abcdefghijklmnopqrstuvyxz0123456789abcdefghijklmnopqrstuvyxz0123456789abcdefghijklmnopqrstuvyxz0123456789" + 1;
}

Internalization also improves performance for equals-compare. The example below the intern version takes about 1 time units while the non-intern takes 7 time units.

static void Main(string[] args)
{
    var a = string.Intern(ReadFromDb());
    var b = string.Intern(ReadFromDb());
    //var a = ReadFromDb();
    //var b = ReadFromDb();

    int equals = 0;
    var stopwatch = Stopwatch.StartNew();
    for (int i = 0; i < 250 * 1000 * 1000; i++)
    {
        if (a == b) equals++;
    }
    stopwatch.Stop();

    Console.WriteLine(stopwatch.Elapsed + ", equals: " + equals);
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionVSSView Question on Stackoverflow
Solution 1 - C#Reed CopseyView Answer on Stackoverflow
Solution 2 - C#AliostadView Answer on Stackoverflow
Solution 3 - C#harpoView Answer on Stackoverflow
Solution 4 - C#bikeman868View Answer on Stackoverflow
Solution 5 - C#J. AndersenView Answer on Stackoverflow