How the StringBuilder class is implemented? Does it internally create new string objects each time we append?

C#.NetStringStringbuilder

C# Problem Overview


How the StringBuilder class is implemented? Does it internally create new string objects each time we append?

C# Solutions


Solution 1 - C#

In .NET 2.0 it uses the String class internally. String is only immutable outside of the System namespace, so StringBuilder can do that.

In .NET 4.0 String was changed to use char[].

In 2.0 StringBuilder looked like this

public sealed class StringBuilder : ISerializable
{
    // Fields
    private const string CapacityField = "Capacity";
    internal const int DefaultCapacity = 0x10;
    internal IntPtr m_currentThread;
    internal int m_MaxCapacity;
    internal volatile string m_StringValue; // HERE ----------------------
    private const string MaxCapacityField = "m_MaxCapacity";
    private const string StringValueField = "m_StringValue";
    private const string ThreadIDField = "m_currentThread";

But in 4.0 it looks like this:

public sealed class StringBuilder : ISerializable
{
    // Fields
    private const string CapacityField = "Capacity";
    internal const int DefaultCapacity = 0x10;
    internal char[] m_ChunkChars; // HERE --------------------------------
    internal int m_ChunkLength;
    internal int m_ChunkOffset;
    internal StringBuilder m_ChunkPrevious;
    internal int m_MaxCapacity;
    private const string MaxCapacityField = "m_MaxCapacity";
    internal const int MaxChunkSize = 0x1f40;
    private const string StringValueField = "m_StringValue";
    private const string ThreadIDField = "m_currentThread";

So evidently it was changed from using a string to using a char[].

EDIT: Updated answer to reflect changes in .NET 4 (that I only just discovered).

Solution 2 - C#

The accepted answer misses the mark by a mile. The significant change to StringBuilder in 4.0 is not the change from an unsafe string to char[] - it's the fact that StringBuilder is now actually a linked-list of StringBuilder instances.


The reason for this change should be obvious: now there is never a need to reallocate the buffer (an expensive operation, since, along with allocating more memory, you also have to copy all the contents from the old buffer to the new one).

This means calling ToString() is now slightly slower, since the final string needs to be computed, but doing a large number of Append() operations is now significantly faster. This fits in with the typical use-case for StringBuilder: a lot of calls to Append(), followed by a single call to ToString().


You can find benchmarks here. The conclusion? The new linked-list StringBuilder uses marginally more memory, but is significantly faster for the typical use-case.

Solution 3 - C#

Not really - it uses internal character buffer. Only when buffer capacity gets exhausted, it will allocate new buffer. Append operation will simply add to this buffer, string object will be created when ToString() method is called on it - henceforth, its advisable for many string concatenations as each traditional string concat op would create new string. You can also specify initial capacity to string builder if you have rough idea about it to avoid multiple allocations.

Edit: People are pointing out that my understanding is wrong. Please ignore the answer (I rather not delete it - it will stand as a proof of my ignorance :-)

Solution 4 - C#

I have made a small sample to demonstrate how StringBuilder works in .NET 4. The contract is

public interface ISimpleStringBuilder
{
    ISimpleStringBuilder Append(string value);
    ISimpleStringBuilder Clear();
    int Lenght { get; }
    int Capacity { get; }
}

And this is a very basic implementation

public class SimpleStringBuilder : ISimpleStringBuilder
{
    public const int DefaultCapacity = 32;
        
    private char[] _internalBuffer;
        
    public int Lenght { get; private set; }
    public int Capacity { get; private set; }

    public SimpleStringBuilder(int capacity)
    {
        Capacity = capacity;
        _internalBuffer = new char[capacity];
        Lenght = 0;
    }

    public SimpleStringBuilder() : this(DefaultCapacity) { }

    public ISimpleStringBuilder Append(string value)
    {
        char[] data = value.ToCharArray();

        //check if space is available for additional data
        InternalEnsureCapacity(data.Length);

        foreach (char t in data)
        {
            _internalBuffer[Lenght] = t;
            Lenght++;
        }

        return this;
    }

    public ISimpleStringBuilder Clear()
    {
        _internalBuffer = new char[Capacity];
        Lenght = 0;
        return this;
    }

    public override string ToString()
    {
        //use only non-null ('\0') characters
        var tmp = new char[Lenght];
        for (int i = 0; i < Lenght; i++)
        {
            tmp[i] = _internalBuffer[i];
        }
        return new string(tmp);
    }

    private void InternalExpandBuffer()
    {
        //double capacity by default
        Capacity *= 2;

        //copy to new array
        var tmpBuffer = new char[Capacity];
        for (int i = 0; i < _internalBuffer.Length; i++)
        {
            char c = _internalBuffer[i];
            tmpBuffer[i] = c;
        }
        _internalBuffer = tmpBuffer;
    }

    private void InternalEnsureCapacity(int additionalLenghtRequired)
    {
        while (Lenght + additionalLenghtRequired > Capacity)
        {
            //not enough space in the current buffer    
            //double capacity
            InternalExpandBuffer();
        }
    }
}

This code is not thread-safe, doesn't make any input validation and is not using the internal (unsafe) magic of System.String. It does however demonstrates the idea behind StringBuilder class.

Some unit-tests and full sample code can be found at github.

Solution 5 - C#

If I look at .NET Reflector at .NET 2 then I will find this:

public StringBuilder Append(string value)
{
    if (value != null)
    {
        string stringValue = this.m_StringValue;
        IntPtr currentThread = Thread.InternalGetCurrentThread();
        if (this.m_currentThread != currentThread)
        {
            stringValue = string.GetStringForStringBuilder(stringValue, stringValue.Capacity);
        }
        int length = stringValue.Length;
        int requiredLength = length + value.Length;
        if (this.NeedsAllocation(stringValue, requiredLength))
        {
            string newString = this.GetNewString(stringValue, requiredLength);
            newString.AppendInPlace(value, length);
            this.ReplaceString(currentThread, newString);
        }
        else
        {
            stringValue.AppendInPlace(value, length);
            this.ReplaceString(currentThread, stringValue);
        }
    }
    return this;
}

So it is a mutated string instance...

EDIT Except in .NET 4 it is a char[]

Solution 6 - C#

If you want to see one of the possible implementations (That is similar to the one shipped wit the microsoft implementation up to v3.5) you could see the source of the Mono one on github.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRamView Question on Stackoverflow
Solution 1 - C#Brian RasmussenView Answer on Stackoverflow
Solution 2 - C#BlueRaja - Danny PflughoeftView Answer on Stackoverflow
Solution 3 - C#VinayCView Answer on Stackoverflow
Solution 4 - C#oleksiiView Answer on Stackoverflow
Solution 5 - C#Yves M.View Answer on Stackoverflow
Solution 6 - C#Julien RoncagliaView Answer on Stackoverflow