How to replace multiple white spaces with one white space

C#StringWhitespace

C# Problem Overview


Let's say I have a string such as:

"Hello     how are   you           doing?"

I would like a function that turns multiple spaces into one space.

So I would get:

"Hello how are you doing?"

I know I could use regex or call

string s = "Hello     how are   you           doing?".replace("  "," ");

But I would have to call it multiple times to make sure all sequential whitespaces are replaced with only one.

Is there already a built in method for this?

C# Solutions


Solution 1 - C#

string cleanedString = System.Text.RegularExpressions.Regex.Replace(dirtyString,@"\s+"," ");

Solution 2 - C#

This question isn't as simple as other posters have made it out to be (and as I originally believed it to be) - because the question isn't quite precise as it needs to be.

There's a difference between "space" and "whitespace". If you only mean spaces, then you should use a regex of " {2,}". If you mean any whitespace, that's a different matter. Should all whitespace be converted to spaces? What should happen to space at the start and end?

For the benchmark below, I've assumed that you only care about spaces, and you don't want to do anything to single spaces, even at the start and end.

Note that correctness is almost always more important than performance. The fact that the Split/Join solution removes any leading/trailing whitespace (even just single spaces) is incorrect as far as your specified requirements (which may be incomplete, of course).

The benchmark uses MiniBench.

using System;
using System.Text.RegularExpressions;
using MiniBench;

internal class Program
{
    public static void Main(string[] args)
    {

        int size = int.Parse(args[0]);
        int gapBetweenExtraSpaces = int.Parse(args[1]);
        
        char[] chars = new char[size];
        for (int i=0; i < size/2; i += 2)
        {
            // Make sure there actually *is* something to do
            chars[i*2] = (i % gapBetweenExtraSpaces == 1) ? ' ' : 'x';
            chars[i*2 + 1] = ' ';
        }
        // Just to make sure we don't have a \0 at the end
        // for odd sizes
        chars[chars.Length-1] = 'y';
        
        string bigString = new string(chars);
        // Assume that one form works :)
        string normalized = NormalizeWithSplitAndJoin(bigString);

        
        var suite = new TestSuite<string, string>("Normalize")
            .Plus(NormalizeWithSplitAndJoin)
            .Plus(NormalizeWithRegex)
            .RunTests(bigString, normalized);
        
        suite.Display(ResultColumns.All, suite.FindBest());
    }

    private static readonly Regex MultipleSpaces = 
        new Regex(@" {2,}", RegexOptions.Compiled);
    
    static string NormalizeWithRegex(string input)
    {
        return MultipleSpaces.Replace(input, " ");
    }
    
    // Guessing as the post doesn't specify what to use
    private static readonly char[] Whitespace =
        new char[] { ' ' };
    
    static string NormalizeWithSplitAndJoin(string input)
    {
        string[] split = input.Split
            (Whitespace, StringSplitOptions.RemoveEmptyEntries);
        return string.Join(" ", split);
    }
}

A few test runs:

c:\Users\Jon\Test>test 1000 50
============ Normalize ============
NormalizeWithSplitAndJoin  1159091 0:30.258 22.93
NormalizeWithRegex        26378882 0:30.025  1.00

c:\Users\Jon\Test>test 1000 5
============ Normalize ============
NormalizeWithSplitAndJoin  947540 0:30.013 1.07
NormalizeWithRegex        1003862 0:29.610 1.00


c:\Users\Jon\Test>test 1000 1001
============ Normalize ============
NormalizeWithSplitAndJoin  1156299 0:29.898 21.99
NormalizeWithRegex        23243802 0:27.335  1.00

Here the first number is the number of iterations, the second is the time taken, and the third is a scaled score with 1.0 being the best.

That shows that in at least some cases (including this one) a regular expression can outperform the Split/Join solution, sometimes by a very significant margin.

However, if you change to an "all whitespace" requirement, then Split/Join does appear to win. As is so often the case, the devil is in the detail...

Solution 3 - C#

A regular expressoin would be the easiest way. If you write the regex the correct way, you wont need multiple calls.

Change it to this:

string s = System.Text.RegularExpressions.Regex.Replace(s, @"\s{2,}", " "); 

Solution 4 - C#

While the existing answers are fine, I'd like to point out one approach which doesn't work:

public static string DontUseThisToCollapseSpaces(string text)
{
    while (text.IndexOf("  ") != -1)
    {
        text = text.Replace("  ", " ");
    }
    return text;
}

This can loop forever. Anyone care to guess why? (I only came across this when it was asked as a newsgroup question a few years ago... someone actually ran into it as a problem.)

Solution 5 - C#

Here is the Solution i work with. Without RegEx and String.Split.

public static string TrimWhiteSpace(this string Value)
{
    StringBuilder sbOut = new StringBuilder();
    if (!string.IsNullOrEmpty(Value))
    {
        bool IsWhiteSpace = false;
        for (int i = 0; i < Value.Length; i++)
        {
            if (char.IsWhiteSpace(Value[i])) //Comparion with WhiteSpace
            {
                if (!IsWhiteSpace) //Comparison with previous Char
                {
                    sbOut.Append(Value[i]);
                    IsWhiteSpace = true;
                }
            }
            else
            {
                IsWhiteSpace = false;
                sbOut.Append(Value[i]);
            }
        }
    }
    return sbOut.ToString();
}

so you can:

string cleanedString = dirtyString.TrimWhiteSpace();

Solution 6 - C#

A fast extra whitespace remover by Felipe Machado. (Modified by RW for multi-space removal)

static string DuplicateWhiteSpaceRemover(string str)
{
    var len = str.Length;
    var src = str.ToCharArray();
    int dstIdx = 0;
    bool lastWasWS = false; //Added line
    for (int i = 0; i < len; i++)
    {
        var ch = src[i];
        switch (ch)
        {
            case '\u0020': //SPACE
            case '\u00A0': //NO-BREAK SPACE
            case '\u1680': //OGHAM SPACE MARK
            case '\u2000': // EN QUAD
            case '\u2001': //EM QUAD
            case '\u2002': //EN SPACE
            case '\u2003': //EM SPACE
            case '\u2004': //THREE-PER-EM SPACE
            case '\u2005': //FOUR-PER-EM SPACE
            case '\u2006': //SIX-PER-EM SPACE
            case '\u2007': //FIGURE SPACE
            case '\u2008': //PUNCTUATION SPACE
            case '\u2009': //THIN SPACE
            case '\u200A': //HAIR SPACE
            case '\u202F': //NARROW NO-BREAK SPACE
            case '\u205F': //MEDIUM MATHEMATICAL SPACE
            case '\u3000': //IDEOGRAPHIC SPACE
            case '\u2028': //LINE SEPARATOR
            case '\u2029': //PARAGRAPH SEPARATOR
            case '\u0009': //[ASCII Tab]
            case '\u000A': //[ASCII Line Feed]
            case '\u000B': //[ASCII Vertical Tab]
            case '\u000C': //[ASCII Form Feed]
            case '\u000D': //[ASCII Carriage Return]
            case '\u0085': //NEXT LINE
                if (lastWasWS == false) //Added line
                {
                    src[dstIdx++] = ' '; // Updated by Ryan
                    lastWasWS = true; //Added line
                }
                continue;
            default:
                lastWasWS = false; //Added line 
                src[dstIdx++] = ch;
                break;
        }
    }
    return new string(src, 0, dstIdx);
}

The benchmarks...

|                           | Time  |   TEST 1    |   TEST 2    |   TEST 3    |   TEST 4    |   TEST 5    |
| Function Name             |(ticks)| dup. spaces | spaces+tabs | spaces+CR/LF| " " -> " "  | " " -> " " |
|---------------------------|-------|-------------|-------------|-------------|-------------|-------------|
| SwitchStmtBuildSpaceOnly  |   5.2 |    PASS     |    FAIL     |    FAIL     |    PASS     |    PASS     |
| InPlaceCharArraySpaceOnly |   5.6 |    PASS     |    FAIL     |    FAIL     |    PASS     |    PASS     |
| DuplicateWhiteSpaceRemover|   7.0 |    PASS     |    PASS     |    PASS     |    PASS     |    PASS     |
| SingleSpacedTrim          |  11.8 |    PASS     |    PASS     |    PASS     |    FAIL     |    FAIL     |
| Fubo(StringBuilder)       |    13 |    PASS     |    FAIL     |    FAIL     |    PASS     |    PASS     |
| User214147                |    19 |    PASS     |    PASS     |    PASS     |    FAIL     |    FAIL     | 
| RegExWithCompile          |    28 |    PASS     |    FAIL     |    FAIL     |    PASS     |    PASS     |
| SwitchStmtBuild           |    34 |    PASS     |    FAIL     |    FAIL     |    PASS     |    PASS     |
| SplitAndJoinOnSpace       |    55 |    PASS     |    FAIL     |    FAIL     |    FAIL     |    FAIL     |
| RegExNoCompile            |   120 |    PASS     |    PASS     |    PASS     |    PASS     |    PASS     |
| RegExBrandon              |   137 |    PASS     |    FAIL     |    PASS     |    PASS     |    PASS     |

Benchmark notes: Release Mode, no-debugger attached, i7 processor, avg of 4 runs, only short strings tested

SwitchStmtBuildSpaceOnly by Felipe Machado 2015 and modified by Sunsetquest

InPlaceCharArraySpaceOnly by Felipe Machado 2015 and modified by Sunsetquest

SwitchStmtBuild by Felipe Machado 2015 and modified by Sunsetquest

SwitchStmtBuild2 by Felipe Machado 2015 and modified by Sunsetquest

SingleSpacedTrim by David S 2013

Fubo(StringBuilder) by fubo 2014

SplitAndJoinOnSpace by Jon Skeet 2009

RegExWithCompile by Jon Skeet 2009

User214147 by user214147

RegExBrandon by Brandon

RegExNoCompile by Tim Hoolihan

Benchmark code is on Github

Solution 7 - C#

As already pointed out, this is easily done by a regular expression. I'll just add that you might want to add a .trim() to that to get rid of leading/trailing whitespace.

Solution 8 - C#

I'm sharing what I use, because it appears I've come up with something different. I've been using this for a while and it is fast enough for me. I'm not sure how it stacks up against the others. I uses it in a delimited file writer and run large datatables one field at a time through it.

    public static string NormalizeWhiteSpace(string S)
    {
        string s = S.Trim();
        bool iswhite = false;
        int iwhite;
        int sLength = s.Length;
        StringBuilder sb = new StringBuilder(sLength);
        foreach(char c in s.ToCharArray())
        {
            if(Char.IsWhiteSpace(c))
            {
                if (iswhite)
                {
                    //Continuing whitespace ignore it.
                    continue;
                }
                else
                {
                    //New WhiteSpace
                   
                    //Replace whitespace with a single space.
                    sb.Append(" ");
                    //Set iswhite to True and any following whitespace will be ignored
                    iswhite = true;
                }  
            }
            else
            {
                sb.Append(c.ToString());
                //reset iswhitespace to false
                iswhite = false;
            }
        }
        return sb.ToString();
    }

Solution 9 - C#

VB.NET

Linha.Split(" ").ToList().Where(Function(x) x <> " ").ToArray

C#

Linha.Split(" ").ToList().Where(x => x != " ").ToArray();

Enjoy the power of LINQ =D

Solution 10 - C#

Using the test program that Jon Skeet posted, I tried to see if I could get a hand written loop to run faster.
I can beat NormalizeWithSplitAndJoin every time, but only beat NormalizeWithRegex with inputs of 1000, 5.

static string NormalizeWithLoop(string input)
{
    StringBuilder output = new StringBuilder(input.Length);

    char lastChar = '*';  // anything other then space 
    for (int i = 0; i < input.Length; i++)
    {
        char thisChar = input[i];
        if (!(lastChar == ' ' && thisChar == ' '))
            output.Append(thisChar);

        lastChar = thisChar;
    }

    return output.ToString();
}

I have not looked at the machine code the jitter produces, however I expect the problem is the time taken by the call to StringBuilder.Append() and to do much better would need the use of unsafe code.

So Regex.Replace() is very fast and hard to beat!!

Solution 11 - C#

Regex regex = new Regex(@"\W+");
string outputString = regex.Replace(inputString, " ");

Solution 12 - C#

Smallest solution:

var regExp=/\s+/g,
newString=oldString.replace(regExp,' ');

Solution 13 - C#

You can try this:

    /// <summary>
    /// Remove all extra spaces and tabs between words in the specified string!
    /// </summary>
    /// <param name="str">The specified string.</param>
    public static string RemoveExtraSpaces(string str)
    {
        str = str.Trim();
        StringBuilder sb = new StringBuilder();
        bool space = false;
        foreach (char c in str)
        {
            if (char.IsWhiteSpace(c) || c == (char)9) { space = true; }
            else { if (space) { sb.Append(' '); }; sb.Append(c); space = false; };
        }
        return sb.ToString();
    }

Solution 14 - C#

Replacement groups provide impler approach resolving replacement of multiple white space characters with same single one:

    public static void WhiteSpaceReduce()
    {
        string t1 = "a b   c d";
        string t2 = "a b\n\nc\nd";

        Regex whiteReduce = new Regex(@"(?<firstWS>\s)(?<repeatedWS>\k<firstWS>+)");
        Console.WriteLine("{0}", t1);
        //Console.WriteLine("{0}", whiteReduce.Replace(t1, x => x.Value.Substring(0, 1))); 
        Console.WriteLine("{0}", whiteReduce.Replace(t1, @"${firstWS}"));
        Console.WriteLine("\nNext example ---------");
        Console.WriteLine("{0}", t2);
        Console.WriteLine("{0}", whiteReduce.Replace(t2, @"${firstWS}"));
        Console.WriteLine();
    }

Please notice the second example keeps single \n while accepted answer would replace end of line with space.

If you need to replace any combination of white space characters with the first one, just remove the back-reference \k from the pattern.

Solution 15 - C#

string.Join(" ", s.Split(" ").Where(r => r != ""));

Solution 16 - C#

There is no way built in to do this. You can try this:

private static readonly char[] whitespace = new char[] { ' ', '\n', '\t', '\r', '\f', '\v' };
public static string Normalize(string source)
{
   return String.Join(" ", source.Split(whitespace, StringSplitOptions.RemoveEmptyEntries));
}

This will remove leading and trailing whitespce as well as collapse any internal whitespace to a single whitespace character. If you really only want to collapse spaces, then the solutions using a regular expression are better; otherwise this solution is better. (See the analysis done by Jon Skeet.)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMattView Question on Stackoverflow
Solution 1 - C#Tim HoolihanView Answer on Stackoverflow
Solution 2 - C#Jon SkeetView Answer on Stackoverflow
Solution 3 - C#BrandonView Answer on Stackoverflow
Solution 4 - C#Jon SkeetView Answer on Stackoverflow
Solution 5 - C#fuboView Answer on Stackoverflow
Solution 6 - C#SunsetQuestView Answer on Stackoverflow
Solution 7 - C#MAKView Answer on Stackoverflow
Solution 8 - C#user214147View Answer on Stackoverflow
Solution 9 - C#Patryk MouraView Answer on Stackoverflow
Solution 10 - C#Ian RingroseView Answer on Stackoverflow
Solution 11 - C#Michael D.View Answer on Stackoverflow
Solution 12 - C#sycotedView Answer on Stackoverflow
Solution 13 - C#LL99View Answer on Stackoverflow
Solution 14 - C#DanView Answer on Stackoverflow
Solution 15 - C#Guillermo GimenezView Answer on Stackoverflow
Solution 16 - C#Scott DormanView Answer on Stackoverflow