Split a string that has white spaces, unless they are enclosed within "quotes"?

C#Split

C# Problem Overview


To make things simple:

string streamR = sr.ReadLine();  // sr.Readline results in:
                                 //                         one "two two"

I want to be able to save them as two different strings, remove all spaces EXCEPT for the spaces found between quotation marks. Therefore, what I need is:

string 1 = one
string 2 = two two

So far what I have found that works is the following code, but it removes the spaces within the quotes.

//streamR.ReadLine only has two strings
  string[] splitter = streamR.Split(' ');
    str1 = splitter[0];
    // Only set str2 if the length is >1
    str2 = splitter.Length > 1 ? splitter[1] : string.Empty;

The output of this becomes

one
two

I have looked into https://stackoverflow.com/questions/554013/regular-expression-to-split-on-spaces-unless-in-quotes however I can't seem to get regex to work/understand the code, especially how to split them so they are two different strings. All the codes there give me a compiling error (I am using System.Text.RegularExpressions)

C# Solutions


Solution 1 - C#

string input = "one \"two two\" three \"four four\" five six";
var parts = Regex.Matches(input, @"[\""].+?[\""]|[^ ]+")
                .Cast<Match>()
                .Select(m => m.Value)
                .ToList();

Solution 2 - C#

You can even do that without Regex: a LINQ expression with String.Split can do the job.

You can split your string before by " then split only the elements with even index in the resulting array by .

var result = myString.Split('"')
                     .Select((element, index) => index % 2 == 0  // If even index
                                           ? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)  // Split the item
                                           : new string[] { element })  // Keep the entire item
                     .SelectMany(element => element).ToList();

For the string:

This is a test for "Splitting a string" that has white spaces, unless they are "enclosed within quotes"

It gives the result:

This
is
a
test
for
Splitting a string
that
has
white
spaces,
unless
they
are
enclosed within quotes
UPDATE
string myString = "WordOne \"Word Two\"";
var result = myString.Split('"')
                     .Select((element, index) => index % 2 == 0  // If even index
                                           ? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)  // Split the item
                                           : new string[] { element })  // Keep the entire item
                     .SelectMany(element => element).ToList();

Console.WriteLine(result[0]);
Console.WriteLine(result[1]);
Console.ReadKey();
UPDATE 2

How do you define a quoted portion of the string?

We will assume that the string before the first " is non-quoted.

Then, the string placed between the first " and before the second " is quoted. The string between the second " and the third " is non-quoted. The string between the third and the fourth is quoted, ...

The general rule is: Each string between the (2n-1)th (odd number) " and (2n)th (even number) " is quoted. (1)

What is the relation with String.Split?

String.Split with the default StringSplitOption (define as StringSplitOption.None) creates an list of 1 string and then add a new string in the list for each splitting character found.

So, before the first ", the string is at index 0 in the splitted array, between the first and second ", the string is at index 1 in the array, between the third and fourth, index 2, ...

The general rule is: The string between the nth and (n+1)th " is at index n in the array. (2)

The given (1) and (2), we can conclude that: Quoted portion are at odd index in the splitted array.

Solution 3 - C#

As custom parser might be more suitable for this.

This is something I wrote once when I had a specific (and very strange) parsing requirement that involved parenthesis and spaces, but it is generic enough that it should work with virtually any delimiter and text qualifier.

public static IEnumerable<String> ParseText(String line, Char delimiter, Char textQualifier)
{

    if (line == null)
        yield break;

    else
    {
        Char prevChar = '\0';
        Char nextChar = '\0';
        Char currentChar = '\0';

        Boolean inString = false;

        StringBuilder token = new StringBuilder();

        for (int i = 0; i < line.Length; i++)
        {
            currentChar = line[i];

            if (i > 0)
                prevChar = line[i - 1];
            else
                prevChar = '\0';

            if (i + 1 < line.Length)
                nextChar = line[i + 1];
            else
                nextChar = '\0';

            if (currentChar == textQualifier && (prevChar == '\0' || prevChar == delimiter) && !inString)
            {
                inString = true;
                continue;
            }

            if (currentChar == textQualifier && (nextChar == '\0' || nextChar == delimiter) && inString)
            {
                inString = false;
                continue;
            }

            if (currentChar == delimiter && !inString)
            {
                yield return token.ToString();
                token = token.Remove(0, token.Length);
                continue;
            }

            token = token.Append(currentChar);

        }

        yield return token.ToString();

    } 
}

The usage would be:

var parsedText = ParseText(streamR, ' ', '"');

Solution 4 - C#

You can use the TextFieldParser class that is part of the Microsoft.VisualBasic.FileIO namespace. (You'll need to add a reference to Microsoft.VisualBasic to your project.):

string inputString = "This is \"a test\" of the parser.";

using (MemoryStream ms = new MemoryStream(Encoding.ASCII.GetBytes(inputString)))
{
    using (Microsoft.VisualBasic.FileIO.TextFieldParser tfp = new TextFieldParser(ms))
    {
        tfp.Delimiters = new string[] { " " };
        tfp.HasFieldsEnclosedInQuotes = true;
        string[] output = tfp.ReadFields();

        for (int i = 0; i < output.Length; i++)
        {
            Console.WriteLine("{0}:{1}", i, output[i]);
        }
    }
}

Which generates the output:

0:This
1:is
2:a test
3:of
4:the
5:parser.

Solution 5 - C#

With support for double quotes.

String:

a "b b" "c ""c"" c"

Result:

a 
"b b"
"c ""c"" c"

Code:

var list=Regex.Matches(value, @"\""(\""\""|[^\""])+\""|[^ ]+", 
    RegexOptions.ExplicitCapture)
			.Cast<Match>()
			.Select(m => m.Value)
			.ToList();

Optional remove double quotes:

Select(m => m.StartsWith("\"") ? m.Substring(1, m.Length - 2).Replace("\"\"", "\"") : m)

Result

a 
b b
c "c" c

Solution 6 - C#

There's just a tiny problem with Squazz' answer.. it works for his string, but not if you add more items. E.g.

string myString = "WordOne \"Word Two\" Three"

In that case, the removal of the last quotation mark would get us 4 results, not three.

That's easily fixed though.. just count the number of escape characters, and if it's uneven, strip the last (adapt as per your requirements..)

    public static List<String> Split(this string myString, char separator, char escapeCharacter)
    {
        int nbEscapeCharactoers = myString.Count(c => c == escapeCharacter);
        if (nbEscapeCharactoers % 2 != 0) // uneven number of escape characters
        {
            int lastIndex = myString.LastIndexOf("" + escapeCharacter, StringComparison.Ordinal);
            myString = myString.Remove(lastIndex, 1); // remove the last escape character
        }
        var result = myString.Split(escapeCharacter)
                             .Select((element, index) => index % 2 == 0  // If even index
                                                   ? element.Split(new[] { separator }, StringSplitOptions.RemoveEmptyEntries)  // Split the item
                                                   : new string[] { element })  // Keep the entire item
                             .SelectMany(element => element).ToList();
        return result;
    }

I also turned it into an extension method and made separator and escape character configurable.

Solution 7 - C#

OP wanted to

> ... remove all spaces EXCEPT for the spaces found between quotation marks

The solution from Cédric Bignon almost did this, but didn't take into account that there could be an uneven number of quotation marks. Starting out by checking for this, and then removing the excess ones, ensures that we only stop splitting if the element really is encapsulated by quotation marks.

string myString = "WordOne \"Word Two";
int placement = myString.LastIndexOf("\"", StringComparison.Ordinal);
if (placement >= 0)
myString = myString.Remove(placement, 1);

var result = myString.Split('"')
                     .Select((element, index) => index % 2 == 0  // If even index
                                           ? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)  // Split the item
                                           : new string[] { element })  // Keep the entire item
                     .SelectMany(element => element).ToList();

Console.WriteLine(result[0]);
Console.WriteLine(result[1]);
Console.ReadKey();

Credit for the logic goes to Cédric Bignon, I only added a safeguard.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTeachmeView Question on Stackoverflow
Solution 1 - C#I4VView Answer on Stackoverflow
Solution 2 - C#Cédric BignonView Answer on Stackoverflow
Solution 3 - C#psubsee2003View Answer on Stackoverflow
Solution 4 - C#John KoernerView Answer on Stackoverflow
Solution 5 - C#kuxView Answer on Stackoverflow
Solution 6 - C#user3566056View Answer on Stackoverflow
Solution 7 - C#SquazzView Answer on Stackoverflow