How can I strip punctuation from a string?

C#String

C# Problem Overview


For the hope-to-have-an-answer-in-30-seconds part of this question, I'm specifically looking for C#

But in the general case, what's the best way to strip punctuation in any language?

I should add: Ideally, the solutions won't require you to enumerate all the possible punctuation marks.

Related: Strip Punctuation in Python

C# Solutions


Solution 1 - C#

new string(myCharCollection.Where(c => !char.IsPunctuation(c)).ToArray());

Solution 2 - C#

Why not simply:

string s = "sxrdct?fvzguh,bij.";
var sb = new StringBuilder();

foreach (char c in s) { if (!char.IsPunctuation(c)) sb.Append(c); }

s = sb.ToString();

The usage of RegEx is normally slower than simple char operations. And those LINQ operations look like overkill to me. And you can't use such code in .NET 2.0...

Solution 3 - C#

Describes intent, easiest to read (IMHO) and best performing:

 s = s.StripPunctuation();

to implement:

public static class StringExtension
{
    public static string StripPunctuation(this string s)
    {
        var sb = new StringBuilder();
        foreach (char c in s)
        {
            if (!char.IsPunctuation(c))
                sb.Append(c);
        }
        return sb.ToString();
    }
}

This is using Hades32's algorithm which was the best performing of the bunch posted.

Solution 4 - C#

Assuming "best" means "simplest" I suggest using something like this:

String stripped = input.replaceAll("\\p{Punct}+", "");

This example is for Java, but all sufficiently modern Regex engines should support this (or something similar).

Edit: the Unicode-Aware version would be this:

String stripped = input.replaceAll("\\p{P}+", "");

The first version only looks at punctuation characters contained in ASCII.

Solution 5 - C#

You can use the regex.replace method:

 replace(YourString, RegularExpressionWithPunctuationMarks, Empty String)

Since this returns a string, your method will look something like this:

 string s = Regex.Replace("Hello!?!?!?!", "[?!]", "");

You can replace "[?!]" with something more sophiticated if you want:

(\p{P})

This should find any punctuation.

Solution 6 - C#

This thread is so old, but I'd be remiss not to post a more elegant (IMO) solution.

string inputSansPunc = input.Where(c => !char.IsPunctuation(c)).Aggregate("", (current, c) => current + c);

It's LINQ sans WTF.

Solution 7 - C#

Based off GWLlosa's idea, I was able to come up with the supremely ugly, but working:

> string s = "cat!"; > s = s.ToCharArray().ToList() > .Where(x => !char.IsPunctuation(x)) > .Aggregate(string.Empty, new Func( > delegate(string s, char c) { return s + c; }));

Solution 8 - C#

The most braindead simple way of doing it would be using string.replace

The other way I would imagine is a regex.replace and have your regular expression with all the appropriate punctuation marks in it.

Solution 9 - C#

Here's a slightly different approach using linq. I like AviewAnew's but this avoids the Aggregate

        string myStr = "Hello there..';,]';';., Get rid of Punction";

        var s = from ch in myStr
                where !Char.IsPunctuation(ch)
                select ch;

        var bytes = UnicodeEncoding.ASCII.GetBytes(s.ToArray());
        var stringResult = UnicodeEncoding.ASCII.GetString(bytes);

Solution 10 - C#

If you want to use this for tokenizing text you can use:

new string(myText.Select(c => char.IsPunctuation(c) ? ' ' : c).ToArray())

Solution 11 - C#

For anyone who would like to do this via RegEx:

This code shows the full RegEx replace process and gives a sample Regex that only keeps letters, numbers, and spaces in a string - replacing ALL other characters with an empty string:

//Regex to remove all non-alphanumeric characters
System.Text.RegularExpressions.Regex TitleRegex = new 
System.Text.RegularExpressions.Regex("[^a-z0-9 ]+", 
System.Text.RegularExpressions.RegexOptions.IgnoreCase);

string ParsedString = TitleRegex.Replace(stringToParse, String.Empty);

return ParsedString;

Solution 12 - C#

I faced the same issue and was concerned about the performance impact of calling the IsPunctuation for every single check.

I found this post: http://www.dotnetperls.com/char-ispunctuation.

Accross the lines: char.IsPunctuation also handles Unicode on top of ASCII. The method matches a bunch of characters including control characters. By definiton, this method is heavy and expensive.

The bottom line is that I finally didn't go for it because of its performance impact on my ETL process.

I went for the custom implemetation of dotnetperls.

And jut FYI, here is some code deduced from the previous answers to get the list of all punctuation characters (excluding the control ones):

var punctuationCharacters = new List<char>();
		
		for (int i = char.MinValue; i <= char.MaxValue; i++)
		{
			var character = Convert.ToChar(i);
			
			if (char.IsPunctuation(character) && !char.IsControl(character))
			{
				punctuationCharacters.Add(character);
			}
		}
		
		var commaSeparatedValueOfPunctuationCharacters = string.Join("", punctuationCharacters);

		Console.WriteLine(commaSeparatedValueOfPunctuationCharacters);

Cheers, Andrew

Solution 13 - C#

$newstr=ereg_replace("[[:punct:]]",'',$oldstr);

Solution 14 - C#

For long strings I use this:

var normalized = input
                .Where(c => !char.IsPunctuation(c))
                .Aggregate(new StringBuilder(),
                           (current, next) => current.Append(next), sb => sb.ToString());

performs much better than using string concatenations (though I agree it's less intuitive).

Solution 15 - C#

This is simple code for removing punctuation from strings given by the user

Import required library

    import string

Ask input from user in string format

    strs = str(input('Enter your string:'))

    for c in string.punctuation:
        strs= strs.replace(c,"")
    print(f"\n Your String without punctuation:{strs}")

Solution 16 - C#

#include<string>
    #include<cctype>
    using namespace std;

    int main(int a, char* b[]){
    string strOne = "H,e.l/l!o W#o@r^l&d!!!";
    int punct_count = 0;

cout<<"before : "<<strOne<<endl;
for(string::size_type ix = 0 ;ix < strOne.size();++ix)   
{   
	if(ispunct(strOne[ix])) 
	{
  			++punct_count;  
  			strOne.erase(ix,1); 
  			ix--;
	}//if
}
    cout<<"after : "<<strOne<<endl;
                  return 0;
    }//main

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTom RitterView Question on Stackoverflow
Solution 1 - C#GWLlosaView Answer on Stackoverflow
Solution 2 - C#Hades32View Answer on Stackoverflow
Solution 3 - C#Brian LowView Answer on Stackoverflow
Solution 4 - C#Joachim SauerView Answer on Stackoverflow
Solution 5 - C#AntonView Answer on Stackoverflow
Solution 6 - C#Nick VaccaroView Answer on Stackoverflow
Solution 7 - C#Tom RitterView Answer on Stackoverflow
Solution 8 - C#TheTXIView Answer on Stackoverflow
Solution 9 - C#JoshBerkeView Answer on Stackoverflow
Solution 10 - C#Chris MarisicView Answer on Stackoverflow
Solution 11 - C#S. Justin GengoView Answer on Stackoverflow
Solution 12 - C#AndrewView Answer on Stackoverflow
Solution 13 - C#Ash YoussefView Answer on Stackoverflow
Solution 14 - C#Shay Ben-SassonView Answer on Stackoverflow
Solution 15 - C#M KailasView Answer on Stackoverflow
Solution 16 - C#brainView Answer on Stackoverflow