How can I strip punctuation from a string?
C#StringC# Problem Overview
For the hope-to-have-an-answer-in-30-seconds part of this question, I'm specifically looking for C#
But in the general case, what's the best way to strip punctuation in any language?
I should add: Ideally, the solutions won't require you to enumerate all the possible punctuation marks.
Related: Strip Punctuation in Python
C# Solutions
Solution 1 - C#
new string(myCharCollection.Where(c => !char.IsPunctuation(c)).ToArray());
Solution 2 - C#
Why not simply:
string s = "sxrdct?fvzguh,bij."; var sb = new StringBuilder();foreach (char c in s) { if (!char.IsPunctuation(c)) sb.Append(c); }
s = sb.ToString();
The usage of RegEx is normally slower than simple char operations. And those LINQ operations look like overkill to me. And you can't use such code in .NET 2.0...
Solution 3 - C#
Describes intent, easiest to read (IMHO) and best performing:
s = s.StripPunctuation();
to implement:
public static class StringExtension
{
public static string StripPunctuation(this string s)
{
var sb = new StringBuilder();
foreach (char c in s)
{
if (!char.IsPunctuation(c))
sb.Append(c);
}
return sb.ToString();
}
}
This is using Hades32's algorithm which was the best performing of the bunch posted.
Solution 4 - C#
Assuming "best" means "simplest" I suggest using something like this:
String stripped = input.replaceAll("\\p{Punct}+", "");
This example is for Java, but all sufficiently modern Regex engines should support this (or something similar).
Edit: the Unicode-Aware version would be this:
String stripped = input.replaceAll("\\p{P}+", "");
The first version only looks at punctuation characters contained in ASCII.
Solution 5 - C#
You can use the regex.replace method:
replace(YourString, RegularExpressionWithPunctuationMarks, Empty String)
Since this returns a string, your method will look something like this:
string s = Regex.Replace("Hello!?!?!?!", "[?!]", "");
You can replace "[?!]" with something more sophiticated if you want:
(\p{P})
This should find any punctuation.
Solution 6 - C#
This thread is so old, but I'd be remiss not to post a more elegant (IMO) solution.
string inputSansPunc = input.Where(c => !char.IsPunctuation(c)).Aggregate("", (current, c) => current + c);
It's LINQ sans WTF.
Solution 7 - C#
Based off GWLlosa's idea, I was able to come up with the supremely ugly, but working:
> string s = "cat!";
> s = s.ToCharArray().ToList
Solution 8 - C#
The most braindead simple way of doing it would be using string.replace
The other way I would imagine is a regex.replace and have your regular expression with all the appropriate punctuation marks in it.
Solution 9 - C#
Here's a slightly different approach using linq. I like AviewAnew's but this avoids the Aggregate
string myStr = "Hello there..';,]';';., Get rid of Punction";
var s = from ch in myStr
where !Char.IsPunctuation(ch)
select ch;
var bytes = UnicodeEncoding.ASCII.GetBytes(s.ToArray());
var stringResult = UnicodeEncoding.ASCII.GetString(bytes);
Solution 10 - C#
If you want to use this for tokenizing text you can use:
new string(myText.Select(c => char.IsPunctuation(c) ? ' ' : c).ToArray())
Solution 11 - C#
For anyone who would like to do this via RegEx:
This code shows the full RegEx replace process and gives a sample Regex that only keeps letters, numbers, and spaces in a string - replacing ALL other characters with an empty string:
//Regex to remove all non-alphanumeric characters
System.Text.RegularExpressions.Regex TitleRegex = new
System.Text.RegularExpressions.Regex("[^a-z0-9 ]+",
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
string ParsedString = TitleRegex.Replace(stringToParse, String.Empty);
return ParsedString;
Solution 12 - C#
I faced the same issue and was concerned about the performance impact of calling the IsPunctuation for every single check.
I found this post: http://www.dotnetperls.com/char-ispunctuation.
Accross the lines: char.IsPunctuation also handles Unicode on top of ASCII. The method matches a bunch of characters including control characters. By definiton, this method is heavy and expensive.
The bottom line is that I finally didn't go for it because of its performance impact on my ETL process.
I went for the custom implemetation of dotnetperls.
And jut FYI, here is some code deduced from the previous answers to get the list of all punctuation characters (excluding the control ones):
var punctuationCharacters = new List<char>();
for (int i = char.MinValue; i <= char.MaxValue; i++)
{
var character = Convert.ToChar(i);
if (char.IsPunctuation(character) && !char.IsControl(character))
{
punctuationCharacters.Add(character);
}
}
var commaSeparatedValueOfPunctuationCharacters = string.Join("", punctuationCharacters);
Console.WriteLine(commaSeparatedValueOfPunctuationCharacters);
Cheers, Andrew
Solution 13 - C#
$newstr=ereg_replace("[[:punct:]]",'',$oldstr);
Solution 14 - C#
For long strings I use this:
var normalized = input
.Where(c => !char.IsPunctuation(c))
.Aggregate(new StringBuilder(),
(current, next) => current.Append(next), sb => sb.ToString());
performs much better than using string concatenations (though I agree it's less intuitive).
Solution 15 - C#
This is simple code for removing punctuation from strings given by the user
Import required library
import string
Ask input from user in string format
strs = str(input('Enter your string:'))
for c in string.punctuation:
strs= strs.replace(c,"")
print(f"\n Your String without punctuation:{strs}")
Solution 16 - C#
#include<string>
#include<cctype>
using namespace std;
int main(int a, char* b[]){
string strOne = "H,e.l/l!o W#o@r^l&d!!!";
int punct_count = 0;
cout<<"before : "<<strOne<<endl;
for(string::size_type ix = 0 ;ix < strOne.size();++ix)
{
if(ispunct(strOne[ix]))
{
++punct_count;
strOne.erase(ix,1);
ix--;
}//if
}
cout<<"after : "<<strOne<<endl;
return 0;
}//main