Remove all special characters from a string in R?

RegexStringRCharacter

Regex Problem Overview


How to remove all special characters from string in R and replace them with spaces ?

Some special characters to remove are : ~!@#$%^&*(){}_+:"<>?,./;'[]-=

I've tried regex with [:punct:] pattern but it removes only punctuation marks.

Question 2 : And how to remove characters from foreign languages like : â í ü Â á ą ę ś ć ?

Answer : Use [^[:alnum:]] to remove~!@#$%^&*(){}_+:"<>?,./;'[]-= and use [^a-zA-Z0-9] to remove also â í ü Â á ą ę ś ć in regex or regexpr functions.

Solution in base R :

x <- "a1~!@#$%^&*(){}_+:\"<>?,./;'[]-=" 
gsub("[[:punct:]]", "", x)  # no libraries needed

Regex Solutions


Solution 1 - Regex

You need to use regular expressions to identify the unwanted characters. For the most easily readable code, you want the str_replace_all from the stringr package, though gsub from base R works just as well.

The exact regular expression depends upon what you are trying to do. You could just remove those specific characters that you gave in the question, but it's much easier to remove all punctuation characters.

x <- "a1~!@#$%^&*(){}_+:\"<>?,./;'[]-=" #or whatever
str_replace_all(x, "[[:punct:]]", " ")

(The base R equivalent is gsub("[[:punct:]]", " ", x).)

An alternative is to swap out all non-alphanumeric characters.

str_replace_all(x, "[^[:alnum:]]", " ")

Note that the definition of what constitutes a letter or a number or a punctuatution mark varies slightly depending upon your locale, so you may need to experiment a little to get exactly what you want.

Solution 2 - Regex

Instead of using regex to remove those "crazy" characters, just convert them to ASCII, which will remove accents, but will keep the letters.

astr <- "Ábcdêãçoàúü"
iconv(astr, from = 'UTF-8', to = 'ASCII//TRANSLIT')

which results in

[1] "Abcdeacoauu"

Solution 3 - Regex

Convert the Special characters to apostrophe,

Data  <- gsub("[^0-9A-Za-z///' ]","'" , Data ,ignore.case = TRUE)

Below code it to remove extra ''' apostrophe

Data <- gsub("''","" , Data ,ignore.case = TRUE)

Use gsub(..) function for replacing the special character with apostrophe

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionQbikView Question on Stackoverflow
Solution 1 - RegexRichie CottonView Answer on Stackoverflow
Solution 2 - RegexFelipe AlvarengaView Answer on Stackoverflow
Solution 3 - RegexUMESH NITNAWAREView Answer on Stackoverflow