Remove not alphanumeric characters from string

JavascriptRegex

Javascript Problem Overview


I want to convert the following string to the provided output.

Input:  "\\test\red\bob\fred\new"
Output: "testredbobfrednew"

I've not found any solution that will handle special characters like \r, \n, \b, etc.

Basically I just want to get rid of anything that is not alphanumeric. Here is what I've tried...

Attempt 1: "\\test\red\bob\fred\new".replace(/[_\W]+/g, "");
Output 1:  "testedobredew"

Attempt 2: "\\test\red\bob\fred\new".replace(/['`~!@#$%^&*()_|+-=?;:'",.<>\{\}\[\]\\\/]/gi, "");
Output 2:  "testedobred [newline] ew"

Attempt 3: "\\test\red\bob\fred\new".replace(/[^a-zA-Z0-9]/, "");
Output 3:  "testedobred [newline] ew"

Attempt 4: "\\test\red\bob\fred\new".replace(/[^a-z0-9\s]/gi, '');
Output 4:  "testedobred [newline] ew"

One other attempt with multiple steps

function cleanID(id) {
	id = id.toUpperCase();
	id = id.replace( /\t/ , "T");
	id = id.replace( /\n/ , "N");
	id = id.replace( /\r/ , "R");
	id = id.replace( /\b/ , "B");
	id = id.replace( /\f/ , "F");
	return id.replace( /[^a-zA-Z0-9]/ , "");
}

with results

Attempt 1: cleanID("\\test\red\bob\fred\new");
Output 1: "BTESTREDOBFREDNEW"

Any help would be appreciated.

Working Solution:

Final Attempt 1: return JSON.stringify("\\test\red\bob\fred\new").replace( /\W/g , '');
Output 1: "testredbobfrednew"

Javascript Solutions


Solution 1 - Javascript

##Removing non-alphanumeric chars

The following is the/a correct regex to strip non-alphanumeric chars from an input string:

input.replace(/\W/g, '')

Note that \W is the equivalent of [^0-9a-zA-Z_] - it includes the underscore character. To also remove underscores use e.g.:

input.replace(/[^0-9a-z]/gi, '')

##The input is malformed

Since the test string contains various escaped chars, which are not alphanumeric, it will remove them.

A backslash in the string needs escaping if it's to be taken literally:

"\\test\\red\\bob\\fred\\new".replace(/\W/g, '')
"testredbobfrednew" // output

##Handling malformed strings

If you're not able to escape the input string correctly (why not?), or it's coming from some kind of untrusted/misconfigured source - you can do something like this:

JSON.stringify("\\test\red\bob\fred\new").replace(/\W/g, '')
"testredbobfrednew" // output

Note that the json representation of a string includes the quotes:

JSON.stringify("\\test\red\bob\fred\new")
""\\test\red\bob\fred\new""

But they are also removed by the replacement regex.

Solution 2 - Javascript

All of the current answers still have quirks, the best thing I could come up with was:

string.replace(/[^A-Za-z0-9]/g, '');

Here's an example that captures every key I could find on the keyboard:

var string = '123abcABC-_*(!@#$%^&*()_-={}[]:\"<>,.?/~`';
var stripped = string.replace(/[^A-Za-z0-9]/g, '');
console.log(stripped);

Outputs: '123abcABC'.

Solution 3 - Javascript

The problem is not with how you replace the characters, the problem is with how you input the string.

It's only the first backslash in the input that is a backslash character, the others are part of the control characters \r, \b, \f and \n.

As those backslashes are not separate characters, but part of the notation to write a single control characters, they can't be removed separately. I.e. you can't remove the backslash from \n as it's not two separate characters, it's the way that you write the control character LF, or line feed.

If you acutally want to turn that input into the desired output, you would need to replace each control character with the corresponding letter, e.g. replace the character \n with the character n.

To replace a control character you need to use a character set like [\r], as \r has a special meaning in a regular expression:

var input = "\\test\red\bob\fred\new";

var output = input
    .replace(/[\r]/g, 'r')
    .replace(/[\b]/g, 'b')
    .replace(/[\f]/g, 'f')
    .replace(/[\n]/g, 'n')
    .replace(/\\/g, '');

Demo: http://jsfiddle.net/SAp4W/

Solution 4 - Javascript

You can try this regex:

value.replace(/[\W_]/g, '');

Solution 5 - Javascript

To include Arabic letters alongside with English letters, you can use:

// Output: نصعربي
"ن$%^&*(ص ع___ربي".replace(/[^0-9a-z\u0600-\u06FF]/gi, '');

Solution 6 - Javascript

Here is an example that you can use,

function removeNonAlphaNumeric(str){
    return str.replace(/[\W_]/g,"");
}

removeNonAlphaNumeric("0_0 (: /-\ :) 0-0");

Solution 7 - Javascript

If you have the case of another language in addition to English you need to add the relative block range from unicode. Here is an example for Cyrillic:

.replace(/[^0-9A-Za-z_\u0400-\u04FF]/gi, '')

Solution 8 - Javascript

This removes all non-alphanumeric characters, preserves capitalization, and preserves spaces between words.

function alpha_numeric_filter (string) {

  const alpha_numeric = Array.from('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' + ' ')

  const json_string = JSON.stringify(string)

  let filterd_string = ''

  for (let i = 0; i < json_string.length; i++) {

    let char = json_string[i]
    let index = alpha_numeric.indexOf(char)
    if (index > -1) {
      filterd_string += alpha_numeric[index]
    }

  }
  
  return filterd_string
  
}

const input = "\\test\red\bob\fred\new"
console.log(alpha_numeric_filter(input)) //=> testredbobfrednew

const complex_string = "/_&_This!&!! is!@#$% a%^&*() Sentence+=-[]{} 123:;\|\\]||~`/.,><"
console.log(alpha_numeric_filter(complex_string)) //=> This is a Sentence 123

Solution 9 - Javascript

You can use \p{L} or \p{Letter} to find letters from any language and \d to find digits.

str.replace(/[^\p{L}\d]/gu, '')

^ to negate character set: not \P{L} and not \d

Flags:

  • g (global) to perform as many replacements as necessary
  • u (unicode) to recognize Unicode escape sequences (like \p{L}).

Example:

function removeNonAlphaNumeric (str) {
  return str.replace(/[^\p{L}\d]/gu, '')
}

sequences = [
  'asdé5kfjdk?',
  'uQjoFß^ßI$jI',
  '无论3如何?!',
  'фв@#ео1'
]

for (seq of sequences) {
  console.log(removeNonAlphaNumeric(seq))
}

Solution 10 - Javascript

If you want to have this \\test\red\bob\fred\new string, you should escape all backslashes (\). When you write \\test\\red\\bob\\fred\\new your string actually contains single backslashes. You can be sure of this printing your string.
So if backslashes in your string are escaped myString.replace(/\W/g,'') will work normally.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBobby CannonView Question on Stackoverflow
Solution 1 - JavascriptAD7sixView Answer on Stackoverflow
Solution 2 - JavascriptDeminetixView Answer on Stackoverflow
Solution 3 - JavascriptGuffaView Answer on Stackoverflow
Solution 4 - JavascriptmyrcutioView Answer on Stackoverflow
Solution 5 - JavascriptAbdulrahman HashemView Answer on Stackoverflow
Solution 6 - Javascriptravi kishoreView Answer on Stackoverflow
Solution 7 - JavascriptRomanView Answer on Stackoverflow
Solution 8 - JavascriptFlavioView Answer on Stackoverflow
Solution 9 - JavascriptLedorubView Answer on Stackoverflow
Solution 10 - Javascriptshift66View Answer on Stackoverflow