Replace all non alphanumeric characters, new lines, and multiple white space with one space

JavascriptRegexReplaceAlphanumeric

Javascript Problem Overview


I'm looking for a neat regex solution to replace

  • All non alphanumeric characters
  • All newlines
  • All multiple instances of white space

With a single space


For those playing at home (the following does work)

text.replace(/[^a-z0-9]/gmi, " ").replace(/\s+/g, " ");

My thinking is regex is probably powerful enough to achieve this in one statement. The components I think I'd need are

  • [^a-z0-9] - to remove non alphanumeric characters
  • \s+ - match any collections of spaces
  • \r?\n|\r - match all new line
  • /gmi - global, multi-line, case insensitive

However, I can't seem to style the regex in the right way (the following doesn't work)

text.replace(/[^a-z0-9]|\s+|\r?\n|\r/gmi, " ");

Input

234&^%,Me,2 2013 1080p x264 5 1 BluRay
S01(*&asd 05
S1E5
1x05
1x5

Desired Output

234 Me 2 2013 1080p x264 5 1 BluRay S01 asd 05 S1E5 1x05 1x5

Javascript Solutions


Solution 1 - Javascript

Be aware, that \W leaves the underscore. A short equivalent for [^a-zA-Z0-9] would be [\W_]

text.replace(/[\W_]+/g," ");

\W is the negation of shorthand \w for [A-Za-z0-9_] word characters (including the underscore)

Example at regex101.com

Solution 2 - Javascript

Jonny 5 beat me to it. I was going to suggest using the \W+ without the \s as in text.replace(/\W+/g, " "). This covers white space as well.

Solution 3 - Javascript

Since [^a-z0-9] character class contains all that is not alnum, it contains white characters too!

 text.replace(/[^a-z0-9]+/gi, " ");

Solution 4 - Javascript

Well I think you just need to add a quantifier to each pattern. Also the carriage-return thing is a little funny:

text.replace(/[^a-z0-9]+|\s+/gmi, " ");

edit The \s thing matches \r and \n too.

Solution 5 - Javascript

Update

Please be aware, the browser landscape changes rapidly, these benchmarks would be woefully out of date, and likely misleading at the time you reading this.


This is an old post of mine, the other answers are good for the most part. However I decided to benchmark each solution and another obvious one (just for fun). I wondered if there was a difference between the regex patterns on different browsers with different sized strings.

So basically I used jsPerf on

  • Testing in Chrome 65.0.3325 / Windows 10 0.0.0
  • Testing in Edge 16.16299.0 / Windows 10 0.0.0

The regex patterns I tested were

  • /[\W_]+/g
  • /[^a-z0-9]+/gi
  • /[^a-zA-Z0-9]+/g

I loaded them up with a string length of random characters

  • length 5000
  • length 1000
  • length 200

Example javascript I used var newstr = str.replace(/[\W_]+/g," ");

Each run consisted of 50 or more sample on each regex, and i run them 5 times on each browser.

Lets race our horses!

Results

                                Chrome                  Edge
Chars   Pattern                 Ops/Sec     Deviation   Op/Sec      Deviation
------------------------------------------------------------------------
5,000	/[\W_]+/g				 19,977.80	1.09		 10,820.40	1.32
5,000	/[^a-z0-9]+/gi			 19,901.60	1.49		 10,902.00	1.20
5,000	/[^a-zA-Z0-9]+/g		 19,559.40	1.96		 10,916.80	1.13
------------------------------------------------------------------------
1,000	/[\W_]+/g				 96,239.00	1.65		 52,358.80	1.41
1,000	/[^a-z0-9]+/gi			 97,584.40	1.18		 52,105.00	1.60
1,000	/[^a-zA-Z0-9]+/g		 96,965.80	1.10		 51,864.60	1.76
------------------------------------------------------------------------
  200	/[\W_]+/g				480,318.60	1.70		261,030.40	1.80
  200	/[^a-z0-9]+/gi			476,177.80	2.01		261,751.60	1.96
  200	/[^a-zA-Z0-9]+/g		486,423.00	0.80		258,774.20	2.15

Truth be known, Regex in both browsers (taking into consideration deviation) were nearly indistinguishable, however i think if it run this even more times the results would become a little more clearer (but not by much).

Theoretical scaling for 1 character

                            Chrome                        Edge
Chars   Pattern             Ops/Sec     Scaled            Op/Sec    Scaled
------------------------------------------------------------------------
5,000 	/[\W_]+/g			 19,977.80	99,889,000		 10,820.40	54,102,000
5,000 	/[^a-z0-9]+/gi		 19,901.60	99,508,000		 10,902.00	54,510,000
5,000 	/[^a-zA-Z0-9]+/g	 19,559.40	97,797,000		 10,916.80	54,584,000
------------------------------------------------------------------------

1,000 	/[\W_]+/g			 96,239.00	96,239,000		 52,358.80	52,358,800
1,000 	/[^a-z0-9]+/gi		 97,584.40	97,584,400		 52,105.00	52,105,000
1,000 	/[^a-zA-Z0-9]+/g	 96,965.80	96,965,800		 51,864.60	51,864,600
------------------------------------------------------------------------

  200 	/[\W_]+/g			480,318.60	96,063,720		261,030.40	52,206,080
  200 	/[^a-z0-9]+/gi		476,177.80	95,235,560		261,751.60	52,350,320
  200 	/[^a-zA-Z0-9]+/g	486,423.00	97,284,600		258,774.20	51,754,840

I wouldn't take to much into these results as this is not really a significant differences, all we can really tell is edge is slower :o . Additionally that i was super bored.

Anyway you can run the benchmark for your self.

Jsperf Benchmark here

Solution 6 - Javascript

A saw a different post that also had diacritical marks, which is great

s.replace(/[^a-zA-Z0-9À-ž\s]/g, "")

Solution 7 - Javascript

To replace with dashes, do the following:

text.replace(/[\W_-]/g,' ');

Solution 8 - Javascript

For anyone still strugging (like me...) after the above more expert replies, this works in Visual Studio 2019:

outputString = Regex.Replace(inputString, @"\W", "_");

Remember to add

using System.Text.RegularExpressions;

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTheGeneralView Question on Stackoverflow
Solution 1 - JavascriptJonny 5View Answer on Stackoverflow
Solution 2 - JavascriptT-CatSanView Answer on Stackoverflow
Solution 3 - JavascriptCasimir et HippolyteView Answer on Stackoverflow
Solution 4 - JavascriptPointyView Answer on Stackoverflow
Solution 5 - JavascriptTheGeneralView Answer on Stackoverflow
Solution 6 - JavascriptDmitri R117View Answer on Stackoverflow
Solution 7 - JavascriptGregory R.View Answer on Stackoverflow
Solution 8 - JavascriptegginstoneView Answer on Stackoverflow