How to remove non-alphanumeric characters?

PhpRegexString

Php Problem Overview


I need to remove all characters from a string which aren't in a-z A-Z 0-9 set or are not spaces.

Does anyone have a function to do this?

Php Solutions


Solution 1 - Php

Sounds like you almost knew what you wanted to do already, you basically defined it as a regex.

preg_replace("/[^A-Za-z0-9 ]/", '', $string);

Solution 2 - Php

For unicode characters, it is :

preg_replace("/[^[:alnum:][:space:]]/u", '', $string);

Solution 3 - Php

Regular expression is your answer.

$str = preg_replace('/[^a-z\d ]/i', '', $str);
  • The i stands for case insensitive.

  • ^ means, does not start with.

  • \d matches any digit.

  • a-z matches all characters between a and z. Because of the i parameter you don't have to specify a-z and A-Z.

  • After \d there is a space, so spaces are allowed in this regex.

Solution 4 - Php

If you need to support other languages, instead of the typical A-Z, you can use the following:

preg_replace('/[^\p{L}\p{N} ]+/', '', $string);
  • [^\p{L}\p{N} ] defines a negated (It will match a character that is not defined) character class of:
    • \p{L}: a letter from any language.
    • \p{N}: a numeric character in any script.
    • : a space character.
  • + greedily matches the character class between 1 and unlimited times.

This will preserve letters and numbers from other languages and scripts as well as A-Z:

preg_replace('/[^\p{L}\p{N} ]+/', '', 'hello-world'); // helloworld
preg_replace('/[^\p{L}\p{N} ]+/', '', 'abc@~#123-+=öäå'); // abc123öäå
preg_replace('/[^\p{L}\p{N} ]+/', '', '你好世界!@£$%^&*()'); // 你好世界

Note: This is a very old, but still relevant question. I am answering purely to provide supplementary information that may be useful to future visitors.

Solution 5 - Php

here's a really simple regex for that:

\W|_

and used as you need it (with a forward / slash delimiter).

preg_replace("/\W|_/", '', $string);

Test it here with this great tool that explains what the regex is doing:

http://www.regexr.com/

Solution 6 - Php

[\W_]+

 

$string = preg_replace("/[\W_]+/u", '', $string);

It select all not A-Z, a-z, 0-9 and delete it.

See example here: https://regexr.com/3h1rj

Solution 7 - Php

preg_replace("/\W+/", '', $string)

You can test it here : http://regexr.com/

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionzuk1View Question on Stackoverflow
Solution 1 - PhpChad BirchView Answer on Stackoverflow
Solution 2 - PhpvoondoView Answer on Stackoverflow
Solution 3 - PhpraspiView Answer on Stackoverflow
Solution 4 - PhpJonathonView Answer on Stackoverflow
Solution 5 - PhpAlex StephensView Answer on Stackoverflow
Solution 6 - PhpIntactoView Answer on Stackoverflow
Solution 7 - PhpPASTAGAView Answer on Stackoverflow