How to remove all non-alpha numeric characters from a string in MySQL?

MysqlRegexStringAlphanumeric

Mysql Problem Overview


I'm working on a routine that compares strings, but for better efficiency I need to remove all characters that are not letters or numbers.

I'm using multiple REPLACE functions now, but maybe there is a faster and nicer solution ?

Mysql Solutions


Solution 1 - Mysql

Using MySQL 8.0 or higher

Courtesy of michal.jakubeczy's answer below, replacing by Regex is now supported by MySQL:

UPDATE {table} SET {column} = REGEXP_REPLACE({column}, '[^0-9a-zA-Z ]', '')

Using MySQL 5.7 or lower

Regex isn't supported here. I had to create my own function called alphanum which stripped the chars for me:

DROP FUNCTION IF EXISTS alphanum; 
DELIMITER | 
CREATE FUNCTION alphanum( str CHAR(255) ) RETURNS CHAR(255) DETERMINISTIC
BEGIN 
  DECLARE i, len SMALLINT DEFAULT 1; 
  DECLARE ret CHAR(255) DEFAULT ''; 
  DECLARE c CHAR(1);
  IF str IS NOT NULL THEN 
    SET len = CHAR_LENGTH( str ); 
    REPEAT 
      BEGIN 
        SET c = MID( str, i, 1 ); 
        IF c REGEXP '[[:alnum:]]' THEN 
          SET ret=CONCAT(ret,c); 
        END IF; 
        SET i = i + 1; 
      END; 
    UNTIL i > len END REPEAT; 
  ELSE
    SET ret='';
  END IF;
  RETURN ret; 
END | 
DELIMITER ; 

Now I can do:

select 'This works finally!', alphanum('This works finally!');

and I get:

+---------------------+---------------------------------+
| This works finally! | alphanum('This works finally!') |
+---------------------+---------------------------------+
| This works finally! | Thisworksfinally                |
+---------------------+---------------------------------+
1 row in set (0.00 sec)

Hurray!

Solution 2 - Mysql

From a performance point of view, (and on the assumption that you read more than you write)

I think the best way would be to pre calculate and store a stripped version of the column, This way you do the transform less.

You can then put an index on the new column and get the database to do the work for you.

Solution 3 - Mysql

Since MySQL 8.0 you can use regular expression to remove non alphanumeric characters from a string. There is method REGEXP_REPLACE

Here is the code to remove non-alphanumeric characters:

UPDATE {table} SET {column} = REGEXP_REPLACE({column}, '[^0-9a-zA-Z ]', '')

Solution 4 - Mysql

SELECT teststring REGEXP '[[:alnum:]]+';

SELECT * FROM testtable WHERE test REGEXP '[[:alnum:]]+'; 

See: http://dev.mysql.com/doc/refman/5.1/en/regexp.html
Scroll down to the section that says: [:character_class:]

If you want to manipulate strings the fastest way will be to use a str_udf, see:
https://github.com/hholzgra/mysql-udf-regexp

Solution 5 - Mysql

Straight and battletested solution for latin and cyrillic characters:

DELIMITER //

CREATE FUNCTION `remove_non_numeric_and_letters`(input TEXT)
  RETURNS TEXT
  BEGIN
    DECLARE output TEXT DEFAULT '';
    DECLARE iterator INT DEFAULT 1;
    WHILE iterator < (LENGTH(input) + 1) DO
      IF SUBSTRING(input, iterator, 1) IN
         ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'А', 'Б', 'В', 'Г', 'Д', 'Е', 'Ж', 'З', 'И', 'Й', 'К', 'Л', 'М', 'Н', 'О', 'П', 'Р', 'С', 'Т', 'У', 'Ф', 'Х', 'Ц', 'Ч', 'Ш', 'Щ', 'Ъ', 'Ы', 'Ь', 'Э', 'Ю', 'Я', 'а', 'б', 'в', 'г', 'д', 'е', 'ж', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы', 'ь', 'э', 'ю', 'я')
      THEN
        SET output = CONCAT(output, SUBSTRING(input, iterator, 1));
      END IF;
      SET iterator = iterator + 1;
    END WHILE;
    RETURN output;
  END //

DELIMITER ;

Usage:

-- outputs "hello12356"
SELECT remove_non_numeric_and_letters('hello - 12356-привет ""]')

Solution 6 - Mysql

The fastest way I was able to find (and using ) is with convert().

from Doc. CONVERT() with USING is used to convert data between different character sets.

Example:

convert(string USING ascii)

In your case the right character set will be self defined

NOTE from Doc. The USING form of CONVERT() is available as of 4.1.0.

Solution 7 - Mysql

Based on the answer by Ryan Shillington, modified to work with strings longer than 255 characters and preserving spaces from the original string.

FYI there is lower(str) in the end.

I used this to compare strings:

DROP FUNCTION IF EXISTS spacealphanum;
DELIMITER $$
CREATE FUNCTION `spacealphanum`( str TEXT ) RETURNS TEXT CHARSET utf8
BEGIN 
  DECLARE i, len SMALLINT DEFAULT 1; 
  DECLARE ret TEXT DEFAULT ''; 
  DECLARE c CHAR(1); 
  SET len = CHAR_LENGTH( str ); 
  REPEAT 
    BEGIN 
      SET c = MID( str, i, 1 ); 
      IF c REGEXP '[[:alnum:]]' THEN 
        SET ret=CONCAT(ret,c); 
      ELSEIF  c = ' ' THEN
          SET ret=CONCAT(ret," ");
      END IF; 
      SET i = i + 1; 
    END; 
  UNTIL i > len END REPEAT; 
  SET ret = lower(ret);
  RETURN ret; 
  END $$
  DELIMITER ;

Solution 8 - Mysql

I have written this UDF. However, it only trims special characters at the beginning of the string. It also converts the string to lower case. You can update this function if desired.

DELIMITER //

DROP FUNCTION IF EXISTS DELETE_DOUBLE_SPACES//

CREATE FUNCTION DELETE_DOUBLE_SPACES ( title VARCHAR(250) )
RETURNS VARCHAR(250) DETERMINISTIC
BEGIN
    DECLARE result VARCHAR(250);
    SET result = REPLACE( title, '  ', ' ' );
    WHILE (result <> title) DO 
    	SET title = result;
    	SET result = REPLACE( title, '  ', ' ' );
    END WHILE;
    RETURN result;
END//

DROP FUNCTION IF EXISTS LFILTER//

CREATE FUNCTION LFILTER ( title VARCHAR(250) )
RETURNS VARCHAR(250) DETERMINISTIC
BEGIN
	WHILE (1=1) DO
	    IF(  ASCII(title) BETWEEN ASCII('a') AND ASCII('z')
	        OR ASCII(title) BETWEEN ASCII('A') AND ASCII('Z')
	        OR ASCII(title) BETWEEN ASCII('0') AND ASCII('9')
	    ) THEN
	        SET title = LOWER( title );
	        SET title = REPLACE(
	            REPLACE(
	                REPLACE(
	                    title,
	                    CHAR(10), ' '
	                ),
	                CHAR(13), ' '
	            ) ,
	            CHAR(9), ' '
	        );
	        SET title = DELETE_DOUBLE_SPACES( title );
	        RETURN title;
	    ELSE
	        SET title = SUBSTRING( title, 2 );	        
	    END IF;
    END WHILE;
END//
DELIMITER ;

SELECT LFILTER(' !@#$%^&*()_+1a    b');

Also, you could use regular expressions but this requires installing a MySql extension.

Solution 9 - Mysql

Be careful, characters like ’ or » are considered as alpha by MySQL. It better to use something like :

> IF c BETWEEN 'a' AND 'z' OR c BETWEEN 'A' AND 'Z' OR c BETWEEN '0' AND > '9' OR c = '-' THEN

Solution 10 - Mysql

This can be done with a regular expression replacer function I posted in another answer and have blogged about here. It may not be the most efficient solution possible and might look overkill for the job in hand - but like a Swiss army knife, it may come in useful for other reasons.

It can be seen in action removing all non-alphanumeric characters in this Rextester online demo.

SQL (excluding the function code for brevity):

SELECT txt,
       reg_replace(txt,
                   '[^a-zA-Z0-9]+',
                   '',
                   TRUE,
                   0,
                   0
                   ) AS `reg_replaced`
FROM test;

Solution 11 - Mysql

So far, the only alternative approach less complicated than the other answers here is to determine the full set of special characters of the column, i.e. all the special characters that are in use in that column at the moment, and then do a sequential replace of all those characters, e.g.

update pages set slug = lower(replace(replace(replace(replace(name, ' ', ''), '-', ''), '.', ''), '&', '')); # replacing just space, -, ., & only

.

> This is only advisable on a known set of data, otherwise it's > trivial for some special characters to slip past with a > blacklist approach instead of a whitelist approach.

Obviously, the simplest way is to pre-validate the data outside of sql due to the lack of robust built-in whitelisting (e.g. via a regex replace).

Solution 12 - Mysql

I had a similar problem with trying to match last names in our database that were slightly different. For example, sometimes people entered the same person's name as "McDonald" and also as "Mc Donald", or "St John" and "St. John".

Instead of trying to convert the Mysql data, I solved the problem by creating a function (in PHP) that would take a string and create an alpha-only regular expression:

function alpha_only_regex($str) {
    $alpha_only = str_split(preg_replace('/[^A-Z]/i', '', $str));
    return '^[^a-zA-Z]*'.implode('[^a-zA-Z]*', $alpha_only).'[^a-zA-Z]*$';
}

Now I can search the database with a query like this:

$lastname_regex = alpha_only_regex($lastname);
$query = "SELECT * FROM my_table WHERE lastname REGEXP '$lastname_regex';

Solution 13 - Mysql

I needed to get only alphabetic characters of a string in a procedure, and did:

SET @source = "whatever you want";
SET @target = '';
SET @i = 1;
SET @len = LENGTH(@source);
WHILE @i <= @len DO
	SET @char = SUBSTRING(@source, @i, 1);
	IF ((ORD(@char) >= 65 && ORD(@char) <= 90) || (ORD(@char) >= 97 && ORD(@char) <= 122)) THEN
		SET @target = CONCAT(@target, @char);
	END IF;
	SET @i = @i + 1;
END WHILE;

Solution 14 - Mysql

Needed to replace non-alphanumeric characters rather than remove non-alphanumeric characters so I have created this based on Ryan Shillington's alphanum. Works for strings up to 255 characters in length

DROP FUNCTION IF EXISTS alphanumreplace; 
DELIMITER | 
CREATE FUNCTION alphanumreplace( str CHAR(255), d CHAR(32) ) RETURNS CHAR(255) 
BEGIN 
  DECLARE i, len SMALLINT DEFAULT 1; 
  DECLARE ret CHAR(32) DEFAULT ''; 
  DECLARE c CHAR(1); 
  SET len = CHAR_LENGTH( str ); 
  REPEAT 
    BEGIN 
      SET c = MID( str, i, 1 ); 
      IF c REGEXP '[[:alnum:]]' THEN SET ret=CONCAT(ret,c); 
      ELSE SET ret=CONCAT(ret,d);
      END IF; 
      SET i = i + 1; 
    END; 
  UNTIL i > len END REPEAT; 
  RETURN ret; 
END | 
DELIMITER ; 

Example:

select 'hello world!',alphanum('hello world!'),alphanumreplace('hello world!','-');
+--------------+--------------------------+-------------------------------------+
| hello world! | alphanum('hello world!') | alphanumreplace('hello world!','-') |
+--------------+--------------------------+-------------------------------------+
| hello world! | helloworld               | hello-world-                        |
+--------------+--------------------------+-------------------------------------+

You'll need to add the alphanum function seperately if you want that, I just have it here for the example.

Solution 15 - Mysql

I tried a few solutions but at the end used replace. My data set is part numbers and I fairly know what to expect. But just for sanity, I used PHP to build the long query:

$dirty = array(' ', '-', '.', ',', ':', '?', '/', '!', '&', '@');
$query = 'part_no';
foreach ($dirty as $dirt) {
    $query = "replace($query,'$dirt','')";
}
echo $query;

This outputs something I used to get a headache from:

replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(part_no,' ',''),'-',''),'.',''),',',''),':',''),'?',''),'/',''),'!',''),'&',''),'@','')

Solution 16 - Mysql

if you are using php then....

try{
$con = new PDO ("mysql:host=localhost;dbname=dbasename","root","");
}
catch(PDOException $e){
echo "error".$e-getMessage();	
}

$select = $con->prepare("SELECT * FROM table");
$select->setFetchMode(PDO::FETCH_ASSOC);
$select->execute();

while($data=$select->fetch()){ 

$id = $data['id'];
$column = $data['column'];
$column = preg_replace("/[^a-zA-Z0-9]+/", " ", $column); //remove all special characters

$update = $con->prepare("UPDATE table SET column=:column WHERE id='$id'");
$update->bindParam(':column', $column );
$update->execute();

// echo $column."<br>";
} 

Solution 17 - Mysql

the alphanum function (self answered) have a bug, but I don't know why. For text "cas synt ls 75W140 1L" return "cassyntls75W1401", "L" from the end is missing some how.

Now I use

delimiter //
DROP FUNCTION IF EXISTS alphanum //
CREATE FUNCTION alphanum(prm_strInput varchar(255))
RETURNS VARCHAR(255)
DETERMINISTIC
BEGIN
  DECLARE i INT DEFAULT 1;
  DECLARE v_char VARCHAR(1);
  DECLARE v_parseStr VARCHAR(255) DEFAULT ' ';
WHILE (i <= LENGTH(prm_strInput) )  DO
  SET v_char = SUBSTR(prm_strInput,i,1);
  IF v_char REGEXP  '^[A-Za-z0-9]+$' THEN 
        SET v_parseStr = CONCAT(v_parseStr,v_char);  
  END IF;
  SET i = i + 1;
END WHILE;
RETURN trim(v_parseStr);
END
//

(found on google)

Solution 18 - Mysql

Probably a silly suggestion compared to others:

if(!preg_match("/^[a-zA-Z0-9]$/",$string)){
    $sortedString=preg_replace("/^[a-zA-Z0-9]+$/","",$string);
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDylanView Question on Stackoverflow
Solution 1 - MysqlRyan ShillingtonView Answer on Stackoverflow
Solution 2 - MysqlKevin BurtonView Answer on Stackoverflow
Solution 3 - Mysqlmichal.jakubeczyView Answer on Stackoverflow
Solution 4 - MysqlJohanView Answer on Stackoverflow
Solution 5 - MysqluserlondView Answer on Stackoverflow
Solution 6 - MysqlvddView Answer on Stackoverflow
Solution 7 - MysqlAlon AsulinView Answer on Stackoverflow
Solution 8 - MysqlАртур КурицынView Answer on Stackoverflow
Solution 9 - MysqlAbdelView Answer on Stackoverflow
Solution 10 - MysqlSteve ChambersView Answer on Stackoverflow
Solution 11 - MysqlKzqaiView Answer on Stackoverflow
Solution 12 - MysqllfjeffView Answer on Stackoverflow
Solution 13 - MysqlAlain TiembloView Answer on Stackoverflow
Solution 14 - MysqlLuke RehmannView Answer on Stackoverflow
Solution 15 - MysqlJosef HabrView Answer on Stackoverflow
Solution 16 - Mysqluser7796548View Answer on Stackoverflow
Solution 17 - MysqlShadowElfView Answer on Stackoverflow
Solution 18 - MysqlbashleighView Answer on Stackoverflow