How can I find non-ASCII characters in MySQL?

MysqlCharacter Encoding

Mysql Problem Overview


I'm working with a MySQL database that has some data imported from Excel. The data contains non-ASCII characters (em dashes, etc.) as well as hidden carriage returns or line feeds. Is there a way to find these records using MySQL?

Mysql Solutions


Solution 1 - Mysql

MySQL provides comprehensive character set management that can help with this kind of problem.

SELECT whatever
  FROM tableName 
 WHERE columnToCheck <> CONVERT(columnToCheck USING ASCII)

The CONVERT(col USING charset) function turns the unconvertable characters into replacement characters. Then, the converted and unconverted text will be unequal.

See this for more discussion. https://dev.mysql.com/doc/refman/8.0/en/charset-repertoire.html

You can use any character set name you wish in place of ASCII. For example, if you want to find out which characters won't render correctly in code page 1257 (Lithuanian, Latvian, Estonian) use CONVERT(columnToCheck USING cp1257)

Solution 2 - Mysql

You can define ASCII as all characters that have a decimal value of 0 - 127 (0x00 - 0x7F) and find columns with non-ASCII characters using the following query

SELECT * FROM TABLE WHERE NOT HEX(COLUMN) REGEXP '^([0-7][0-9A-F])*$';

This was the most comprehensive query I could come up with.

Solution 3 - Mysql

It depends exactly what you're defining as "ASCII", but I would suggest trying a variant of a query like this:

SELECT * FROM tableName WHERE columnToCheck NOT REGEXP '[A-Za-z0-9]';

That query will return all rows where columnToCheck contains any non-alphanumeric characters. If you have other characters that are acceptable, add them to the character class in the regular expression. For example, if periods, commas, and hyphens are OK, change the query to:

SELECT * FROM tableName WHERE columnToCheck NOT REGEXP '[A-Za-z0-9.,-]';

The most relevant page of the MySQL documentation is probably 12.5.2 Regular Expressions.

Solution 4 - Mysql

This is probably what you're looking for:

select * from TABLE where COLUMN regexp '[^ -~]';

It should return all rows where COLUMN contains non-ASCII characters (or non-printable ASCII characters such as newline).

Solution 5 - Mysql

One missing character from everyone's examples above is the termination character (\0). This is invisible to the MySQL console output and is not discoverable by any of the queries heretofore mentioned. The query to find it is simply:

select * from TABLE where COLUMN like '%\0%';

Solution 6 - Mysql

Based on the correct answer, but taking into account ASCII control characters as well, the solution that worked for me is this:

SELECT * FROM `table` WHERE NOT `field` REGEXP  "[\\x00-\\xFF]|^$";

It does the same thing: searches for violations of the ASCII range in a column, but lets you search for control characters too, since it uses hexadecimal notation for code points. Since there is no comparison or conversion (unlike @Ollie's answer), this should be significantly faster, too. (Especially if MySQL does early-termination on the regex query, which it definitely should.)

It also avoids returning fields that are zero-length. If you want a slightly-longer version that might perform better, you can use this instead:

SELECT * FROM `table` WHERE `field` <> "" AND NOT `field` REGEXP  "[\\x00-\\xFF]";

It does a separate check for length to avoid zero-length results, without considering them for a regex pass. Depending on the number of zero-length entries you have, this could be significantly faster.

Note that if your default character set is something bizarre where 0x00-0xFF don't map to the same values as ASCII (is there such a character set in existence anywhere?), this would return a false positive. Otherwise, enjoy!

Solution 7 - Mysql

Try Using this query for searching special character records

SELECT *
FROM tableName
WHERE fieldName REGEXP '[^a-zA-Z0-9@:. \'\-`,\&]'

Solution 8 - Mysql

@zende's answer was the only one that covered columns with a mix of ascii and non ascii characters, but it also had that problematic hex thing. I used this:

SELECT * FROM `table` WHERE NOT `column` REGEXP '^[ -~]+$' AND `column` !=''

Solution 9 - Mysql

In Oracle we can use below.

SELECT * FROM TABLE_A WHERE ASCIISTR(COLUMN_A) <> COLUMN_A;

Solution 10 - Mysql

for this question we can also use this method :

Question from sql zoo:
Find all details of the prize won by PETER GRÜNBERG

Non-ASCII characters

ans: select*from nobel where winner like'P% GR%_%berg';

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionEd MaysView Question on Stackoverflow
Solution 1 - MysqlO. JonesView Answer on Stackoverflow
Solution 2 - MysqlzendeView Answer on Stackoverflow
Solution 3 - MysqlChad BirchView Answer on Stackoverflow
Solution 4 - MysqlDavid MinorView Answer on Stackoverflow
Solution 5 - MysqlRob BaileyView Answer on Stackoverflow
Solution 6 - MysqlMahmoud Al-QudsiView Answer on Stackoverflow
Solution 7 - MysqlSachinView Answer on Stackoverflow
Solution 8 - MysqlchiliNUTView Answer on Stackoverflow
Solution 9 - MysqlMalaka GunawardhanaView Answer on Stackoverflow
Solution 10 - MysqlHemen_boroView Answer on Stackoverflow