How to fix double-encoded UTF8 characters (in an utf-8 table)

MysqlStringUtf 8Character Encoding

Mysql Problem Overview


A previous LOAD DATA INFILE was run under the assumption that the CSV file is latin1-encoded. During this import the multibyte characters were interpreted as two single character and then encoded using utf-8 (again).

This double-encoding created anomalies like ñ instead of ñ.

How to correct these strings?

Mysql Solutions


Solution 1 - Mysql

The following MySQL function will return the correct utf8 string after double-encoding:

CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8)

It can be used with an UPDATE statement to correct the fields:

UPDATE tablename SET
    field = CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8);

Solution 2 - Mysql

The above answer worked for some of my data, but resulted in a lot of NULL columns after running. My thought is if the conversion wasn't successful it returns null. To avoid that, I added a small check.

UPDATE
	tbl

SET
	col =
	CASE
		WHEN CONVERT(CAST(CONVERT(col USING latin1) AS BINARY) USING utf8) IS NULL THEN col
		ELSE CONVERT(CAST(CONVERT(col USING latin1) AS BINARY) USING utf8)
	END

Solution 3 - Mysql

I meet this issue too, here a solution for Oracle:

update tablename t set t.colname = convert(t.colname, 'WE8ISO8859P1', 'UTF8') where t.colname like '%Ã%'

And another one for Java:

public static String fixDoubleEncoded(String text) {
	final Pattern pattern = Pattern.compile("^.*Ã[^0-9a-zA-Z\\ \t].*$");
	try {
		while (pattern.matcher(text).matches())
			text = new String(text.getBytes("iso-8859-1"), "utf-8");
	}
	catch (UnsupportedEncodingException e) {
		e.printStackTrace();
	}
	return text;
}

Solution 4 - Mysql

well it is very important to use "utf8mb4" instead of "utf8" since mysql will strip out all the data after an unrecognized character. So the safer method is;

UPDATE tablename SET
field = CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8mb4);

be careful about this.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionvbenceView Question on Stackoverflow
Solution 1 - MysqlvbenceView Answer on Stackoverflow
Solution 2 - MysqlEricView Answer on Stackoverflow
Solution 3 - MysqlStéphane MillienView Answer on Stackoverflow
Solution 4 - MysqlburkulView Answer on Stackoverflow