How much UTF-8 text fits in a MySQL "Text" field?

MysqlUtf 8

Mysql Problem Overview


According to MySQL, a text column holds 65,535 bytes.

So if this a legitimate boundary then will it actually only fit about 32k UTF-8 characters, right? Or is this one of those "fuzzy" boundaries where the guys that wrote the docs can't tell characters from bytes and it will actually allow ~64k UTF-8 characters if set to something like utf8_general_ci?

Mysql Solutions


Solution 1 - Mysql

A text column can be up to 65,535 bytes.

An utf-8 character can be up to 3 bytes.

So... your actual limit can be 21,844 characters.

See the manual for more info: http://dev.mysql.com/doc/refman/5.1/en/string-type-overview.html

> A variable-length string. M represents > the maximum column length in > characters. The range of M is 0 to > 65,535. The effective maximum length > of a VARCHAR is subject to the maximum > row size (65,535 bytes, which is > shared among all columns) and the > character set used. For example, utf8 > characters can require up to three > bytes per character, so a VARCHAR > column that uses the utf8 character > set can be declared to be a maximum of > 21,844 characters.

Solution 2 - Mysql

UTF-8 characters can take up to 4 bytes each, not 2 as you are supposing. http://en.wikipedia.org/wiki/Utf-8">UTF-8 is a variable-width encoding, depending on the number of significant bits in the Unicode code point:

  • 7 bits and under in the Unicode code point: 1 byte in UTF-8
  • 8 to 11 bits: 2 bytes in UTF-8
  • 12 to 16 bits: 3 bytes
  • 17 to 21 bits: 4 bytes

The https://www.ietf.org/rfc/rfc2279.txt">original UTF-8 spec allows encoding up to 31-bit Unicode values, taking as many as 6 bytes to encode in UTF-8 form. After UTF-8 became popular, the Unicode Consortium declared that they will never use code points beyond 221 - 1. This is now standardized as https://www.rfc-editor.org/rfc/rfc3629">RFC&nbsp;3629</a>;.

MySQL http://dev.mysql.com/doc/refman/5.6/en/charset-unicode-utf8.html">currently (i.e. version 5.6) only supports the Unicode https://secure.wikimedia.org/wikipedia/en/wiki/Basic_Multilingual_Plane">Basic Multilingual Plane characters, for which UTF-8 needs up to 3 bytes per character. That means the current answer to your question is that your TEXT field can hold at least 21,844 characters.

Depending on how you look at it, the actual limits are higher or lower than that:

  • If you assume, as I do, that the BMP limitation will eventually be lifted in MySQL or one of http://www.drizzle.org/">its</a> http://kb.askmonty.org/en/mariadb-faq/">forks</a>;, you shouldn't count on being able to store more than 16,383 characters in that field if your MySQL client allows arbitrary Unicode text input.

  • On the other hand, you may be able to exploit the fact that UTF-8 is a variable width encoding. If you know your text is mostly plain English with just the occasional non-ASCII character, your effective in-practice limit could approach the maximum 64 KB - 1 character limit.

Solution 3 - Mysql

However, when used as primary key, MySQL assumes that each limit of column's size adds 3 bytes to key.

mysql> alter table test2 modify code varchar(333) character set utf8;
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table test2 modify code varchar(334) character set utf8;
ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes

Well, using long string columns as primary key is generally a bed practice, however I've came across that problem when working with database of one commercial (!) product.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionXeoncrossView Question on Stackoverflow
Solution 1 - MysqlWolphView Answer on Stackoverflow
Solution 2 - MysqlWarren YoungView Answer on Stackoverflow
Solution 3 - MysqlDanubian SailorView Answer on Stackoverflow