When to use NULL in MySQL tables

MysqlNull

Mysql Problem Overview


I appreciate the semantic meaning of a NULL value in a database table, different from both false and the empty string ''. However, I have often read about performance problems when fields are nullable and been advised to use an empty string in cases where NULL is actually semantically correct.

What circumstances are appropriate to use nullable fields and NULL values? What are the trade-offs? Is it sensible to simply avoid using NULLs altogether and simply use empty strings, false or 0 to indicate the absence of a value?

UPDATE

OK - I understand the semantic difference between '' and NULL as well as the (performance-agnostic) circumstances in which NULL is the appropriate field value. However, let me expand on the hinted performance issue. This is from the excellent "High Performance MySQL" by Schwartz, Zeitsev et al http://www.borders.co.uk/book/high-performance-mysql-optimization-backups-replication-and-more/857673/:

> It's harder for MySQL to optimize > queries that refer to nullable coumns, > because they make indexes, index > statistics, and value comparisons more > complicated. A nullable column uses > more storage space and requires > special processing inside MySQL. When > a nullable column is indexed, it > requires an extra byte per entry and > can even cause a fixed-size inded > (such as an index on a single integer > column) to be converted to a > variable-sized one in MyISAM.

More here: Google books preview

This is quite possibly the definitive answer - I was just looking for second opinions and experience from the front-line.

Mysql Solutions


Solution 1 - Mysql

> However, I have often read about > performance problems when fields are > nullable and been advised to use an > empty string in cases where NULL is > actually semantically correct.

I'm going to be nit-picky about word choice for a moment:

  • Even if it were a significant performance factor, that doesn't make it semantically correct to use a value instead of NULL. In SQL, NULL has a semantic role, to denote a missing or inapplicable value. The performance characteristics of NULL in a given RDBMS implementation are independent of this. The performance may vary from brand to brand or from version to version, but the purpose of NULL in the language is consistent.

In any case, I have not heard of any evidence that NULL performs poorly. I'd be interested in any references to performance measurements that show nullable columns perform worse than non-nullable columns.

I'm not saying I'm not wrong or that it can't be true in some cases -- just that it's not meaningful to make idle suppositions. Science is not made up of conjecture; one has to show evidence with repeatable measurements.

Metrics also tell you by how much the performance differs, so you can make a judgment about whether it's something to worth worrying about. That is, the impact could be measurable and nonzero, but still insignificant compared to greater performance factors, such as properly indexing tables or sizing your database cache.

In MySQL, searches for NULL can benefit from an index:

mysql> CREATE TABLE foo (
  i INT NOT NULL,
  j INT DEFAULT NULL,
  PRIMARY KEY (i),
  UNIQUE KEY j_index (j)
);

mysql> INSERT INTO foo (i, j) VALUES 
  (1, 1), (2, 2), (3, NULL), (4, NULL), (5, 5);

mysql> EXPLAIN SELECT * FROM foo WHERE i = 3;
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | foo   | const | PRIMARY       | PRIMARY | 4       | const |    1 |       | 
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+

mysql> EXPLAIN SELECT * FROM foo WHERE j IS NULL;
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | foo   | ref  | j_index       | j_index | 5       | const |    2 | Using where | 
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+

Note that's still not a measurement of performance. I've only shown that you can use an index while searching for NULL. I'm going to assert (admittedly without having measured, but hey this is just StackOverflow) that the benefit of an index overshadows any possible penalty when searching for NULL versus a blank string.

It's not a correct design decision to choose zero or blank or any other value to substitute for NULL. You may need to use those values as significant in the column. That's why NULL exists, as a value that is by definition outside the domain of values of any data type, so you can use the full range of values of integers or strings or whatever and still have something to signify "none of the above values."

Solution 2 - Mysql

The MySQL manual actually has a nice article about the problems with NULL.

Hope it helps.

Also found this other SO post about NULL and Performance

Solution 3 - Mysql

We don't allow NULL values in our databases unless it's for numeric values, or for dates. The reason why we do this is because numeric values sometimes should not be defaulted to zero as this is very, very bad. I'm a developer for a stock brokers and there's a big, big difference between NULL and 0. The use of COALESCE comes in handy if we do want to default values back to zero even though we don't store them as such.

MyVal = COALESCE(TheData, 0)

As we do bulk inserts of data from flat files we use format files to determine the entry of the data which automagically converts empty values into blank strings anyway.

Dates default to whatever value may appear dependant on the collation I believe, but ours default to something like 1900, and again, dates are extremely important. Other plain text values aren't so important, and if left blank typically qualify as okay.

Solution 4 - Mysql

Generally, if an attribute is required, it is defined as Not NULL and if it may be omitted it is defined as nullable.

Solution 5 - Mysql

The empty string should not be used in place of NULL. NULL represents nothing where as the empty string is something, with nothing inside. NULL will always be false when compared to another value (even NULL) and NULL will not be summed in the COUNT function.

If you need to represent unknown information there is no substitute to NULL.

Solution 6 - Mysql

As @ForYourOwnGood said - Null should be used for "unknown" information. For example: If you have a lot of fields that customer should fill-in on registration and some of them are optional. For some reason you might want to reserve an ID for that particular customer and since you don't know if the optional fields are a real choice by the customer to be left empty you should set them as NULL i.e. "unknown" when you first save the row. If the customer submits the form, passes all your validation and then you save the information, then you know for fact that the optional field is left empty by intention.

This is just a good case of using NULL.

Solution 7 - Mysql

The meaning of a NULL column is more or less "doesn't apply in this context". I generally use NULL columns in two cases:

  • If the field doesn't apply (let's say you have a boolean column is_thirsty and you add two datasets. One human and a stone. In case of the human, you set is_thirsty to either true or false, whereas in the case of the stone, you'd probably set it to NULL.
  • If I need to flag something and store some data with the value. Like an inventory close date, which you'd use to a) specify that the inventory cannot be changed any more and b) to specify when the inventory was closed. Instead of having two columns (closed_at and is_closed), I just create the closed_at column and set it to NULL if the inventory set can still be changed, but set the date once it's closed.

Basically it boils down to the fact that I use NULL when the emptyness of a field has a different unique semantic than just an empty field. The absence of a middle initial is just that. The absence of a closing date has the meaning of the inventory set still being open to changes.

NULL values can have nasty side effects and they will make life harder for you to add data to the table and more often than not, you can end up with a mish-mash of NULL values and empty strings for example.

Also, NULL is not equal to anything, which will screw queries all over the place if you are not very careful.

Personally, I use NULL columns only when one of the above two cases applies. I never use it to signify empty fields when the emptyness has no meaning other than the absence of a value.

Solution 8 - Mysql

I understand that there are times when MySQL's NULL semantics are entirely appropriate.

That said, they do seriously get in the way, particularly with text fields.

Here's a real world example.

We wish to copy data from a FileMaker database into a mysql table.

if we do "SELECT * from table where textfield <> 'test'", rows that have textfield of NULL will NOT be returned. This is most likely not what you expected or desired.

if a field that is nullable is used in a where query, either alone or as part of an AND, entries that are NULL will NEVER be returned unless the IS NULL test used. We must do something like "where ((textfield<>"test") OR (textfield IS NOT NULL))" which is ugly at best.

So in this case, we probably do not want the field nullable.

The problem here is that you CANNOT insert an empty string into MySQL using Filemaker. It gets converted to NULL, which errors out if you made the column not null! If you do allow NULL, then the transfer to mysql works, but then your not queries fail to act the way you want!

the workaroud is to alter table twice, to convert the existing nulls after the import to empty string, then alter the table back to once again allow null. youch!

damn filemaker.

Solution 9 - Mysql

The main benefit, of course, is the semantic meaning of NULL, which you mentioned.

In addition to that -- and it may depend on your storage engine, as always, check the documentation -- but in at least some databases, NULLs take up a lot less room than a regular value. For example, if you have a "varchar" column declared to be 20 characters, and it's rarely filled in, you can save a lot of disk space by making it NULL instead of an empty string.

I have never heard of any performance issues with using NULLs, one the opposite. I've heard of people mucking up their counts because they counted NULLs wrong, but never performance. If that's a real thing, I would love to hear about it!

Solution 10 - Mysql

Any self-respecting database engine these days should offer no penalty for properly using NULLs, unless your query is not designed correctly (which is usually not a problem you'll have very often with regard to NULLs).

You should pay first attention to using the database (including NULLs) as intended; then worry about the optimizatin consequences when and if they occur.

The cumulative effect of improperly NULLed column values in both SQL complexity and accuracy will almost surely outweigh the benefits of fooling with Mother DBMS. Besides, it will mess up your head, as well as that of anyone later who tries to figure out what you were trying to do.

Solution 11 - Mysql

On some databases like Oracle, may be somethinkg on MySQL is true:

  • Nulls are not indexed, then if looking for null values can be bottleneck.
  • Trailing nulls on rows save space.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDavidWinterbottomView Question on Stackoverflow
Solution 1 - MysqlBill KarwinView Answer on Stackoverflow
Solution 2 - MysqlÓlafur WaageView Answer on Stackoverflow
Solution 3 - MysqlKezzerView Answer on Stackoverflow
Solution 4 - MysqlJim AndersonView Answer on Stackoverflow
Solution 5 - MysqlForYourOwnGoodView Answer on Stackoverflow
Solution 6 - Mysqluser1105491View Answer on Stackoverflow
Solution 7 - MysqlpilifView Answer on Stackoverflow
Solution 8 - MysqlChuck CochemsView Answer on Stackoverflow
Solution 9 - MysqlSquareCogView Answer on Stackoverflow
Solution 10 - MysqldkretzView Answer on Stackoverflow
Solution 11 - MysqlFerranBView Answer on Stackoverflow