Does Order of Fields of Multi-Column Index in MySQL Matter

MysqlSqlPerformanceIndexing

Mysql Problem Overview


I know the importance of indexes and how order of joins can change performance. I've done a bunch of reading related to multi-column indexes and haven't found the answer to my question.

I'm curious if I do a multi-column index, if the order that they are specified matters at all. My guess is that it would not, and that the engine would treat them as a group, where ordering doesn't matter. But I wish to verify.

For example, from mysql's website (http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html)

CREATE TABLE test (
    id         INT NOT NULL,
    last_name  CHAR(30) NOT NULL,
    first_name CHAR(30) NOT NULL,
    PRIMARY KEY (id),
    INDEX name (last_name,first_name)
);

Would there be any benifit in any cases where the following would be better, or is it equivalent?

CREATE TABLE test (
    id         INT NOT NULL,
    last_name  CHAR(30) NOT NULL,
    first_name CHAR(30) NOT NULL,
    PRIMARY KEY (id),
    INDEX name (first_name,last_name)
);

Specificially:

INDEX name (last_name,first_name)

vs

INDEX name (first_name,last_name)

Mysql Solutions


Solution 1 - Mysql

When discussing multi-column indexes, I use an analogy to a telephone book. A telephone book is basically an index on last name, then first name. So the sort order is determined by which "column" is first. Searches fall into a few categories:

  1. If you look up people whose last name is Smith, you can find them easily because the book is sorted by last name.

  2. If you look up people whose first name is John, the telephone book doesn't help because the Johns are scattered throughout the book. You have to scan the whole telephone book to find them all.

  3. If you look up people with a specific last name Smith and a specific first name John, the book helps because you find the Smiths sorted together, and within that group of Smiths, the Johns are also found in sorted order.

If you had a telephone book sorted by first name then by last name, the sorting of the book would assist you in the above cases #2 and #3, but not case #1.

That explains cases for looking up exact values, but what if you're looking up by ranges of values? Say you wanted to find all people whose first name is John and whose last name begins with 'S' (Smith, Saunders, Staunton, Sherman, etc.). The Johns are sorted under 'J' within each last name, but if you want all Johns for all last names starting with 'S', the Johns are not grouped together. They're scattered again, so you end up having to scan through all the names with last name starting with 'S'. Whereas if the telephone book were organized by first name then by last name, you'd find all the Johns together, then within the Johns, all the 'S' last names would be grouped together.

So the order of columns in a multi-column index definitely matters. One type of query may need a certain column order for the index. If you have several types of queries, you might need several indexes to help them, with columns in different orders.

You can read my presentation How to Design Indexes, Really for more information.

Solution 2 - Mysql

The two indexes are different. This is true in MySQL and in other databases. MySQL does a pretty good job of explaining the different in the documentation.

Consider the two indexes:

create index idx_lf on name(last_name, first_name);
create index idx_fl on name(first_name, last_name);

Both of these should work equally well on:

where last_name = XXX and first_name = YYY

idx_lf will be optimal for the following conditions:

where last_name = XXX
where last_name like 'X%'
where last_name = XXX and first_name like 'Y%'
where last_name = XXX order by first_name

idx_fl will be optimal for the following:

where first_name = YYY
where first_name like 'Y%'
where first_name = YYY and last_name like 'X%'
where first_name = XXX order by last_name

For many of these cases, both indexes could possibly be used, but one is optimal. For instance, consider idx_lf with the query:

where first_name = XXX order by last_name

MySQL could read the entire table using idx_lf and then do the filtering after the order by. I don't think this is an optimization option in practice (for MySQL), but that can happen in other databases.

Solution 3 - Mysql

The general rule is that in a multi-column index, you want to put the most selective -- that is, the one that will give you fewest results -- first. So if you are creating a multiple-column index on a table with a status column of say 10 possible values, and also a dateAdded column, and you're typically writing queries like

SELECT * FROM myTable WHERE status='active' and dateAdded='2010-10-01'

...then you'd want dateAdded first, because that would limit the scan to just a few rows rather than 10% (or whatever proportion are 'active') of your rows.

This takes a fair bit of thought and tuning; you should check out the Lahdenmaki and Leach book.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJames OravecView Question on Stackoverflow
Solution 1 - MysqlBill KarwinView Answer on Stackoverflow
Solution 2 - MysqlGordon LinoffView Answer on Stackoverflow
Solution 3 - MysqlGraham CharlesView Answer on Stackoverflow