Why does MYSQL higher LIMIT offset slow the query down?

MysqlPerformanceSql Order-ByLimit

Mysql Problem Overview


Scenario in short: A table with more than 16 million records [2GB in size]. The higher LIMIT offset with SELECT, the slower the query becomes, when using ORDER BY primary_key

So

SELECT * FROM large ORDER BY `id`  LIMIT 0, 30 

takes far less than

SELECT * FROM large ORDER BY `id` LIMIT 10000, 30 

That only orders 30 records and same eitherway. So it's not the overhead from ORDER BY.
Now when fetching the latest 30 rows it takes around 180 seconds. How can I optimize that simple query?

Mysql Solutions


Solution 1 - Mysql

I had the exact same problem myself. Given the fact that you want to collect a large amount of this data and not a specific set of 30 you'll be probably running a loop and incrementing the offset by 30.

So what you can do instead is:

  1. Hold the last id of a set of data(30) (e.g. lastId = 530)
  2. Add the condition WHERE id > lastId limit 0,30

So you can always have a ZERO offset. You will be amazed by the performance improvement.

Solution 2 - Mysql

It's normal that higher offsets slow the query down, since the query needs to count off the first OFFSET + LIMIT records (and take only LIMIT of them). The higher is this value, the longer the query runs.

The query cannot go right to OFFSET because, first, the records can be of different length, and, second, there can be gaps from deleted records. It needs to check and count each record on its way.

Assuming that id is the primary key of a MyISAM table, or a unique non-primary key field on an InnoDB table, you can speed it up by using this trick:

SELECT  t.* 
FROM    (
        SELECT  id
        FROM    mytable
        ORDER BY
                id
        LIMIT 10000, 30
        ) q
JOIN    mytable t
ON      t.id = q.id

See this article:

Solution 3 - Mysql

MySQL cannot go directly to the 10000th record (or the 80000th byte as your suggesting) because it cannot assume that it's packed/ordered like that (or that it has continuous values in 1 to 10000). Although it might be that way in actuality, MySQL cannot assume that there are no holes/gaps/deleted ids.

So, as bobs noted, MySQL will have to fetch 10000 rows (or traverse through 10000th entries of the index on id) before finding the 30 to return.

EDIT : To illustrate my point

Note that although

SELECT * FROM large ORDER BY id LIMIT 10000, 30 

would be slow(er),

SELECT * FROM large WHERE id >  10000 ORDER BY id LIMIT 30 

would be fast(er), and would return the same results provided that there are no missing ids (i.e. gaps).

Solution 4 - Mysql

I found an interesting example to optimize SELECT queries ORDER BY id LIMIT X,Y. I have 35million of rows so it took like 2 minutes to find a range of rows.

Here is the trick :

select id, name, address, phone
FROM customers
WHERE id > 990
ORDER BY id LIMIT 1000;

Just put the WHERE with the last id you got increase a lot the performance. For me it was from 2minutes to 1 second :)

Other interesting tricks here : http://www.iheavy.com/2013/06/19/3-ways-to-optimize-for-paging-in-mysql/

It works too with strings

Solution 5 - Mysql

The time-consuming part of the two queries is retrieving the rows from the table. Logically speaking, in the LIMIT 0, 30 version, only 30 rows need to be retrieved. In the LIMIT 10000, 30 version, 10000 rows are evaluated and 30 rows are returned. There can be some optimization can be done my the data-reading process, but consider the following:

What if you had a WHERE clause in the queries? The engine must return all rows that qualify, and then sort the data, and finally get the 30 rows.

Also consider the case where rows are not processed in the ORDER BY sequence. All qualifying rows must be sorted to determine which rows to return.

Solution 6 - Mysql

For those who are interested in a comparison and figures :)

Experiment 1: The dataset contains about 100 million rows. Each row contains several BIGINT, TINYINT, as well as two TEXT fields (deliberately) containing about 1k chars.

  • Blue := SELECT * FROM post ORDER BY id LIMIT {offset}, 5
  • Orange := @Quassnoi's method. SELECT t.* FROM (SELECT id FROM post ORDER BY id LIMIT {offset}, 5) AS q JOIN post t ON t.id = q.id
  • Of course, the third method, ... WHERE id>xxx LIMIT 0,5, does not appear here since it should be constant time.

Experiment 2: Similar thing, except that one row only has 3 BIGINTs.

  • green := the blue before
  • red := the orange before

enter image description here

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRahmanView Question on Stackoverflow
Solution 1 - MysqlNikos KyrView Answer on Stackoverflow
Solution 2 - MysqlQuassnoiView Answer on Stackoverflow
Solution 3 - MysqlRiedsioView Answer on Stackoverflow
Solution 4 - MysqlsymView Answer on Stackoverflow
Solution 5 - MysqlbobsView Answer on Stackoverflow
Solution 6 - Mysqlch271828nView Answer on Stackoverflow