MySQL ORDER BY With LIMIT and ANALYZE-wenzk-ChinaUnix博客

ELM's&nbsp;Blogwenzk.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

wenzk

博客访问： 7808244
博文数量： 637
博客积分： 10265
博客等级：上将
技术积分： 6165
用户组：普通用户
注册时间： 2004-12-12 22:00

文章分类

全部博文（637）

程序设计（9）
Solaris系统（22）
数码相机（8）
IT新技术（11）
DNS相关（16）
Office相关（13）
OpenVPN（2）
供电系统（9）
工作相关（79）
BSD杂谈（46）
数据库相关（35）
乱七八糟（31）
Linux系统相关（224）
自娱自乐（19）
邮件系统相关（39）
计算机网络（73）
未分配的博文（1）

文章存档

2011年（1）

2010年（1）

2009年（3）

2008年（12）

2007年（44）

2006年（156）

2005年（419）

2004年（1）

我的朋友

ORDER BY With LIMIT

Until now index usage by MySQL has been straightforward common sense. The following example comes from the first easily fixed slow query:

# Query_time: 39 Lock_time: 0 Rows_sent: 2 Rows_examined: 184753
SELECT fp_id FROM ldu_forum_posts WHERE fp_topicid = 24701 ORDER BY fp_id ASC LIMIT 2;

Without looking we already know that there's no index on fp_topicid, but I'll show it for completeness:

      mysql> SHOW INDEXES FROM ldu_forum_posts;
      +-----------------+------------+------------+--------------+-------------+-----------+-------------+
      | Table           | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality |
      +-----------------+------------+------------+--------------+-------------+-----------+-------------+
      | ldu_forum_posts |          0 | PRIMARY    |            1 | fp_id       | A         |      187356 |
      | ldu_forum_posts |          1 | fp_updated |            1 | fp_creation | A         |      187356 |
      +-----------------+------------+------------+--------------+-------------+-----------+-------------+
      
      mysql> EXPLAIN SELECT fp_id FROM ldu_forum_posts WHERE fp_topicid = 24701 ORDER BY fp_id ASC LIMIT 2;
      +-----------------+-------+---------------+---------+---------+------+--------+-------------+
      | table           | type  | possible_keys | key     | key_len | ref  | rows   | Extra       |
      +-----------------+-------+---------------+---------+---------+------+--------+-------------+
      | ldu_forum_posts | index | NULL          | PRIMARY |       3 | NULL | 187360 | Using where |
      +-----------------+-------+---------------+---------+---------+------+--------+-------------+

Although we know the solution is index fp_topicid, fp_id, I want to consider why MySQL says no possible keys yet it uses the PRIMARY key and does an index scan instead of a table scan. If the query matches rows on fp_topicid, but this column is not in the PRIMARY key, why is MySQL scanning the PRIMARY key? MySQL is obviously not going to find any fp_topicid = 24701 in the PRIMARY key. The answer is vaguely stated in the MySQL manual section on LIMIT optimization: "If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count lines rather than sorting the whole table." In short: Using ORDER BY with LIMIT causes this type of optimization. To understand why consider how table data is stored:

      On HDD:                                PRIMARY key:
         fp_id   fp_topicid   HDD location      fp_id   fp_topicid   HDD location
         1       7            a                 1       7            a
         4       7            b                 2       7            d
         3       12           c                 3       12           c
         2       7            d                 4       7            b
         5       15           e                 5       15           e

Notice on the hard drive (HDD) data is pretty much in random order; MySQL expects this. "HDD location" is a laughable simplification to show the physical ordering of data on the hard drive. In the PRIMARY key data is sorted by fp_id, which translates to respective HDD locations. This is, of course, the nature of indexes: If you want fp_id = 5 MySQL simply jumps to HDD location 'e'; witout the index MySQL would have to scan HDD locations 'a' through 'd' until arriving at 'e'. Consider our query with just the ORDER BY condition: Either way it goes this causes MySQL to do a full table scan. MySQL could index scan and get every row for every index record and by doing this would at least avoid a filesort because the index is already in sorted ORDER BY fp_id. In effect it's still a table scan though. As proof that adding the LIMIT condition enables the optimization lets table scan and filesort our test data manually:

Query: SELECT fp_id FROM ldu_forum_posts WHERE fp_topicid = 7 ORDER BY fp_id ASC;
Table scan: a (match 1), b (match 4), c (no), d (match 2), e (no)
Filesort ASC: 1, 4, 2 (a, b, d) => 1, 2, 4 (a, d, b)
Retrieve rows: a, d, b (1, 2, 4)

Wihout LIMIT we must scan all the way to the end of the table, but if we only need the first 2 matches then it becomes efficient to index scan and get the respective rows, and in so doing avoid a filesort:

Query: SELECT fp_id FROM ldu_forum_posts WHERE fp_topicid = 7 ORDER BY fp_id ASC LIMIT 2;
PRIMARY index scan to table scan: a (match 1), d (match 2)
Retrieve rows: a, d (1, 2)

Same end result: Rows 'a' and 'd' are retrieved first and second. ORDER BY fp_id gives us the disposition to use the PRIMARY key, LIMIT makes it practical to do so.

With that explained, we fix the query by (unique) indexing fp_topicid, fp_id and the result is expected:

      mysql> EXPLAIN SELECT fp_id FROM ldu_forum_posts WHERE fp_topicid = 24701 ORDER BY fp_id ASC LIMIT 2;
      +-----------------+------+---------------+---------------+---------+-------+------+--------------------------+
      | table           | type | possible_keys | key           | key_len | ref   | rows | Extra                    |
      +-----------------+------+---------------+---------------+---------+-------+------+--------------------------+
      | ldu_forum_posts | ref  | fp_topicid_id | fp_topicid_id |       3 | const |   31 | Using where; Using index |
      +-----------------+------+---------------+---------------+---------+-------+------+--------------------------+

Index ANALYZE Effect

The next slow query and the previous turn out to have someting in common:

# Query_time: 18 Lock_time: 7 Rows_sent: 8 Rows_examined: 575139
SELECT
   p.fp_id,
   t.ft_title, t.ft_id, t.ft_updated, t.ft_postcount, t.ft_lastposterid, t.ft_lastpostername,
   s.fs_id, s.fs_title
FROM
   ldu_forum_posts AS p,
   ldu_forum_topics AS t,
   ldu_forum_sections AS s
WHERE s.fs_minlevel <= 1
AND   p.fp_topicid = t.ft_id
AND   p.fp_sectionid = s.fs_id
GROUP BY t.ft_id
ORDER BY ft_updated DESC
LIMIT 8;

Notice p.fp_topicid = t.ft_id could use the index we just created; MySQL notices this too:

      mysql> EXPLAIN (the above query)
      +-------+--------+---------------+---------+---------+----------------+--------+---------------------------------+
      | table | type   | possible_keys | key     | key_len | ref            | rows   | Extra                           |
      +-------+--------+---------------+---------+---------+----------------+--------+---------------------------------+
      | p     | ALL    | fp_topicid_id | NULL    |    NULL | NULL           | 188047 | Using temporary; Using filesort |
      | t     | eq_ref | PRIMARY       | PRIMARY |       3 | p.fp_topicid   |      1 | Using where                     |
      | s     | eq_ref | PRIMARY       | PRIMARY |       2 | p.fp_sectionid |      1 | Using where                     |
      +-------+--------+---------------+---------+---------+----------------+--------+---------------------------------+

The two equality references (eq_ref) are great, but the table scan for 'p' is not, and "Using filesort" just makes it worse. I couldn't figure out why MySQL wasn't using this index (and it was still logging this query as slow). Then I remembered what the MySQL manual says about ANALYZE: "If you have a problem with incorrect index usage, you should run ANALYZE TABLE to update table statistics such as cardinality of keys, which can affect the choices the optimizer makes." I ran ANALYZE on all three tables and to our delight MySQL changed:

      mysql> EXPLAIN (the above query)
      +-------+--------+---------------+---------------+---------+----------------+-------+-----------------+
      | table | type   | possible_keys | key           | key_len | ref            | rows  | Extra           |
      +-------+--------+---------------+---------------+---------+----------------+-------+-----------------+
      | t     | index  | PRIMARY       | ft_updated    |       4 | NULL           | 12152 | Using temporary |
      | p     | ref    | fp_topicid_id | fp_topicid_id |       3 | t.ft_id        |    16 | Using where     |
      | s     | eq_ref | PRIMARY       | PRIMARY       |       2 | p.fp_sectionid |     1 | Using where     |
      +-------+--------+---------------+---------------+---------+----------------+-------+-----------------+

A pretty big change too: Everything except for table 's' is different. The obvious benefits of these changes are: No filesort or table scan. There is another very important benefit: 175,277 less rows read. The total number of rows a join will produce (not to be confused with how many the query will actually match and return) is the "product of the number of rows in each table." In our case, before ANALYZE this amounts to 188,047 rows, after ANAZLYE 194,432 rows. Even though the query will now produce about 10,000 more rows it will read 175,000 less rows because of the way MySQL resolves joins. An explanation for this is beyond scope but I wrote about it in . Overall the effect is clear: ANALYZE turned an okay join into a good join (although the MySQL manual will disagree with me, calling the JOIN before ANAZLYE "perfect" and I disagree with it because our post-ANALYZE join is consistently 62% faster when tested on the production server).

Evidence of Improvement

Before I changed anything on this server I recorded a 24 hour snapshot of MySQL performance using . After the above fixes I restarted MySQL, waited 24 hours, and took another snapshot. The results speak for themselves (okay maybe not because I've annotated them; all are percentages):

Slow queries: -99.72
Questions/second: +100 (22 to 44/second average or 1.9M to 3.8M/day)
SELECT full scan: +78
SELECT full join: +136
Table locks waited: -39
Slow launch threads: -100
Connections used: -18 (although more users are connecting)

The server's load was also cut by three-quarters. Ironically, the day the queries were fixed (around 02:00 hours in the morning) was a day the website's primary game was down. On such days the website fills with users. That day set the record for unique visitors and page views.

Final Thoughts

I think this is what makes slow query optimization fun: Subtle optimizations like ORDER BY with LIMIT, sweeping JOIN plan changes, and helping what is fun and entertaining for other people.

阅读(5423) | 评论(0) | 转发(0) |

上一篇：Table Design and MySQL Index Details

下一篇：How To Index For Joins With MySQL

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6