分类: Mysql/postgreSQL
2007-03-11 17:51:39
mysql> SHOW INDEXES FROM ldu_forum_posts;Although we know the solution is index fp_topicid, fp_id, I want to consider why MySQL says no possible keys yet it uses the PRIMARY key and does an index scan instead of a table scan. If the query matches rows on fp_topicid, but this column is not in the PRIMARY key, why is MySQL scanning the PRIMARY key? MySQL is obviously not going to find any fp_topicid = 24701 in the PRIMARY key. The answer is vaguely stated in the MySQL manual section on LIMIT optimization: "If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count lines rather than sorting the whole table." In short: Using ORDER BY with LIMIT causes this type of optimization. To understand why consider how table data is stored:
+-----------------+------------+------------+--------------+-------------+-----------+-------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality |
+-----------------+------------+------------+--------------+-------------+-----------+-------------+
| ldu_forum_posts | 0 | PRIMARY | 1 | fp_id | A | 187356 |
| ldu_forum_posts | 1 | fp_updated | 1 | fp_creation | A | 187356 |
+-----------------+------------+------------+--------------+-------------+-----------+-------------+
mysql> EXPLAIN SELECT fp_id FROM ldu_forum_posts WHERE fp_topicid = 24701 ORDER BY fp_id ASC LIMIT 2;
+-----------------+-------+---------------+---------+---------+------+--------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-----------------+-------+---------------+---------+---------+------+--------+-------------+
| ldu_forum_posts | index | NULL | PRIMARY | 3 | NULL | 187360 | Using where |
+-----------------+-------+---------------+---------+---------+------+--------+-------------+
On HDD: PRIMARY key:Notice on the hard drive (HDD) data is pretty much in random order; MySQL expects this. "HDD location" is a laughable simplification to show the physical ordering of data on the hard drive. In the PRIMARY key data is sorted by fp_id, which translates to respective HDD locations. This is, of course, the nature of indexes: If you want fp_id = 5 MySQL simply jumps to HDD location 'e'; witout the index MySQL would have to scan HDD locations 'a' through 'd' until arriving at 'e'. Consider our query with just the ORDER BY condition: Either way it goes this causes MySQL to do a full table scan. MySQL could index scan and get every row for every index record and by doing this would at least avoid a filesort because the index is already in sorted ORDER BY fp_id. In effect it's still a table scan though. As proof that adding the LIMIT condition enables the optimization lets table scan and filesort our test data manually:
fp_id fp_topicid HDD location fp_id fp_topicid HDD location
1 7 a 1 7 a
4 7 b 2 7 d
3 12 c 3 12 c
2 7 d 4 7 b
5 15 e 5 15 e
mysql> EXPLAIN SELECT fp_id FROM ldu_forum_posts WHERE fp_topicid = 24701 ORDER BY fp_id ASC LIMIT 2;
+-----------------+------+---------------+---------------+---------+-------+------+--------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-----------------+------+---------------+---------------+---------+-------+------+--------------------------+
| ldu_forum_posts | ref | fp_topicid_id | fp_topicid_id | 3 | const | 31 | Using where; Using index |
+-----------------+------+---------------+---------------+---------+-------+------+--------------------------+
mysql> EXPLAIN (the above query)The two equality references (eq_ref) are great, but the table scan for 'p' is not, and "Using filesort" just makes it worse. I couldn't figure out why MySQL wasn't using this index (and it was still logging this query as slow). Then I remembered what the MySQL manual says about ANALYZE: "If you have a problem with incorrect index usage, you should run ANALYZE TABLE to update table statistics such as cardinality of keys, which can affect the choices the optimizer makes." I ran ANALYZE on all three tables and to our delight MySQL changed:
+-------+--------+---------------+---------+---------+----------------+--------+---------------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+--------+---------------+---------+---------+----------------+--------+---------------------------------+
| p | ALL | fp_topicid_id | NULL | NULL | NULL | 188047 | Using temporary; Using filesort |
| t | eq_ref | PRIMARY | PRIMARY | 3 | p.fp_topicid | 1 | Using where |
| s | eq_ref | PRIMARY | PRIMARY | 2 | p.fp_sectionid | 1 | Using where |
+-------+--------+---------------+---------+---------+----------------+--------+---------------------------------+
mysql> EXPLAIN (the above query)A pretty big change too: Everything except for table 's' is different. The obvious benefits of these changes are: No filesort or table scan. There is another very important benefit: 175,277 less rows read. The total number of rows a join will produce (not to be confused with how many the query will actually match and return) is the "product of the number of rows in each table." In our case, before ANALYZE this amounts to 188,047 rows, after ANAZLYE 194,432 rows. Even though the query will now produce about 10,000 more rows it will read 175,000 less rows because of the way MySQL resolves joins. An explanation for this is beyond scope but I wrote about it in . Overall the effect is clear: ANALYZE turned an okay join into a good join (although the MySQL manual will disagree with me, calling the JOIN before ANAZLYE "perfect" and I disagree with it because our post-ANALYZE join is consistently 62% faster when tested on the production server).
+-------+--------+---------------+---------------+---------+----------------+-------+-----------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+--------+---------------+---------------+---------+----------------+-------+-----------------+
| t | index | PRIMARY | ft_updated | 4 | NULL | 12152 | Using temporary |
| p | ref | fp_topicid_id | fp_topicid_id | 3 | t.ft_id | 16 | Using where |
| s | eq_ref | PRIMARY | PRIMARY | 2 | p.fp_sectionid | 1 | Using where |
+-------+--------+---------------+---------------+---------+----------------+-------+-----------------+