分类: Mysql/postgreSQL
2007-03-11 17:53:47
SELECTAnd EXPLAIN for the query:
*
FROM
tblA,
tblB,
tblC
WHERE
tblA.col1 = tblB.col1
AND tblA.col2 = tblC.col1;
+-------+------+---------------+------+---------+------+------+-------------+And finally, from the MySQL manual (7.2.1):
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+------+---------------+------+---------+------+------+-------------+
| tblA | ALL | NULL | NULL | NULL | NULL | 1000 | |
| tblB | ALL | NULL | NULL | NULL | NULL | 1000 | Using where |
| tblC | ALL | NULL | NULL | NULL | NULL | 1000 | Using where |
+-------+------+---------------+------+---------+------+------+-------------+
+-------+------+---------------+----------+---------+-----------+------+-------------+As we see, MySQL now uses the ndx_col1 key to join tblB to tblA. That is, when MySQL goes looking for rows in tblB, instead of table scanning like it did before, it uses the value of tbbA.col1 with the ndx_col1 key to directly fetch matching rows. This is why the ref column for tblB says "tblA.col1". tblC is still table scanned, but that can be fixed the same way we fixed the join of tblA and tblB, by looking at what MySQL needs: From the part of the query that joins these tables, "tblA.col2 = tblC.col1," we see it needs tblC.col1 because we already have tblA.col2. Indexing this column, no surprise EXPLAIN now says:
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+------+---------------+----------+---------+-----------+------+-------------+
| tblA | ALL | NULL | NULL | NULL | NULL | 1000 | |
| tblB | ref | ndx_col1 | ndx_col1 | 5 | tblA.col1 | 1 | Using where |
| tblC | ALL | NULL | NULL | NULL | NULL | 1000 | Using where |
+-------+------+---------------+----------+---------+-----------+------+-------------+
+-------+------+---------------+----------+---------+-----------+------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+------+---------------+----------+---------+-----------+------+-------------+
| tblA | ALL | NULL | NULL | NULL | NULL | 1000 | |
| tblB | ref | ndx_col1 | ndx_col1 | 5 | tblA.col1 | 1 | Using where |
| tblC | ref | ndx_col1 | ndx_col1 | 5 | tblA.col2 | 1 | Using where |
+-------+------+---------------+----------+---------+-----------+------+-------------+
SELECTAt first that seems like a daunting query: 4 tables, an aggregate function, 9 WHERE conditions, and a GROUP BY. The great thing about EXPLAIN is that we can ignore all this for now, and simply approach it two tables at a time as we did before, determining at each step what MySQL needs. This is a real query actually, all tables and columns renamed to protect the identity of its origin. Before I started working it, EXPLAIN said:
COUNT(tblB.a_id) as correct,
tblA.type,
tblA.se_type
FROM
tblA,
tblB,
tblC,
tblD
WHERE
tblA.ex_id = tblC.ex_id
AND tblC.st_ex_id = tblB.st_ex_id
AND tblB.q_num = tblA.q_num
AND tblB.se_num = tblA.se_num
AND tblD.ex_id = tblA.ex_id
AND tblD.exp <> tblB.se_num
AND tblB.ans = tblA.ans
AND tblA.ex_id = 1001
AND tblC.r_id = 542
GROUP BY
tblA.type,
tblA.se_type;
+-------+--------+---------------+---------+---------+---------------+-------+----------------------------------------------+First a word on determining the impact of a join: Result sets. A result set is, obviously, the set of results from a query. For joins, an estimated way to figure this is to multiple the number of rows MySQL predicts it will need to read for each table. As an estimate, this is more towards the worst case end of the scale, since other WHERE conditions will almost always dramatically reduce the real number of rows the query produces. But for this query, the result set is 94 million rows. This is why indexless joins are so dangerous; a few thousand rows times a few thousand more and you're up in millions already.
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+--------+---------------+---------+---------+---------------+-------+----------------------------------------------+
| tblA | ALL | NULL | NULL | NULL | NULL | 1080 | Using where; Using temporary; Using filesort |
| tblB | ALL | NULL | NULL | NULL | NULL | 87189 | Using where |
| tblC | eq_ref | PRIMARY | PRIMARY | 4 | tblB.st_ex_id | 1 | Using where |
| tblD | eq_ref | PRIMARY | PRIMARY | 4 | tblA.ex_id | 1 | Using where |
+-------+--------+---------------+---------+---------+---------------+-------+----------------------------------------------+
+-------+--------+---------------+-------------+---------+------------------------+------+----------------------------------------------+The result set of the query has fallen 99.3% to 692,280 rows. But why stop there? We can easily get rid of tblA's table scan. Since it's the first table, we're not really indexing for a join because we just did that for the join of tblB to tblA. More often, indexing for the first table in a join is just like indexing for that table if it were all alone. In which case, you look at the query to see if the table is being restricted in any way. In this case we get lucky and tblA is: "AND tblA.ex_id = 1001". As we learned way back in Optimizing Queries , all we have to do is index column ex_id:
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+--------+---------------+-------------+---------+------------------------+------+----------------------------------------------+
| tblA | ALL | NULL | NULL | NULL | NULL | 1080 | Using where; Using temporary; Using filesort |
| tblB | ref | ndx_secn_qn | ndx_secn_qn | 2 | tblA.se_num,tblA.q_num | 641 | Using where |
| tblC | eq_ref | PRIMARY | PRIMARY | 4 | tblB.st_ex_id | 1 | Using where |
| tblD | eq_ref | PRIMARY | PRIMARY | 4 | tblA.ex_id | 1 | Using where |
+-------+--------+---------------+-------------+---------+------------------------+------+----------------------------------------------+
+-------+--------+---------------+-------------+---------+------------------------+------+----------------------------------------------+Now the query's result set is 641 rows! Down from 94 million, you can practically call that a 100% decrease. Studying the query further, we may be able to get rid of the temp table and filesort, but overall the query is profoundly faster, and has served the purpose of demonstrating how to index for joins well. Despite initially looking challenging, we saw that if you take it two tables at a time, and isolate and index what MySQL needs, indexing for joins isn't so difficult.
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+--------+---------------+-------------+---------+------------------------+------+----------------------------------------------+
| tblA | ref | ndx_ex_id | ndx_ex_id | 4 | const | 1 | Using where; Using temporary; Using filesort |
| tblB | ref | ndx_secn_qn | ndx_secn_qn | 2 | tblA.se_num,tblA.q_num | 641 | Using where |
| tblC | eq_ref | PRIMARY | PRIMARY | 4 | tblB.st_ex_id | 1 | Using where |
| tblD | eq_ref | PRIMARY | PRIMARY | 4 | tblA.ex_id | 1 | Using where |
+-------+--------+---------------+-------------+---------+------------------------+------+----------------------------------------------+