分类: Mysql/postgreSQL
2007-03-11 17:48:38
mysql> EXPLAINIt's a thing of beauty to see inside the mind of the machine, but in this case perhaps not. Why is MySQL telling us there's no possible keys but it's using key attach_id_post_id? And if it's using a key then why does it suspect it will have to examine 61,834 rows (by performing a full index scan, denoted by "type: index")? It seems there's a problem with the keys so we must now understand them:
-> SELECT attach_id as total_attachment FROM phpbb_attachments
-> WHERE post_id IN (163246, 164224, 164894, 165146, 167931);
+-------------------+-------+---------------+-------------------+---------+------+-------+--------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------------------+-------+---------------+-------------------+---------+------+-------+--------------------------+
| phpbb_attachments | index | NULL | attach_id_post_id | 6 | NULL | 61834 | Using where; Using index |
+-------------------+-------+---------------+-------------------+---------+------+-------+--------------------------+
mysql> DESCRIBE phpbb_attachments;Understanding indexes (or keys if you prefer) is two part: Understanding the structure of the table then understanding the indexes. You can't just slap an index on a table and think everything will be wonderful. In this example it looks like everything should be wonderful with key attach_id_post_id. Given that the SELECT statement is selecting attach_id and post_id and that's just what this key indexes, so why isn't it working? It is working, just not how we're intending; it's working for MySQL which is why in EXPLAIN it says "Using index." When MySQL says this in "Extra" is means "The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row." In other words: It finds and returns matching columns from the index in memory not the table on disk, which is a good thing, unless it's doing this 12 million times for 1 matching column.
+-------------+-----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-----------------------+------+-----+---------+-------+
| attach_id | mediumint(8) unsigned | | MUL | 0 | |
| post_id | mediumint(8) unsigned | | | 0 | |
| privmsgs_id | mediumint(8) unsigned | | | 0 | |
| user_id_1 | mediumint(8) | | | 0 | |
| user_id_2 | mediumint(8) | | | 0 | |
+-------------+-----------------------+------+-----+---------+-------+
mysql> SHOW INDEX FROM phpbb_attachments;
+-------------------+------------+-----------------------+--------------+-------------+-----------+-------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality |
+-------------------+------------+-----------------------+--------------+-------------+-----------+-------------+
| phpbb_attachments | 1 | attach_id_post_id | 1 | attach_id | A | 61834 |
| phpbb_attachments | 1 | attach_id_post_id | 2 | post_id | A | 61834 |
| phpbb_attachments | 1 | attach_id_privmsgs_id | 1 | attach_id | A | 61834 |
| phpbb_attachments | 1 | attach_id_privmsgs_id | 2 | privmsgs_id | A | 61834 |
+-------------------+------------+-----------------------+--------------+-------------+-----------+-------------+
mysql> EXPLAINThat output speaks volumes: 5 rows to examine. This is the index that saved the server. I have not seen the server's load go above 4 and the website is running faster with more users at once. The difference was night and day. But why stop there? The query is no longer slow but it could be better. Notice how MySQL is not "Using index" anymore. This is because attach_id is not in the index its using. Whereas it's in the attach_id_post_id index MySQL has wisely chosen to examine fewer rows at the cost of doing a few disk seeks. The solution is a multiple column index on both post_id and attach_id, with post_id first.
-> SELECT attach_id as total_attachment FROM phpbb_attachments
-> WHERE post_id IN (163246, 164224, 164894, 165146, 167931);
+-------------------+-------+---------------+------------+---------+------+------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------------------+-------+---------------+------------+---------+------+------+-------------+
| phpbb_attachments | range | test_index | test_index | 3 | NULL | 5 | Using where |
+-------------------+-------+---------------+------------+---------+------+------+-------------+
mysql> CREATE INDEX post_id_attach_id ON phpbb_attachments (post_id, attach_id);As we can see MySQL still considers the test_index key but chooses post_id_attach_id because doing so will allow it to get matching attach_id from the index instead of the disk. A simple swap of column orders in the index made all the difference. As the saying goes, it takes one tree to make a thousand matches and one match to burn a thousand trees down.
Query OK, 61834 rows affected (0.53 sec)
Records: 61834 Duplicates: 0 Warnings: 0
mysql> EXPLAIN
-> SELECT attach_id as total_attachment FROM phpbb_attachments
-> WHERE post_id IN (163246, 164224, 164894, 165146, 167931);
+-------------------+-------+------------------------------+-------------------+---------+------+------+--------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------------------+-------+------------------------------+-------------------+---------+------+------+--------------------------+
| phpbb_attachments | range | test_index,post_id_attach_id | post_id_attach_id | 3 | NULL | 5 | Using where; Using index |
+-------------------+-------+------------------------------+-------------------+---------+------+------+--------------------------+
// Display Total Attachments MOD StartThis query runs every time a vistor views the forums index or views the topics in a forum. If, before we fixed the slow query, 200 hundred vistors navigated from the index to a topic MySQL would have examined about 24,417,200 rows. Now it may examine a few thousand at worse. What's more interesting is the purpose this query serves: It counts the number of attachments for a given forum or topic. Instead of using the SQL COUNT function to return one row with the sum it litterally gets all matching rows and then counts them afterwards. In some cases this means it matches and counts hundreds of rows when MySQL could do this internally. Ultimately the PHP function mysql_num_rows is called to count the rows returned so perhaps this function is optimized to know ahead of time how many rows were returned without actually counting them. Still it seems more logical just to use the SQL COUNT function.
$sql = "SELECT post_id
FROM " . POSTS_TABLE . "
WHERE topic_id = $topic_id
GROUP BY post_id";
if ( !($result = $db->sql_query($sql)) )
{
message_die(GENERAL_ERROR, 'Could not query post information', '', __LINE__, __FILE__, $sql);
}
$post_ids = '';
while ( $row = $db->sql_fetchrow($result) )
{
$post_ids .= ($post_ids == '') ? $row['post_id'] : ', ' . $row['post_id'];
}
if ( $post_ids != '' )
{
$sql = "SELECT attach_id as total_attachment
FROM " . ATTACHMENTS_TABLE . "
WHERE post_id IN (" . $post_ids . ")";
if ( !($result2 = $db->sql_query($sql)) )
{
message_die(GENERAL_ERROR, 'Could not query attachment information', '', __LINE__, __FILE__, $sql);
}
$attach_num = $db->sql_numrows($result2);
}
else
{
$attach_num = 0;
}
// Display Total Attachments MOD End
mysql> EXPLAIN SELECT post_id FROM phpbb_posts WHERE topic_id=30 GROUP BY post_id;Type "ref," using a key, returns 1 row, that's all great but what the hell: "Using temporary; Using filesort"? "Using temporary" means MySQL creates a temporary table (hopefully in memory) to hold the results. According to the manual "This typically happens if the query contains GROUP BY and ORDER BY clauses that list columns differently." A filesort is not desirable either because it means "MySQL will need to do an extra pass to find out how to retrieve the rows in sorted order." Seems overkill when only 1 row is expected to be examined. To optimize this query we need to know a little about the structure of the table and the current indexes:
+-------------+------+---------------+----------+---------+-------+------+----------------------------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------------+------+---------------+----------+---------+-------+------+----------------------------------------------+
| phpbb_posts | ref | topic_id | topic_id | 3 | const | 3 | Using where; Using temporary; Using filesort |
+-------------+------+---------------+----------+---------+-------+------+----------------------------------------------+
mysql> DESCRIBE phpbb_posts;There's a single column index on each column basically. MySQL uses the topic_id index for the WHERE condition but then it must disk seek for post_id. Since rows on the disk can be in any order and we're grouping by a column we have to read from rows on the disk (post_id) MySQL creates a temporary table and filesorts. First MySQL has to scan the matching rows to determine how to retrieve them in sorted order, this is the extra filesort pass. Then it retrieves the rows in sorted order putting them into a temporary table because where else can it put them? The rows aren't sorted on the disk and there's no sorted index. If we had a sorted index then none of this would be necessary because all the values (for topic_id and post_id) would already exist in memory in sorted order. All we have to do is create an appropriate index:
+-----------------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+-----------------------+------+-----+---------+----------------+
| post_id | mediumint(8) unsigned | | PRI | NULL | auto_increment |
| topic_id | mediumint(8) unsigned | | MUL | 0 | |
| forum_id | smallint(5) unsigned | | MUL | 0 | |
... (We don't need the remaining columns)
mysql> SHOW INDEX FROM phpbb_posts;
+-------------+------------+-----------+--------------+-------------+-----------+-------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality |
+-------------+------------+-----------+--------------+-------------+-----------+-------------+
| phpbb_posts | 0 | PRIMARY | 1 | post_id | A | 151962 |
| phpbb_posts | 1 | forum_id | 1 | forum_id | A | 23 |
| phpbb_posts | 1 | topic_id | 1 | topic_id | A | 16884 |
... (We don't need the remaining indexes)
mysql> CREATE INDEX topic_id_post_id ON phpbb_posts (topic_id, post_id);Much better: No temporary table or file sort and getting matching post_id from the index instead of the disk.
Query OK, 151999 rows affected (3.74 sec)
Records: 151999 Duplicates: 0 Warnings: 0
mysql> EXPLAIN SELECT post_id FROM phpbb_posts WHERE topic_id=30 GROUP BY post_id;
+-------------+------+---------------------------+------------------+---------+-------+------+--------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------------+------+---------------------------+------------------+---------+-------+------+--------------------------+
| phpbb_posts | ref | topic_id,topic_id_post_id | topic_id_post_id | 3 | const | 2 | Using where; Using index |
+-------------+------+---------------------------+------------------+---------+-------+------+--------------------------+