分类: Mysql/postgreSQL
2007-03-11 17:50:24
mysql> DESCRIBE hints;Unless there's some special condition I was not made aware of, the date column should not be type TEXT. What complicates the issue more is that the data in the date column is not normalized. That is, there are different representations of data (dates) in the column: Some dates are typical YYYY-MM-DD format and others are Unix timestamps. The first issue this creates is increased code complexity: Having to account for data that may be in different formats. Secondly, it's space inefficient. A date as TEXT will require 10 + 2 bytes or, times 45,000 rows, 540k. As a DATE column type, 3 bytes or 135k. The third issue involves fixinig the query: How to index a date column as TEXT. Indexes on TEXT columns require a prefix length; that is, since TEXT columns are truly variable you have to tell MySQL how much of it you want to index. In this case, since the data is not normalized and everything counts down to the last second for dates and times, we would have to index 10 bytes, another 450k for the index (minus compression). Finally, it's error prone: A DATE column type will reliaibly have a date. A TEXT column may have a DATE and a recipe for biscuits. Perhaps I'm just being overly zealous and picky; afterall, what's another 450k here or there, or a little more code to determine if the date is DATE or a Unix timestamp? Nothing, actually, but the point is: Why add these complications when they're completely unnecessary? Just use a DATE colum type. In short: .
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| hints_id | int(10) unsigned | | PRI | NULL | auto_increment |
| game_id | int(11) | | | 0 | |
| hint_title | text | | | | |
| hint | text | | | | |
| status | text | | | | |
| date | text | | | | |
+------------+------------------+------+-----+---------+----------------+
mysql> SHOW INDEXES FROM hints;MySQL executes the query by doing a full table scan (type: ALL) plus a filesort (caused by ORDER BY). Not surprising since there are no indexes MySQL can use (possible_keys: NULL). We at least want to prevent a table scan and reduce the number of rows examined. To do this we can create an index, but the result isn't going to be as spectacular as in was in :
+-------+------------+----------+--------------+-------------+-------------+----------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Cardinality | Sub_part |
+-------+------------+----------+--------------+-------------+-------------+----------+
| hints | 0 | PRIMARY | 1 | hints_id | 45021 | NULL |
+-------+------------+----------+--------------+-------------+-------------+----------+
mysql> EXPLAIN SELECT * FROM hints WHERE game_id = 374 ORDER BY date DESC;
+-------+------+---------------+------+---------+------+-------+-----------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+------+---------------+------+---------+------+-------+-----------------------------+
| hints | ALL | NULL | NULL | NULL | NULL | 45021 | Using where; Using filesort |
+-------+------+---------------+------+---------+------+-------+-----------------------------+
mysql> CREATE INDEX game_id_date ON hints (game_id, date(10));Success: No table scan and only one row examined. However, Extra doesn't say "Using index" even though our multi-column key includes both game_id and date. MySQL won't retrieve the values from the index because "SELECT * " requires more values: It requires all 6 columns for every matching row but the index only has 2 columns. In Case 1 only columns that were part of the key were included in the SELECT criteria, which is why things worked out better there than here. If we alter the table we can at least get rid of the filesort. The filesort is caused by date being TEXT. If we drop the index, normalize the data, change the column type to DATE, and re-add the index (these commands left out for brevity):
Query OK, 1001 rows affected (0.04 sec)
Records: 1001 Duplicates: 0 Warnings: 0
mysql> EXPLAIN SELECT * FROM hints WHERE game_id = 374 ORDER BY date DESC;
+-------+------+---------------+--------------+---------+-------+------+-----------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+------+---------------+--------------+---------+-------+------+-----------------------------+
| hints | ref | game_id_date | game_id_date | 5 | const | 1 | Using where; Using filesort |
+-------+------+---------------+--------------+---------+-------+------+-----------------------------+
mysql> EXPLAIN SELECT * FROM hints WHERE game_id = 374 ORDER BY date DESC;Not a slow query anymore. And it goes to show that using efficient column types is important, unless you like filesorts.
+-------+------+---------------+--------------+---------+-------+------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+------+---------------+--------------+---------+-------+------+-------------+
| hints | ref | game_id_date | game_id_date | 5 | const | 1 | Using where |
+-------+------+---------------+--------------+---------+-------+------+-------------+
mysql> SHOW INDEXES FROM links;Of course EXPLAIN predicts a full table scan (type: ALL) because there's no suitable key to do otherwise (key: NULL). Something has to be done about this query but simply adding an index like we usually do isn't so easy anymore because there's multiple conditions. In this case we get lucky:
+-------+------------+----------+--------------+-------------+-------------+----------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Cardinality | Sub_part |
+-------+------------+----------+--------------+-------------+-------------+----------+
| links | 0 | PRIMARY | 1 | links_id | 307551 | NULL |
| links | 1 | other_id | 1 | other_id | NULL | NULL |
| links | 1 | game_id | 1 | game_id | NULL | NULL |
+-------+------------+----------+--------------+-------------+-------------+----------+
mysql> EXPLAIN SELECT * FROM links WHERE
-> game_id = 5 OR (other_id = 200 AND class = 'articles') AND
-> has_link = 'true' LIMIT 0,1;
+-------+------+------------------+------+---------+------+--------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+------+------------------+------+---------+------+--------+-------------+
| links | ALL | other_id,game_id | NULL | NULL | NULL | 307551 | Using where |
+-------+------+------------------+------+---------+------+--------+-------------+
mysql> SELECT DISTINCT(has_link) FROM links;Ah ha! If every has_link = 'true' then we probably don't need that AND condition. (Not to mention, referring to effecient column types again, has_link as TEXT 1.8 Megs, as BOOL (TINYINT) 308k.) Now the query becomes:
+----------+
| has_link |
+----------+
| true |
+----------+
mysql> EXPLAIN SELECT * FROM links WHERE game_id = 5 LIMIT 0,1;This is a much better approach: These two queries can execute in a fraction of the time than the original. And since either or will match what we're looking for we can try the first query, if it matches we're done, if not try the second.
+-------+------+---------------+------+---------+-------+------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+------+---------------+------+---------+-------+------+-------------+
| links | ref | gid | gid | 5 | const | 36 | Using where |
+-------+------+---------------+------+---------+-------+------+-------------+
mysql> EXPLAIN SELECT * FROM links WHERE other_id = 200 AND class = 'articles' LIMIT 0,1;
+-------+------+---------------+------+---------+-------+------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+------+---------------+------+---------+-------+------+-------------+
| links | ref | oid | oid | 5 | const | 1 | Using where |
+-------+------+---------------+------+---------+-------+------+-------------+
mysql> EXPLAIN SELECT * FROM links WHERE game_id = 5 OR (other_id = 200 AND class = 'articles') LIMIT 0,1;Notice in the previous two queries the first matched 36 rows, the second 1 for 37 total and this index merge matches 37 rows—fun stuff huh? Makes you feel better that "it all adds up."
+----+-------------+-------+-------------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | links | index_merge | gid,oid | gid,oid | 5,5 | NULL | 37 | Using where |
+----+-------------+-------+-------------+---------------+---------+---------+------+------+-------------+