结合explain extended浅析使用mysql in 的效率-eatmyshort-ChinaUnix博客

bart&nbsp;的资料

首页　| 　博文目录　| 　关于我

eatmyshort

博客访问： 136190
博文数量： 37
博客积分： 0
博客等级：民兵
技术积分： 0
用户组：普通用户
注册时间： 2016-07-02 01:04

个人简介

文章分类

全部博文（37）

SQL（0）
操作系统（1）
mysql（7）
shell（16）
linux（2）
python（6）
sybase（5）
未分配的博文（0）

文章存档

2016年（33）

2015年（4）

我的朋友

相关博文

结合explain extended浅析使用mysql in 的效率

分类： Mysql/postgreSQL

2016-01-16 00:40:26

原文地址：结合explain extended浅析使用mysql in 的效率作者：wjlcn

用explain extended查看执行计划会比explain多一列 filtered。
filtered列给出了一个百分比的值，这个百分比值和rows列的值一起使用，可以估计出那些将要和explain中的前一个表进行连接的行的数目。
前一个表就是指explain 的 id列的值比当前表的id小的表。

1. mysql sql查询中，in是会走索引的：

点击(此处)折叠或打开

mysql> explain extended select *,sleep(0.2) from testinfo where id in (1232,232,324,2342,23);
+----+-------------+----------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | testinfo | range | PRIMARY | PRIMARY | 4 | NULL | 5 | 100.00 | Using where |
+----+-------------+----------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> show warnings \G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: select `test`.`testinfo`.`id` AS `id`,`test`.`testinfo`.`idtest` AS `idtest`,`test`.`testinfo`.`nametest` AS `nametest`,`test`.`testinfo`.`author` AS `author`,`test`.`testinfo`.`typetest` AS `typetest`,sleep(0.2) AS `sleep(0.2)` from `test`.`testinfo` where (`test`.`testinfo`.`id` in (1232,232,324,2342,23))
1 row in set (0.00 sec)
mysql> select *,sleep(0.2) from testinfo where id in (1232,232,324,2342,23);
5 rows in set (1.02 sec)
# Time: 130725 11:47:51
# User@Host: root[root] @ localhost []
# Query_time: 1.017450 Lock_time: 0.000219 Rows_sent: 5 Rows_examined: 5
SET timestamp=1374724071;
select *,sleep(0.2) from testinfo where id in (1232,232,324,2342,23);

可见，id in (1232,232,324,2342,23) 是走了主键索引，而且效果很好，扫描5行就出结果了。

2.看看sql为：select count(*) from testinfo where id not in (select id from testinfo group by idtest);的效率

点击(此处)折叠或打开

mysql> explain extended select count(*) from testinfo where id not in (select id from testinfo group by idtest);
+----+--------------------+----------+-------+---------------+------------+---------+------+------+-----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+----------+-------+---------------+------------+---------+------+------+-----------+--------------------------+
| 1 | PRIMARY | testinfo | index | NULL | key_idtest | 62 | NULL | 8761 | 100.00 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | testinfo | index | NULL | key_idtest | 62 | NULL | 1 | 876100.00 | Using index |
+----+--------------------+----------+-------+---------------+------------+---------+------+------+-----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)

表面上看都走了索引，但仔细发现filtered列为876100，这个值很大，直接影响到执行sql时扫描的行数。

mysql> show warnings \G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: select count(0) AS `count(*)` from `test`.`testinfo` where (not(<in_optimizer>(`test`.`testinfo`.`id`,<exists>(select `test`.`testinfo`.`id` from `test`.`testinfo` group by `test`.`testinfo`.`idtest` having (<cache>(`test`.`testinfo`.`id`) = <ref_null_helper>(`test`.`testinfo`.`id`))))))
1 row in set (0.00 sec)
可见，经过mysql优化器后，in 给转换成exists的方式，下面实际执行一次sql花了36秒
mysql> select count(*) from testinfo where id not in (select id from testinfo group by idtest);
+----------+
| count(*) |
+----------+
| 1059 |
+----------+
1 row in set (36.79 sec)
根据上面的执行计划，估算大概的扫描的行数为：76755121
mysql> select 8761*((876100*1)/100)
-> ;
+-----------------------+
| 8761*((876100*1)/100) |
+-----------------------+
| 76755121.0000 |
+-----------------------+
1 row in set (0.00 sec)
而实际执行扫描的行数为：50910026
# User@Host: root[root] @ localhost []
# Query_time: 36.793302 Lock_time: 0.000227 Rows_sent: 1 Rows_examined: 50910026
SET timestamp=1374723426;
select count(*) from testinfo where id not in (select id from testinfo group by idtest);

从上面测试可知，in里面的子查询并非是先查出结果后再执行外层的查询。当in中子查询含有group by时，需注意是否会产生扫描的行数很大，sql执行效率很低。

3.将上面的sql变换一下：
先创建一个临时表：
create table wjlcn_temp(id int auto_increment primary key);
再将中间结果insert到临时表中：
insert into wjlcn_temp select id from testinfo group by idtest;
再来查询结果：select count(*) from testinfo where id not in (select id from wjlcn_temp);

点击(此处)折叠或打开

mysql> explain extended select count(*) from testinfo where id not in (select id from wjlcn_temp);
+----+--------------------+------------+-----------------+---------------+------------+---------+------+------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+------------+-----------------+---------------+------------+---------+------+------+----------+--------------------------+
| 1 | PRIMARY | testinfo | index | NULL | key_idtest | 62 | NULL | 8761 | 100.00 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | wjlcn_temp | unique_subquery | PRIMARY | PRIMARY | 4 | func | 1 | 100.00 | Using index |
+----+--------------------+------------+-----------------+---------------+------------+---------+------+------+----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)
mysql> show warnings \G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: select count(0) AS `count(*)` from `test`.`testinfo` where (not(<in_optimizer>(`test`.`testinfo`.`id`,<exists>(<primary_index_lookup>(<cache>(`test`.`testinfo`.`id`) in wjlcn_temp on PRIMARY)))))
1 row in set (0.00 sec)
mysql> select count(*),sleep(1) from testinfo where id not in (select id from wjlcn_temp);
+----------+----------+
| count(*) | sleep(1) |
+----------+----------+
| 1059 | 0 |
+----------+----------+
1 row in set (1.02 sec)
# Time: 130725 11:41:04
# User@Host: root[root] @ localhost []
# Query_time: 1.026054 Lock_time: 0.000231 Rows_sent: 1 Rows_examined: 9999
SET timestamp=1374723664;
select count(*),sleep(1) from testinfo where id not in (select id from wjlcn_temp);

从上面可以看到执行计划的filtered列为100，跟上面的sql有很大的区别。
其次，在explain中出现了unique_subquery

文档中解释：
unique_subquery
This type replaces ref for some IN subqueries of the following form:

value IN (SELECT primary_key FROM single_table WHERE some_expr)
unique_subquery is just an index lookup function that replaces the subquery completely for better efficiency.

当sql中出现unique_subquery时，sql会自动替换in 后面的子查询。从上面的执行计划中可以看到sql实际执行的是：
select count(0) AS `count(*)` from `test`.`testinfo` where (not(<in_optimizer>(`test`.`testinfo`.`id`,<exists>(<primary_index_lookup>(<cache>(`test`.`testinfo`.`id`) in wjlcn_temp on PRIMARY)))))
当sql中用到 primary_index_lookup时，sql的执行效率也比较好。
从慢查询中看到扫描的行数为：9999，而实际的执行时间为：Query_time: 1.026054 - 1 =0.026054秒。
比起前面的36秒好了很多，如果表的记录数更大时，执行时间相差更加明显。

所以，在sql中用in子查询时，最后看看执行计划。若在线上大表频繁执行 select count(*) from testinfo where id not in (select id from testinfo group by idtest);类的sql，可能会导致服务器的性能问题。

阅读(1358) | 评论(0) | 转发(0) |

上一篇：linux sort 命令详解

下一篇：Mysql Join语法解析与性能分析

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6