ORACLE vs MySQL 对组合索引包含IN LIST执行计划研究(MySQL部分)

博客访问： 1465070
博文数量： 180
博客积分： 0
博客等级：民兵
技术积分： 3926
用户组：普通用户
注册时间： 2011-03-30 13:00

个人简介

About me:Oracle ACE pro,optimistic,passionate and harmonious. Focus on ORACLE,MySQL and other database programming,peformance tuning,db design, j2ee,Linux/AIX,Architecture tech,etc

文章分类

全部博文（180）

CBO（57）
Tuning&Performan（35）
Concepts&Archite（4）
TroubleShooting（5）
New Feature（7）
SQL（52）
PL/SQL（8）
Miscellaneous（3）
MySQL（9）
未分配的博文（0）

文章存档

2025年（8）

2024年（27）

2023年（28）

2022年（43）

2020年（62）

2014年（3）

2013年（9）

我的朋友

2.MySQL里组合索引有IN条件的情况

2.1）MySQL单表组合索引IN LIST情况：
MySQL貌似没有inlist iterator算子，能否实现INLIST iterator效果：

建表语句如下：

点击(此处)折叠或打开

drop table t1;
drop table t2;
create table t1 as select * from employees;
alter table t1 add status int default 0;
update t1 set status=ceil(rand()*10);
create table t2 as select * from t1;

t1,t2表结构：

点击(此处)折叠或打开

t1,t2表结构：
+------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+-------+
| emp_no | int | NO | | NULL | |
| birth_date | date | NO | | NULL | |
| first_name | varchar(14) | NO | | NULL | |
| last_name | varchar(16) | NO | | NULL | |
| gender | enum('M','F') | NO | | NULL | |
| hire_date | date | NO | | NULL | |
| status | int | YES | | 0 | |
+------------+---------------+------+-----+---------+-------+

创建索引，将选择性好的放前面：
create index idx1_t1 on t1(first_name,status);

###对于MySQL，看组合索引用了几个列参与索引access,可以通过索引列key_len查看，
也可以通过树形执行计划sing idx1_t1 over 有几个列组合查看。
或者可以查看status变量handler确定索引扫描区间数：
Handler_read_key和Handler_read_next，
Handler_read_next是按照索引顺序扫描下一行的次数，也可以代表回表次数(比如有icp可以减少这
个值），
所以Handler_read_next也不能判断索引实际扫描的行数，而是索引扫描结果行数。

看下执行计划：有Using index condition，说明第二个列status in (1,2,8)还是ICP条件,
索引access用到几个列，可以通过key_len看：
first_name的key_len=14*3+2 (变长的+2,utf8mb3是3个字节）=44,
status是int型，对应的key_len=4+1(允许null+1)=5
可以看到执行计划的key_len=49,说明两个条件都用到。

explain select * from t1
where t1.first_name='Saniya' and t1.status in (1,2,8);

点击(此处)折叠或打开

+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | t1 | NULL | range | idx1_t1 | idx1_t1 | 49 | NULL | 78 | 100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+

从树形计划上看：using idx1_t1 over 转为两个or分支，也就是2个区间，说明两个条件都用到，
对应的还有icp，不过这个icp貌似作用不大：

点击(此处)折叠或打开

explain format=tree
select * from t1
where t1.first_name='Saniya' and t1.status in (1,2,8)\G
*************************** 1. row ***************************
EXPLAIN: -> Index range scan on t1 using idx1_t1 over
(first_name = 'Saniya' AND status = 1) OR (first_name = 'Saniya' AND status = 2) OR (first_name = 'Saniya' AND status = 8),
with index condition: ((t1.first_name = 'Saniya') and (t1.`status` in (1,2,8))) (cost=42.35 rows=78)

可以查看实际执行信息：

点击(此处)折叠或打开

flush status;
explain analyze
select * from t1
where t1.first_name='Saniya' and t1.status in (1,2,8)\G
*************************** 1. row ***************************
EXPLAIN: -> Index range scan on t1 using idx1_t1 over (first_name = 'Saniya' AND status = 1)
OR (first_name = 'Saniya' AND status = 2) OR (first_name = 'Saniya' AND status = 8),
with index condition: ((t1.first_name = 'Saniya') and (t1.`status` in (1,2,8)))
(cost=42.35 rows=78) (actual time=0.038..0.668 rows=78 loops=1)

通过show status查看handler信息，可以看到Handler_read_key=3，对应三个索引区间。
说明first_name和status都参与索引扫描，执行计划里显示的icp基本用不着，这时候
Handler_read_next=扫描的行数。

点击(此处)折叠或打开

show status like '%handler%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_commit | 1 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 2 |
| Handler_mrr_init | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 0 |
| Handler_read_key | 3 |
| Handler_read_last | 0 |
| Handler_read_next | 78 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 0 |
| Handler_rollback | 0 |
| Handler_savepoint | 0 |
| Handler_savepoint_rollback | 0 |
| Handler_update | 0 |
| Handler_write | 0 |
+----------------------------+-------+
select ps_thread_id(connection_id());
+-------------------------------+
| ps_thread_id(connection_id()) |
+-------------------------------+
| 50 |
+-------------------------------+
通过检查examined_rows=78行，也符合索引扫描行数，因为使用explain analyze,rows_sent=1：
+-----------+---------------+-----------+---------------+
| THREAD_ID | ROWS_AFFECTED | ROWS_SENT | ROWS_EXAMINED |
+-----------+---------------+-----------+---------------+
| 50 | 0 | 1 | 78 |
+-----------+---------------+-----------+---------------+

下面将索引顺序换一下,貌似都和in非前导列的一样，都有ICP，都是改成or条件（实际上这个icp没
有啥用filtered=100）：
将选择性不好的列放前面：
create index idx2_t1 on t1(status,first_name);

通过key_len和树形计划的using idx2_t1 over，可以看到两个列条件都用到：
现在让两个列都是in,看是否能转为2*3=6个索引区间：

点击(此处)折叠或打开

explain select/*+index(t1 idx2_t1)*/ * from t1
where t1.first_name in('Saniya','Aamer') and t1.status in (1,2,8);
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | t1 | NULL | range | idx2_t1 | idx2_t1 | 49 | NULL | 144 | 100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+
select count(*) from t1 where status in (1,2,8);
+----------+
| count(*) |
+----------+
| 89751 |
+----------+
select count(*) from t1
where t1.first_name in('Saniya','Aamer')
and t1.status in (1,2,8);
+----------+
| count(*) |
+----------+
| 144 |
+----------+

实际执行看：

点击(此处)折叠或打开

flush status;
explain analyze
select/*+index(t1 idx2_t1)*/ * from t1
where t1.first_name in('Saniya','Aamer') and t1.status in (1,2,8)\G
*************************** 1. row ***************************
EXPLAIN: -> Index range scan on t1 using idx2_t1 over
(status = 1 AND first_name = 'Aamer') OR (status = 1 AND first_name = 'Saniya')
OR (4 more), with index condition: ((t1.first_name in ('Saniya','Aamer')) and (t1.`status` in (1,2,8)))
(cost=78.34 rows=144) (actual time=0.055..2.067 rows=144 loops=1)

执行计划里转为的or超过3个后面变为or (n more)

可以看到转为6个扫描区间：Handler_read_key=6，
icp的实际用不上，所以两个列都是index access,对应Handler_read_next和ROWS_EXAMINED都是
扫描的行数144：

点击(此处)折叠或打开

show status like '%handler%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_commit | 1 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 2 |
| Handler_mrr_init | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 0 |
| Handler_read_key | 6 |
| Handler_read_last | 0 |
| Handler_read_next | 144 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 0 |
| Handler_rollback | 0 |
| Handler_savepoint | 0 |
| Handler_savepoint_rollback | 0 |
| Handler_update | 0 |
| Handler_write | 0 |
+----------------------------+-------+
select THREAD_ID,ROWS_AFFECTED,ROWS_SENT,ROWS_EXAMINED from performance_schema.events_statements_current where thread_id=50;
+-----------+---------------+-----------+---------------+
| THREAD_ID | ROWS_AFFECTED | ROWS_SENT | ROWS_EXAMINED |
+-----------+---------------+-----------+---------------+
| 50 | 0 | 1 | 144 |
+-----------+---------------+-----------+---------------+

总结：MySQL单表访问，组合索引in不管是否是前导列，都能按照leftmost prefix规则，转为or,然后
转为多个索引区间(interval),
能够充分利用组合索引条件进行index access,执行计划里虽然有icp,基本用不着。
MySQL没有inlist iterator算子，可以将in list转为多个区间从而实现单表访问inlist iterator效果。

2.2）多表join,inlist表作为被驱动表
看能否实现和单表一样的效果

先给t2的last_name建个索引，让t2做驱动表，t1被驱动：
create index idx_t2 on t2(last_name);

MySQL组合索引有IN的，如果IN里只有一个值，和ORACLE一样，也会转为等值，则组合索引都是
等值的，可以直接access

MySQL和ORACLE对于组合索引中条件in有多个值的的处理貌似不一样：

###MySQL没有index inlist iterator算子，如果组合索引前导列是in并且有多个值，则不能转为多个
等值的组合条件，
这样只能in的列作为access,后续列作为icp，如果关联查询且被驱动走NL，则in的列作为单独条件，
作为索引access效率则可能很低，
所以在MySQL里，关联列作为索引前导列较好，in的作为非前导列（用于icp条件，可能有bug将in
条件放到回表FILTER，则可能非常影响效率）。

1)不管in是否多个值，列是否是前导列，则都有icp
2)如果in的是非前导列，则in作为icp条件，这个和ORACLE一样
3)如果in是前导列，则in条件列对应的后续列作为icp条件，这个和ORACLE不一样，ORACLE有inlist
iterator算子,
将in转为or,变为多个索引等值条件，然后避免icp，MySQL没有inlist interator算子，in作为前导列
在关联查询里效率可能很低，将关联列作为前导列较好。

###在MySQL里in条件避免不了icp,所以关联查询的被驱动表，将关联列作为索引前导列，这样
可以access条件，in条件列作为非前导列使用ICP，
如果将in条件作为前导列，则关联列只能用于ICP，可能效率很低。

驱动表730行，被驱动表循环730次：

点击(此处)折叠或打开

select count(*) from t2 where last_name like 'Ar%';
+----------+
| count(*) |
+----------+
| 730 |
+----------+

###1.先测试关联列是前导列，然后+in list组合索引，走nested loops,t1被驱动：

点击(此处)折叠或打开

explain select * from t1,t2
where t1.first_name=t2.first_name and t1.status in (1,3,8)
and t2.last_name like 'Ar%';

通过执行计划看，多表JOIN和单表不一样，t1被驱动，走idx1_t1(first_name,status),对应的
key_len是44，不是49（两个列都用上是49），
可以看到filtered=59.26,也就是status in (1,3,8)起类似过滤效果是真正ICP，过滤了接近40%的
数据，这个和ORACLE的多表JOIN不一样，
在MySQL里貌似多表JOIN的索引有in list,不能让多列都走index access:

点击(此处)折叠或打开

+----+-------------+-------+------------+-------+-----------------+---------+---------+-------------------------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+-----------------+---------+---------+-------------------------+------+----------+-----------------------+
| 1 | SIMPLE | t2 | NULL | range | idx_t2 | idx_t2 | 50 | NULL | 730 | 100.00 | Using index condition |
| 1 | SIMPLE | t1 | NULL | ref | idx2_t1,idx1_t1 | idx1_t1 | 44 | employees.t2.first_name | 234 | 59.26 | Using index condition |
+----+-------------+-------+------------+-------+-----------------+---------+---------+-------------------------+------+----------+-----------------------+
2 rows in set, 1 warning (0.00 sec)
flush status;
explain analyze
select * from t1,t2
where t1.first_name=t2.first_name and t1.status in (1,3,8)
and t2.last_name like 'Ar%'\G

耗时0.22s,索引访问只能用到first_name,这里的first_name选择性比较好，这样status条件走不了
索引其实差别不大
改成union all效果不好，所以将关联列作为前导列较好，这样可以用到前导列索引：

show status like '%handler%';

Handler_read_key=驱动表索引区间1+被驱动表730次（每次一个区间）=731，
Handler_read_next这时候就不是索引真正扫描的次数了，因为被驱动表t1只走first_name列索引
扫描，还有icp:status in (1,3,8),
这时候只是结果行数，相应的examined_rows也一样，这时候都是结果行数，结果行数包括驱动
表t2和t1的索引结果行数。

点击(此处)折叠或打开

(root@localhost) [employees_15:21:17]> show status like '%handler%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_commit | 1 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 4 |
| Handler_mrr_init | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 0 |
| Handler_read_key | 731 |
| Handler_read_last | 0 |
| Handler_read_next | 52626 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 0 |
| Handler_rollback | 0 |
| Handler_savepoint | 0 |
| Handler_savepoint_rollback | 0 |
| Handler_update | 0 |
| Handler_write | 0 |
+----------------------------+-------+
select THREAD_ID,ROWS_AFFECTED,ROWS_SENT,ROWS_EXAMINED from performance_schema.events_statements_current where thread_id=50;
+-----------+---------------+-----------+---------------+
| THREAD_ID | ROWS_AFFECTED | ROWS_SENT | ROWS_EXAMINED |
+-----------+---------------+-----------+---------------+
| 50 | 0 | 1 | 52626 |
+-----------+---------------+-----------+---------------+
1 row in set (0.01 sec)
flush status;
explain analyze
select * from t1,t2
where t1.first_name=t2.first_name and t1.status in (1)
and t2.last_name like 'Ar%'\G

如果in只有一个值，转为=，则访问条件是using idx1_t1 (first_name=t2.first_name, status=1)，
没有问题：

点击(此处)折叠或打开

*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join (cost=7653.14 rows=17026) (actual time=0.078..67.316 rows=17234 loops=1)
-> Index range scan on t2 using idx_t2 over ('Ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'Ar??????????????'),
with index condition: (t2.last_name like 'Ar%') (cost=328.76 rows=730) (actual time=0.049..6.793 rows=730 loops=1)
-> Index lookup on t1 using idx1_t1 (first_name=t2.first_name, status=1)
(cost=7.70 rows=23) (actual time=0.009..0.081 rows=24 loops=730)
普通执行计划没有icp,key_len=49,两个列都用上：
+----+-------------+-------+------------+-------+-------------------------+---------+---------+-------------------------------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+-------------------------+---------+---------+-------------------------------+------+----------+-----------------------+
| 1 | SIMPLE | t2 | NULL | range | idx_t2 | idx_t2 | 50 | NULL | 730 | 100.00 | Using index condition |
| 1 | SIMPLE | t1 | NULL | ref | idx2_t1,idx1_t1,idx3_t1 | idx1_t1 | 49 | employees.t2.first_name,const | 23 | 100.00 | NULL |
+----+-------------+-------+------------+-------+-------------------------+---------+---------+-------------------------------+------+----------+-----------------------+
扫描的行数没有问题，被驱动t1的真实rows=24,这个是平均值。 730*24=17520略大于真实行数，然后+驱动表700+行，实际扫描17964行。
show status like 'handle%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_commit | 1 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 4 |
| Handler_mrr_init | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 0 |
| Handler_read_key | 731 |
| Handler_read_last | 0 |
| Handler_read_next | 17964 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 0 |
| Handler_rollback | 0 |
| Handler_savepoint | 0 |
| Handler_savepoint_rollback | 0 |
| Handler_update | 0 |
| Handler_write | 0 |
+----------------------------+-------+
select THREAD_ID,ROWS_AFFECTED,ROWS_SENT,ROWS_EXAMINED from performance_schema.events_statements_current where thread_id=50;
+-----------+---------------+-----------+---------------+
| THREAD_ID | ROWS_AFFECTED | ROWS_SENT | ROWS_EXAMINED |
+-----------+---------------+-----------+---------------+
| 50 | 0 | 1 | 17964 |
+-----------+---------------+-----------+---------------+

###2：测试in list列是前导列，关联条件列非前导列，被驱动走索引
先删除first_name前导列索引：
drop index idx1_t1 on t1;

单个值同样没有问题：using idx2_t1 (status=1, first_name=t2.first_name).

点击(此处)折叠或打开

flush status;
explain analyze
select * from t1,t2
where t1.first_name=t2.first_name and t1.status in (1)
and t2.last_name like 'Ar%'\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join (cost=7741.93 rows=17233) (actual time=0.097..77.225 rows=17234 loops=1)
-> Index range scan on t2 using idx_t2 over ('Ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'Ar??????????????'), with index condition: (t2.last_name like 'Ar%') (cost=328.76 rows=730) (actual time=0.053..12.567 rows=730 loops=1)
-> Index lookup on t1 using idx2_t1 (status=1, first_name=t2.first_name) (cost=7.80 rows=24) (actual time=0.011..0.086 rows=24 loops=730)
前导列是in有多个值，关联列是非前导列，走了hash join，0.22s:
explain analyze
select * from t1,t2
where t1.first_name=t2.first_name and t1.status in (1,3,8)
and t2.last_name like 'Ar%'\G
*************************** 1. row ***************************
EXPLAIN: -> Inner hash join (t1.first_name = t2.first_name) (cost=1306095.83 rows=763905) (actual time=6.099..329.593 rows=51896 loops=1)
-> Filter: (t1.`status` in (1,3,8)) (cost=25.42 rows=17657) (actual time=0.029..281.806 rows=89911 loops=1)
-> Table scan on t1 (cost=25.42 rows=297939) (actual time=0.019..258.542 rows=300024 loops=1)
-> Hash
-> Index range scan on t2 using idx_t2 over ('Ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'Ar??????????????'),
with index condition: (t2.last_name like 'Ar%') (cost=328.76 rows=730) (actual time=0.035..5.303 rows=730 loops=1)
explain analyze树形计划看的cost=1306095,和json,实际估算的costs不一样：12906876，json的和普通计划的一样
show status like '%last_query_cost%';
+-----------------+-----------------+
| Variable_name | Value |
+-----------------+-----------------+
| Last_query_cost | 12906876.294155 |
+-----------------+-----------------+
EXPLAIN: {
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "12906876.29"
},

禁用hash join，竟然t1被驱动走全表，没有选择组合索引，说明走索引效率低cost大,
因为删除了idx1_t1(first_name,status)索引，这里的关联列没有索引，而是在status列上有前导列
索引，
MySQL基本不选择t1走索引，可以通过possible_keys看出，很显然，这里的cost计算不准才选择
这种不走索引的计划：
"query_cost": "22313643.61" < 走idx2_t1索引的68339620.40

点击(此处)折叠或打开

explain
select/*+no_bnl(t1,t2) */ * from t1,t2
where t1.first_name=t2.first_name and t1.status in (1,3,8)
and t2.last_name like 'Ar%';
+----+-------------+-------+------------+-------+---------------+--------+---------+------+--------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------+---------+------+--------+----------+-----------------------+
| 1 | SIMPLE | t2 | NULL | range | idx_t2 | idx_t2 | 50 | NULL | 730 | 100.00 | Using index condition |
| 1 | SIMPLE | t1 | NULL | ALL | idx2_t1 | NULL | NULL | NULL | 297939 | 5.93 | Using where |
+----+-------------+-------+------------+-------+---------------+--------+---------+------+--------+----------+-----------------------+

通过optimizer_trace查看：
先计算单表走索引按照status查询的cost:

点击(此处)折叠或打开

"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "idx2_t1",
"ranges": [
"status = 1",
"status = 3",
"status = 8"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"in_memory": 0.111617,
"rows": 176572,
"cost": 75958.3,
"chosen": false,
"cause": "cost"
}

MySQL没有考虑t1走idx2_t1索引，直接按照全表扫描来，没有考虑走nested loops情况下，可以
走组合索引选择率，比前面计算的只走status要小很多的情况，
所以直接走全表扫描，也没有考虑hash join:

点击(此处)折叠或打开

"cost_for_plan": 328.76,
"rest_of_plan": [
{
"plan_prefix": [
"`t2`"
],
"table": "`t1`",
"best_access_path": {
"considered_access_paths": [
{
"rows_to_scan": 297939,
"filtering_effect": [
],
"final_filtering_effect": 0.592645,
"access_type": "scan",
"resulting_rows": 176572,
"cost": 2.23133e+07,
"chosen": true
}
]
},

所以t2作为被驱动表，如果关联列不是前导列，则必须要有前导列且等值条件（in一个值也转为
等值），才可以让t2走nested loops,走索引。

强制走索引，可以看到rows=176572，filtered=10，key_len=5,索引效率很低，只走了status列
索引，first_name是ICP：
json显示的执行 "query_cost": "68339620.40",走索引的cost大，这个计算不准，所以默认走了
全表

点击(此处)折叠或打开

explain
select/*+no_bnl(t1,t2) index(t1 idx2_t1) */ * from t1,t2
where t1.first_name=t2.first_name and t1.status in (1,3,8)
and t2.last_name like 'Ar%';
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-----------------------+
| 1 | SIMPLE | t2 | NULL | range | idx_t2 | idx_t2 | 50 | NULL | 730 | 100.00 | Using index condition |
| 1 | SIMPLE | t1 | NULL | range | idx2_t1 | idx2_t1 | 5 | NULL | 176572 | 10.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-----------------------+

explain analyze结果：执行耗时16s

点击(此处)折叠或打开

explain analyze select/*+no_bnl(t1,t2) index(t1 idx2_t1) */ * from t1,t2
where t1.first_name=t2.first_name and t1.status in (1,3,8)
and t2.last_name like 'Ar%'\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join (cost=16180534643.87 rows=128897560) (actual time=8.275..16590.253 rows=51896 loops=1)
-> Index range scan on t2 using idx_t2 over ('Ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'Ar??????????????'), with index condition: (t2.last_name like 'Ar%') (cost=328.76 rows=730) (actual time=0.083..8.235 rows=730 loops=1)
-> Index range scan on t1 using idx2_t1 over (status = 1) OR (status = 3) OR (status = 8), with index condition: ((t1.first_name = t2.first_name) and (t1.`status` in (1,3,8))) (cost=75958.29 rows=176572) (actual time=3.688..22.707 rows=71 loops=730)
1 row in set (16.62 sec)

可以看到，不走关联列索引，走status in条件索引，效率比前面走first_name列索引差很多，因为
status列索引选择率不好。
可以看到，前导列in list+非前导列关联的，也不能全部走index access,和单表不一样。

这种前导列IN LIST且是非关联列效率很低的，既然in list走不了多列索引扫描，那么改成等值就
可以了，
改为UNION ALL提升很明显，这样可以利用组合索引进行扫描：执行时间从16s到0.29s:

点击(此处)折叠或打开

explain analyze
select/*+index(t1 idx2_t1)*/ * from t1,t2
where t1.first_name=t2.first_name and t1.status in (1)
and t2.last_name like 'Ar%'
union all
select/*+index(t1 idx2_t1)*/ * from t1,t2
where t1.first_name=t2.first_name and t1.status in (3)
and t2.last_name like 'Ar%'
union all
select/*+index(t1 idx2_t1)*/ * from t1,t2
where t1.first_name=t2.first_name and t1.status in (8)
and t2.last_name like 'Ar%'\G
*************************** 1. row ***************************
EXPLAIN: -> Append (actual time=0.064..261.577 rows=51896 loops=1)
-> Stream results (cost=7741.93 rows=17233) (actual time=0.063..89.491 rows=17234 loops=1)
-> Nested loop inner join (cost=7741.93 rows=17233) (actual time=0.056..64.456 rows=17234 loops=1)
-> Index range scan on t2 using idx_t2 over ('Ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'Ar??????????????'), with index condition: (t2.last_name like 'Ar%') (cost=328.76 rows=730) (actual time=0.035..3.081 rows=730 loops=1)
-> Index lookup on t1 using idx2_t1 (status=1, first_name=t2.first_name) (cost=7.80 rows=24) (actual time=0.006..0.082 rows=24 loops=730)
-> Stream results (cost=7741.93 rows=17233) (actual time=0.048..83.117 rows=17265 loops=1)
-> Nested loop inner join (cost=7741.93 rows=17233) (actual time=0.044..59.954 rows=17265 loops=1)
-> Index range scan on t2 using idx_t2 over ('Ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'Ar??????????????'), with index condition: (t2.last_name like 'Ar%') (cost=328.76 rows=730) (actual time=0.031..2.674 rows=730 loops=1)
-> Index lookup on t1 using idx2_t1 (status=3, first_name=t2.first_name) (cost=7.80 rows=24) (actual time=0.006..0.077 rows=24 loops=730)
-> Stream results (cost=7741.93 rows=17233) (actual time=0.043..85.437 rows=17397 loops=1)
-> Nested loop inner join (cost=7741.93 rows=17233) (actual time=0.039..61.487 rows=17397 loops=1)
-> Index range scan on t2 using idx_t2 over ('Ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'Ar??????????????'), with index condition: (t2.last_name like 'Ar%') (cost=328.76 rows=730) (actual time=0.028..2.909 rows=730 loops=1)
-> Index lookup on t1 using idx2_t1 (status=8, first_name=t2.first_name) (cost=7.80 rows=24) (actual time=0.006..0.078 rows=24 loops=730)
1 row in set (0.29 sec)

###将索引列in list里值用with构造中间表，然后关联，这样先与驱动表做无条件join,然后驱动
in list表（实际上扩大了驱动表行数*inlist数目），
将in list转为等值条件，这样充分利用组合索引的多列做index access提高效率（主要是索引前导
列效率不好的时候使用）。

想一种方法怎么避免UNION ALL，很显然IN LIST里三个数字(1,3,8)可以用with构造个临时表，
然后JOIN，效率变成0.19s:

点击(此处)折叠或打开

explain analyze
with tmp as
(select 1 status union all select 3 status union all select 8 status)
select/*+no_bnl(t1,t2) index(t1 idx2_t1) */ * from t1,t2,tmp
where t1.first_name=t2.first_name and t1.status = tmp.status
and t2.last_name like 'Ar%'\G
EXPLAIN: -> Nested loop inner join (cost=22790.57 rows=51698) (actual time=3.096..171.256 rows=51896 loops=1)
-> Inner hash join (no condition) (cost=551.06 rows=2190) (actual time=3.049..3.937 rows=2190 loops=1)
-> Table scan on tmp (cost=1.15..2.84 rows=3) (actual time=0.025..0.031 rows=3 loops=1)
-> Materialize union CTE tmp (cost=0.30..0.30 rows=3) (actual time=0.022..0.022 rows=3 loops=1)
-> Rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
-> Rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
-> Rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
-> Hash
-> Index range scan on t2 using idx_t2 over ('Ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'Ar??????????????'), with index condition: (t2.last_name like 'Ar%') (cost=328.76 rows=730) (actual time=0.046..2.756 rows=730 loops=1)
-> Index lookup on t1 using idx2_t1 (status=tmp.`status`, first_name=t2.first_name), with index condition: (t1.`status` = tmp.`status`) (cost=7.80 rows=24) (actual time=0.008..0.075 rows=24 loops=2190)
1 row in set (0.19 sec)

同样的原来走idx1_t1的以first_name为前导列的也可以改成两个列都走索引，只不过因为
first_name选择性本来就够好，所以效率差别不大：基本都是0.2s左右：

点击(此处)折叠或打开

explain analyze
with tmp as
(select 1 status union all select 3 status union all select 8 status)
select/*+no_bnl(t1,t2) index(t1 idx1_t1) */ * from t1,t2,tmp
where t1.first_name=t2.first_name and t1.status = tmp.status
and t2.last_name like 'Ar%'\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join (cost=22524.20 rows=51079) (actual time=2.879..171.640 rows=51896 loops=1)
-> Inner hash join (no condition) (cost=551.06 rows=2190) (actual time=2.838..3.779 rows=2190 loops=1)
-> Table scan on tmp (cost=1.15..2.84 rows=3) (actual time=0.049..0.055 rows=3 loops=1)
-> Materialize union CTE tmp (cost=0.30..0.30 rows=3) (actual time=0.046..0.046 rows=3 loops=1)
-> Rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
-> Rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
-> Rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
-> Hash
-> Index range scan on t2 using idx_t2 over ('Ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'Ar??????????????'), with index condition: (t2.last_name like 'Ar%') (cost=328.76 rows=730) (actual time=0.045..2.543 rows=730 loops=1)
-> Index lookup on t1 using idx1_t1 (first_name=t2.first_name, status=tmp.`status`), with index condition: (t1.`status` = tmp.`status`) (cost=7.70 rows=23) (actual time=0.008..0.075 rows=24 loops=2190)
1 row in set (0.20 sec)

总结：
MySQL里组合索引，如果有in多个值的，对于单表，可以转为or多个索引区间进行索引扫描，索引
扫描可以用上inlist列条件。

但是对于多表JOIN的，如果被驱动表组合索引有in条件：
1）如果关联列是前导列且等值的，index access则是使用关联列，in非前导列多个值的，只能是icp
，这种关联列选择性好，一般问题不大
2) 如果关联列是非前导列，前导列是in条件，如果前导列in list选择性不好，一般走hash join,
如果强制走索引，则走前导列in的，关联列则走icp,效率一般比关联列是前导列的差很多。
针对这种，要么重新选择或建立以关联列（前提关联列选择性好）的索引，如果关联列选择
性不好，组合索引选择性好，可以引入中间表，将in list里面的值用with构造为临时表，然后
与原表关联，这样执行计划可以将构造的中间表先与驱动表关联（无条件JOIN,实际上扩大
了驱动表行数*inlist数目），然后驱动in list表，这样in条件转为等值的（循环执行），则组
合索引列都能用于index access。

所以对于多表JOIN，子查询等，包含in条件的，in条件不要做前导列，而将关联列作为前导列（
前提选择性得好），这样可以利用关联列做NESTED LOOPS扫描，in条件做icp，如果关联条件
选择性不好，可以使用with构造将in构造为临时表然后关联。
（效果没有ORACLE的inlist iterator算子好）。

另外注意一下，对于索引访问有icp的，examined_rows和handler_read_next并不能反应索引扫描
的行数，这两个变量如果是全表扫描或完全索引索引（无ICP），而是实际扫描的行数，如果有
ICP，则是索引访问返回的行数。

实际上examined_rows MySQL设计的是server层扫描的行数，所以有ICP的也是包含ICP的结果行数
，非索引真正在stroage层扫描行数，其实大部分时候关心的是扫描索引的真实行数，MySQL这么
设计，那么examined_rows的作用就小很多，因为大部分情况下，隐藏了真实索引扫描行数。

阅读(5128) | 评论(0) | 转发(0) |

上一篇：ORACLE vs MySQL 对组合索引包含IN LIST执行计划研究(ORACLE部分)_PART1

下一篇：ORACLE CBO 默认选择率及其问题解决方案_PART1

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6