Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1364772
  • 博文数量: 172
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 3831
  • 用 户 组: 普通用户
  • 注册时间: 2011-03-30 13:00
个人简介

About me:Oracle ACE pro,optimistic,passionate and harmonious. Focus on ORACLE,MySQL and other database programming,peformance tuning,db design, j2ee,Linux/AIX,Architecture tech,etc

文章分类

全部博文(172)

文章存档

2024年(27)

2023年(28)

2022年(43)

2020年(62)

2014年(3)

2013年(9)

分类: Mysql/postgreSQL

2024-05-13 17:31:18

上一篇:ORACLE vs MySQL 对组合索引包含IN LIST执行计划研究(ORACLE部分)_PART1
本文主要研究下组合索引包含in条件(多个值),在单表查询,关联查询这两种SQL查询结果在ORACLE和MySQL里的区别。
ORACLE具有强大的优化器,一般来说,组合索引在ORACLE里不管是单表还是关联查询,只要符合leftmost prefix规则,都可以用于index access,只要统计信息等是准确的。
MySQL的优化器相对来说,要弱不少,很多功能不够健全,单表对于组合索引包含IN的处理貌似没有什么问题,但是JOIN情况下,
包含IN LIST组合索引的表是被驱动表,则问题比较严重,卖个关子,详细见本文MySQL部分讲述。

下面看下MySQL部分:

2.MySQL里组合索引有IN条件的情况


2.1)MySQL单表组合索引IN LIST情况:
MySQL貌似没有inlist iterator算子,能否实现INLIST iterator效果:
建表语句如下:

点击(此处)折叠或打开

  1. drop table t1;
  2. drop table t2;
  3. create table t1 as select * from employees;
  4. alter table t1 add status int default 0;
  5. update t1 set status=ceil(rand()*10);
  6. create table t2 as select * from t1;
t1,t2表结构:

点击(此处)折叠或打开

  1. t1,t2表结构:
  2. +------------+---------------+------+-----+---------+-------+
  3. | Field | Type | Null | Key | Default | Extra |
  4. +------------+---------------+------+-----+---------+-------+
  5. | emp_no | int | NO | | NULL | |
  6. | birth_date | date | NO | | NULL | |
  7. | first_name | varchar(14) | NO | | NULL | |
  8. | last_name | varchar(16) | NO | | NULL | |
  9. | gender | enum('M','F') | NO | | NULL | |
  10. | hire_date | date | NO | | NULL | |
  11. | status | int | YES | | 0 | |
  12. +------------+---------------+------+-----+---------+-------+
创建索引,将选择性好的放前面:
create index idx1_t1 on t1(first_name,status);



###对于MySQL,看组合索引用了几个列参与索引access,可以通过索引列key_len查看,
也可以通过树形执行计划sing idx1_t1 over 有几个列组合查看。
或者可以查看status变量handler确定索引扫描区间数:
Handler_read_key和Handler_read_next,
Handler_read_next是按照索引顺序扫描下一行的次数,也可以代表回表次数(比如有icp可以减少这
个值),
所以Handler_read_next也不能判断索引实际扫描的行数,而是索引扫描结果行数。

看下执行计划:有Using index condition,说明第二个列status in (1,2,8)还是ICP条件,
索引access用到几个列,可以通过key_len看:
first_name的key_len=14*3+2 (变长的+2,utf8mb3是3个字节)=44,
status是int型,对应的key_len=4+1(允许null+1)=5
可以看到执行计划的key_len=49,说明两个条件都用到。


explain select * from t1 
where t1.first_name='Saniya' and  t1.status in (1,2,8);

点击(此处)折叠或打开

  1. +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+
  2. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
  3. +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+
  4. | 1 | SIMPLE | t1 | NULL | range | idx1_t1 | idx1_t1 | 49 | NULL | 78 | 100.00 | Using index condition |
  5. +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+

从树形计划上看:using idx1_t1 over 转为两个or分支,也就是2个区间,说明两个条件都用到,
对应的还有icp,不过这个icp貌似作用不大:

点击(此处)折叠或打开

  1. explain format=tree
  2. select * from t1
  3. where t1.first_name='Saniya' and t1.status in (1,2,8)\G
  4. *************************** 1. row ***************************
  5. EXPLAIN: -> Index range scan on t1 using idx1_t1 over
  6. (first_name = 'Saniya' AND status = 1) OR (first_name = 'Saniya' AND status = 2) OR (first_name = 'Saniya' AND status = 8),
  7. with index condition: ((t1.first_name = 'Saniya') and (t1.`status` in (1,2,8))) (cost=42.35 rows=78)


可以查看实际执行信息:

点击(此处)折叠或打开

  1. flush status;
  2. explain analyze
  3. select * from t1
  4. where t1.first_name='Saniya' and t1.status in (1,2,8)\G
  5. *************************** 1. row ***************************
  6. EXPLAIN: -> Index range scan on t1 using idx1_t1 over (first_name = 'Saniya' AND status = 1)
  7. OR (first_name = 'Saniya' AND status = 2) OR (first_name = 'Saniya' AND status = 8),
  8. with index condition: ((t1.first_name = 'Saniya') and (t1.`status` in (1,2,8)))
  9. (cost=42.35 rows=78) (actual time=0.038..0.668 rows=78 loops=1)

通过show status查看handler信息,可以看到Handler_read_key=3,对应三个索引区间。
说明first_name和status都参与索引扫描,执行计划里显示的icp基本用不着,这时候
Handler_read_next=扫描的行数。

点击(此处)折叠或打开

  1. show status like '%handler%';
  2. +----------------------------+-------+
  3. | Variable_name | Value |
  4. +----------------------------+-------+
  5. | Handler_commit | 1 |
  6. | Handler_delete | 0 |
  7. | Handler_discover | 0 |
  8. | Handler_external_lock | 2 |
  9. | Handler_mrr_init | 0 |
  10. | Handler_prepare | 0 |
  11. | Handler_read_first | 0 |
  12. | Handler_read_key | 3 |
  13. | Handler_read_last | 0 |
  14. | Handler_read_next | 78 |
  15. | Handler_read_prev | 0 |
  16. | Handler_read_rnd | 0 |
  17. | Handler_read_rnd_next | 0 |
  18. | Handler_rollback | 0 |
  19. | Handler_savepoint | 0 |
  20. | Handler_savepoint_rollback | 0 |
  21. | Handler_update | 0 |
  22. | Handler_write | 0 |
  23. +----------------------------+-------+

  24. select ps_thread_id(connection_id());
  25. +-------------------------------+
  26. | ps_thread_id(connection_id()) |
  27. +-------------------------------+
  28. | 50 |
  29. +-------------------------------+

  30. 通过检查examined_rows=78行,也符合索引扫描行数,因为使用explain analyze,rows_sent=1:
  31. +-----------+---------------+-----------+---------------+
  32. | THREAD_ID | ROWS_AFFECTED | ROWS_SENT | ROWS_EXAMINED |
  33. +-----------+---------------+-----------+---------------+
  34. | 50 | 0 | 1 | 78 |
  35. +-----------+---------------+-----------+---------------+

下面将索引顺序换一下,貌似都和in非前导列的一样,都有ICP,都是改成or条件(实际上这个icp没
有啥用filtered=100):
将选择性不好的列放前面:
create index idx2_t1 on t1(status,first_name);


通过key_len和树形计划的using idx2_t1 over,可以看到两个列条件都用到:
现在让两个列都是in,看是否能转为2*3=6个索引区间:

点击(此处)折叠或打开

  1. explain select/*+index(t1 idx2_t1)*/ * from t1
  2. where t1.first_name in('Saniya','Aamer') and t1.status in (1,2,8);
  3. +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+
  4. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
  5. +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+
  6. | 1 | SIMPLE | t1 | NULL | range | idx2_t1 | idx2_t1 | 49 | NULL | 144 | 100.00 | Using index condition |
  7. +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+

  8. select count(*) from t1 where status in (1,2,8);
  9. +----------+
  10. | count(*) |
  11. +----------+
  12. | 89751 |
  13. +----------+

  14. select count(*) from t1
  15. where t1.first_name in('Saniya','Aamer')
  16. and t1.status in (1,2,8);
  17. +----------+
  18. | count(*) |
  19. +----------+
  20. | 144 |
  21. +----------+

实际执行看:

点击(此处)折叠或打开

  1. flush status;
  2. explain analyze
  3. select/*+index(t1 idx2_t1)*/ * from t1
  4. where t1.first_name in('Saniya','Aamer') and t1.status in (1,2,8)\G
  5. *************************** 1. row ***************************
  6. EXPLAIN: -> Index range scan on t1 using idx2_t1 over
  7. (status = 1 AND first_name = 'Aamer') OR (status = 1 AND first_name = 'Saniya')
  8. OR (4 more), with index condition: ((t1.first_name in ('Saniya','Aamer')) and (t1.`status` in (1,2,8)))
  9. (cost=78.34 rows=144) (actual time=0.055..2.067 rows=144 loops=1)
执行计划里转为的or超过3个后面变为or (n more)


可以看到转为6个扫描区间:Handler_read_key=6,
icp的实际用不上,所以两个列都是index access,对应Handler_read_next和ROWS_EXAMINED都是
扫描的行数144:

点击(此处)折叠或打开

  1. show status like '%handler%';
  2. +----------------------------+-------+
  3. | Variable_name | Value |
  4. +----------------------------+-------+
  5. | Handler_commit | 1 |
  6. | Handler_delete | 0 |
  7. | Handler_discover | 0 |
  8. | Handler_external_lock | 2 |
  9. | Handler_mrr_init | 0 |
  10. | Handler_prepare | 0 |
  11. | Handler_read_first | 0 |
  12. | Handler_read_key | 6 |
  13. | Handler_read_last | 0 |
  14. | Handler_read_next | 144 |
  15. | Handler_read_prev | 0 |
  16. | Handler_read_rnd | 0 |
  17. | Handler_read_rnd_next | 0 |
  18. | Handler_rollback | 0 |
  19. | Handler_savepoint | 0 |
  20. | Handler_savepoint_rollback | 0 |
  21. | Handler_update | 0 |
  22. | Handler_write | 0 |
  23. +----------------------------+-------+

  24. select THREAD_ID,ROWS_AFFECTED,ROWS_SENT,ROWS_EXAMINED from performance_schema.events_statements_current where thread_id=50;
  25. +-----------+---------------+-----------+---------------+
  26. | THREAD_ID | ROWS_AFFECTED | ROWS_SENT | ROWS_EXAMINED |
  27. +-----------+---------------+-----------+---------------+
  28. | 50 | 0 | 1 | 144 |
  29. +-----------+---------------+-----------+---------------+



总结:MySQL单表访问,组合索引in不管是否是前导列,都能按照leftmost prefix规则,转为or,然后
转为多个索引区间(interval),
能够充分利用组合索引条件进行index access,执行计划里虽然有icp,基本用不着。
MySQL没有inlist iterator算子,可以将in list转为多个区间从而实现单表访问inlist iterator效果。




2.2)多表join,inlist表作为被驱动表
看能否实现和单表一样的效果


先给t2的last_name建个索引,让t2做驱动表,t1被驱动:
create index idx_t2 on t2(last_name);




MySQL组合索引有IN的,如果IN里只有一个值,和ORACLE一样,也会转为等值,则组合索引都是
等值的,可以直接access


MySQL和ORACLE对于组合索引中条件in有多个值的的处理貌似不一样:


###MySQL没有index inlist iterator算子,如果组合索引前导列是in并且有多个值,则不能转为多个
等值的组合条件,
这样只能in的列作为access,后续列作为icp,如果关联查询且被驱动走NL,则in的列作为单独条件,
作为索引access效率则可能很低,
所以在MySQL里,关联列作为索引前导列较好,in的作为非前导列(用于icp条件,可能有bug将in
条件放到回表FILTER,则可能非常影响效率)。


1)不管in是否多个值,列是否是前导列,则都有icp
2)如果in的是非前导列,则in作为icp条件,这个和ORACLE一样
3)如果in是前导列,则in条件列对应的后续列作为icp条件,这个和ORACLE不一样,ORACLE有inlist
iterator算子,
  将in转为or,变为多个索引等值条件,然后避免icp,MySQL没有inlist interator算子,in作为前导列
在关联查询里效率可能很低,将关联列作为前导列较好。


###在MySQL里in条件避免不了icp,所以关联查询的被驱动表,将关联列作为索引前导列,这样
可以access条件,in条件列作为非前导列使用ICP,
如果将in条件作为前导列,则关联列只能用于ICP,可能效率很低。

驱动表730行,被驱动表循环730次:

点击(此处)折叠或打开

  1. select count(*) from t2 where last_name like 'Ar%';
  2. +----------+
  3. | count(*) |
  4. +----------+
  5. | 730 |
  6. +----------+


###1.先测试关联列是前导列,然后+in list组合索引,走nested loops,t1被驱动:

点击(此处)折叠或打开

  1. explain select * from t1,t2
  2. where t1.first_name=t2.first_name and t1.status in (1,3,8)
  3. and t2.last_name like 'Ar%';
通过执行计划看,多表JOIN和单表不一样,t1被驱动,走idx1_t1(first_name,status),对应的
key_len是44,不是49(两个列都用上是49),
可以看到filtered=59.26,也就是status in (1,3,8)起类似过滤效果是真正ICP,过滤了接近40%的
数据,这个和ORACLE的多表JOIN不一样,
在MySQL里貌似多表JOIN的索引有in list,不能让多列都走index access:


点击(此处)折叠或打开

  1. +----+-------------+-------+------------+-------+-----------------+---------+---------+-------------------------+------+----------+-----------------------+
  2. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
  3. +----+-------------+-------+------------+-------+-----------------+---------+---------+-------------------------+------+----------+-----------------------+
  4. | 1 | SIMPLE | t2 | NULL | range | idx_t2 | idx_t2 | 50 | NULL | 730 | 100.00 | Using index condition |
  5. | 1 | SIMPLE | t1 | NULL | ref | idx2_t1,idx1_t1 | idx1_t1 | 44 | employees.t2.first_name | 234 | 59.26 | Using index condition |
  6. +----+-------------+-------+------------+-------+-----------------+---------+---------+-------------------------+------+----------+-----------------------+
  7. 2 rows in set, 1 warning (0.00 sec)

  8. flush status;
  9. explain analyze
  10. select * from t1,t2
  11. where t1.first_name=t2.first_name and t1.status in (1,3,8)
  12. and t2.last_name like 'Ar%'\G
耗时0.22s,索引访问只能用到first_name,这里的first_name选择性比较好,这样status条件走不了
索引其实差别不大
改成union all效果不好,所以将关联列作为前导列较好,这样可以用到前导列索引

show status like '%handler%';

Handler_read_key=驱动表索引区间1+被驱动表730次(每次一个区间)=731,
Handler_read_next这时候就不是索引真正扫描的次数了,因为被驱动表t1只走first_name列索引
扫描,还有icp:status in (1,3,8),
这时候只是结果行数,相应的examined_rows也一样,这时候都是结果行数,结果行数包括驱动
表t2和t1的索引结果行数。

点击(此处)折叠或打开

  1. (root@localhost) [employees_15:21:17]> show status like '%handler%';
  2. +----------------------------+-------+
  3. | Variable_name | Value |
  4. +----------------------------+-------+
  5. | Handler_commit | 1 |
  6. | Handler_delete | 0 |
  7. | Handler_discover | 0 |
  8. | Handler_external_lock | 4 |
  9. | Handler_mrr_init | 0 |
  10. | Handler_prepare | 0 |
  11. | Handler_read_first | 0 |
  12. | Handler_read_key | 731 |
  13. | Handler_read_last | 0 |
  14. | Handler_read_next | 52626 |
  15. | Handler_read_prev | 0 |
  16. | Handler_read_rnd | 0 |
  17. | Handler_read_rnd_next | 0 |
  18. | Handler_rollback | 0 |
  19. | Handler_savepoint | 0 |
  20. | Handler_savepoint_rollback | 0 |
  21. | Handler_update | 0 |
  22. | Handler_write | 0 |
  23. +----------------------------+-------+

  24. select THREAD_ID,ROWS_AFFECTED,ROWS_SENT,ROWS_EXAMINED from performance_schema.events_statements_current where thread_id=50;
  25. +-----------+---------------+-----------+---------------+
  26. | THREAD_ID | ROWS_AFFECTED | ROWS_SENT | ROWS_EXAMINED |
  27. +-----------+---------------+-----------+---------------+
  28. | 50 | 0 | 1 | 52626 |
  29. +-----------+---------------+-----------+---------------+
  30. 1 row in set (0.01 sec)

  31. flush status;
  32. explain analyze
  33. select * from t1,t2
  34. where t1.first_name=t2.first_name and t1.status in (1)
  35. and t2.last_name like 'Ar%'\G

如果in只有一个值,转为=,则访问条件是using idx1_t1 (first_name=t2.first_name, status=1),
没有问题:

点击(此处)折叠或打开

  1. *************************** 1. row ***************************
  2. EXPLAIN: -> Nested loop inner join (cost=7653.14 rows=17026) (actual time=0.078..67.316 rows=17234 loops=1)
  3.     -> Index range scan on t2 using idx_t2 over ('Ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'Ar??????????????'),
  4.      with index condition: (t2.last_name like 'Ar%') (cost=328.76 rows=730) (actual time=0.049..6.793 rows=730 loops=1)
  5.     -> Index lookup on t1 using idx1_t1 (first_name=t2.first_name, status=1)
  6.     (cost=7.70 rows=23) (actual time=0.009..0.081 rows=24 loops=730)

  7. 普通执行计划没有icp,key_len=49,两个列都用上:
  8. +----+-------------+-------+------------+-------+-------------------------+---------+---------+-------------------------------+------+----------+-----------------------+
  9. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
  10. +----+-------------+-------+------------+-------+-------------------------+---------+---------+-------------------------------+------+----------+-----------------------+
  11. | 1 | SIMPLE | t2 | NULL | range | idx_t2 | idx_t2 | 50 | NULL | 730 | 100.00 | Using index condition |
  12. | 1 | SIMPLE | t1 | NULL | ref | idx2_t1,idx1_t1,idx3_t1 | idx1_t1 | 49 | employees.t2.first_name,const | 23 | 100.00 | NULL |
  13. +----+-------------+-------+------------+-------+-------------------------+---------+---------+-------------------------------+------+----------+-----------------------+

  14. 扫描的行数没有问题,被驱动t1的真实rows=24,这个是平均值。 730*24=17520略大于真实行数,然后+驱动表700+行,实际扫描17964行。
  15. show status like 'handle%';
  16. +----------------------------+-------+
  17. | Variable_name | Value |
  18. +----------------------------+-------+
  19. | Handler_commit | 1 |
  20. | Handler_delete | 0 |
  21. | Handler_discover | 0 |
  22. | Handler_external_lock | 4 |
  23. | Handler_mrr_init | 0 |
  24. | Handler_prepare | 0 |
  25. | Handler_read_first | 0 |
  26. | Handler_read_key | 731 |
  27. | Handler_read_last | 0 |
  28. | Handler_read_next | 17964 |
  29. | Handler_read_prev | 0 |
  30. | Handler_read_rnd | 0 |
  31. | Handler_read_rnd_next | 0 |
  32. | Handler_rollback | 0 |
  33. | Handler_savepoint | 0 |
  34. | Handler_savepoint_rollback | 0 |
  35. | Handler_update | 0 |
  36. | Handler_write | 0 |
  37. +----------------------------+-------+

  38. select THREAD_ID,ROWS_AFFECTED,ROWS_SENT,ROWS_EXAMINED from performance_schema.events_statements_current where thread_id=50;
  39. +-----------+---------------+-----------+---------------+
  40. | THREAD_ID | ROWS_AFFECTED | ROWS_SENT | ROWS_EXAMINED |
  41. +-----------+---------------+-----------+---------------+
  42. | 50 | 0 | 1 | 17964 |
  43. +-----------+---------------+-----------+---------------+

###2:测试in list列是前导列,关联条件列非前导列,被驱动走索引
先删除first_name前导列索引:
drop index idx1_t1 on t1;


单个值同样没有问题:using idx2_t1 (status=1, first_name=t2.first_name).

点击(此处)折叠或打开

  1. flush status;
  2. explain analyze
  3. select * from t1,t2
  4. where t1.first_name=t2.first_name and t1.status in (1)
  5. and t2.last_name like 'Ar%'\G

  6. *************************** 1. row ***************************
  7. EXPLAIN: -> Nested loop inner join (cost=7741.93 rows=17233) (actual time=0.097..77.225 rows=17234 loops=1)
  8.     -> Index range scan on t2 using idx_t2 over ('Ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'Ar??????????????'), with index condition: (t2.last_name like 'Ar%') (cost=328.76 rows=730) (actual time=0.053..12.567 rows=730 loops=1)
  9.     -> Index lookup on t1 using idx2_t1 (status=1, first_name=t2.first_name) (cost=7.80 rows=24) (actual time=0.011..0.086 rows=24 loops=730)


  10. 前导列是in有多个值,关联列是非前导列,走了hash join,0.22s:
  11. explain analyze
  12. select * from t1,t2
  13. where t1.first_name=t2.first_name and t1.status in (1,3,8)
  14. and t2.last_name like 'Ar%'\G

  15. *************************** 1. row ***************************
  16. EXPLAIN: -> Inner hash join (t1.first_name = t2.first_name) (cost=1306095.83 rows=763905) (actual time=6.099..329.593 rows=51896 loops=1)
  17.     -> Filter: (t1.`status` in (1,3,8)) (cost=25.42 rows=17657) (actual time=0.029..281.806 rows=89911 loops=1)
  18.         -> Table scan on t1 (cost=25.42 rows=297939) (actual time=0.019..258.542 rows=300024 loops=1)
  19.     -> Hash
  20.         -> Index range scan on t2 using idx_t2 over ('Ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'Ar??????????????'),
  21.         with index condition: (t2.last_name like 'Ar%') (cost=328.76 rows=730) (actual time=0.035..5.303 rows=730 loops=1)

  22. explain analyze树形计划看的cost=1306095,和json,实际估算的costs不一样:12906876,json的和普通计划的一样

  23. show status like '%last_query_cost%';
  24. +-----------------+-----------------+
  25. | Variable_name | Value |
  26. +-----------------+-----------------+
  27. | Last_query_cost | 12906876.294155 |
  28. +-----------------+-----------------+
  29. EXPLAIN: {
  30.   "query_block": {
  31.     "select_id": 1,
  32.     "cost_info": {
  33.       "query_cost": "12906876.29"
  34.     },

禁用hash join,竟然t1被驱动走全表,没有选择组合索引,说明走索引效率低cost大,
因为删除了idx1_t1(first_name,status)索引,这里的关联列没有索引,而是在status列上有前导列
索引,
MySQL基本不选择t1走索引,可以通过possible_keys看出,很显然,这里的cost计算不准才选择
这种不走索引的计划:   
"query_cost": "22313643.61"     < 走idx2_t1索引的68339620.40

点击(此处)折叠或打开

  1. explain
  2. select/*+no_bnl(t1,t2) */ * from t1,t2
  3. where t1.first_name=t2.first_name and t1.status in (1,3,8)
  4. and t2.last_name like 'Ar%';

  5. +----+-------------+-------+------------+-------+---------------+--------+---------+------+--------+----------+-----------------------+
  6. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
  7. +----+-------------+-------+------------+-------+---------------+--------+---------+------+--------+----------+-----------------------+
  8. | 1 | SIMPLE | t2 | NULL | range | idx_t2 | idx_t2 | 50 | NULL | 730 | 100.00 | Using index condition |
  9. | 1 | SIMPLE | t1 | NULL | ALL | idx2_t1 | NULL | NULL | NULL | 297939 | 5.93 | Using where |
  10. +----+-------------+-------+------------+-------+---------------+--------+---------+------+--------+----------+-----------------------+

通过optimizer_trace查看:
先计算单表走索引按照status查询的cost:

点击(此处)折叠或打开

  1. "analyzing_range_alternatives": {
  2.                     "range_scan_alternatives": [
  3.                       {
  4.                         "index": "idx2_t1",
  5.                         "ranges": [
  6.                           "status = 1",
  7.                           "status = 3",
  8.                           "status = 8"
  9.                         ],
  10.                         "index_dives_for_eq_ranges": true,
  11.                         "rowid_ordered": false,
  12.                         "using_mrr": false,
  13.                         "index_only": false,
  14.                         "in_memory": 0.111617,
  15.                         "rows": 176572,
  16.                         "cost": 75958.3,
  17.                         "chosen": false,
  18.                         "cause": "cost"
  19.                       }

MySQL没有考虑t1走idx2_t1索引,直接按照全表扫描来,没有考虑走nested loops情况下,可以
走组合索引选择率,比前面计算的只走status要小很多的情况,
所以直接走全表扫描,也没有考虑hash join:  

点击(此处)折叠或打开

  1. "cost_for_plan": 328.76,
  2.                 "rest_of_plan": [
  3.                   {
  4.                     "plan_prefix": [
  5.                       "`t2`"
  6.                     ],
  7.                     "table": "`t1`",
  8.                     "best_access_path": {
  9.                       "considered_access_paths": [
  10.                         {
  11.                           "rows_to_scan": 297939,
  12.                           "filtering_effect": [
  13.                           ],
  14.                           "final_filtering_effect": 0.592645,
  15.                           "access_type": "scan",
  16.                           "resulting_rows": 176572,
  17.                           "cost": 2.23133e+07,
  18.                           "chosen": true
  19.                         }
  20.                       ]
  21.                     },

所以t2作为被驱动表,如果关联列不是前导列,则必须要有前导列且等值条件(in一个值也转为
等值),才可以让t2走nested loops,走索引。


强制走索引,可以看到rows=176572,filtered=10,key_len=5,索引效率很低,只走了status列
索引,first_name是ICP:
json显示的执行 "query_cost": "68339620.40",走索引的cost大,这个计算不准,所以默认走了
全表

点击(此处)折叠或打开

  1. explain
  2. select/*+no_bnl(t1,t2) index(t1 idx2_t1) */ * from t1,t2
  3. where t1.first_name=t2.first_name and t1.status in (1,3,8)
  4. and t2.last_name like 'Ar%';
  5. +----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-----------------------+
  6. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
  7. +----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-----------------------+
  8. | 1 | SIMPLE | t2 | NULL | range | idx_t2 | idx_t2 | 50 | NULL | 730 | 100.00 | Using index condition |
  9. | 1 | SIMPLE | t1 | NULL | range | idx2_t1 | idx2_t1 | 5 | NULL | 176572 | 10.00 | Using index condition |
  10. +----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-----------------------+

explain analyze结果:执行耗时16s

点击(此处)折叠或打开

  1. explain analyze select/*+no_bnl(t1,t2) index(t1 idx2_t1) */ * from t1,t2
  2. where t1.first_name=t2.first_name and t1.status in (1,3,8)
  3. and t2.last_name like 'Ar%'\G
  4. *************************** 1. row ***************************
  5. EXPLAIN: -> Nested loop inner join (cost=16180534643.87 rows=128897560) (actual time=8.275..16590.253 rows=51896 loops=1)
  6.     -> Index range scan on t2 using idx_t2 over ('Ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'Ar??????????????'), with index condition: (t2.last_name like 'Ar%') (cost=328.76 rows=730) (actual time=0.083..8.235 rows=730 loops=1)
  7.     -> Index range scan on t1 using idx2_t1 over (status = 1) OR (status = 3) OR (status = 8), with index condition: ((t1.first_name = t2.first_name) and (t1.`status` in (1,3,8))) (cost=75958.29 rows=176572) (actual time=3.688..22.707 rows=71 loops=730)
  8.     

  9. 1 row in set (16.62 sec)


可以看到,不走关联列索引,走status in条件索引,效率比前面走first_name列索引差很多,因为
status列索引选择率不好。
可以看到,前导列in list+非前导列关联的,也不能全部走index access,和单表不一样。


这种前导列IN LIST且是非关联列效率很低的,既然in list走不了多列索引扫描,那么改成等值就
可以了,
改为UNION ALL提升很明显,这样可以利用组合索引进行扫描:执行时间从16s到0.29s:

点击(此处)折叠或打开

  1. explain analyze
  2. select/*+index(t1 idx2_t1)*/ * from t1,t2
  3. where t1.first_name=t2.first_name and t1.status in (1)
  4. and t2.last_name like 'Ar%'
  5. union all
  6. select/*+index(t1 idx2_t1)*/ * from t1,t2
  7. where t1.first_name=t2.first_name and t1.status in (3)
  8. and t2.last_name like 'Ar%'
  9. union all
  10. select/*+index(t1 idx2_t1)*/ * from t1,t2
  11. where t1.first_name=t2.first_name and t1.status in (8)
  12. and t2.last_name like 'Ar%'\G
  13. *************************** 1. row ***************************
  14. EXPLAIN: -> Append (actual time=0.064..261.577 rows=51896 loops=1)
  15.     -> Stream results (cost=7741.93 rows=17233) (actual time=0.063..89.491 rows=17234 loops=1)
  16.         -> Nested loop inner join (cost=7741.93 rows=17233) (actual time=0.056..64.456 rows=17234 loops=1)
  17.             -> Index range scan on t2 using idx_t2 over ('Ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'Ar??????????????'), with index condition: (t2.last_name like 'Ar%') (cost=328.76 rows=730) (actual time=0.035..3.081 rows=730 loops=1)
  18.             -> Index lookup on t1 using idx2_t1 (status=1, first_name=t2.first_name) (cost=7.80 rows=24) (actual time=0.006..0.082 rows=24 loops=730)
  19.     -> Stream results (cost=7741.93 rows=17233) (actual time=0.048..83.117 rows=17265 loops=1)
  20.         -> Nested loop inner join (cost=7741.93 rows=17233) (actual time=0.044..59.954 rows=17265 loops=1)
  21.             -> Index range scan on t2 using idx_t2 over ('Ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'Ar??????????????'), with index condition: (t2.last_name like 'Ar%') (cost=328.76 rows=730) (actual time=0.031..2.674 rows=730 loops=1)
  22.             -> Index lookup on t1 using idx2_t1 (status=3, first_name=t2.first_name) (cost=7.80 rows=24) (actual time=0.006..0.077 rows=24 loops=730)
  23.     -> Stream results (cost=7741.93 rows=17233) (actual time=0.043..85.437 rows=17397 loops=1)
  24.         -> Nested loop inner join (cost=7741.93 rows=17233) (actual time=0.039..61.487 rows=17397 loops=1)
  25.             -> Index range scan on t2 using idx_t2 over ('Ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'Ar??????????????'), with index condition: (t2.last_name like 'Ar%') (cost=328.76 rows=730) (actual time=0.028..2.909 rows=730 loops=1)
  26.             -> Index lookup on t1 using idx2_t1 (status=8, first_name=t2.first_name) (cost=7.80 rows=24) (actual time=0.006..0.078 rows=24 loops=730)

  27. 1 row in set (0.29 sec)
###将索引列in list里值用with构造中间表,然后关联,这样先与驱动表做无条件join,然后驱动
in list表(实际上扩大了驱动表行数*inlist数目),
将in list转为等值条件,这样充分利用组合索引的多列做index access提高效率(主要是索引前导
列效率不好的时候使用)。


想一种方法怎么避免UNION ALL,很显然IN LIST里三个数字(1,3,8)可以用with构造个临时表,
然后JOIN,效率变成0.19s:


点击(此处)折叠或打开

  1. explain analyze
  2. with tmp as
  3. (select 1 status union all select 3 status union all select 8 status)
  4. select/*+no_bnl(t1,t2) index(t1 idx2_t1) */ * from t1,t2,tmp
  5. where t1.first_name=t2.first_name and t1.status = tmp.status
  6. and t2.last_name like 'Ar%'\G

  7. EXPLAIN: -> Nested loop inner join (cost=22790.57 rows=51698) (actual time=3.096..171.256 rows=51896 loops=1)
  8.     -> Inner hash join (no condition) (cost=551.06 rows=2190) (actual time=3.049..3.937 rows=2190 loops=1)
  9.         -> Table scan on tmp (cost=1.15..2.84 rows=3) (actual time=0.025..0.031 rows=3 loops=1)
  10.             -> Materialize union CTE tmp (cost=0.30..0.30 rows=3) (actual time=0.022..0.022 rows=3 loops=1)
  11.                 -> Rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
  12.                 -> Rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
  13.                 -> Rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
  14.         -> Hash
  15.             -> Index range scan on t2 using idx_t2 over ('Ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'Ar??????????????'), with index condition: (t2.last_name like 'Ar%') (cost=328.76 rows=730) (actual time=0.046..2.756 rows=730 loops=1)
  16.     -> Index lookup on t1 using idx2_t1 (status=tmp.`status`, first_name=t2.first_name), with index condition: (t1.`status` = tmp.`status`) (cost=7.80 rows=24) (actual time=0.008..0.075 rows=24 loops=2190)
  17. 1 row in set (0.19 sec)

同样的原来走idx1_t1的以first_name为前导列的也可以改成两个列都走索引,只不过因为
first_name选择性本来就够好,所以效率差别不大:基本都是0.2s左右:


点击(此处)折叠或打开

  1. explain analyze
  2. with tmp as
  3. (select 1 status union all select 3 status union all select 8 status)
  4. select/*+no_bnl(t1,t2) index(t1 idx1_t1) */ * from t1,t2,tmp
  5. where t1.first_name=t2.first_name and t1.status = tmp.status
  6. and t2.last_name like 'Ar%'\G

  7. *************************** 1. row ***************************
  8. EXPLAIN: -> Nested loop inner join (cost=22524.20 rows=51079) (actual time=2.879..171.640 rows=51896 loops=1)
  9.     -> Inner hash join (no condition) (cost=551.06 rows=2190) (actual time=2.838..3.779 rows=2190 loops=1)
  10.         -> Table scan on tmp (cost=1.15..2.84 rows=3) (actual time=0.049..0.055 rows=3 loops=1)
  11.             -> Materialize union CTE tmp (cost=0.30..0.30 rows=3) (actual time=0.046..0.046 rows=3 loops=1)
  12.                 -> Rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
  13.                 -> Rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
  14.                 -> Rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
  15.         -> Hash
  16.             -> Index range scan on t2 using idx_t2 over ('Ar\0\0\0\0\0\0\0\0\0\0\0\0\0\0' <= last_name <= 'Ar??????????????'), with index condition: (t2.last_name like 'Ar%') (cost=328.76 rows=730) (actual time=0.045..2.543 rows=730 loops=1)
  17.     -> Index lookup on t1 using idx1_t1 (first_name=t2.first_name, status=tmp.`status`), with index condition: (t1.`status` = tmp.`status`) (cost=7.70 rows=23) (actual time=0.008..0.075 rows=24 loops=2190)

  18. 1 row in set (0.20 sec)

总结:
MySQL里组合索引,如果有in多个值的,对于单表,可以转为or多个索引区间进行索引扫描,索引
扫描可以用上inlist列条件。

但是对于多表JOIN的,如果被驱动表组合索引有in条件:
1)如果关联列是前导列且等值的,index access则是使用关联列,in非前导列多个值的,只能是icp
     ,这种关联列选择性好,一般问题不大
2) 如果关联列是非前导列,前导列是in条件,如果前导列in list选择性不好,一般走hash join,
    如果强制走索引,则走前导列in的,关联列则走icp,效率一般比关联列是前导列的差很多。
    针对这种,要么重新选择或建立以关联列(前提关联列选择性好)的索引,如果关联列选择
    性不好,组合索引选择性好,可以引入中间表,将in list里面的值用with构造为临时表,然后
    与原表关联,这样执行计划可以将构造的中间表先与驱动表关联(无条件JOIN,实际上扩大
    了驱动表行数*inlist数目), 然后驱动in list表,这样in条件转为等值的(循环执行),则组
    合索引列都能用于index access。
         
所以对于多表JOIN,子查询等,包含in条件的,in条件不要做前导列,而将关联列作为前导列(
前提选择性得好),这样可以利用关联列做NESTED LOOPS扫描,in条件做icp,如果关联条件
选择性不好,可以使用with构造将in构造为临时表然后关联。
(效果没有ORACLE的inlist iterator算子好)。
      
另外注意一下,对于索引访问有icp的,examined_rows和handler_read_next并不能反应索引扫描
的行数,这两个变量如果是全表扫描或完全索引索引(无ICP),而是实际扫描的行数,如果有
ICP,则是索引访问返回的行数。

实际上examined_rows MySQL设计的是server层扫描的行数,所以有ICP的也是包含ICP的结果行数
,非索引真正在stroage层扫描行数,其实大部分时候关心的是扫描索引的真实行数,MySQL这么
设计,那么examined_rows的作用就小很多,因为大部分情况下,隐藏了真实索引扫描行数

        
阅读(3906) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~