接:
MySQL Subquery Cache vs Oracle Subquery Cache子查询缓存_MySQL部分_PART2
下面研究下ORACLE里的subquery cache.
6.ORACLE具有全面的子查询cache优化功能
ORACLE同样的查询,不管是in还是exists,对于子查询不能unnest(这里使用no_unnest hints),都有subquery cache:
ORACLE里常见的unnest subquery的是标量子查询,UPDATE关联子查询等走FILTER操作的子查询,对应执行计划子查询部分
有系统绑定变量,比如:B1,有缓存优化,如果外层关联key基数小,则子查询执行次数少。
Oracle建表语句:
-
create table subq_t1(a int,b int);
-
insert into subq_t1(a,b)
-
select 1,1
-
from dual
-
connect by level<=100000;
-
-
insert into subq_t1(a,b)
-
select 2,1+level
-
from dual
-
connect by level<=100000;
-
-
insert into subq_t1(a,b)
-
select 3,1+level
-
from dual
-
connect by level<=100000;
-
-
-
create table subq_t2(a int,b int);
-
insert into subq_t2(a,b)
-
select level,level
-
from dual
-
connect by level<=100000;
-
-
create index idx_subq_t2 on subq_t2(a);
-
-
subq_t1表的a只有3个值:
-
select count(*),count(distinct a) from subq_t1;
-
-
COUNT(*) COUNT(DISTINCTA)
-
---------- ----------------
-
300000 3
-
-
subq_t2的a都是不重复的:
-
select count(*),count(distinct a) from subq_t2;
-
-
COUNT(*) COUNT(DISTINCTA)
-
---------- ----------------
-
100000 100000
-
set serveroutput off
-
set autotrace traceonly --不输出结果,太多
-
select/*+gather_plan_statistics*/ *
-
from subq_t1 t1
-
where t1.a in
-
(select/*+no_unnest*/ t2.a from subq_t2 t2);
-
set autotrace off
-
select sql_id,sql_text from v$sql where sql_text like '%gather_plan_statistics%subq_t2%';
-
select * from table(dbms_xplan.display_cursor('0hrqfk46ss2ns',null,'allstats last'));
可以看到走FILTER,虽然subq_t1有30w行,但是子查询只执行3次(实际执行次数依赖subq_t1.a的相同值连续情况)
执行计划ID=3对于谓词有系统绑定变量: 3 - access("T2"."A"=:B1)
-
Plan hash value: 2995694926
-
-
--------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-
--------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 1 | | 300K|00:00:00.17 | 20535 |
-
|* 1 | FILTER | | 1 | | 300K|00:00:00.17 | 20535 |
-
| 2 | TABLE ACCESS FULL| SUBQ_T1 | 1 | 327K| 300K|00:00:00.11 | 20529 |
-
|* 3 | INDEX RANGE SCAN | IDX_SUBQ_T2 | 3 | 1098 | 3 |00:00:00.01 | 6 |
-
--------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter( IS NOT NULL)
-
3 - access("T2"."A"=:B1)
改为exists和in一样,因为在ORACLE里in/exists基本都可以相互转换:
-
set serveroutput off
-
set autotrace traceonly --不输出结果,太多
-
select/*+gather_plan_statistics*/ *
-
from subq_t1 t1
-
where exists
-
(select/*+no_unnest*/ 1 from subq_t2 t2 where t1.a = t2.a);
-
set autotrace off
-
select sql_id,sql_text from v$sql where sql_text like '%gather_plan_statistics%subq_t2%';
-
select * from table(dbms_xplan.display_cursor('f8uyx3hynta6x',null,'allstats last'));
-
-
PLAN_TABLE_OUTPUT
-
----------------------------------------------------------------------------------------------------
-
SQL_ID f8uyx3hynta6x, child number 0
-
-------------------------------------
-
select/*+gather_plan_statistics*/ * from subq_t1 t1 where exists
-
(select/*+no_unnest*/ 1 from subq_t2 t2 where t1.a = t2.a)
-
-
Plan hash value: 2995694926
-
-
--------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-
--------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 1 | | 300K|00:00:00.16 | 20535 |
-
|* 1 | FILTER | | 1 | | 300K|00:00:00.16 | 20535 |
-
| 2 | TABLE ACCESS FULL| SUBQ_T1 | 1 | 327K| 300K|00:00:00.09 | 20529 |
-
|* 3 | INDEX RANGE SCAN | IDX_SUBQ_T2 | 3 | 1098 | 3 |00:00:00.01 | 6 |
-
--------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter( IS NOT NULL)
-
3 - access("T2"."A"=:B1)
oracle标量子查询,也有subquery cache优化:
-
set serveroutput off
-
set autotrace traceonly --不输出结果,太多
-
select/*+gather_plan_statistics*/ t1.a,(select t2.b from subq_t2 t2 where t1.a = t2.a) b
-
from subq_t1 t1;
-
set autotrace off
-
select sql_id,sql_text from v$sql where sql_text like '%gather_plan_statistics%subq_t2%';
-
select * from table(dbms_xplan.display_cursor('cwvtbwcaytnz5',null,'allstats last'));
可以看到ID=1和ID=2只执行3次:
-
Plan hash value: 438047693
-
-
-------------------------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-
-------------------------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 1 | | 300K|00:00:00.09 | 20529 |
-
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| SUBQ_T2 | 3 | 1098 | 3 |00:00:00.01 | 9 |
-
|* 2 | INDEX RANGE SCAN | IDX_SUBQ_T2 | 3 | 439 | 3 |00:00:00.01 | 6 |
-
| 3 | TABLE ACCESS FULL | SUBQ_T1 | 1 | 327K| 300K|00:00:00.09 | 20529 |
-
-------------------------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
2 - access("T2"."A"=:B1)
甚至在ORACLE里semi join/anti join走nested loops的(普通nested loops没有缓存优化),都有缓存优化,可以减少被驱动表回表次数,比如:
1.semi join走nested loops有缓存优化
-
set serveroutput off
-
set autotrace traceonly --不输出结果,太多
-
-
--去掉no_unnest
-
select/*+gather_plan_statistics leading(t1) use_nl(t2@subq) */ *
-
from subq_t1 t1
-
where t1.a in
-
(select/*+qb_name(subq)*/ t2.a from subq_t2 t2);
NESTED LOOPS SEMI,子查询只执行3次:
-
set autotrace off
-
select sql_id,sql_text from v$sql where sql_text like '%gather_plan_statistics%subq_t2%';
-
select * from table(dbms_xplan.display_cursor('cacaags9gcy2a',null,'allstats last'));
-
Plan hash value: 3871264342
-
-
--------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-
--------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 1 | | 300K|00:00:00.18 | 20535 |
-
| 1 | NESTED LOOPS SEMI | | 1 | 327K| 300K|00:00:00.18 | 20535 |
-
| 2 | TABLE ACCESS FULL| SUBQ_T1 | 1 | 327K| 300K|00:00:00.13 | 20529 |
-
|* 3 | INDEX RANGE SCAN | IDX_SUBQ_T2 | 3 | 109K| 3 |00:00:00.01 | 6 |
-
--------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
3 - access("T1"."A"="T2"."A")
2.anti join同样有subquery cache
-
set serveroutput off
-
set autotrace traceonly --不输出结果,太多
-
-
--去掉no_unnest
-
select/*+gather_plan_statistics leading(t1) use_nl(t2@subq) */ *
-
from subq_t1 t1
-
where t1.a not in
-
(select/*+qb_name(subq)*/ t2.a from subq_t2 t2 where t2.a is not null) and t1.a is not null;
-
-
set autotrace off
-
select sql_id,sql_text from v$sql where sql_text like '%gather_plan_statistics%subq_t2%';
-
select * from table(dbms_xplan.display_cursor('fs848pak6turs',null,'allstats last'));
因为有缓存,not in的执行效率非常高,没有找到数据:
-
Plan hash value: 3776200174
-
-
--------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-
--------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 1 | | 0 |00:00:00.05 | 558 |
-
| 1 | NESTED LOOPS ANTI | | 1 | 327K| 0 |00:00:00.05 | 558 |
-
|* 2 | TABLE ACCESS FULL| SUBQ_T1 | 1 | 327K| 300K|00:00:00.06 | 553 |
-
|* 3 | INDEX RANGE SCAN | IDX_SUBQ_T2 | 3 | 1 | 3 |00:00:00.01 | 5 |
-
--------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
2 - filter("T1"."A" IS NOT NULL)
-
3 - access("T1"."A"="T2"."A")
-
filter("T2"."A" IS NOT NULL)
3.普通的nested loops join没有缓存优化,但是有批处理
-
set serveroutput off
-
set autotrace traceonly --不输出结果,太多
-
select/*+gather_plan_statistics leading(t1) use_nl(t2)*/ *
-
from subq_t1 t1,subq_t2 t2
-
where t1.a = t2.a;
-
set autotrace off
-
select sql_id,sql_text from v$sql where sql_text like '%gather_plan_statistics%subq_t2%';
-
select * from table(dbms_xplan.display_cursor('2qa1g6az4hmk8',null,'allstats last'));
两个nested loops就是bulk获取rowid的优化:
-
Plan hash value: 2089944506
-
-
------------------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-
------------------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 1 | | 300K|00:00:00.70 | 80537 |
-
| 1 | NESTED LOOPS | | 1 | 327K| 300K|00:00:00.70 | 80537 |
-
| 2 | NESTED LOOPS | | 1 | 327K| 300K|00:00:00.47 | 60536 |
-
| 3 | TABLE ACCESS FULL | SUBQ_T1 | 1 | 327K| 300K|00:00:00.06 | 20529 |
-
|* 4 | INDEX RANGE SCAN | IDX_SUBQ_T2 | 300K| 1 | 300K|00:00:00.25 | 40007 |
-
| 5 | TABLE ACCESS BY INDEX ROWID| SUBQ_T2 | 300K| 1 | 300K|00:00:00.14 | 20001 |
-
------------------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
4 - access("T1"."A"="T2"."A")
在Oracle里,subquery cache优化比较全面,不能unnest走FILTER的有cache优化,可以减少执行次数,semi join/anti join同样有cache优化,普通nested loops join没有cache优化但是有批量获取rowid回表的优化。
总结:
1.MySQL的子查询subquery cache,仅限于无法unnest subquery,走select_type:SUBQUERY,也就是物化方式,如果外部表与子查询关联的key重复值特别多,
则可以有效减少子查询执行次数,具体减少多少,要看这个key存储的连续程度。
至于DEPENDENT SUBQUERY是MySQL子查询兜底招数,虽然改写和执行计划里对外部传递的列有,但是不能减少子查询执行次数,效果比较差。
另外MySQL的标量子查询只能走DEPENDENT SUBQUERY,无subquery cache优化,效率不好。
MariaDB是做了subquery cache优化的:见
MySQL对于不能unnest的,如果外层表有过滤条件,如果不能走索引,则条件写在子查询前还是后对驱动子查询次数有影响,
能走索引先走索引过滤无影响。
不走索引的,写在前先过滤后然后驱动子查询,写在后则是先驱动子查询后过滤外层条件,可能执行次数会多很多。
2.Oracle对无法unnest的子查询、标量子查询、update关联子查询是有缓存结果优化的,一般most缓存数目在1024个key左右,
如果outer table关联key的基数(distinct数目)很小,则可以大幅度减少子查询执行次数,具体也要看数据物理分布连续性是否强,
连续性强,减少的子查询次数越少,如果很离散存储,则子查询次数会增多。详细见:
FILTER Operation中子查询的执行次数问题分析(http://blog.chinaunix.net/uid-7655508-id-5875810.html)。
另外oracle对能够unnest的semi/anti join,走nested loops的也有缓存优化,普通join走nested loops没有缓存优化,但是有批处理优化。
ORACLE这方面优化做的比较全面。
3.MySQL子查询优化还是不够强大,所以在MySQL里让子查询能够subquery unnest较好,这样可以转为join,semi join,anti join优化,避免unnest subquery只有2种方法,
完全依赖于外层结果驱动子查询,缓存优化也只有materialization才有,标量子查询在MySQL里要慎用,和O不同的是,它没有缓存优化。
在MySQL里经常会使用materialization来优化子查询,因为它会按照关联key剔除重复行并且创建索引,一定程度上能够弥补不能缓存的差距。
本系列结束.
阅读(21013) | 评论(0) | 转发(0) |