Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1339439
  • 博文数量: 169
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 3800
  • 用 户 组: 普通用户
  • 注册时间: 2011-03-30 13:00
个人简介

About me:Oracle ACE pro,optimistic,passionate and harmonious. Focus on ORACLE,MySQL and other database programming,peformance tuning,db design, j2ee,Linux/AIX,Architecture tech,etc

文章分类

全部博文(169)

文章存档

2024年(24)

2023年(28)

2022年(43)

2020年(62)

2014年(3)

2013年(9)

分类: Oracle

2024-05-23 16:29:04

接:MySQL Subquery Cache vs Oracle Subquery Cache子查询缓存_MySQL部分_PART2

下面研究下ORACLE里的subquery cache.

6.ORACLE具有全面的子查询cache优化功能


ORACLE同样的查询,不管是in还是exists,对于子查询不能unnest(这里使用no_unnest hints),都有subquery cache:


ORACLE里常见的unnest subquery的是标量子查询,UPDATE关联子查询等走FILTER操作的子查询,对应执行计划子查询部分
有系统绑定变量,比如:B1,有缓存优化,如果外层关联key基数小,则子查询执行次数少。

Oracle建表语句:

点击(此处)折叠或打开

  1. create table subq_t1(a int,b int);
  2. insert into subq_t1(a,b)
  3. select 1,1
  4. from dual
  5. connect by level<=100000;

  6. insert into subq_t1(a,b)
  7. select 2,1+level
  8. from dual
  9. connect by level<=100000;

  10. insert into subq_t1(a,b)
  11. select 3,1+level
  12. from dual
  13. connect by level<=100000;


  14. create table subq_t2(a int,b int);
  15. insert into subq_t2(a,b)
  16. select level,level
  17. from dual
  18. connect by level<=100000;

  19. create index idx_subq_t2 on subq_t2(a);

  20. subq_t1表的a只有3个值:
  21. select count(*),count(distinct a) from subq_t1;

  22.   COUNT(*) COUNT(DISTINCTA)
  23. ---------- ----------------
  24.     300000 3

  25. subq_t2的a都是不重复的:
  26. select count(*),count(distinct a) from subq_t2;

  27.   COUNT(*) COUNT(DISTINCTA)
  28. ---------- ----------------
  29.     100000 100000

点击(此处)折叠或打开

  1. set serveroutput off
  2. set autotrace traceonly --不输出结果,太多
  3. select/*+gather_plan_statistics*/ *
  4. from subq_t1 t1
  5. where t1.a in
  6. (select/*+no_unnest*/ t2.a from subq_t2 t2);
  7. set autotrace off
  8. select sql_id,sql_text from v$sql where sql_text like '%gather_plan_statistics%subq_t2%';
  9. select * from table(dbms_xplan.display_cursor('0hrqfk46ss2ns',null,'allstats last'));

可以看到走FILTER,虽然subq_t1有30w行,但是子查询只执行3次(实际执行次数依赖subq_t1.a的相同值连续情况)

执行计划ID=3对于谓词有系统绑定变量: 3 - access("T2"."A"=:B1)


点击(此处)折叠或打开

  1. Plan hash value: 2995694926

  2. --------------------------------------------------------------------------------------------
  3. | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
  4. --------------------------------------------------------------------------------------------
  5. | 0 | SELECT STATEMENT | | 1 | | 300K|00:00:00.17 | 20535 |
  6. |* 1 | FILTER | | 1 | | 300K|00:00:00.17 | 20535 |
  7. | 2 | TABLE ACCESS FULL| SUBQ_T1 | 1 | 327K| 300K|00:00:00.11 | 20529 |
  8. |* 3 | INDEX RANGE SCAN | IDX_SUBQ_T2 | 3 | 1098 | 3 |00:00:00.01 | 6 |
  9. --------------------------------------------------------------------------------------------

  10. Predicate Information (identified by operation id):
  11. ---------------------------------------------------

  12.    1 - filter( IS NOT NULL)
  13.    3 - access("T2"."A"=:B1)
改为exists和in一样,因为在ORACLE里in/exists基本都可以相互转换:  


点击(此处)折叠或打开

  1. set serveroutput off
  2. set autotrace traceonly --不输出结果,太多
  3. select/*+gather_plan_statistics*/ *
  4. from subq_t1 t1
  5. where exists
  6. (select/*+no_unnest*/ 1 from subq_t2 t2 where t1.a = t2.a);
  7. set autotrace off
  8. select sql_id,sql_text from v$sql where sql_text like '%gather_plan_statistics%subq_t2%';
  9. select * from table(dbms_xplan.display_cursor('f8uyx3hynta6x',null,'allstats last'));

  10. PLAN_TABLE_OUTPUT
  11. ----------------------------------------------------------------------------------------------------
  12. SQL_ID f8uyx3hynta6x, child number 0
  13. -------------------------------------
  14. select/*+gather_plan_statistics*/ * from subq_t1 t1 where exists
  15. (select/*+no_unnest*/ 1 from subq_t2 t2 where t1.a = t2.a)

  16. Plan hash value: 2995694926

  17. --------------------------------------------------------------------------------------------
  18. | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
  19. --------------------------------------------------------------------------------------------
  20. | 0 | SELECT STATEMENT | | 1 | | 300K|00:00:00.16 | 20535 |
  21. |* 1 | FILTER | | 1 | | 300K|00:00:00.16 | 20535 |
  22. | 2 | TABLE ACCESS FULL| SUBQ_T1 | 1 | 327K| 300K|00:00:00.09 | 20529 |
  23. |* 3 | INDEX RANGE SCAN | IDX_SUBQ_T2 | 3 | 1098 | 3 |00:00:00.01 | 6 |
  24. --------------------------------------------------------------------------------------------

  25. Predicate Information (identified by operation id):
  26. ---------------------------------------------------

  27.    1 - filter( IS NOT NULL)
  28.    3 - access("T2"."A"=:B1)
oracle标量子查询,也有subquery cache优化:


点击(此处)折叠或打开

  1. set serveroutput off
  2. set autotrace traceonly --不输出结果,太多
  3. select/*+gather_plan_statistics*/ t1.a,(select t2.b from subq_t2 t2 where t1.a = t2.a) b
  4. from subq_t1 t1;
  5. set autotrace off
  6. select sql_id,sql_text from v$sql where sql_text like '%gather_plan_statistics%subq_t2%';
  7. select * from table(dbms_xplan.display_cursor('cwvtbwcaytnz5',null,'allstats last'));
可以看到ID=1和ID=2只执行3次:


点击(此处)折叠或打开

  1. Plan hash value: 438047693

  2. -------------------------------------------------------------------------------------------------------------
  3. | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
  4. -------------------------------------------------------------------------------------------------------------
  5. | 0 | SELECT STATEMENT | | 1 | | 300K|00:00:00.09 | 20529 |
  6. | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| SUBQ_T2 | 3 | 1098 | 3 |00:00:00.01 | 9 |
  7. |* 2 | INDEX RANGE SCAN | IDX_SUBQ_T2 | 3 | 439 | 3 |00:00:00.01 | 6 |
  8. | 3 | TABLE ACCESS FULL | SUBQ_T1 | 1 | 327K| 300K|00:00:00.09 | 20529 |
  9. -------------------------------------------------------------------------------------------------------------

  10. Predicate Information (identified by operation id):
  11. ---------------------------------------------------

  12.    2 - access("T2"."A"=:B1)
甚至在ORACLE里semi join/anti join走nested loops的(普通nested loops没有缓存优化),都有缓存优化,可以减少被驱动表回表次数,比如:

1.semi join走nested loops有缓存优化


点击(此处)折叠或打开

  1. set serveroutput off
  2. set autotrace traceonly --不输出结果,太多

  3. --去掉no_unnest
  4. select/*+gather_plan_statistics leading(t1) use_nl(t2@subq) */ *
  5. from subq_t1 t1
  6. where t1.a in
  7. (select/*+qb_name(subq)*/ t2.a from subq_t2 t2);

NESTED LOOPS SEMI,子查询只执行3次:

点击(此处)折叠或打开

  1. set autotrace off
  2. select sql_id,sql_text from v$sql where sql_text like '%gather_plan_statistics%subq_t2%';
  3. select * from table(dbms_xplan.display_cursor('cacaags9gcy2a',null,'allstats last'));
  4. Plan hash value: 3871264342

  5. --------------------------------------------------------------------------------------------
  6. | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
  7. --------------------------------------------------------------------------------------------
  8. | 0 | SELECT STATEMENT | | 1 | | 300K|00:00:00.18 | 20535 |
  9. | 1 | NESTED LOOPS SEMI | | 1 | 327K| 300K|00:00:00.18 | 20535 |
  10. | 2 | TABLE ACCESS FULL| SUBQ_T1 | 1 | 327K| 300K|00:00:00.13 | 20529 |
  11. |* 3 | INDEX RANGE SCAN | IDX_SUBQ_T2 | 3 | 109K| 3 |00:00:00.01 | 6 |
  12. --------------------------------------------------------------------------------------------

  13. Predicate Information (identified by operation id):
  14. ---------------------------------------------------

  15.    3 - access("T1"."A"="T2"."A")

2.anti join同样有subquery cache


点击(此处)折叠或打开

  1. set serveroutput off
  2. set autotrace traceonly --不输出结果,太多

  3. --去掉no_unnest
  4. select/*+gather_plan_statistics leading(t1) use_nl(t2@subq) */ *
  5. from subq_t1 t1
  6. where t1.a not in
  7. (select/*+qb_name(subq)*/ t2.a from subq_t2 t2 where t2.a is not null) and t1.a is not null;

  8. set autotrace off
  9. select sql_id,sql_text from v$sql where sql_text like '%gather_plan_statistics%subq_t2%';
  10. select * from table(dbms_xplan.display_cursor('fs848pak6turs',null,'allstats last'));

因为有缓存,not in的执行效率非常高,没有找到数据:


点击(此处)折叠或打开

  1. Plan hash value: 3776200174

  2. --------------------------------------------------------------------------------------------
  3. | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
  4. --------------------------------------------------------------------------------------------
  5. | 0 | SELECT STATEMENT | | 1 | | 0 |00:00:00.05 | 558 |
  6. | 1 | NESTED LOOPS ANTI | | 1 | 327K| 0 |00:00:00.05 | 558 |
  7. |* 2 | TABLE ACCESS FULL| SUBQ_T1 | 1 | 327K| 300K|00:00:00.06 | 553 |
  8. |* 3 | INDEX RANGE SCAN | IDX_SUBQ_T2 | 3 | 1 | 3 |00:00:00.01 | 5 |
  9. --------------------------------------------------------------------------------------------

  10. Predicate Information (identified by operation id):
  11. ---------------------------------------------------

  12.    2 - filter("T1"."A" IS NOT NULL)
  13.    3 - access("T1"."A"="T2"."A")
  14.        filter("T2"."A" IS NOT NULL)

3.普通的nested loops join没有缓存优化,但是有批处理


点击(此处)折叠或打开

  1. set serveroutput off
  2. set autotrace traceonly --不输出结果,太多
  3. select/*+gather_plan_statistics leading(t1) use_nl(t2)*/ *
  4. from subq_t1 t1,subq_t2 t2
  5. where t1.a = t2.a;
  6. set autotrace off
  7. select sql_id,sql_text from v$sql where sql_text like '%gather_plan_statistics%subq_t2%';
  8. select * from table(dbms_xplan.display_cursor('2qa1g6az4hmk8',null,'allstats last'));
两个nested loops就是bulk获取rowid的优化:

点击(此处)折叠或打开

  1. Plan hash value: 2089944506

  2. ------------------------------------------------------------------------------------------------------
  3. | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
  4. ------------------------------------------------------------------------------------------------------
  5. | 0 | SELECT STATEMENT | | 1 | | 300K|00:00:00.70 | 80537 |
  6. | 1 | NESTED LOOPS | | 1 | 327K| 300K|00:00:00.70 | 80537 |
  7. | 2 | NESTED LOOPS | | 1 | 327K| 300K|00:00:00.47 | 60536 |
  8. | 3 | TABLE ACCESS FULL | SUBQ_T1 | 1 | 327K| 300K|00:00:00.06 | 20529 |
  9. |* 4 | INDEX RANGE SCAN | IDX_SUBQ_T2 | 300K| 1 | 300K|00:00:00.25 | 40007 |
  10. | 5 | TABLE ACCESS BY INDEX ROWID| SUBQ_T2 | 300K| 1 | 300K|00:00:00.14 | 20001 |
  11. ------------------------------------------------------------------------------------------------------

  12. Predicate Information (identified by operation id):
  13. ---------------------------------------------------

  14.    4 - access("T1"."A"="T2"."A")

在Oracle里,subquery cache优化比较全面,不能unnest走FILTER的有cache优化,可以减少执行次数,semi join/anti join同样有cache优化,普通nested loops join没有cache优化但是有批量获取rowid回表的优化。

总结:
1.MySQL的子查询subquery cache,仅限于无法unnest subquery,走select_type:SUBQUERY,也就是物化方式,如果外部表与子查询关联的key重复值特别多,
则可以有效减少子查询执行次数,具体减少多少,要看这个key存储的连续程度。


至于DEPENDENT SUBQUERY是MySQL子查询兜底招数,虽然改写和执行计划里对外部传递的列有,但是不能减少子查询执行次数,效果比较差。


另外MySQL的标量子查询只能走DEPENDENT SUBQUERY,无subquery cache优化,效率不好。


MariaDB是做了subquery cache优化的:见


MySQL对于不能unnest的,如果外层表有过滤条件,如果不能走索引,则条件写在子查询前还是后对驱动子查询次数有影响,
能走索引先走索引过滤无影响。


不走索引的,写在前先过滤后然后驱动子查询,写在后则是先驱动子查询后过滤外层条件,可能执行次数会多很多。




2.Oracle对无法unnest的子查询、标量子查询、update关联子查询是有缓存结果优化的,一般most缓存数目在1024个key左右,
如果outer table关联key的基数(distinct数目)很小,则可以大幅度减少子查询执行次数,具体也要看数据物理分布连续性是否强,
连续性强,减少的子查询次数越少,如果很离散存储,则子查询次数会增多。详细见:
FILTER Operation中子查询的执行次数问题分析(http://blog.chinaunix.net/uid-7655508-id-5875810.html)。


另外oracle对能够unnest的semi/anti join,走nested loops的也有缓存优化,普通join走nested loops没有缓存优化,但是有批处理优化。


ORACLE这方面优化做的比较全面。


3.MySQL子查询优化还是不够强大,所以在MySQL里让子查询能够subquery unnest较好,这样可以转为join,semi join,anti join优化,避免unnest subquery只有2种方法,
完全依赖于外层结果驱动子查询,缓存优化也只有materialization才有,标量子查询在MySQL里要慎用,和O不同的是,它没有缓存优化。
在MySQL里经常会使用materialization来优化子查询,因为它会按照关联key剔除重复行并且创建索引,一定程度上能够弥补不能缓存的差距。

本系列结束.

阅读(17908) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~