Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1300784
  • 博文数量: 167
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 3775
  • 用 户 组: 普通用户
  • 注册时间: 2011-03-30 13:00
个人简介

About me:Oracle ACE pro,optimistic,passionate and harmonious. Focus on ORACLE,MySQL and other database programming,peformance tuning,db design, j2ee,Linux/AIX,Architecture tech,etc

文章分类

全部博文(167)

文章存档

2024年(22)

2023年(28)

2022年(43)

2020年(62)

2014年(3)

2013年(9)

分类: Mysql/postgreSQL

2024-05-23 16:11:26

接:MySQL Subquery Cache vs Oracle Subquery Cache子查询缓存_MySQL部分_PART1

3.MySQL标量子查询没有subquery cache优化,走 DEPENDENT SUBQUERY ,这个很差劲


从下列执行计划看,标量子查询只能走DEPENDENT SUBQUERY ,没有缓存优化。

点击(此处)折叠或打开

  1. explain
  2. select t1.a,(select t2.b from subq_t2 t2 where t1.a = t2.a) b
  3. from subq_t1 t1;
  4. +----+--------------------+-------+------------+------+---------------+-------------+---------+----------------+--------+----------+-------+
  5. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
  6. +----+--------------------+-------+------------+------+---------------+-------------+---------+----------------+--------+----------+-------+
  7. | 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 299886 | 100.00 | NULL |
  8. | 2 | DEPENDENT SUBQUERY | t2 | NULL | ref | idx_subq_t2 | idx_subq_t2 | 5 | employees.t1.a | 1 | 100.00 | NULL |
  9. +----+--------------------+-------+------------+------+---------------+-------------+---------+----------------+--------+----------+-------+
  10. 2 rows in set, 2 warnings (0.00 sec)

  11. show warnings\G
  12. *************************** 1. row ***************************
  13.   Level: Note
  14.    Code: 1276
  15. Message: Field or reference 'employees.t1.a' of SELECT #2 was resolved in SELECT #1
  16. *************************** 2. row ***************************
  17.   Level: Note
  18.    Code: 1003
  19. Message: /* select#1 */ select `employees`.`t1`.`a` AS `a`,(/* select#2 */ select `employees`.`t2`.`b` from `employees`.`subq_t2` `t2` where (`employees`.`t1`.`a` = `employees`.`t2`.`a`)) AS `b` from `employees`.`subq_t1` `t1`

使用hints也走DEPENDENT SUBQUERY:


点击(此处)折叠或打开

  1. explain
  2. select/*+subquery(@subq materialization)*/ t1.a,(select/*+qb_name(subq)*/ t2.b from subq_t2 t2 where t1.a = t2.a) b
  3. from subq_t1 t1;

  4. +----+--------------------+-------+------------+------+---------------+-------------+---------+----------------+--------+----------+-------+
  5. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
  6. +----+--------------------+-------+------------+------+---------------+-------------+---------+----------------+--------+----------+-------+
  7. | 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 299886 | 100.00 | NULL |
  8. | 2 | DEPENDENT SUBQUERY | t2 | NULL | ref | idx_subq_t2 | idx_subq_t2 | 5 | employees.t1.a | 1 | 100.00 | NULL |
  9. +----+--------------------+-------+------------+------+---------------+-------------+---------+----------------+--------+----------+-------+

树形计划,子查询走索引被驱动30w次,和预期的3,5次,相差甚远,没有缓存优化。

点击(此处)折叠或打开

  1. *************************** 1. row ***************************
  2. EXPLAIN: -> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.034..232.504 rows=300000 loops=1)
  3. -> Select #2 (subquery in projection; dependent)
  4.     -> Index lookup on t2 using idx_subq_t2 (a=t1.a) (cost=0.35 rows=1) (actual time=0.003..0.004 rows=1 loops=300000)

总结:MySQL标量子查询只能走DEPENDENT SUBQUERY,没有缓存优化,所以在MySQL里,要少用标量子查询。


4.MySQL对于不能unnest的,如果外层表有过滤条件,如果条件不能走索引,则条件写在子查询前还是后对驱动子查询次数有影响,能走索引则没有影响。


写在前先过滤后然后驱动子查询,写在后则是先驱动子查询后过滤外层条件,可能执行次数会
多很多。

如果能走索引,那么一般按照先走索引过滤外部表,顺序没有影响。

1)先过滤,虽然这里还是对t1全表扫描,返回30w行,但是FILTER条件t1.b>300000先执行,
则返回0行,然后子查询不用执行。


点击(此处)折叠或打开

  1. explain analyze
  2. select/*+subquery(@subq intoexists)*/ *
  3. from subq_t1 t1
  4. where t1.b>300000 and t1.a in
  5. (select/*+qb_name(subq)*/ t2.a from subq_t2 t2)\G

注意Filter: ((t1.b > 300000) and (t1.a,(select #2)))条件,在前的先执行

实际上这个是树形执行计划显示的问题,应该Table scan on t1直接按照(t1.b > 300000)扫描,返回0行,通过执行时间可以看出是193ms,比全表扫描返回所有行小很多


点击(此处)折叠或打开

  1. *************************** 1. row ***************************
  2. EXPLAIN: -> Filter: ((t1.b > 300000) and <in_optimizer>(t1.a,<exists>(select #2))) (cost=30156.85 rows=99952) (actual time=218.451..218.451 rows=0 loops=1)
  3.     -> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.030..193.624 rows=300000 loops=1)
  4.     -> Select #2 (subquery in condition; dependent)
  5.         -> Limit: 1 row(s) (never executed)
  6.             -> Covering index lookup on t2 using idx_subq_t2 (a=<cache>(t1.a)) (never executed)

2)先驱动子查询后过滤,t1返回30w行,子查询执行30w次,很拉跨

点击(此处)折叠或打开

  1. explain analyze
  2. select/*+subquery(@subq intoexists)*/ *
  3. from subq_t1 t1
  4. where t1.a in
  5. (select/*+qb_name(subq)*/ t2.a from subq_t2 t2)
  6. and t1.b>300000\G

条件:Filter: ((t1.a,(select #2)) and (t1.b > 300000)) 
这里的Table scan on t1执行时间是250ms,比前面的先过滤的193ms大。这里是真的返回30w行。

点击(此处)折叠或打开

  1. *************************** 1. row ***************************
  2. EXPLAIN: -> Filter: (<in_optimizer>(t1.a,<exists>(select #2)) and (t1.b > 300000)) (cost=30156.85 rows=99952) (actual time=1046.440..1046.440 rows=0 loops=1)
  3.     -> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.030..250.191 rows=300000 loops=1)
  4.     -> Select #2 (subquery in condition; dependent)
  5.         -> Limit: 1 row(s) (actual time=0.002..0.002 rows=1 loops=300000)
  6.             -> Covering index lookup on t2 using idx_subq_t2 (a=<cache>(t1.a)) (actual time=0.002..0.002 rows=1 loops=300000)

3)同样的不能unnest的materialization和intoexists一样,只不过materialization有缓存优化

点击(此处)折叠或打开

  1. explain analyze
  2. select/*+subquery(@subq materialization)*/ *
  3. from subq_t1 t1
  4. where t1.b>300000 and t1.a in
  5. (select/*+qb_name(subq)*/ t2.a from subq_t2 t2)\G

t1.b>300000写在子查询前,子查询执行0次:


点击(此处)折叠或打开

  1. *************************** 1. row ***************************
  2. EXPLAIN: -> Filter: ((t1.b > 300000) and <in_optimizer>(t1.a,t1.a in (select #2))) (cost=30156.85 rows=99952) (actual time=204.768..204.768 rows=0 loops=1)
  3.     -> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.026..182.554 rows=300000 loops=1)
  4.     -> Select #2 (subquery in condition; run only once)
  5.         -> Filter: ((t1.a = `<materialized_subquery>`.a)) (cost=20078.95..20078.95 rows=1) (never executed)
  6.             -> Limit: 1 row(s) (cost=20078.85..20078.85 rows=1) (never executed)
  7.                 -> Index lookup on <materialized_subquery> using <auto_distinct_key> (a=t1.a) (never executed)
  8.                     -> Materialize with deduplication (cost=20078.85..20078.85 rows=100113) (never executed)
  9.                         -> Index scan on t2 using idx_subq_t2 (cost=10067.55 rows=100113) (never executed)

t1.b>300000写在子查询后,因为有缓存,子查询执行4次驱动物化表:

SQL:

点击(此处)折叠或打开

  1. explain analyze
  2. select/*+subquery(@subq materialization)*/ *
  3. from subq_t1 t1
  4. where t1.a in
  5. (select/*+qb_name(subq)*/ t2.a from subq_t2 t2)
  6. and t1.b>300000\G

子查询执行4次:

点击(此处)折叠或打开

  1. *************************** 1. row ***************************
  2. EXPLAIN: -> Filter: (<in_optimizer>(t1.a,t1.a in (select #2)) and (t1.b > 300000)) (cost=30156.85 rows=99952) (actual time=371.866..371.866 rows=0 loops=1)
  3.     -> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.027..199.214 rows=300000 loops=1)
  4.     -> Select #2 (subquery in condition; run only once)
  5.         -> Filter: ((t1.a = `<materialized_subquery>`.a)) (cost=20078.95..20078.95 rows=1) (actual time=27.640..27.640 rows=1 loops=4)
  6.             -> Limit: 1 row(s) (cost=20078.85..20078.85 rows=1) (actual time=27.639..27.639 rows=1 loops=4)
  7.                 -> Index lookup on <materialized_subquery> using <auto_distinct_key> (a=t1.a) (actual time=27.638..27.638 rows=1 loops=4)
  8.                     -> Materialize with deduplication (cost=20078.85..20078.85 rows=100113) (actual time=110.525..110.525 rows=100000 loops=1)
  9.                         -> Covering index scan on t2 using idx_subq_t2 (cost=10067.55 rows=100113) (actual time=0.007..61.938 rows=100000 loops=1)

4)能够unnest的,过滤条件写在前后无影响

t1.b>300000写在子查询后,也是先过滤,子查询执行0次:


点击(此处)折叠或打开

  1. explain analyze
  2. select *
  3. from subq_t1 t1
  4. where t1.a in
  5. (select/*+qb_name(subq)*/ t2.a from subq_t2 t2)
  6. and t1.b>300000\G
  7. *************************** 1. row ***************************
  8. EXPLAIN: -> Nested loop inner join (cost=1000689631.09 rows=10006494790) (actual time=220.631..220.631 rows=0 loops=1)
  9.     -> Filter: ((t1.b > 300000) and (t1.a is not null)) (cost=30156.85 rows=99952) (actual time=220.630..220.630 rows=0 loops=1)
  10.         -> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.027..198.257 rows=300000 loops=1)
  11.     -> Single-row index lookup on <subquery2> using <auto_distinct_key> (a=t1.a) (never executed)
  12.         -> Materialize with deduplication (cost=20078.85..20078.85 rows=100113) (never executed)
  13.             -> Filter: (t2.a is not null) (cost=10067.55 rows=100113) (never executed)
  14.                 -> Index scan on t2 using idx_subq_t2 (cost=10067.55 rows=100113) (never executed)

5)如果外部表条件有索引,则先走索引过滤,没有影响

建立索引:

点击(此处)折叠或打开

  1. create index idx_sub1_t1 on subq_t1(b);

现在完全没有区别,外部表都是先Index range scan on t1 using idx_sub1_t1,返回0行,子查询不需要执行。

t1.b>300000在子查询前,走索引,也是先过滤,子查询执行0次:

点击(此处)折叠或打开

  1. explain analyze
  2. select/*+subquery(@subq intoexists)*/ *
  3. from subq_t1 t1
  4. where t1.b>300000 and t1.a in
  5. (select/*+qb_name(subq)*/ t2.a from subq_t2 t2)\G
  6. *************************** 1. row ***************************
  7. EXPLAIN: -> Filter: <in_optimizer>(t1.a,<exists>(select #2)) (cost=0.71 rows=1) (actual time=0.057..0.057 rows=0 loops=1)
  8.     -> Index range scan on t1 using idx_sub1_t1 over (300000 < b), with index condition: (t1.b > 300000) (cost=0.71 rows=1) (actual time=0.029..0.029 rows=0 loops=1)
  9.     -> Select #2 (subquery in condition; dependent)
  10.         -> Limit: 1 row(s) (never executed)
  11.             -> Covering index lookup on t2 using idx_subq_t2 (a=<cache>(t1.a)) (never executed)

  12. 1 row in set (0.00 sec)

t1.b>300000在子查询后,走索引,也是先过滤,子查询也是执行0次:

点击(此处)折叠或打开

  1. explain analyze
  2. select/*+subquery(@subq intoexists)*/ *
  3. from subq_t1 t1
  4. where t1.a in
  5. (select/*+qb_name(subq)*/ t2.a from subq_t2 t2)
  6. and t1.b>300000\G
  7. *************************** 1. row ***************************
  8. EXPLAIN: -> Filter: <in_optimizer>(t1.a,<exists>(select #2)) (cost=0.71 rows=1) (actual time=0.022..0.022 rows=0 loops=1)
  9.     -> Index range scan on t1 using idx_sub1_t1 over (300000 < b), with index condition: (t1.b > 300000) (cost=0.71 rows=1) (actual time=0.021..0.021 rows=0 loops=1)
  10.     -> Select #2 (subquery in condition; dependent)
  11.         -> Limit: 1 row(s) (never executed)
  12.             -> Covering index lookup on t2 using idx_subq_t2 (a=<cache>(t1.a)) (never executed)

  13. 1 row in set (0.01 sec)

5.能够unnest的是利用join/semi join/anti join,能够有5种方式优化子查询,没有缓存优化


5种子查询策略:pullout,firstmatch,loosescan,materialization,dupsweedout

点击(此处)折叠或打开

  1. explain analyze
  2. select *
  3. from subq_t1 t1
  4. where t1.a in
  5. (select/*+qb_name(subq)*/ t2.a from subq_t2 t2)\G

执行计划可以看到,驱动物化表30w次。因为外层重复结果多,这时候效率还不如unnest subquery+materialization

点击(此处)折叠或打开

  1. *************************** 1. row ***************************
  2. EXPLAIN: -> Nested loop inner join (cost=3002308857.25 rows=30022487118) (actual time=133.304..555.223 rows=300000 loops=1)
  3.     -> Filter: (t1.a is not null) (cost=30156.85 rows=299886) (actual time=0.028..282.177 rows=300000 loops=1)
  4.         -> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.027..258.794 rows=300000 loops=1)
  5.     -> Single-row index lookup on <subquery2> using <auto_distinct_key> (a=t1.a) (actual time=0.001..0.001 rows=1 loops=300000)
  6.         -> Materialize with deduplication (cost=20078.85..20078.85 rows=100113) (actual time=133.266..133.266 rows=100000 loops=1)
  7.             -> Filter: (t2.a is not null) (cost=10067.55 rows=100113) (actual time=0.007..77.712 rows=100000 loops=1)
  8.                 -> Covering index scan on t2 using idx_subq_t2 (cost=10067.55 rows=100113) (actual time=0.007..69.800 rows=100000 loops=1)

no unnest subquery+materialization,因为子查询只需要执行4次,执行时间346ms,比前面的555ms高。


点击(此处)折叠或打开

  1. explain analyze
  2. select/*+subquery(@subq materialization)*/ *
  3. from subq_t1 t1
  4. where t1.a in
  5. (select/*+qb_name(subq)*/ t2.a from subq_t2 t2)\G
  6. *************************** 1. row ***************************
  7. EXPLAIN: -> Filter: <in_optimizer>(t1.a,t1.a in (select #2)) (cost=30156.85 rows=299886) (actual time=111.111..346.171 rows=300000 loops=1)
  8.     -> Table scan on t1 (cost=30156.85 rows=299886) (actual time=0.027..189.058 rows=300000 loops=1)
  9.     -> Select #2 (subquery in condition; run only once)
  10.         -> Filter: ((t1.a = `<materialized_subquery>`.a)) (cost=20078.95..20078.95 rows=1) (actual time=27.775..27.775 rows=1 loops=4)
  11.             -> Limit: 1 row(s) (cost=20078.85..20078.85 rows=1) (actual time=27.773..27.773 rows=1 loops=4)
  12.                 -> Index lookup on <materialized_subquery> using <auto_distinct_key> (a=t1.a) (actual time=27.773..27.773 rows=1 loops=4)
  13.                     -> Materialize with deduplication (cost=20078.85..20078.85 rows=100113) (actual time=111.064..111.064 rows=100000 loops=1)
  14.                         -> Covering index scan on t2 using idx_subq_t2 (cost=10067.55 rows=100113) (actual time=0.007..62.216 rows=100000 loops=1)

在ORACLE里semi join,anti join有缓存优化。


阅读(738) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~