Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1339490
  • 博文数量: 169
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 3800
  • 用 户 组: 普通用户
  • 注册时间: 2011-03-30 13:00
个人简介

About me:Oracle ACE pro,optimistic,passionate and harmonious. Focus on ORACLE,MySQL and other database programming,peformance tuning,db design, j2ee,Linux/AIX,Architecture tech,etc

文章分类

全部博文(169)

文章存档

2024年(24)

2023年(28)

2022年(43)

2020年(62)

2014年(3)

2013年(9)

分类: Oracle

2024-05-17 17:25:45

本部分主要研究默认选择率NULL数目影响、案例以及默认选择率系列总结。

4.默认选择率还要考虑NULL数目影响

对于计算null数目影响的:
adjust_sel=(num_rows-num_nulls)/num_rows


其中走索引选择率ix_sel,索引过滤选择率ix_sel_with_filters,
回表选择率如果没有额外过滤条件就是ix_sel_with_filters,有额外条件还要加上额外条件选择率*condition_filters


对于列有统计信息的,选择率=默认选择率*adjust_sel, adjust_sel=(num_rows-num_nulls)/num_rows,
如果绑定变量等无具体值的、使用函数等,可能使用默认选择率。


对于列无统计信息的,这个adjust_sel就没有了,所以就是默认选择率,如果对于num_nulls
很多的列,更加不准确。

也即有统计信息的默认选择率如果考虑null影响,要*adjust_sel.
比如subobject_name null较多:

点击(此处)折叠或打开

  1. select num_rows,sample_size from dba_tables where table_name='SEL_TEST';

  2.   NUM_ROWS SAMPLE_SIZE
  3. ---------- -----------
  4.      72800 72800


  5. select column_name,num_distinct,density,num_nulls,histogram
  6. from dba_tab_col_statistics
  7. where table_name='SEL_TEST'
  8. and column_name='SUBOBJECT_NAME';
  9. COLUMN_NAME NUM_DISTINCT DENSITY NUM_NULLS HISTOGRAM
  10. -------------------- ------------ ---------- ---------- ------------------------------
  11. SUBOBJECT_NAME 564 .00177305 71801 NONE

则adjust_sel=(num_rows-num_nulls)/num_rows
=(72800-71801)/72800
=0.013722527

建立索引如下:

点击(此处)折叠或打开

  1. create index idx2_sel_test on sel_test(subobject_name);
like范围有索引走索引默认ix_sel=0.009,ix_sel_with_filters=0.05


点击(此处)折叠或打开

  1. explain plan for
  2. select * from sel_test
  3. where subobject_name like :var;

  4. select * from table(dbms_xplan.display);
  5. PLAN_TABLE_OUTPUT
  6. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  7. Plan hash value: 2441733028

  8. -----------------------------------------------------------------------------------------------------
  9. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
  10. -----------------------------------------------------------------------------------------------------
  11. | 0 | SELECT STATEMENT | | 50 | 7500 | 5 (0)| 00:00:01 |
  12. | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| SEL_TEST | 50 | 7500 | 5 (0)| 00:00:01 |
  13. |* 2 | INDEX RANGE SCAN | IDX2_SEL_TEST | 9 | | 2 (0)| 00:00:01 |
  14. -----------------------------------------------------------------------------------------------------

  15. Predicate Information (identified by operation id):
  16. ---------------------------------------------------

  17.    2 - access("SUBOBJECT_NAME" LIKE :VAR)
  18.        filter("SUBOBJECT_NAME" LIKE :VAR)

因为sub_object_name的null较多,还要考虑null影响选择率,所以ix_sel=
0.009*0.013722527,
所以索引扫描行数=round(72800*0.009*0.013722527)=9,
回表行数=round(72800*0.05*0.013722527)=50

5.示例分析


下面用JOIN例子说简单分析:
删除扩展统计信息:

点击(此处)折叠或打开

  1. exec dbms_stats.drop_extended_stats(ownname=>user,tabname=>'sel_test',extension=>'(SUBSTR("OBJECT_NAME",1,1))');
  2. exec dbms_stats.drop_extended_stats(ownname=>user,tabname=>'sel_test',extension=>'(SUBSTR("OWNER",1,3))');
  3. exec dbms_stats.drop_extended_stats(ownname=>user,tabname=>'sel_test',extension=>'(SUBSTR("OBJECT_NAME",1,1),SUBSTR("OWNER",1,3))');

点击(此处)折叠或打开

  1. explain plan for
  2. select * from sel_test a,sel2 b
  3. where substr(a.object_name,1,1) = 'D' and SUBSTR(a.OWNER,1,3)='SYS'
  4. and a.object_id=b.object_id;
  5. select * from table(dbms_xplan.display(null,null,'adaptive'));

执行计划如下,因为是19c,这里是adaptive plan,很显然使用adaptive plan也是不准的,无法收集函数条件统计信息,走NESTED LOOPS,
没有走HASH JOIN:

点击(此处)折叠或打开

  1. PLAN_TABLE_OUTPUT
  2. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  3. Plan hash value: 2436822295

  4. --------------------------------------------------------------------------------------------
  5. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
  6. --------------------------------------------------------------------------------------------
  7. | 0 | SELECT STATEMENT | | 7 | 1974 | 410 (1)| 00:00:01 |
  8. |- * 1 | HASH JOIN | | 7 | 1974 | 410 (1)| 00:00:01 |
  9. | 2 | NESTED LOOPS | | 7 | 1974 | 410 (1)| 00:00:01 |
  10. | 3 | NESTED LOOPS | | 7 | 1974 | 410 (1)| 00:00:01 |
  11. |- 4 | STATISTICS COLLECTOR | | | | | |
  12. | * 5 | TABLE ACCESS FULL | SEL_TEST | 7 | 1050 | 396 (1)| 00:00:01 |
  13. | * 6 | INDEX RANGE SCAN | IDX_SEL2 | 1 | | 1 (0)| 00:00:01 |
  14. | 7 | TABLE ACCESS BY INDEX ROWID| SEL2 | 1 | 132 | 2 (0)| 00:00:01 |
  15. |- 8 | TABLE ACCESS FULL | SEL2 | 1 | 132 | 2 (0)| 00:00:01 |
  16. --------------------------------------------------------------------------------------------

  17. Predicate Information (identified by operation id):
  18. ---------------------------------------------------

  19.    1 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
  20.    5 - filter(SUBSTR("A"."OBJECT_NAME",1,1)='D' AND SUBSTR("A"."OWNER",1,3)='SYS')
  21.    6 - access("A"."OBJECT_ID"="B"."OBJECT_ID")

  22. Note
  23. -----
  24.    - this is an adaptive plan (rows marked '-' are inactive)

如果使用dynamic_sampling(5),则执行计划走HASH JOIN:


点击(此处)折叠或打开

  1. explain plan for
  2. select/*+dynamic_sampling(a 5)*/ * from sel_test a,sel2 b
  3. where substr(a.object_name,1,1) = 'D' and SUBSTR(a.OWNER,1,3)='SYS'
  4. and a.object_id=b.object_id;
  5. select * from table(dbms_xplan.display(null,null,'adaptive'));
  6. PLAN_TABLE_OUTPUT
  7. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  8. Plan hash value: 2990091768

  9. --------------------------------------------------------------------------------------------
  10. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
  11. --------------------------------------------------------------------------------------------
  12. | 0 | SELECT STATEMENT | | 2474 | 681K| 794 (2)| 00:00:01 |
  13. | * 1 | HASH JOIN | | 2474 | 681K| 794 (2)| 00:00:01 |
  14. |- 2 | NESTED LOOPS | | 2474 | 681K| 794 (2)| 00:00:01 |
  15. |- 3 | NESTED LOOPS | | | | | |
  16. |- 4 | STATISTICS COLLECTOR | | | | | |
  17. | * 5 | TABLE ACCESS FULL | SEL_TEST | 2474 | 362K| 396 (1)| 00:00:01 |
  18. |- * 6 | INDEX RANGE SCAN | IDX_SEL2 | | | | |
  19. |- 7 | TABLE ACCESS BY INDEX ROWID| SEL2 | 1 | 132 | 397 (2)| 00:00:01 |
  20. | 8 | TABLE ACCESS FULL | SEL2 | 72802 | 9384K| 397 (2)| 00:00:01 |
  21. --------------------------------------------------------------------------------------------

  22. Predicate Information (identified by operation id):
  23. ---------------------------------------------------

  24.    1 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
  25.    5 - filter(SUBSTR("A"."OBJECT_NAME",1,1)='D' AND SUBSTR("A"."OWNER",1,3)='SYS')
  26.    6 - access("A"."OBJECT_ID"="B"."OBJECT_ID")

  27. Note
  28. -----
  29.    - dynamic statistics used: dynamic sampling (level=2)
  30.    - this is an adaptive plan (rows marked '-' are inactive)

这里的hints使用:dynamic_sampling(a 5),只对a表动态采样5级别,但是Note:dynamic statistics used: dynamic sampling (level=2)
将SUBSTR("A"."OBJECT_NAME",1,1)='D' AND SUBSTR("A"."OWNER",1,3)='SYS'估算为2474行。
使用dynamic_sampling(a 5),指定不准的表动态采样即可。


如果改成dynamic_sampling(5)则Note:dynamic statistics used: dynamic sampling (level=5),这种是全部表采样,不好:

点击(此处)折叠或打开

  1. explain plan for
  2. select/*+dynamic_sampling(5)*/ * from sel_test a,sel2 b
  3. where substr(a.object_name,1,1) = 'D' and SUBSTR(a.OWNER,1,3)='SYS'
  4. and a.object_id=b.object_id;
  5. select * from table(dbms_xplan.display(null,null,'adaptive'));
  6. PLAN_TABLE_OUTPUT
  7. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  8. Plan hash value: 2990091768

  9. --------------------------------------------------------------------------------------------
  10. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
  11. --------------------------------------------------------------------------------------------
  12. | 0 | SELECT STATEMENT | | 3005 | 827K| 794 (2)| 00:00:01 |
  13. | * 1 | HASH JOIN | | 3005 | 827K| 794 (2)| 00:00:01 |
  14. |- 2 | NESTED LOOPS | | 3005 | 827K| 794 (2)| 00:00:01 |
  15. |- 3 | NESTED LOOPS | | | | | |
  16. |- 4 | STATISTICS COLLECTOR | | | | | |
  17. | * 5 | TABLE ACCESS FULL | SEL_TEST | 3005 | 440K| 396 (1)| 00:00:01 |
  18. |- * 6 | INDEX RANGE SCAN | IDX_SEL2 | | | | |
  19. |- 7 | TABLE ACCESS BY INDEX ROWID| SEL2 | 1 | 132 | 397 (2)| 00:00:01 |
  20. | 8 | TABLE ACCESS FULL | SEL2 | 72802 | 9384K| 397 (2)| 00:00:01 |
  21. --------------------------------------------------------------------------------------------

  22. Predicate Information (identified by operation id):
  23. ---------------------------------------------------

  24.    1 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
  25.    5 - filter(SUBSTR("A"."OBJECT_NAME",1,1)='D' AND SUBSTR("A"."OWNER",1,3)='SYS')
  26.    6 - access("A"."OBJECT_ID"="B"."OBJECT_ID")

  27. Note
  28. -----
  29.    - dynamic statistics used: dynamic sampling (level=5)
  30.    - this is an adaptive plan (rows marked '-' are inactive)

可以看到Note:dynamic statistics used: dynamic sampling (level=5)
SEL_TEST估算为3005行。


下面使用扩展统计信息:
收集扩展统计信息如下:

点击(此处)折叠或打开

  1. --收集单列扩展
  2. exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'sel_test',-
  3. method_opt=>'for columns size auto (substr(object_name,1,1)),(SUBSTR(OWNER,1,3)),',no_invalidate=>false);
  4. --收集组合扩展
  5. exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'sel_test',-
  6. method_opt=>'for columns size auto (SUBSTR("OBJECT_NAME",1,1),SUBSTR("OWNER",1,3))',no_invalidate=>false);
  7. 先创建扩展列,然后在收集全表的,更准:
  8. exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'sel_test',no_invalidate=>false);


点击(此处)折叠或打开

  1. explain plan for
  2. select * from sel_test a,sel2 b
  3. where substr(a.object_name,1,1) = 'D' and SUBSTR(a.OWNER,1,3)='SYS'
  4. and a.object_id=b.object_id;
  5. select * from table(dbms_xplan.display(null,null,'adaptive'));
可以看到执行计划从NESTED LOOPS变为走HASH JOIN,SEL_TEST对函数组合条件估算为2631行,准确。


点击(此处)折叠或打开

  1. PLAN_TABLE_OUTPUT
  2. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  3. Plan hash value: 2990091768

  4. --------------------------------------------------------------------------------------------
  5. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
  6. --------------------------------------------------------------------------------------------
  7. | 0 | SELECT STATEMENT | | 2631 | 724K| 794 (2)| 00:00:01 |
  8. | * 1 | HASH JOIN | | 2631 | 724K| 794 (2)| 00:00:01 |
  9. |- 2 | NESTED LOOPS | | 2631 | 724K| 794 (2)| 00:00:01 |
  10. |- 3 | NESTED LOOPS | | | | | |
  11. |- 4 | STATISTICS COLLECTOR | | | | | |
  12. | * 5 | TABLE ACCESS FULL | SEL_TEST | 2631 | 385K| 396 (1)| 00:00:01 |
  13. |- * 6 | INDEX RANGE SCAN | IDX_SEL2 | | | | |
  14. |- 7 | TABLE ACCESS BY INDEX ROWID| SEL2 | 1 | 132 | 397 (2)| 00:00:01 |
  15. | 8 | TABLE ACCESS FULL | SEL2 | 72802 | 9384K| 397 (2)| 00:00:01 |
  16. --------------------------------------------------------------------------------------------

  17. Predicate Information (identified by operation id):
  18. ---------------------------------------------------

  19.    1 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
  20.    5 - filter(SUBSTR("OBJECT_NAME",1,1)='D' AND SUBSTR("OWNER",1,3)='SYS')
  21.    6 - access("A"."OBJECT_ID"="B"."OBJECT_ID")

  22. Note
  23. -----
  24.    - this is an adaptive plan (rows marked '-' are inactive)

6.ORACLE CBO默认选择率总结


在ORACLE里,如果SQL里的表无法获得需要的信息来计算选择率,可能就会使用默认选择率,常发生在以下几种情况下:
1)表有统计信息,列未收集统计信息,无法使用动态采样
2)表有统计信息,但是使用绑定变量,绑定变量无peeking
3) 表有统计信息,但是对列使用了函数等复杂条件,无法准确估算对应条件的选择率


默认选择率可能很多时候是不准的,会导致执行计划非optimal,这样需要针对不准情况对应修复:


针对1)情况,就是更新统计信息,补全没有收集或不准的统计信息。
针对第二种情况,为了避免修改_optim_peek_user_binds参数造成的全局影响,可以针对单个SQL开启_optim_peek_user_binds,
使用SQL PROFILE绑定。
针对第三种情况,因为复杂条件无法计算准确选择率,利用使用扩展列统计信息、高级别动态采样、cardinality/opt_esimate hints、
SQL PROFILE等绑定执行计划。


另外默认选择率总结如下:
对于计算null数目影响的:
adjust_sel=(num_rows-num_nulls)/num_rows


其中走索引选择率ix_sel,索引过滤选择率ix_sel_with_filters,
回表选择率如果没有额外过滤条件就是ix_sel_with_filters,有额外条件还要加上额外条件选择率*condition_filters


对于列有统计信息的,FINAL选择率=默认选择率*adjust_sel, adjust_sel=(num_rows-num_nulls)/num_rows,
如果绑定变量等无具体值的、使用函数等,可能使用默认选择率。


对于列无统计信息的,这个adjust_sel就没有了,所以就是默认选择率,如果对于num_nulls很多的列,更加不准确。


也即有统计信息的默认选择率如果考虑null影响,考虑adjust_sel,其final选择率=default_sel * adjust_sel


有统计信息默认选择率如下:
1)列等值条件:sel = max(1/num_distinct,density)*adjust_sel,不管是否走索引ix_sel和ix_sel_with_filters一样.

2) 列<>条件: sel = 1-sel(等值)= 1 - max(1/num_distinct,density)*adjust_sel ,其中ix_sel,ix_sel_with_filters一样

3) 索引列范围条件:(包括like) ix_sel = 0.009,其中ix_sel_with_filters=0.05,这样可能导致回表后的cardinality估算行数大于索引扫描行数

4) 索引列绑定变量范围条件: sel = 0.009 (无peeking),其中ix_sel = 0.009,ix_sel_with_filters=0.05

5)索引列多个范围and条件 ix_sel = 0.009/2 = 0.0045,其中ix_sel_with_filters还是按照(sel_1)*(sel_2)=0.05*0.05

6) 索引列or条件,如果or扩展走索引,按照单个条件计算 范围是ix_sel=0.009,等值是1/num_distinct.
 回表ix_sel_with_filters=(sel_1)*(sel_2)-(sel_1*sel_2)=0.05*0.05-0.05*0.05,按照OR数目计算。

7)函数条件等值 sel = 0.01

8) 函数条件范围 sel = 0.05

9) 非索引列范围 sel = 0.05

10) 回表选择率 sel = ix_sel_with_filters * condition_filters,如果回表没有额外条件=ix_sel_with_filters
回表除了等值的,非函数等值=max(1/num_distinct,density)*adjust_sel,函数等值=0.01
,其他均为0.05




无统计信息默认选择率如下:
无统计信息的不用考虑adjest_sel,如果NULL数目较多,对CBO来说也不可知,所以NULL数目多的时候,可能影响大。

1)等值列条件无统计信息选择率,列有索引等值条件 ix_sel=0.004,ix_sel_with_filters=0.01

2)等值列条件无统计信息,列无索引列默认选择率sel=0.01,包括绑定变量无具体值,函数条件的等值查询,不管有没有统计信息都是1%。

3) 对于非索引列列无统计信息的范围查询,和有统计信息无具体绑定变量值一样,非索引列(包括函数)是5%

4)列无统计信息,范围条件走索引的选择率是ix_sel=0.009,ix_sel_with_filters=0.05,和索引列有统计信息,使用绑定变量无具体值的范围查询一样。

5) 列无统计信息,<>条件选择率是5%,和列有统计信息按照1-sel(等值)的不一样。


ORACLE的默认选择率一般很小:
比如索引列density,0.009,0.004几种,有索引的偏向于走索引。
对于不走索引或回表的其他选择率:等值的0.01,其他0.05。


MySQL对于列无索引,无直方图的,也会采用默认选择率,大部分使用
Access Path Selection in a Relational Database Management System论文里的选择率,
一般比较大,
比如:
等值是10%,
>,<等是33%,
like,between是11%,
<>是0.9
比ORACLE默认选择率大很多,MySQL默认选择率问题较多,
可以建索引或收集直方图解决,MySQL没有动态采样,没有扩展统计信息,
没有cardinality hints



本系列结束。








阅读(930) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~