本部分主要研究默认选择率NULL数目影响、案例以及默认选择率系列总结。
4.默认选择率还要考虑NULL数目影响
对于计算null数目影响的:
adjust_sel=(num_rows-num_nulls)/num_rows
其中走索引选择率ix_sel,索引过滤选择率ix_sel_with_filters,
回表选择率如果没有额外过滤条件就是ix_sel_with_filters,有额外条件还要加上额外条件选择率*condition_filters
对于列有统计信息的,选择率=默认选择率*adjust_sel, adjust_sel=(num_rows-num_nulls)/num_rows,
如果绑定变量等无具体值的、使用函数等,可能使用默认选择率。
对于列无统计信息的,这个adjust_sel就没有了,所以就是默认选择率,如果对于num_nulls
很多的列,更加不准确。
也即有统计信息的默认选择率如果考虑null影响,要*adjust_sel.
比如subobject_name null较多:
-
select num_rows,sample_size from dba_tables where table_name='SEL_TEST';
-
-
NUM_ROWS SAMPLE_SIZE
-
---------- -----------
-
72800 72800
-
-
-
select column_name,num_distinct,density,num_nulls,histogram
-
from dba_tab_col_statistics
-
where table_name='SEL_TEST'
-
and column_name='SUBOBJECT_NAME';
-
COLUMN_NAME NUM_DISTINCT DENSITY NUM_NULLS HISTOGRAM
-
-------------------- ------------ ---------- ---------- ------------------------------
-
SUBOBJECT_NAME 564 .00177305 71801 NONE
则adjust_sel=(num_rows-num_nulls)/num_rows
=(72800-71801)/72800
=0.013722527
建立索引如下:
-
create index idx2_sel_test on sel_test(subobject_name);
like范围有索引走索引默认ix_sel=0.009,ix_sel_with_filters=0.05
-
explain plan for
-
select * from sel_test
-
where subobject_name like :var;
-
-
select * from table(dbms_xplan.display);
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 2441733028
-
-
-----------------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
-----------------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 50 | 7500 | 5 (0)| 00:00:01 |
-
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| SEL_TEST | 50 | 7500 | 5 (0)| 00:00:01 |
-
|* 2 | INDEX RANGE SCAN | IDX2_SEL_TEST | 9 | | 2 (0)| 00:00:01 |
-
-----------------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
2 - access("SUBOBJECT_NAME" LIKE :VAR)
-
filter("SUBOBJECT_NAME" LIKE :VAR)
因为sub_object_name的null较多,还要考虑null影响选择率,所以ix_sel=
0.009*0.013722527,
所以索引扫描行数=round(72800*0.009*0.013722527)=9,
回表行数=round(72800*0.05*0.013722527)=50
5.示例分析
下面用JOIN例子说简单分析:
删除扩展统计信息:
-
exec dbms_stats.drop_extended_stats(ownname=>user,tabname=>'sel_test',extension=>'(SUBSTR("OBJECT_NAME",1,1))');
-
exec dbms_stats.drop_extended_stats(ownname=>user,tabname=>'sel_test',extension=>'(SUBSTR("OWNER",1,3))');
-
exec dbms_stats.drop_extended_stats(ownname=>user,tabname=>'sel_test',extension=>'(SUBSTR("OBJECT_NAME",1,1),SUBSTR("OWNER",1,3))');
-
explain plan for
-
select * from sel_test a,sel2 b
-
where substr(a.object_name,1,1) = 'D' and SUBSTR(a.OWNER,1,3)='SYS'
-
and a.object_id=b.object_id;
-
select * from table(dbms_xplan.display(null,null,'adaptive'));
执行计划如下,因为是19c,这里是adaptive plan,很显然使用adaptive plan也是不准的,无法收集函数条件统计信息,走NESTED LOOPS,
没有走HASH JOIN:
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 2436822295
-
-
--------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
--------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 7 | 1974 | 410 (1)| 00:00:01 |
-
|- * 1 | HASH JOIN | | 7 | 1974 | 410 (1)| 00:00:01 |
-
| 2 | NESTED LOOPS | | 7 | 1974 | 410 (1)| 00:00:01 |
-
| 3 | NESTED LOOPS | | 7 | 1974 | 410 (1)| 00:00:01 |
-
|- 4 | STATISTICS COLLECTOR | | | | | |
-
| * 5 | TABLE ACCESS FULL | SEL_TEST | 7 | 1050 | 396 (1)| 00:00:01 |
-
| * 6 | INDEX RANGE SCAN | IDX_SEL2 | 1 | | 1 (0)| 00:00:01 |
-
| 7 | TABLE ACCESS BY INDEX ROWID| SEL2 | 1 | 132 | 2 (0)| 00:00:01 |
-
|- 8 | TABLE ACCESS FULL | SEL2 | 1 | 132 | 2 (0)| 00:00:01 |
-
--------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
-
5 - filter(SUBSTR("A"."OBJECT_NAME",1,1)='D' AND SUBSTR("A"."OWNER",1,3)='SYS')
-
6 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
-
-
Note
-
-----
-
- this is an adaptive plan (rows marked '-' are inactive)
如果使用dynamic_sampling(5),则执行计划走HASH JOIN:
-
explain plan for
-
select/*+dynamic_sampling(a 5)*/ * from sel_test a,sel2 b
-
where substr(a.object_name,1,1) = 'D' and SUBSTR(a.OWNER,1,3)='SYS'
-
and a.object_id=b.object_id;
-
select * from table(dbms_xplan.display(null,null,'adaptive'));
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 2990091768
-
-
--------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
--------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 2474 | 681K| 794 (2)| 00:00:01 |
-
| * 1 | HASH JOIN | | 2474 | 681K| 794 (2)| 00:00:01 |
-
|- 2 | NESTED LOOPS | | 2474 | 681K| 794 (2)| 00:00:01 |
-
|- 3 | NESTED LOOPS | | | | | |
-
|- 4 | STATISTICS COLLECTOR | | | | | |
-
| * 5 | TABLE ACCESS FULL | SEL_TEST | 2474 | 362K| 396 (1)| 00:00:01 |
-
|- * 6 | INDEX RANGE SCAN | IDX_SEL2 | | | | |
-
|- 7 | TABLE ACCESS BY INDEX ROWID| SEL2 | 1 | 132 | 397 (2)| 00:00:01 |
-
| 8 | TABLE ACCESS FULL | SEL2 | 72802 | 9384K| 397 (2)| 00:00:01 |
-
--------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
-
5 - filter(SUBSTR("A"."OBJECT_NAME",1,1)='D' AND SUBSTR("A"."OWNER",1,3)='SYS')
-
6 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
-
-
Note
-
-----
-
- dynamic statistics used: dynamic sampling (level=2)
-
- this is an adaptive plan (rows marked '-' are inactive)
这里的hints使用:dynamic_sampling(a 5),只对a表动态采样5级别,但是Note:dynamic statistics used: dynamic sampling (level=2)
将SUBSTR("A"."OBJECT_NAME",1,1)='D' AND SUBSTR("A"."OWNER",1,3)='SYS'估算为2474行。
使用dynamic_sampling(a 5),指定不准的表动态采样即可。
如果改成dynamic_sampling(5)则Note:dynamic statistics used: dynamic sampling (level=5),这种是全部表采样,不好:
-
explain plan for
-
select/*+dynamic_sampling(5)*/ * from sel_test a,sel2 b
-
where substr(a.object_name,1,1) = 'D' and SUBSTR(a.OWNER,1,3)='SYS'
-
and a.object_id=b.object_id;
-
select * from table(dbms_xplan.display(null,null,'adaptive'));
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 2990091768
-
-
--------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
--------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 3005 | 827K| 794 (2)| 00:00:01 |
-
| * 1 | HASH JOIN | | 3005 | 827K| 794 (2)| 00:00:01 |
-
|- 2 | NESTED LOOPS | | 3005 | 827K| 794 (2)| 00:00:01 |
-
|- 3 | NESTED LOOPS | | | | | |
-
|- 4 | STATISTICS COLLECTOR | | | | | |
-
| * 5 | TABLE ACCESS FULL | SEL_TEST | 3005 | 440K| 396 (1)| 00:00:01 |
-
|- * 6 | INDEX RANGE SCAN | IDX_SEL2 | | | | |
-
|- 7 | TABLE ACCESS BY INDEX ROWID| SEL2 | 1 | 132 | 397 (2)| 00:00:01 |
-
| 8 | TABLE ACCESS FULL | SEL2 | 72802 | 9384K| 397 (2)| 00:00:01 |
-
--------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
-
5 - filter(SUBSTR("A"."OBJECT_NAME",1,1)='D' AND SUBSTR("A"."OWNER",1,3)='SYS')
-
6 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
-
-
Note
-
-----
-
- dynamic statistics used: dynamic sampling (level=5)
-
- this is an adaptive plan (rows marked '-' are inactive)
可以看到Note:dynamic statistics used: dynamic sampling (level=5)
SEL_TEST估算为3005行。
下面使用扩展统计信息:
收集扩展统计信息如下:
-
--收集单列扩展
-
exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'sel_test',-
-
method_opt=>'for columns size auto (substr(object_name,1,1)),(SUBSTR(OWNER,1,3)),',no_invalidate=>false);
-
--收集组合扩展
-
exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'sel_test',-
-
method_opt=>'for columns size auto (SUBSTR("OBJECT_NAME",1,1),SUBSTR("OWNER",1,3))',no_invalidate=>false);
-
先创建扩展列,然后在收集全表的,更准:
-
exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'sel_test',no_invalidate=>false);
-
explain plan for
-
select * from sel_test a,sel2 b
-
where substr(a.object_name,1,1) = 'D' and SUBSTR(a.OWNER,1,3)='SYS'
-
and a.object_id=b.object_id;
-
select * from table(dbms_xplan.display(null,null,'adaptive'));
可以看到执行计划从NESTED LOOPS变为走HASH JOIN,SEL_TEST对函数组合条件估算为2631行,准确。
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 2990091768
-
-
--------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
--------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 2631 | 724K| 794 (2)| 00:00:01 |
-
| * 1 | HASH JOIN | | 2631 | 724K| 794 (2)| 00:00:01 |
-
|- 2 | NESTED LOOPS | | 2631 | 724K| 794 (2)| 00:00:01 |
-
|- 3 | NESTED LOOPS | | | | | |
-
|- 4 | STATISTICS COLLECTOR | | | | | |
-
| * 5 | TABLE ACCESS FULL | SEL_TEST | 2631 | 385K| 396 (1)| 00:00:01 |
-
|- * 6 | INDEX RANGE SCAN | IDX_SEL2 | | | | |
-
|- 7 | TABLE ACCESS BY INDEX ROWID| SEL2 | 1 | 132 | 397 (2)| 00:00:01 |
-
| 8 | TABLE ACCESS FULL | SEL2 | 72802 | 9384K| 397 (2)| 00:00:01 |
-
--------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
-
5 - filter(SUBSTR("OBJECT_NAME",1,1)='D' AND SUBSTR("OWNER",1,3)='SYS')
-
6 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
-
-
Note
-
-----
-
- this is an adaptive plan (rows marked '-' are inactive)
6.ORACLE CBO默认选择率总结
在ORACLE里,如果SQL里的表无法获得需要的信息来计算选择率,可能就会使用默认选择率,常发生在以下几种情况下:
1)表有统计信息,列未收集统计信息,无法使用动态采样
2)表有统计信息,但是使用绑定变量,绑定变量无peeking
3) 表有统计信息,但是对列使用了函数等复杂条件,无法准确估算对应条件的选择率
默认选择率可能很多时候是不准的,会导致执行计划非optimal,这样需要针对不准情况对应修复:
针对1)情况,就是更新统计信息,补全没有收集或不准的统计信息。
针对第二种情况,为了避免修改_optim_peek_user_binds参数造成的全局影响,可以针对单个SQL开启_optim_peek_user_binds,
使用SQL PROFILE绑定。
针对第三种情况,因为复杂条件无法计算准确选择率,利用使用扩展列统计信息、高级别动态采样、cardinality/opt_esimate hints、
SQL PROFILE等绑定执行计划。
另外默认选择率总结如下:
对于计算null数目影响的:
adjust_sel=(num_rows-num_nulls)/num_rows
其中走索引选择率ix_sel,索引过滤选择率ix_sel_with_filters,
回表选择率如果没有额外过滤条件就是ix_sel_with_filters,有额外条件还要加上额外条件选择率*condition_filters
对于列有统计信息的,FINAL选择率=默认选择率*adjust_sel, adjust_sel=(num_rows-num_nulls)/num_rows,
如果绑定变量等无具体值的、使用函数等,可能使用默认选择率。
对于列无统计信息的,这个adjust_sel就没有了,所以就是默认选择率,如果对于num_nulls很多的列,更加不准确。
也即有统计信息的默认选择率如果考虑null影响,考虑adjust_sel,其final选择率=default_sel * adjust_sel
有统计信息默认选择率如下:
1)列等值条件:sel = max(1/num_distinct,density)*adjust_sel,不管是否走索引ix_sel和ix_sel_with_filters一样.
2) 列<>条件: sel = 1-sel(等值)= 1 - max(1/num_distinct,density)*adjust_sel ,其中ix_sel,ix_sel_with_filters一样
3) 索引列范围条件:(包括like) ix_sel = 0.009,其中ix_sel_with_filters=0.05,这样可能导致回表后的cardinality估算行数大于索引扫描行数
4) 索引列绑定变量范围条件: sel = 0.009 (无peeking),其中ix_sel = 0.009,ix_sel_with_filters=0.05
5)索引列多个范围and条件 ix_sel = 0.009/2 = 0.0045,其中ix_sel_with_filters还是按照(sel_1)*(sel_2)=0.05*0.05
6) 索引列or条件,如果or扩展走索引,按照单个条件计算 范围是ix_sel=0.009,等值是1/num_distinct.
回表ix_sel_with_filters=(sel_1)*(sel_2)-(sel_1*sel_2)=0.05*0.05-0.05*0.05,按照OR数目计算。
7)函数条件等值 sel = 0.01
8) 函数条件范围 sel = 0.05
9) 非索引列范围 sel = 0.05
10) 回表选择率 sel = ix_sel_with_filters * condition_filters,如果回表没有额外条件=ix_sel_with_filters
回表除了等值的,非函数等值=max(1/num_distinct,density)*adjust_sel,函数等值=0.01
,其他均为0.05
无统计信息默认选择率如下:
无统计信息的不用考虑adjest_sel,如果NULL数目较多,对CBO来说也不可知,所以NULL数目多的时候,可能影响大。
1)等值列条件无统计信息选择率,列有索引等值条件 ix_sel=0.004,ix_sel_with_filters=0.01
2)等值列条件无统计信息,列无索引列默认选择率sel=0.01,包括绑定变量无具体值,函数条件的等值查询,不管有没有统计信息都是1%。
3) 对于非索引列列无统计信息的范围查询,和有统计信息无具体绑定变量值一样,非索引列(包括函数)是5%
4)列无统计信息,范围条件走索引的选择率是ix_sel=0.009,ix_sel_with_filters=0.05,和索引列有统计信息,使用绑定变量无具体值的范围查询一样。
5) 列无统计信息,<>条件选择率是5%,和列有统计信息按照1-sel(等值)的不一样。
ORACLE的默认选择率一般很小:
比如索引列density,0.009,0.004几种,有索引的偏向于走索引。
对于不走索引或回表的其他选择率:等值的0.01,其他0.05。
MySQL对于列无索引,无直方图的,也会采用默认选择率,大部分使用
Access Path Selection in a Relational Database Management System论文里的选择率,
一般比较大,
比如:
等值是10%,
>,<等是33%,
like,between是11%,
<>是0.9
比ORACLE默认选择率大很多,
MySQL默认选择率问题较多,
可以建索引或收集直方图解决,MySQL没有动态采样,没有扩展统计信息,
没有cardinality hints。
本系列结束。
阅读(1130) | 评论(0) | 转发(0) |