本文主要研究默认选择率,为什么要有默认选择率,因为没有存储计算对应条件选择率需要的数据,所以只能用默认的。
在ORACLE里,如果SQL里的表无法获得需要的信息来计算选择率,可能就会使用默认选择率,常发生在以下几种情况下:
1)表有统计信息,列未收集统计信息,无法使用动态采样
2)表有统计信息,但是使用绑定变量,绑定变量无peeking
3) 表有统计信息,但是对列使用了函数等复杂条件,无法准确估算对应条件的选择率
针对1),就是更新统计信息,补全没有收集或不准的统计信息。
针对第二种情况,为了避免修改_optim_peek_user_binds参数造成的全局影响,可以针对单个SQL开启_optim_peek_user_binds,使用SQL PROFILE绑定。
针对第三种情况,因为复杂条件无法计算准确选择率,利用使用扩展列统计信息、高级别动态采样、cardinality/opt_esimate hints、
SQL PROFILE等绑定执行计划。
1.ORACLE CBO 默认选择率
本节主要分析什么时候CBO使用默认选择率,各种场景下的默认选择率。
建表语句如下:
-
drop table sel_test;
-
create table sel_test as select * from dba_objects;
-
create index idx_sel_test on sel_test(object_name);
因为ctas会自动收集统计信息,这里不需要再收集(需要直方图的要收集)。
统计信息如下:
-
select num_rows,sample_size,last_analyzed
-
from dba_tables where table_name='SEL_TEST';
-
NUM_ROWS SAMPLE_SIZE LAST_ANALYZE
-
---------- ----------- ------------
-
72800 72800 15-MAY-24
-
-
select column_name,num_distinct,num_nulls,density,histogram
-
from dba_tab_col_statistics
-
where table_name='SEL_TEST';
-
COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY HISTOGRAM
-
-------------------- ------------ ---------- ---------- ------------------------------
-
TEMPORARY 2 0 .5 NONE
-
GENERATED 2 0 .5 NONE
-
SECONDARY 1 0 1 NONE
-
NAMESPACE 21 2 .047619048 NONE
-
EDITION_NAME 0 72800 0 NONE
-
SHARING 4 0 .25 NONE
-
EDITIONABLE 2 47685 .5 NONE
-
ORACLE_MAINTAINED 2 0 .5 NONE
-
APPLICATION 1 0 1 NONE
-
DEFAULT_COLLATION 1 58503 1 NONE
-
DUPLICATED 1 0 1 NONE
-
SHARDED 1 0 1 NONE
-
CREATED_APPID 0 72800 0 NONE
-
CREATED_VSNID 0 72800 0 NONE
-
MODIFIED_APPID 0 72800 0 NONE
-
MODIFIED_VSNID 0 72800 0 NONE
-
OWNER 27 0 .037037037 NONE
-
OBJECT_NAME 60896 0 .000016421 NONE
-
SUBOBJECT_NAME 564 71801 .00177305 NONE
-
OBJECT_ID 72798 2 .000013737 NONE
-
DATA_OBJECT_ID 6252 66511 .000159949 NONE
-
OBJECT_TYPE 45 0 .022222222 NONE
-
CREATED 1479 0 .000676133 NONE
-
LAST_DDL_TIME 1634 2 .000611995 NONE
-
TIMESTAMP 1640 2 .000609756 NONE
-
STATUS 1 0 1 NONE
注意点:
对于计算null数目影响的:
adjust_sel=(num_rows-num_nulls)/num_rows
其中走索引选择率ix_sel,索引过滤选择率ix_sel_with_filters,
回表选择率如果没有额外过滤条件就是ix_sel_with_filters,有额外条件还要加上额外条件选择率*condition_filters
对于列有统计信息的,选择率=默认选择率*adjust_sel, adjust_sel=(num_rows-num_nulls)/num_rows,
如果绑定变量等无具体值的、使用函数等,可能使用默认选择率。
对于列无统计信息的,这个adjust_sel就没有了,所以就是默认选择率,如果对于num_nulls很多的列,更加不准确。
注意:以下示例使用OBJECT_NAME的因为NULL行数较少,忽略其影响,
使用SUBOBJECT_NAME NULL行数多,考虑adjust_sel,选择率=default_sel * adjust_sel
1.列有统计信息情况下的选择率计算
可以用explain测试,explain不会探测绑定变量值,用的是默认选择率
注意NULL数目多的,还要考虑adjuest_sel。
选择率=默认选择率*adjust_sel, adjust_sel=(num_rows-num_nulls)/num_rows,
1)列等值条件选择率计算(索引列,非索引列都一样,前提是有统计信息)
如果列有统计信息,这个不管比较条件是否有具体值,其实都不是默认选择率。
如果列有统计信息情况下:
-
explain plan for
-
select * from sel_test
-
where object_name = :var;
-
-
select * from table(dbms_xplan.display);
-
PLAN_TABLE_OUTPUT
-
----------------------------------------------------------------------------------------------------
-
Plan hash value: 310337252
-
----------------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
----------------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 1 | 132 | 4 (0)| 00:00:01 |
-
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| SEL_TEST | 1 | 132 | 4 (0)| 00:00:01 |
-
|* 2 | INDEX RANGE SCAN | IDX_SEL_TEST | 1 | | 3 (0)| 00:00:01 |
-
----------------------------------------------------------------------------------------------------
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
2 - access("OBJECT_NAME"=:VAR)
列等值选择率,这个是正常选择率,直接按照max(1/num_distinct,density)计算,
无直方图情况下1/num_distinct和density一样(这里没有考虑null,直方图等),
object_name的num_nulls=0,
对应cardinality=round((1/60896)*72800)=1
不管这个列是否走索引,等值的选择率都是sel=1/num_distinct=density,如果有直方图要复杂点,
如果考虑null,sel=(1/num_distinct)*(num_rows-num_nulls)/num_rows
非索引列object_id等值也是(1/num_distinct)*(num_rows-num_nulls)/num_rows:
-
explain plan for
-
select * from sel_test
-
where object_id = :var;
-
select * from table(dbms_xplan.display);
-
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 1 | 132 | 396 (1)| 00:00:01 |
-
|* 1 | TABLE ACCESS FULL| SEL_TEST | 1 | 132 | 396 (1)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter("OBJECT_ID"=TO_NUMBER(:VAR))
2)列有统计信息,<>的选择率计算不是默认选择率,而是1-sel(等值条件)
-
explain plan for
-
select * from sel_test
-
where object_name <> :var;
-
-
select * from table(dbms_xplan.display);
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 72799 | 9384K| 398 (2)| 00:00:01 |
-
|* 1 | TABLE ACCESS FULL| SEL_TEST | 72799 | 9384K| 398 (2)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter("OBJECT_NAME"<>:VAR)
<>选择率=1-sel(object_name=)=1-1/60896,结果行=round(72800-1)=82799
3)索引列like 绑定变量 走索引默认选择率计算(绑定变量无peeking) ix_sel=0.009,回表ix_sel_with_filters=0.05
like实际上是范围查询,所以对于使用绑定变量的范围查询选择率是一样的,比如
like :var, > :var,< :var都是一样的选择率,当然如果>,<是有多个组合的,默认选择率有所不同,见3)
如果有具体值,会按照具体值计算选择率,不是默认选择率。
-
explain plan for
-
select * from sel_test
-
where object_name like :var;
-
-
select * from table(dbms_xplan.display);
-
PLAN_TABLE_OUTPUT
-
----------------------------------------------------------------------------------------------------
-
Plan hash value: 310337252
-
-
----------------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
----------------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 3640 | 469K| 335 (0)| 00:00:01 |
-
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| SEL_TEST | 3640 | 469K| 335 (0)| 00:00:01 |
-
|* 2 | INDEX RANGE SCAN | IDX_SEL_TEST | 655 | | 7 (0)| 00:00:01 |
-
----------------------------------------------------------------------------------------------------
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
2 - access("OBJECT_NAME" LIKE :VAR)
-
filter("OBJECT_NAME" LIKE :VAR)
这里的rows=655,是怎么计算的?因为没有peeking,很显然是不准的,在实际应用中,
如果peeking关闭,则就是按照默认选择率,
这里的like :var,无peeking,不知道是前通配还是后通配,也就是走INDEX RANGE SCAN是
不准确的,如果采用这样的默认选择率,
可能传入的:var = '%ab%'也会走INDEX RANGE SCAN,这很有问题,所以生产上还是
建议打开peeking。
回头在看655是怎么产生的,因为这时候虽然object_name列有统计信息,但是因为like是
范围查询,范围查询需要知道具体的范围,
计算范围查询选择率sel=required_range/(high_value-low_value),
现在绑定变量没有具体值(和peeking关闭一样),所以不知道
required_range,因此采用默认选择率。
猜测下是多少?
sel= 655/num_rows=655/72800=.008997253,一般默认选择率到小数部分非0整数位结束,
也就是0.009
这是猜测,其实也可以通过10053查看:ix_sel: 0.009000
-
Access Path: index (RangeScan)
-
Index: IDX_SEL_TEST
-
resc_io: 335.000000 resc_cpu: 2943282
-
ix_sel: 0.009000 ix_sel_with_filters: 0.009000
-
Cost: 335.220618 Resp: 335.220618 Degree: 1
如果like绑定变量列没有索引的话就是0.05,也就是5%的选择率。可以通过10053查看:
sel = 0.050000 ,如下:
-
Cost of predicates:
-
io = NOCOST, cpu = 100.000000, sel = 0.050000 flag = 2048 ("SEL_TEST"."OBJECT_NAME" LIKE :B1)
也可以推导出来:sel=回表返回行/总行数=3640/72800=0.05,这就是为什么执行计划里走索引
估算行数是655,而回表行数=3640,
正常情况下索引行数是655,回表结果行数应该小于等于655,不会大这么多,主要原因就是
索引和回表的选择率不一样,索引是0.009,回表是0.05。
下面看下有索引的单个范围查询选择率:
-
explain plan for
-
select * from sel_test
-
where object_name > :var;
-
-
select * from table(dbms_xplan.display);
-
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 310337252
-
-
----------------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
----------------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 3640 | 469K| 335 (0)| 00:00:01 |
-
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| SEL_TEST | 3640 | 469K| 335 (0)| 00:00:01 |
-
|* 2 | INDEX RANGE SCAN | IDX_SEL_TEST | 655 | | 7 (0)| 00:00:01 |
-
----------------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
2 - access("OBJECT_NAME">:VAR)
通过范围的基数看,还是655/72800=.008997253 = 0.009
4)索引列带绑定变量至于索引范围扫描,多个范围查询的选择率 and条件ix_sel选择率
=0.0045,相当于单个范围的的0.009/2,回表ix_sel_with_filters= 0.05*0.05
多个范围查询:
-
explain plan for
-
select * from sel_test
-
where object_name > :var and object_name < :var1;
-
-
select * from table(dbms_xplan.display);
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 2443607680
-
-
-----------------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
-----------------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 182 | 24024 | 169 (0)| 00:00:01 |
-
|* 1 | FILTER | | | | | |
-
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| SEL_TEST | 182 | 24024 | 169 (0)| 00:00:01 |
-
|* 3 | INDEX RANGE SCAN | IDX_SEL_TEST | 328 | | 5 (0)| 00:00:01 |
-
-----------------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter(:VAR1>:VAR)
-
3 - access("OBJECT_NAME">:VAR AND "OBJECT_NAME"<:VAR1)
这个和一般的and选择率计算不一样,如果按照一般的选择率计算:
sel=0.009*0.009,结果行数=72800*0.009*0.009=6,很显然和328相差巨大,328对应单个
选择率计算的655的一般,
猜测选择率=0.0045,也可以通过10053看出 ix_sel: 0.004500 :
-
Access Path: index (RangeScan)
-
Index: IDX_SEL_TEST
-
resc_io: 169.000000 resc_cpu: 1482323
-
ix_sel: 0.004500 ix_sel_with_filters: 0.004500
-
Cost: 169.111110 Resp: 169.111110 Degree: 1
这里的回表182行,还是按照回表选择率0.05*0.05计算,对应返回行数=
round(72800*0.05*0.5)=182.
5)索引列带绑定变量走索引范围扫描,OR范围查询的选择率,单个分支ix_sel=0.009,
ix_sel_with_filters=0.05,
等值的单个分支ix_sel=1/num_distinct。
如果是走索引OR,走索引单个条件还是按照0.009,回表按照0.05,计算返回行选择率
按照0.05计算,
sel(or)=sel(a)+sel(b)-sel(a)*sel(b),回表选择绿按照0.05计算
-
explain plan for
-
select/*+or_expand*/ * from sel_test
-
where object_name > :var or object_name <= :var1;
-
-
select * from table(dbms_xplan.display);
-
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 2487264663
-
-
---------------------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
---------------------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 7098 | 3334K| 670 (0)| 00:00:01 |
-
| 1 | VIEW | VW_ORE_67CAE57E | 7098 | 3334K| 670 (0)| 00:00:01 |
-
| 2 | UNION-ALL | | | | | |
-
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| SEL_TEST | 3640 | 469K| 335 (0)| 00:00:01 |
-
|* 4 | INDEX RANGE SCAN | IDX_SEL_TEST | 655 | | 7 (0)| 00:00:01 |
-
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| SEL_TEST | 3458 | 445K| 335 (0)| 00:00:01 |
-
|* 6 | INDEX RANGE SCAN | IDX_SEL_TEST | 655 | | 7 (0)| 00:00:01 |
-
---------------------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
4 - access("OBJECT_NAME">:VAR)
-
6 - access("OBJECT_NAME"<=:VAR1)
-
filter(LNNVL("OBJECT_NAME">:VAR))
走索引返回655行,是按照0.009计算,回表按照0.05计算:
sel=sel(a)+sel(b)-sel(a)*sel(b)=0.05+0.05-0.05*0.05,
返回行数=72800*(0.05+0.05-0.05*0.05)=7098
像第二个分支的返回行数=3458,因为要排除前面分支的条件,对应的选择率=0.05-0.05*0.05,
所以返回行数=72800*(0.05-0.05*0.05)=3458
如果不加or_expand hints:
根据不走索引的默认选择率0.05计算:
sel=sel(a)+sel(b)-sel(a)*sel(b)=0.05+0.05-0.05*0.05,
返回行数=72800*(0.05+0.05-0.05*0.05)=7098
-
explain plan for
-
select * from sel_test
-
where object_name > :var or object_name <= :var1;
-
-
select * from table(dbms_xplan.display);
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 7098 | 914K| 395 (1)| 00:00:01 |
-
|* 1 | TABLE ACCESS FULL| SEL_TEST | 7098 | 914K| 395 (1)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter("OBJECT_NAME">:VAR OR "OBJECT_NAME"<=:VAR1)
6)非索引列范围默认选择率0.05
通过2)可以得知,不走索引的like绑定变量默认选择率=0.05,和走索引回表的选择率一样,
比如:
-
explain plan for
-
select/*+test*/ * from sel_test
-
where SUBOBJECT_NAME like :var;
-
select * from table(dbms_xplan.display);
-
-
PLAN_TABLE_OUTPUT
-
----------------------------------------------------------------------------------------------------
-
Plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 50 | 6600 | 395 (1)| 00:00:01 |
-
|* 1 | TABLE ACCESS FULL| SEL_TEST | 50 | 6600 | 395 (1)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
1 - filter("SUBOBJECT_NAME" LIKE :VAR)
这里的rows=50,不是3640,为什么呢,因为num_nulls:71801
所以选择率=0.05*adjust_sel=0.05*(72800-71801)/72800=.000686126
所以返回行=round(72800*.000686126)=50
7)函数范围查询默认选择率0.05(函数范围,无具体值,有具体值都是5%)
-
create index idx1_sel_test on sel_test(object_id);
-
explain plan for
-
select * from sel_test
-
where object_id like :var;
-
select * from table(dbms_xplan.display);
-
-
PLAN_TABLE_OUTPUT
-
----------------------------------------------------------------------------------------------------
-
Plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 3640 | 469K| 396 (1)| 00:00:01 |
-
|* 1 | TABLE ACCESS FULL| SEL_TEST | 3640 | 469K| 396 (1)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter(TO_CHAR("OBJECT_ID") LIKE :VAR)
like也是范围查询,object_id因为是number类型,隐式类型转换使用TO_CHAR,
对应选择率=3640/72800=0.05,object_id的num_nulls=2,可以忽略。
传入具体值也是5%:
-
explain plan for
-
select * from sel_test
-
where substr(object_id,1,2) > '123';
-
select * from table(dbms_xplan.display);
-
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 3640 | 469K| 396 (1)| 00:00:01 |
-
|* 1 | TABLE ACCESS FULL| SEL_TEST | 3640 | 469K| 396 (1)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter(SUBSTR(TO_CHAR("OBJECT_ID"),1,2)>'123')
8)函数等值查询,选择率是0.01(函数等值,无具体值,有具体值都是1%)
-
explain plan for
-
select * from sel_test
-
where substr(object_id,1,2) = :var;
-
select * from table(dbms_xplan.display);
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 728 | 96096 | 396 (1)| 00:00:01 |
-
|* 1 | TABLE ACCESS FULL| SEL_TEST | 728 | 96096 | 396 (1)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter(SUBSTR(TO_CHAR("OBJECT_ID"),1,2)=:VAR)
函数等值查询对应选择率0.01,所以返回行=72800*0.01=728
传入实际值也一样:
-
explain plan for
-
select * from sel_test
-
where SUBSTR(object_id,1,2) ='TA';
-
select * from table(dbms_xplan.display);
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 728 | 96096 | 396 (1)| 00:00:01 |
-
|* 1 | TABLE ACCESS FULL| SEL_TEST | 728 | 96096 | 396 (1)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter(SUBSTR(TO_CHAR("OBJECT_ID"),1,2)='TA')
下一篇:
ORACLE CBO 默认选择率及其问题解决方案_PART2
阅读(1012) | 评论(0) | 转发(0) |