接:
ORACLE CBO 默认选择率及其问题解决方案_PART1
本节研究下列无统计信息情况下的默认选择率。
2.列无统计信息情况下的选择率计算
列无统计信息,常发生在统计信息自动采集关闭,手动或定时收集统计信息时,采用:
-
exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'sel_test',estimate_percent=>10,-
-
method_opt=>'for all indexed columns size skewonly',no_invalidate=>false);
也即只对索引列收集统计信息其它列未收集统计信息,这个主要是DBA对统计信息了解的不够,
认为只要收集索引列统计信息即可,
其实非索引列统计信息也很重要,它的选择性影响过滤条件计算返回的cardinality,从而影响join method和join order的选择。
所以收集统计信息还是要把索引列,非索引列统计信息都要收集,
从12c开始,建议estimate_percent用默认的,才可能收集
hybirid histogram和top n histogram。
如果一个表已经生产使用过,method_opt也可以考虑使用默认值,会根据column_usage$自动决定是否收集直方图,使用默认值的可以去掉estimate_percent和method_opt也可以手动写上。
当然如果表很大,需要控制收集比例,也可以写estimate_percent,但是不会收集hybirid histogram和top n histogram。
建议收集语句如下:
-
exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'sel_test',estimate_percent=>dbms_stats.auto_sample_size,-
-
method_opt=>'for all columns size auto',no_invalidate=>false);
如果分区表,建议增加partname,避免无数据的分区num_rows=0.
另外method_opt支持多个条件混合收集,如下:
-
FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
-
FOR COLUMNS [size clause] column [size_clause] [,column [size_clause]...]
比如:
-
exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>'xxx',tabname=>'T',estimate_percent=>10,-
-
method_opt=>'FOR ALL INDEXED COLUMNS SIZE SKEWONLY FOR COLUMNS SIZE 1 OBJECT_NAME,OBJECT_TYPE,CREATED,-
-
LAST_DDL_TIME SIZE 254',no_invalidate=>false,cascade=>true,degree => 10);
for all indexed columns size skewonly for all columns size 1这种不行,只会收集索引列。
如果所有列有倾斜的都要收集直方图,不考虑col_usage$,可以用
method_opt=>'for all columns size skewonly'
列无统计信息,就不考虑1/num_distinct,density,num_nulls等,
比较的值不管是否是具体值,都直接按照对应的默认选择率计算,因为列无统计信息,
无法按照选择率计算公式计算选择率,
其实和有统计信息,绑定变量无具体值类似(除了等值有索引是0.004)。
1)列无统计信息,有索引且条件是等值,走索引范围扫描 ix_sel=0.004,ix_sel_with_filters=0.01
先删除object_name列统计信息
-
exec dbms_stats.delete_column_stats(ownname=>user,tabname=>'sel_test',colname=>'object_name',no_invalidate=>false);
-
-
select column_name,num_distinct,num_nulls,density,histogram
-
from dba_tab_col_statistics
-
where table_name='SEL_TEST'
-
and column_name='OBJECT_NAME';
-
no rows selected
-
-
Elapsed: 00:00:00.25
现在object_name列无统计信息,不知道列的num_distinct和density,num_nulls,按照前面的公式就不行了,这时候就要采用
默认选择率。
列有索引且是等值,因为列无统计信息,
则不能按照1/num_distinct等计算,默认选择率0.004,回表也不是按照0.05计算,而是按照0.01
计算。
有没有具体值都一样。
这个默认选择率还是很小的,偏向于走索引。
-
explain plan for
-
select * from sel_test
-
where object_name = :var;
-
-
select * from table(dbms_xplan.display);
-
-
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 310337252
-
-
----------------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
----------------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 728 | 96096 | 149 (0)| 00:00:01 |
-
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| SEL_TEST | 728 | 96096 | 149 (0)| 00:00:01 |
-
|* 2 | INDEX RANGE SCAN | IDX_SEL_TEST | 291 | | 3 (0)| 00:00:01 |
-
----------------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
2 - access("OBJECT_NAME"=:VAR)
对应ix_sel=291/72800=.003997253=0.004
对应回表选择率sel=728/72800=0.01
传入具体值一样:
-
explain plan for
-
select * from sel_test
-
where object_name = 'TA';
-
-
select * from table(dbms_xplan.display);
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 310337252
-
-
----------------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
----------------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 728 | 96096 | 149 (0)| 00:00:01 |
-
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| SEL_TEST | 728 | 96096 | 149 (0)| 00:00:01 |
-
|* 2 | INDEX RANGE SCAN | IDX_SEL_TEST | 291 | | 3 (0)| 00:00:01 |
-
----------------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
2 - access("OBJECT_NAME"='TA')
2)等值无统计信息,列无索引列默认选择率sel=0.01,包括绑定变量无具体值,函数条件的等值查询,不管有没有统计信息都是1%。
通过1)计算的回表选择率是0.01也可以看出,因为回表利用的就是无索引选择率,删除object_name索引测试:
有无具体值,选择率一样。
-
drop index idx_sel_test;
-
-
explain plan for
-
select * from sel_test
-
where object_name = :var;
-
-
select * from table(dbms_xplan.display);
-
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 728 | 96096 | 395 (1)| 00:00:01 |
-
|* 1 | TABLE ACCESS FULL| SEL_TEST | 728 | 96096 | 395 (1)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter("OBJECT_NAME"=:VAR)
很显然列无统计信息,且无索引默认选择率是1%= 728/72800=0.01.
传入具体值一样:
-
explain plan for
-
select * from sel_test
-
where object_name = 'TA';
-
-
select * from table(dbms_xplan.display);
-
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 728 | 96096 | 395 (1)| 00:00:01 |
-
|* 1 | TABLE ACCESS FULL| SEL_TEST | 728 | 96096 | 395 (1)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter("OBJECT_NAME"='TA')
函数等值和前面有统计信息的一样,都是1%。
-
explain plan for
-
select * from sel_test
-
where SUBSTR(object_name,1,1) = 'TA';
-
select * from table(dbms_xplan.display);
-
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 728 | 96096 | 396 (1)| 00:00:01 |
-
|* 1 | TABLE ACCESS FULL| SEL_TEST | 728 | 96096 | 396 (1)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter(SUBSTR("OBJECT_NAME",1,1)='TA')
有没有注意到SQL条件:SUBSTR(object_name,1,1) = 'TA',因为substr条件只匹配一个字符,而值是2个字符,
实际上,这个条件是impossible的,但是这里CBO却没有考虑这些,这就是既定规则的默认选择率,所以,默认选择率很多时候是不准确的。
3)列无统计信息,对于非索引列范围查询,和有统计信息无具体绑定变量值一样,非索引列
(包括函数)是5%
object_name列索引已经删除:
-
explain plan for
-
select * from sel_test
-
where object_name like :var;
-
-
select * from table(dbms_xplan.display);
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 3640 | 469K| 395 (1)| 00:00:01 |
-
|* 1 | TABLE ACCESS FULL| SEL_TEST | 3640 | 469K| 395 (1)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter("OBJECT_NAME" LIKE :VAR)
函数范围也是5%,不管是否有具体值:
-
explain plan for
-
select * from sel_test
-
where substr(object_name,1,1) like :var;
-
-
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 3640 | 469K| 396 (1)| 00:00:01 |
-
|* 1 | TABLE ACCESS FULL| SEL_TEST | 3640 | 469K| 396 (1)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter(SUBSTR("OBJECT_NAME",1,1) LIKE :VAR)
4)列无统计信息,范围条件走索引的选择率是0.009,回表是0.05,和索引列有统计信息,使用绑定变量无具体值的范围查询一样。
将object_name索引加上:
-
create index idx_sel_test on sel_test(object_name);
SQL如下:
-
explain plan for
-
select * from sel_test
-
where object_name like :var;
-
-
select * from table(dbms_xplan.display);
有索引列的范围查询ix_sel=655/72800=0.009,回表选择率=3640/72800=0.05:
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 310337252
-
-
----------------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
----------------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 3640 | 469K| 335 (0)| 00:00:01 |
-
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| SEL_TEST | 3640 | 469K| 335 (0)| 00:00:01 |
-
|* 2 | INDEX RANGE SCAN | IDX_SEL_TEST | 655 | | 7 (0)| 00:00:01 |
-
----------------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
2 - access("OBJECT_NAME" LIKE :VAR)
-
filter("OBJECT_NAME" LIKE :VAR)
5)列无统计信息,<>条件选择率是5%,和列有统计信息按照1-sel(等值)的不一样。
注意这里<>的选择率不是1-sel(=)的,而是默认5%,和列有统计信息按照1-sel(等值)的不一样。
-
explain plan for
-
select * from sel_test
-
where object_name <> 'A';
-
select * from table(dbms_xplan.display);
-
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 3640 | 469K| 395 (1)| 00:00:01 |
-
|* 1 | TABLE ACCESS FULL| SEL_TEST | 3640 | 469K| 395 (1)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter("OBJECT_NAME"<>'A')
列OBJECT_NAME无统计信息,<>的选择率=5%
如果OBJECT_NAME有统计信息,则<>的选择率=1-sel(object_name=),返回行=round(72800*(1-1/60896))=72799.
收集统计信息:
-
exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'sel_test',estimate_percent=>dbms_stats.auto_sample_size,-
-
method_opt=>'for all columns size auto',no_invalidate=>false);
列有统计信息则按照1-sel(等值)计算:
-
PLAN_TABLE_OUTPUT
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
Plan hash value: 747827318
-
-
------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 72799 | 9384K| 398 (2)| 00:00:01 |
-
|* 1 | TABLE ACCESS FULL| SEL_TEST | 72799 | 9384K| 398 (2)| 00:00:01 |
-
------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - filter("OBJECT_NAME"<>'A')
下一篇:
ORACLE CBO 默认选择率及其问题解决方案_PART3
阅读(618) | 评论(0) | 转发(0) |