分类: Oracle
2008-03-02 09:31:31
摘自 http://zhouwf0726.itpub.net/post/9689/288301
dbms_stats包问世以后,Oracle专家可通过一种简单的方式来为CBO收集统计数据。目前,已经不再推荐你使用老式的分析表和dbms_utility方法来生成CBO统计数据。那些古老的方式甚至有可能危及SQL的性能,因为它们并非总是能够捕捉到有关表和索引的高质量信息。CBO使用对象统计,为所有SQL语句选择最佳的执行计划。
dbms_stats能良好地估计统计数据(尤其是针对较大的分区表),并能获得更好的统计结果,最终制定出速度更快的SQL执行计划。
下边给出了dbms_stats的一次示范执行情况,其中使用了options子句。
execdbms_stats.gather_schema_stats( -
ownname => 'SCOTT', -
options => 'GATHER AUTO', -
estimate_percent => dbms_stats.auto_sample_size, -
method_opt => 'for all columns size repeat', -
degree => 15 -
)
为了充分认识dbms_stats的好处,你需要仔细体会每一条主要的预编译指令(directive)。下面让我们研究每一条指令,并体会如何用它为基于代价的SQL优化器收集最高质量的统计数据。
使用4个预设的方法之一,这个选项能控制Oracle统计的刷新方式:
注意,无论gather stale还是gather auto,都要求进行监视。如果你执行一个alter table xxx monitoring命令,Oracle会用dba_tab_modifications视图来跟踪发生变动的表。这样一来,你就确切地知道,自从上一次分析统计数据以来,发生了多少次插入、更新和删除操作。
以下estimate_percent参数是一种比较新的设计,它允许Oracle的dbms_stats在收集统计数据时,自动估计要采样的一个segment的最佳百分比:
estimate_percent => dbms_stats.auto_sample_size
要验证自动统计采样的准确性,你可检视dba_tables sample_size列。一个有趣的地方是,在使用自动采样时,Oracle会为一个样本尺寸选择5到20的百分比。记住,统计数据质量越好,CBO做出的决定越好。
dbms_stats的method_opt参数尤其适合在表和索引数据发生变化时刷新统计数据。method_opt参数也适合用于判断哪些列需要直方图(histograms)。
某些情况下,索引内的各个值的分布会影响CBO是使用一个索引还是执行一次全表扫描的决策。例如,假如在where子句中指定的值的数量不对称,全表扫描就显得比索引访问更经济。
如果你有一个高度倾斜的索引(某些值的行数不对称),就可创建Oracle直方图统计。但在现实世界中,出现这种情况的机率相当小。使用CBO时,最常见的错误之一就是在CBO统计中不必要地引入直方图。根据经验,只有在列值要求必须修改执行计划时,才应使用直方图。
为了智能地生成直方图,Oracle为dbms_stats准备了method_opt参数。在method_opt子句中,还有一些重要的新选项,包括skewonly,repeat和auto:method_opt=>'for all columns size skewonly'
method_opt=>'for all columns size repeat'
method_opt=>'for all columns size auto'
skewonly选项会耗费大量处理时间,因为它要检查每个索引中的每个列的值的分布情况。
假如dbms_stat发现一个索引的各个列分布得不均匀,就会为那个索引创建直方图,帮助基于代价的SQL优化器决定是进行索引访问,还是进行全表扫描访问。例如,在一个索引中,假定有一个列在50%的行中,如清单B所示,那么为了检索这些行,全表扫描的速度会快于索引扫描。--*************************************************************
-- SKEWONLY option—Detailed analysis
--
-- Use this method for a first-time analysis for skewed indexes
-- This runs a long time because all indexes are examined
--*************************************************************
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size skewonly',
degree => 7
);
end;
重新分析统计数据时,使用repeat选项,重新分析任务所消耗的资源就会少一些。使用repeat选项(清单C)时,只会为现有的直方图重新分析索引,不再搜索其他直方图机会。定期重新分析统计数据时,你应该采取这种方式。--**************************************************************
-- REPEAT OPTION - Only reanalyze histograms for indexes
-- that have histograms
--
-- Following the initial analysis, the weekly analysis
-- job will use the “repeat” option. The repeat option
-- tells dbms_stats that no indexes have changed, and
-- it will only reanalyze histograms for
-- indexes that have histograms.
--**************************************************************begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size repeat',
degree => 7
);
end;
使用alter table xxx monitoring;命令来实现Oracle表监视时,需要使用dbms_stats中的auto选项。如清单D所示,auto选项根据数据分布以及应用程序访问列的方式(例如通过监视而确定的一个列的工作量)来创建直方图。使用method_opt=>’auto’类似于在dbms_stats的option参数中使用gather auto。begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size auto',
degree => 7
);
end;
Oracle允许以并行方式来收集CBO统计数据,这就显著提高了收集统计数据的速度。但是,要想并行收集统计数据,你需要一台安装了多个CPU的SMP服务器。
dbms_stats是提高SQL执行速度的一种出色机制。通过使用dbms_stats来收集最高质量的统计数据,CBO能够正确判断执行任何SQL查询时的最快途径。dbms_stats还在不断地改进。目前,它的一些令人激动的新特性(自动样本大小和自动直方图生成)已经显著简化了Oracle专家的工作。
/*******************************************************/
One exciting feature of dbms_stats is the ability to automatically look for columns that should have histograms, and create the histograms. Multi-bucket histograms add a huge parsing overhead to SQL statements, and histograms should ONLY be used when the SQL will choose a different execution plan based upon the column value.
To aid in intelligent histogram generation, Oracle uses the method_opt parameter of dbms_stats. There are also important new options within the method_opt clause, namely skewonly, repeat and auto.
method_opt=>'for all columns size skewonly'
method_opt=>'for all columns size repeat'
method_opt=>'for all columns size auto'
In practice, there is a specific order to use the different options of dbms_stats. See this article for details. Let’s take a close look at each method option.
The method_opt=’SKEWONLY’ dbms_stats Option
The first is the “skewonly” option which very time-intensive because it examines the distribution of values for every column within every index. If dbms_stats discovers an index whose columns are unevenly distributed, it will create histograms for that index to aid the cost-based SQL optimizer in making a decision about index vs. full-table scan access. For example, if an index has one column that is in 50% of the rows, a full-table scan is faster than and index scan to retrieve these rows.
Histograms are also used with SQL that has bind variables and SQL with cursor_sharing enabled. In these cases, the CBO determines if the column value could affect the execution plan, and if so, replaced the bind variable with a literal and performs a hard parse.
--*************************************************************
-- SKEWONLY option – Detailed analysis
--
-- Use this method for a first-time analysis for skewed indexes
-- This runs a long time because all indexes are examined
--*************************************************************
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size skewonly',
degree => 7
);
end;
/
The method_opt=’REPEAT’ dbms_stats Option
Following the one-time detailed analysis, the re-analyze task will be less resource intensive with the REPEAT option. Using the repeat option will only re-analyze indexes with existing histograms, and will not search for other histograms opportunities. This is the way that you will re-analyze you statistics on a regular basis.
--**************************************************************
-- REPEAT OPTION - Only re-analyze histograms for indexes
-- that have histograms
--
-- Following the initial analysis, the weekly analysis
-- job will use the “repeat” option. The repeat option
-- tells dbms_stats that no indexes have changed, and
-- it will only re-analyze histograms for
-- indexes that have histograms.
--**************************************************************
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size repeat',
degree => 7
);
end;
/
The method_opt=’AUTO’ dbms_stats Option
The auto option is used when monitoring is implemented and creates histograms based upon data distribution and the manner in which the column is accessed by the application (e.g. the workload on the column as determined by monitoring, especially foreign keys to determine the cardinality of table join result sets). Using method_opt=>’auto’ is similar to using the gather auto in the option parameter of dbms_stats.
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size auto',
degree => 7
);
end;
/
Remember, analyzing for histograms is time-consuming, and histograms are used under two conditions:
Table join order – The CBO must know the size of the intermediate result sets (cardinality) to properly determine the correct join order the multi-table joins.
Table access method – The CBO needs to know about columns in SQL where clauses, where the column value is skewed such that a full-table scan might be faster than an index range scan. Oracle uses this skew information in conjunction with the clustering_factor columns of the dba_indexes view.
Hence, this is the proper order for using the dbms_stats package to locate proper columns for histograms:
1. Skewonly option - You want to use skewonly to do histograms for skewed columns, for cases where the value will make a difference between a full-table scan and an index scan.
2. Monitor - Next, turn-on monitoring. Issue an “alter table xx monitoring” and “alter index yyy monitoring” command for all segments in your schema. This will monitor workload against
3. Auto option - Once monitoring is in-place, you need to re-analyze with the "auto" option to create histograms for join columns within tables. This is critical for the CBO to determine the proper join order for finding the driving table in multi-table joins.
4. Repeat option - Finally, use the "repeat" option to re-analyze only the existing histograms.
Periodically you will want to re-run the skewonly and auto option to identify any new columns that require histograms. Once located, the repeat option will ensure that they are refreshed with current values.
--------------------------------------------------------------------------------
If you like DBA internal tricks, check-out my new book Creating a Self-tuning Oracle Database by Rampant TechPress. This book is now available at this link:
Regards,
DBMS_STATS AND LEAF_BLOCKS
dbms_stats是oracle用来代替原有的analyze功能的一个包,与analyze相比dbms_stats具有很多优势,比如并行,比如分区信息统计等,但是dbms_stats再分析index的时候处理方式并不是太理想,dbms_stats分析index时将会只统计leaf_blocks为当前有数据的leaf block,而analyze则会统计为所有曾经被使用过的leaf block number,很显然dbms_stats的统计结果会使index fast full scan的成本被严重低估,在某些情况下会错误得选择index fast full scan做为执行路径。下面来看一个例子:
先清空原来的表
SQL 10G>truncate table t1;
Table truncated.
插入数据
SQL 10G>insert into t1 select
2 rownum id,
3 trunc(100 * dbms_random.normal) val,
4 rpad(’x',100) padding
5 from
6 all_objects
7 where
8 rownum < = 10000
9 ;
10000 rows created.
SQL 10G>commit;
Commit complete.
创建索引,并限制pctfree为99,模拟大索引的产生
SQL 10G>create index ind_t1 on t1(id) pctfree 99;
Index created.
使用analyze分析索引
SQL 10G>ANALYZE INDEX IND_T1 COMPUTE STATISTICS;
Index analyzed.
SQL 10G>SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS FROM user_indexes where table_name=’T1′;
INDEX_NAME BLEVEL LEAF_BLOCKS
—————————— ———- ———–
IND_T1 2 10000
再使用dbms_stats分析,可以看到在这个时候基本统计数据是相同的
SQL 10G>EXEC DBMS_STATS.GATHER_INDEX_STATS(’TEST’,'IND_T1′);
PL/SQL procedure successfully completed.
SQL 10G>SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS FROM user_indexes where table_name=’T1′;
INDEX_NAME BLEVEL LEAF_BLOCKS
—————————— ———- ———–
IND_T1 2 10010
删除数据使表中只保留一条记录
SQL 10G>delete from t1 where rownum<10000;
9999 rows deleted.
SQL 10G>commit;
Commit complete.
再用analyze分析索引,可以发现leaf_blocks依然是10000
SQL 10G>ANALYZE INDEX IND_T1 COMPUTE STATISTICS;
Index analyzed.
SQL 10G>SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS FROM user_indexes where table_name=’T1′;
INDEX_NAME BLEVEL LEAF_BLOCKS
—————————— ———- ———–
IND_T1 2 10000
看cost这一项显示index fast full scan的成本为2679,这是正确的
SQL 10G>set autotrace trace exp;
SQL 10G>select/*+ index_ffs(t1,ind_t1)*/ count(*) from t1;
Execution Plan
————————————————————-
| Id | Operation | Name | Rows | Cost (%CPU)|
————————————————————-
| 0 | SELECT STATEMENT | | 1 | 2679 (19)|
| 1 | SORT AGGREGATE | | 1 | |
| 2 | INDEX FAST FULL SCAN| IND_T1 | 1 | 2679 (19)|
————————————————————-
使用dbms_stats分析索引,leaf_blocks被统计为1,只统计了当前在用的leaf block
SQL 10G>EXEC DBMS_STATS.GATHER_INDEX_STATS(’TEST’,'IND_T1′);
PL/SQL procedure successfully completed.
SQL 10G>set autotrace off;
SQL 10G>SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS FROM user_indexes where table_name=’T1′;
INDEX_NAME BLEVEL LEAF_BLOCKS
—————————— ———- ———–
IND_T1 2 1
看cost这一项显示index fast full scan的成本为1,这显然是出现了错误
SQL 10G>set autotrace trace;
SQL 10G>select/*+ index_ffs(t1,ind_t1)*/ count(*) from t1;
Execution Plan
————————————————————-
| Id | Operation | Name | Rows | Cost (%CPU)|
————————————————————-
| 0 | SELECT STATEMENT | | 1 | 1 (0)|
| 1 | SORT AGGREGATE | | 1 | |
| 2 | INDEX FAST FULL SCAN| IND_T1 | 1 | 1 (0)|
————————————————————-
再来看看它究竟需要读取多少个块,是不是cost=1就够了
SQL 10G> ALTER SESSION SET EVENTS ‘immediate trace name flush_cache’;
Session altered.
SQL 10G>select/*+ index_ffs(t1,ind_t1)*/ count(*) from t1;
Execution Plan
———————————————————-
————————————————————-
| Id | Operation | Name | Rows | Cost (%CPU)|
————————————————————-
| 0 | SELECT STATEMENT | | 1 | 1 (0)|
| 1 | SORT AGGREGATE | | 1 | |
| 2 | INDEX FAST FULL SCAN| IND_T1 | 1 | 1 (0)|
————————————————————-
Statistics
———————————————————-
0 recursive calls
0 db block gets
10035 consistent gets
10016 physical reads
0 redo size
411 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
显然,这里发生了10016个physical reads,cost=1是远远不够的。不知道oracle会不会就这个问题有改进方案,大家拭目以待