全部博文(389)
分类: Oracle
2013-12-12 10:47:24
ORACLE的GLOBAL STATS收集
GLOBAL STATS是从整个对像级别来描述对像的状态,比如num_rows.相对应的可能还有PARTITION和SUB PARTITION级别
的统计(非分区表只有GLOBAL STATS),分别描述了各种级别的统计信息。对于 GLOBAL_STATS的收集可以通过从低层的
PARTITION和SUB PARTITION级别汇聚得到,也可以通过自己本身收集来得到。比如num_rows,每个PARTITION的num_rows相加就是整个表的num_rows.
有些统计则不行,比如非PARTITION KEY的栏位distinct value.来看看这个例子
SQL> create table t3 (a int,b varchar2(100))
2 partition by range(a)
3 (partition p1 values less than (5),
4 partition p2 values less than (15),
5 partition p3 values less than (20));
Table created.
SQL> select b,count(*) from t3 group by b;
B COUNT(*)
------------------------------ ----------
mnb 1
abcd 100
df 900
efg 100
先看看不收集全局状态,通过dbms_stats在分区级别来收集,全局级别则由汇聚得到(没有直方图).
SQL> exec dbms_stats.gather_table_stats(tabname=>'T3',ownname=>'FRANK',granularity=>'PARTITION');
PL/SQL procedure successfully completed.
SQL> select table_name,global_stats,num_rows
2 from dba_tables where owner='FRANK' and table_name='T3'
3 ;
TABLE_NAME GLOBAL_STATS NUM_ROWS
------------------------------ ------------ ----------
T3 NO 1101
GLOBAL_STATS字段为NO表示当前没有收集GLOBAL_STATS,只是通过分区级别的汇聚得到。但是通过这种汇聚得到
的统计会对谓词评估产生不准的结果,如
SQL> explain plan for select * from t3 where b='df';
Explained.
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 367 | 2202 | 6 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE ALL| | 367 | 2202 | 6 (0)| 00:00:01 | 1 | 3 |
|* 2 | TABLE ACCESS FULL | T3 | 367 | 2202 | 6 (0)| 00:00:01 | 1 | 3 |
--------------------------------------------------------------------------------------------
评估的结果是367行.
SQL> select table_name,owner,num_distinct
2 from dba_tab_col_statistics
3 where owner='FRANK' and table_name='T3' and column_name='B'
4 ;
TABLE_NAME OWNER NUM_DISTINCT
------------------------------ ------------------------------ ------------
T3 FRANK 3
1101/3=367 ,可以看出num_distinct的值和真实的num_distinct不相等,所以造成了不准确的谓词评估。
而根据我们的了解,正常评估的结果应该是1101/4=275.25。来看看从GLOBAL来收集后的的情形
SQL> exec dbms_stats.delete_table_stats(tabname=>'T3',ownname=>'FRANK'); --delete stats first
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_table_stats(tabname=>'T3',ownname=>'FRANK',method_opt=>'FOR ALL COLUMNS SIZE 1');
PL/SQL procedure successfully completed. --prevent gathering histogram
SQL> explain plan for select * from t3 where b='df';
Explained.
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 275 | 1650 | 6 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE ALL| | 275 | 1650 | 6 (0)| 00:00:01 | 1 | 3 |
|* 2 | TABLE ACCESS FULL | T3 | 275 | 1650 | 6 (0)| 00:00:01 | 1 | 3 |
--------------------------------------------------------------------------------------------
和我们之前所预计的结果一样
对于使用汇聚的方式来得到结果的好处在于避免了在整个表级别再次进行全表扫描,在大表收集
信息时大大节约了资源,但是这种好处也有代价的。后面引入了一种叫作synopsis的算法,对这种不精
确的情况有所缓解.