分类: Oracle
2008-05-13 16:24:03
列的统计信息可以被存储在柱状图中。柱状图提供了正确的列的动态分布信息。为SQL的查询优化提供
正确的优化的执行路径。某一列值的分布情况将影响优化器决定是使用索引还是执行全表扫描。
即使某一列上没有索引而且这一列也不是连接键的一部分,也能在该列上创建柱状图信息
柱状图分为高度柱状图和频率柱状图。存储在user_TAB_COL_STATISTICS和dba_TAB_COL_STATISTICS中
1、柱状图概念和分类
列的统计信息可以被存储在柱状图中。柱状图提供了正确的列的动态分布信息。为SQL的查询优化提供
正确的优化的执行路径。某一列值的分布情况将影响优化器决定是使用索引还是执行全表扫描。
即使某一列上没有索引而且这一列也不是连接键的一部分,也能在该列上创建柱状图信息
柱状图分为高度柱状图和频率柱状图。存储在user_TAB_COL_STATISTICS和dba_TAB_COL_STATISTICS中
distinct histogram from user_TAB_COL_STATISTICS ;
HISTOGRAM
---------------
FREQUENCY
HEIGHT BALANCED
NONE
2、何时创建柱状图
由于创建柱状图给SQL语句的分析阶段增加了额外的开销,你应该避免使用它们,
除非它们对于更快的优化器执行计划是必须的
创建柱状图推荐:
当该列在某个查询中被参考时 - 假如没有任何查询参考这一列的话,就没有必要给该列创建柱状图。
当该列的值的分布非常倾斜时 - 该列值的倾斜应该足够大,以至于where子句中的值将使得优化器选择其他的执行计划。
当该列的值导致某个不正确的假定时 - 如果优化器对某个中间结果集的大小作出了错误的猜测,那么它可能选择一个次优化的表连接方法。
给该列添加一个柱状图通常情况下会向优化器提供选择最好的表连接方法的所需信息
3、如何创建柱状图
由系统过程DBMS_STATS.GATHER_TABLE_STATS
ownname: Schema of table to analyze
tabname: Name of table
partname: Name of partition
method_opt:
Accepts:
FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
FOR COLUMNS [size clause] column|attribute [size_clause] [,column|attribute [size_clause]...]
size_clause is defined as size_clause := SIZE {integer | REPEAT | AUTO | SKEWONLY}
- integer : Number of histogram buckets. Must be in the range [1,254].
- REPEAT : Collects histograms only on the columns that already have histograms.
- AUTO : Oracle determines the columns to collect histograms based on data distribution and the workload of the columns.
- SKEWONLY : Oracle determines the columns to collect histograms based on the data distribution of the columns.
The default is FOR ALL COLUMNS SIZE AUTO.The default value can be changed using the SET_PARAM Procedure.
4、通过事例说明柱状图
table test_hi tablespace hwm as select * from dba_objects;
表已创建。
index indx_obid on test_hi(object_id);
索引已创建。
owner,count(*) from test_hi group by owner;
OWNER COUNT(*)
------------------------------ ----------
MDSYS 885
TSMSYS 3
DMSYS 189
PUBLIC 19989
OUTLN 8
CTXSYS 339
OLAPSYS 720
HR 34
SYSTEM 462
EXFSYS 281
SCOTT 23
SH 306
OE 127
DBSNMP 46
ORDSYS 1669
ORDPLUGINS 10
SYSMAN 1321
PM 26
XDB 682
IX 53
BI 8
SYS 23290
WMSYS 242
SI_INFORMTN_SCHEMA 8
已选择24行。
index indx_own on test_hi(owner);
索引已创建。
--收集列统计信息
DBMS_STATS.GATHER_table_STATS (OWNNAME => 'SYS', TABNAME => 'TEST_
HI',METHOD_OPT => 'FOR COLUMNS SIZE 10 OBJECT_ID');
PL/SQL 过程已成功完成。
--查看一下,发现NUM_DISTINCT >> NUM_BUCKETS 是 HEIGHT 柱状图
column_name, num_distinct, num_buckets, histogram
2 FROM USER_TAB_COL_STATISTICS
3 WHERE table_name = 'TEST_HI' AND column_name = 'OBJECT_ID';
COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ----------- ---------------
OBJECT_ID 50721 10 HEIGHT BALANCED
endpoint_number, endpoint_value
2 FROM(
3 SELECT endpoint_number, endpoint_value
4 FROM USER_HISTOGRAMS
5 WHERE table_name = 'TEST_HI' and column_name = 'OBJECT_ID'
6 ORDER BY endpoint_number)
7 WHERE rownum <=20;
ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
0 2
1 5162
2 10590
3 15662
4 20734
5 25806
6 30878
7 35950
8 41101
9 46835
10 56705
已选择11行。
DBMS_STATS.GATHER_table_STATS (OWNNAME => 'SYS', TABNAME => 'TEST_
HI',METHOD_OPT => 'FOR COLUMNS SIZE 10 OWNER');
PL/SQL 过程已成功完成。
column_name, num_distinct, num_buckets, histogram
2 FROM USER_TAB_COL_STATISTICS
3 WHERE table_name = 'TEST_HI' AND column_name = 'OWNER';
COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ----------- ---------------
OWNER 21 10 HEIGHT BALANCED
DBMS_STATS.GATHER_table_STATS (OWNNAME => 'SYS', TABNAME => 'TEST_
HI',METHOD_OPT => 'FOR COLUMNS SIZE 20 OWNER');
PL/SQL 过程已成功完成。
column_name, num_distinct, num_buckets, histogram
2 FROM USER_TAB_COL_STATISTICS
3 WHERE table_name = 'TEST_HI' AND column_name = 'OWNER';
COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ----------- ---------------
OWNER 20 20 FREQUENCY
DBMS_STATS.GATHER_table_STATS (OWNNAME => 'SYS', TABNAME => 'TEST_
HI',METHOD_OPT => 'FOR COLUMNS SIZE 30 OWNER');
PL/SQL 过程已成功完成。
column_name, num_distinct, num_buckets, histogram
2 FROM USER_TAB_COL_STATISTICS
3 WHERE table_name = 'TEST_HI' AND column_name = 'OWNER';
COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ----------- ---------------
OWNER 23 23 FREQUENCY
--看来设置过多的bucket则 NUM_DISTINCT=NUM_BUCKETS 就是FREQUENCY 柱状图
DBMS_STATS.GATHER_table_STATS (OWNNAME => 'SYS', TABNAME => 'TEST_
HI',METHOD_OPT => 'FOR COLUMNS SIZE AUTO OWNER');
PL/SQL 过程已成功完成。
column_name, num_distinct, num_buckets, histogram
2 FROM USER_TAB_COL_STATISTICS
3 WHERE table_name = 'TEST_HI' AND column_name = 'OWNER';
COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ----------- ---------------
OWNER 21 1 NONE
endpoint_number, endpoint_value
2 FROM(
3 SELECT endpoint_number, endpoint_value
4 FROM USER_HISTOGRAMS
5 WHERE table_name = 'TEST_HI' and column_name = 'OWNER'
6 ORDER BY endpoint_number)
7 WHERE rownum <=20;
ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
0 3.4417E+35
1 4.5831E+35
--我的疑问,以后慢慢找答案
--当我把FOR COLUMNS SIZE 值增大时,为什么大于20后NUM_DISTINCT总是和NUM_BUCKETS相等?
当我用FOR COLUMNS SIZE AUTO 选项时 竟然没有任何柱状图
AUTO应该是系统自动计算出来的,难道oracle建议不要使用柱状图?
5、柱状图对优化器的影响
table test_hi compute statistics for all indexed columns;
表已分析。
DBMS_STATS.GATHER_table_STATS (OWNNAME => 'SYS', TABNAME => 'TEST_HI',METHOD_OPT => 'FOR ALL INDEXED COLUMNS ');
表已分析。
endpoint_number, endpoint_value,ENDPOINT_ACTUAL_VALUE
2 FROM(
3 SELECT endpoint_number, endpoint_value,ENDPOINT_ACTUAL_VALUE
4 FROM USER_HISTOGRAMS
5 WHERE table_name = 'TEST_HI' and column_name = 'OWNER'
6 ORDER BY endpoint_number)
7 WHERE rownum <=20
8 /
ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU
--------------- -------------- --------------------
8 3.4417E+35 BI
347 3.4959E+35 CTXSYS
393 3.5442E+35 DBSNMP
582 3.5464E+35 DMSYS
863 3.6006E+35 EXFSYS
897 3.7551E+35 HR
950 3.8082E+35 IX
1835 4.0119E+35 MDSYS
1962 4.1159E+35 OE
2682 4.1174E+35 OLAPSYS
2692 4.1186E+35 ORDPLUGINS
4361 4.1186E+35 ORDSYS
4369 4.1192E+35 OUTLN
4395 4.1695E+35 PM
24384 4.1711E+35 PUBLIC
24407 4.3233E+35 SCOTT
24713 4.3242E+35 SH
24721 4.3245E+35 SI_INFORMTN_SCHEMA
48011 4.3277E+35 SYS
49332 4.3277E+35 SYSMAN
已选择20行。
system flush shared_pool;
autotrace traceonly
系统已更改。
count(*) from test_hi where owner = 'BI';
执行计划
----------------------------------------------------------
Plan hash value: 2278301351
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 1 (0)| 00:00:01
| 1 | SORT AGGREGATE | | 1 | 5 | |
|* 2 | INDEX RANGE SCAN| INDX_OWN | 8 | 40 | 1 (0)| 00:00:01
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='BI')
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
0 consistent gets
0 physical reads
0 redo size
0 bytes sent via SQL*Net to client
0 bytes received via SQL*Net from client
0 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
count(*) from test_hi where owner = 'SYS';
执行计划
----------------------------------------------------------
Plan hash value: 607366465
----------------------------------------------------------------------------
--
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
|
----------------------------------------------------------------------------
--
| 0 | SELECT STATEMENT | | 1 | 5 | 29 (7)| 00:0
|
| 1 | SORT AGGREGATE | | 1 | 5 | |
|
|* 2 | INDEX FAST FULL SCAN| INDX_OWN | 23290 | 113K| 29 (7)| 00:0
|
----------------------------------------------------------------------------
--
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("OWNER"='SYS')
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
0 consistent gets
0 physical reads
0 redo size
0 bytes sent via SQL*Net to client
0 bytes received via SQL*Net from client
0 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
>