分类: Oracle
2008-01-29 11:22:13
analyze table compute statistics = analyze table compute statistics for table for all indexes for all columns;
比analyze table compute statistics for table for all indexes for all indexed columns 分析了更多的信息
但这里并不是鼓励使用 analyze table 的方法进行分析。
[2005/07/16增加]http://www.itpub.net/showthread.php?s=&threadid=390929 讨论了不同方式分析,oracle默认产生的HISTOGRAMS不同。作为本文章的补充。
观察analyze table compute statistics 都对什么对象统计了信息
analyze table compute statistics = analyze table compute statistics for table for all indexes for all columns;
比analyze table compute statistics for table for all indexes for all indexed columns 分析了更多的信息
但这里并不是鼓励使用 analyze table 的方法进行分析。
SQL> create table t as select * from all_objects;
Table created.
SQL> create index t_idx on t(object_id);
Index created.
SQL> analyze table t compute statistics for table
2 for all indexes for all indexed columns;
Table analyzed.
SQL> select t.num_rows, i.num_rows, c.cnt
2 from (select num_rows from user_tables where table_name = 'T') t,
3 (select num_rows from user_indexes where table_name = 'T' ) i,
4 (select count(distinct column_name) cnt from user_tab_histograms where tab
le_name = 'T' ) c
5 /
NUM_ROWS NUM_ROWS CNT
---------- ---------- ----------
31213 31213 1 《------在这里 因为只有1列建立有索引,因此user_tab_histograms 只统计了object_id列的数据分布,这里object_id因为唯一,所以是均匀分布的
SQL> analyze table t delete statistics;
Table analyzed.
SQL> select t.num_rows, i.num_rows, c.cnt
2 from (select num_rows from user_tables where table_name = 'T') t,
3 (select num_rows from user_indexes where table_name = 'T' ) i,
4 (select count(distinct column_name) cnt from user_tab_histograms where ta
ble_name = 'T' ) c;
NUM_ROWS NUM_ROWS CNT
---------- ---------- ----------
0
SQL> analyze table t compute statistics;
Table analyzed.
SQL> select t.num_rows, i.num_rows, c.cnt
2 from (select num_rows from user_tables where table_name = 'T') t,
3 (select num_rows from user_indexes where table_name = 'T' ) i,
4 (select count(distinct column_name) cnt from user_tab_histograms where ta
ble_name = 'T' ) c;
NUM_ROWS NUM_ROWS CNT
---------- ---------- ----------
31213 31213 13 〈------统计了所有的列,但这些列并不是都用在 where col='X' 上的,因此很多信息其实都没有实际意义。
DBMS_STATS 和TABLE的MONITOR结合,可以当表数据量发生10%的变化的时候,自动重新收集统计信息。
我平常更喜欢使用SIZE SKEWONLY 分析histograms
以下引用 ASKTOM中对“what kind of analyze is best”的回复
DBMS_STATS in general always gets table statistics
(gather_table,gather_schema,gather_database all do).
If you use CASCADE=>TRUE, it'll get indexes.
METHOD_OPT dictates how histograms are done.
Take the analyze command for example:
analyze table T compute statistics
for table <<=== dbms_stats does this when you gather
table,schema,database stats
for all indexes <<=== when you do table/schema/database AND cascade=>
TRUE or you use gather index stats directly
for all columns size 1 <<== when you default method_opt, can be overriden
Now, we have method_opts of:
o for all columns - gather histograms on all columns -- hugely expensive and
rarely should be used
o for all indexed columns - gather histograms on all columns that are currently
indexed. I find this typically to be the "most useful" although you can
arguably skip nearly unique columns such as primary keys
o for all hidden columns - gather histograms on columns you cannot see, useful
if using the object relational features which incorporate tons of hidden columns
in your objects.
o for columns - gather histograms on these specific columns, could be
useful to skip primary key indexes or just to get the 2 or 3 columns of interest
Each of the above can be modified with SIZE clauses where SIZE can be:
o SIZE
buckets. See note at
[url][/url]
to understand what this means and what happens if for example you use N = 10 and
there are 5 distinct column values or you use N = 10 and there are 15 distinct
values...
o SIZE REPEAT -- says to do the histograms just like you did them last time. It
reads the data dictionary to figure out what to do
o SIZE AUTO -- Oracle looks at the data and using a magical, undocumented and
changing algorithm, figures out all by itself what columns to gather stats on
and how many buckets and all. It'll collect histograms in memory only for those
columns which are used by your applications (those columns appearing in a
predicate involving an equality, range, or like operators). we know that a
particular column was used by an application because at parse time, we'll store
workload information in SGA. Then we'll store histograms in the data dictionary
only if it has skewed data (and it worthy of a histogram)
o SIZE SKEWONLY when you collect histograms with the SIZE option set to
SKEWONLY, we collect histogram data in memory for all specified columns (if you
do not specify any, all columns are used). Once an "in-memory" histogram is
computed for a column, it is stored inside the data dictionary only if it has
"popular" values (multiple end-points with the same value which is what we
define by "there is skew in the data").