关于Bew Density, 我的脚本最早没有用newdensity,发现计算出的行数与实际偏差很大,也与10053 trace看到的不一样。
研究发现Oracle最早在10.2.0.4里引入了newdensity,但是缺省没有开启,可以通过_optimizer_enable_density_improvements=true来开启
从11.1.0.6开始缺省开启了,
这个我认为算比较大的更新,因为newdensity的值跟真实值非常接近,而旧的偏差非常大。
New Density针对的是high balanced histogram中的none popular value的估算,它没有存在数据字典中,运行的时候动态计算,
所以我的脚本里也只有动态计算了
-
rem
-
rem Show a histogram of data distribution in a column
-
rem
-
rem Kerlion He, Apr 2013 :kerlion.blog.chinaunix.net
-
rem
-
rem Modifid from the script of Guy Harrison
-
rem Now:
-
rem 1. It supports both FREQUENCY and Height Balanced Histogram
-
rem 2. It detect datatype automatically
-
rem 3. The output is more clear
-
rem Tested only in 11gR2
-
-
set pagesize 200
-
set lines 80
-
set verify off
-
-
CREATE OR REPLACE FUNCTION epv2str (p_number IN NUMBER)
-
RETURN VARCHAR2
-
AS
-
--more accurate than Tom\'s hexstr from asktom.oracle.com
-
v_str varchar2(1000);
-
ch varchar2(1000);
-
i binary_integer;
-
j binary_integer;
-
v_len binary_integer;
-
BEGIN
-
v_str := TO_CHAR (p_number, RPAD (\'fm\', 50, \'X\'));
-
v_str := substr(v_str,1,14);
-
v_len := length(v_str);
-
-
i := 1; ch := \'xx\';
-
while (i+2<=v_len) loop
-
ch := substr(v_str,i,2);
-
exit when (ch = \'FF\' or ch = \'00\' );
-
i := i+2;
-
end loop;
-
v_len := i;
-
ch := substr(v_str,v_len,2);
-
i := to_number(ch,\'XX\');
-
ch := substr(v_str,v_len-2,2);
-
v_str := substr(v_str,1,v_len-3);
-
-
j := to_number(ch,\'XX\');
-
j := j + round(i/255);
-
v_str := v_str||to_char(j,\'fmXX\');
-
RETURN (utl_raw.CAST_TO_VARCHAR2(hextoraw(v_str)));
-
END;
-
/
-
-
-
col endpoint_value for a30 jus right
-
COL EP_PCT for 999.99
-
col endpoint_rows for 999,999,999.99
-
WITH hist_data AS (
-
SELECT
-
case
-
when data_type like \'%CHAR%\' then
-
epv2str(endpoint_value)
-
else to_char(endpoint_value)
-
end epv,
-
endpoint_actual_value epva,
-
Density,
-
NUM_DISTINCT as NDV,
-
NUM_ROWS - NUM_NULLS nn_rows,
-
NUM_BUCKETS,
-
HISTOGRAM,
-
endpoint_number epn,
-
Least(endpoint_number, NUM_BUCKETS- 0.5) -
-
LAG(endpoint_number,1,0) OVER
-
(partition by column_name ORDER BY endpoint_number)
-
buckets,
-
endpoint_number -
-
LAG(endpoint_number,1,0) OVER
-
(partition by column_name ORDER BY endpoint_value)
-
ep_rows
-
FROM dba_tab_histograms
-
JOIN DBA_TAB_COLUMNS USING (owner, table_name,column_name)
-
JOIN dba_tables USING (owner, table_name)
-
WHERE owner = \'&owner\'
-
AND table_name = \'&table_name\'
-
AND column_name = \'&colmn_name\'),
-
v_nd as (
-
SELECT
-
(MAX(NUM_BUCKETS)-sum(bkt))/
-
MAX(NUM_BUCKETS)/
-
(MAX(ndv)-sum(val)) NewDensity
-
FROM (select
-
ndv,
-
NUM_BUCKETS,
-
case when buckets>1 then buckets
-
else 0
-
end bkt,
-
case when buckets>1 then 1
-
else 0
-
end val
-
from hist_data))
-
SELECT
-
nvl(epva,epv) endpoint_value ,
-
decode(HISTOGRAM,\'FREQUENCY\',ep_rows,
-
case when buckets>1 then
-
round(nn_rows*buckets/NUM_BUCKETS,2)
-
else round(nn_rows*NewDENSITY,2)
-
end) endpoint_rows,
-
decode(HISTOGRAM,\'FREQUENCY\', round(ep_rows*100/nn_rows,2),
-
case when buckets>1 then
-
round(buckets*100/NUM_BUCKETS,2)
-
else round(100*NewDENSITY,2)
-
end) EP_PCT
-
FROM hist_data,v_nd;
执行的效果是这样的:
按20 buckets收集,是High Balance Histogram
ID
ENDPOINT_VALUE ENDPOINT_ROWS EP_PCT
------------------------------ --------------- -------
1 2.06 .21
3 200.00 20.00
6 100.00 10.00
10 500.00 50.00
25 2.06 .21
50 2.06 .21
75 2.06 .21
100 2.06 .21
Name
ENDPOINT_VALUE ENDPOINT_ROWS EP_PCT
------------------------------ --------------- -------
1 3.06 .31
10 500.00 50.00
23 3.06 .31
3 200.00 20.00
39 3.06 .31
53 3.06 .31
69 3.06 .31
84 3.06 .31
99 3.06 .31
阅读(3112) | 评论(0) | 转发(0) |