Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1284254
  • 博文数量: 127
  • 博客积分: 2286
  • 博客等级: 大尉
  • 技术积分: 1943
  • 用 户 组: 普通用户
  • 注册时间: 2010-06-10 10:37
文章分类

全部博文(127)

文章存档

2018年(1)

2015年(2)

2014年(1)

2013年(30)

2012年(88)

2011年(5)

分类: Oracle

2013-05-09 13:35:10

关于Bew Density, 我的脚本最早没有用newdensity,发现计算出的行数与实际偏差很大,也与10053 trace看到的不一样。
研究发现Oracle最早在10.2.0.4里引入了newdensity,但是缺省没有开启,可以通过_optimizer_enable_density_improvements=true来开启
从11.1.0.6开始缺省开启了,这个认为比较更新,因为newdensity真实非常接近,偏差非常大
New Density针对的是high balanced histogram中的none popular value的估算,它没有存在数据字典中,运行的时候动态计算,
所以我的脚本里也只有动态计算了

点击(此处)折叠或打开

  1. rem
  2. rem Show a histogram of data distribution in a column
  3. rem
  4. rem Kerlion He, Apr 2013 :kerlion.blog.chinaunix.net
  5. rem
  6. rem Modifid from the script of Guy Harrison
  7. rem Now:
  8. rem 1. It supports both FREQUENCY and Height Balanced Histogram
  9. rem 2. It detect datatype automatically
  10. rem 3. The output is more clear
  11. rem Tested only in 11gR2

  12. set pagesize 200
  13. set lines 80
  14. set verify off

  15. CREATE OR REPLACE FUNCTION epv2str (p_number IN NUMBER)
  16.     RETURN VARCHAR2
  17. AS
  18. --more accurate than Tom\'s hexstr from asktom.oracle.com
  19.     v_str varchar2(1000);
  20.     ch     varchar2(1000);
  21.     i binary_integer;
  22.     j binary_integer;
  23.     v_len binary_integer;
  24. BEGIN
  25.     v_str := TO_CHAR (p_number, RPAD (\'fm\', 50, \'X\'));
  26.     v_str := substr(v_str,1,14);
  27.     v_len := length(v_str);

  28.     i := 1; ch := \'xx\';
  29.     while (i+2<=v_len) loop
  30.       ch := substr(v_str,i,2);
  31.       exit when (ch = \'FF\' or ch = \'00\' );
  32.       i := i+2;
  33.     end loop;
  34.     v_len := i;
  35.     ch := substr(v_str,v_len,2);
  36.     i := to_number(ch,\'XX\');
  37.     ch := substr(v_str,v_len-2,2);
  38.     v_str := substr(v_str,1,v_len-3);

  39.     j := to_number(ch,\'XX\');
  40.     j := j + round(i/255);
  41.     v_str := v_str||to_char(j,\'fmXX\');
  42.     RETURN (utl_raw.CAST_TO_VARCHAR2(hextoraw(v_str)));
  43. END;
  44. /


  45. col endpoint_value for a30 jus right
  46. COL EP_PCT for 999.99
  47. col endpoint_rows for 999,999,999.99
  48. WITH hist_data AS (
  49. SELECT
  50.    case
  51.       when data_type like \'%CHAR%\' then
  52.          epv2str(endpoint_value)
  53.          else to_char(endpoint_value)
  54.    end epv,
  55.    endpoint_actual_value epva,
  56.    Density,
  57.    NUM_DISTINCT as NDV,
  58.    NUM_ROWS - NUM_NULLS nn_rows,
  59.    NUM_BUCKETS,
  60.    HISTOGRAM,
  61.    endpoint_number epn,
  62.    Least(endpoint_number, NUM_BUCKETS- 0.5) -
  63.    LAG(endpoint_number,1,0) OVER
  64.       (partition by column_name ORDER BY endpoint_number)
  65.    buckets,
  66.    endpoint_number -
  67.    LAG(endpoint_number,1,0) OVER
  68.       (partition by column_name ORDER BY endpoint_value)
  69.    ep_rows
  70. FROM dba_tab_histograms
  71. JOIN DBA_TAB_COLUMNS USING (owner, table_name,column_name)
  72. JOIN dba_tables USING (owner, table_name)
  73. WHERE owner = \'&owner\'
  74.       AND table_name = \'&table_name\'
  75.       AND column_name = \'&colmn_name\'),
  76. v_nd as (
  77. SELECT
  78.    (MAX(NUM_BUCKETS)-sum(bkt))/
  79.       MAX(NUM_BUCKETS)/
  80.       (MAX(ndv)-sum(val)) NewDensity
  81. FROM (select
  82.         ndv,
  83.         NUM_BUCKETS,
  84.            case when buckets>1 then buckets
  85.            else 0
  86.         end bkt,
  87.            case when buckets>1 then 1
  88.            else 0
  89.         end val
  90.       from hist_data))
  91. SELECT
  92.    nvl(epva,epv) endpoint_value ,
  93.    decode(HISTOGRAM,\'FREQUENCY\',ep_rows,
  94.       case when buckets>1 then
  95.                 round(nn_rows*buckets/NUM_BUCKETS,2)
  96.            else round(nn_rows*NewDENSITY,2)
  97.       end) endpoint_rows,
  98.    decode(HISTOGRAM,\'FREQUENCY\', round(ep_rows*100/nn_rows,2),
  99.       case when buckets>1 then
  100.                 round(buckets*100/NUM_BUCKETS,2)
  101.            else round(100*NewDENSITY,2)
  102.       end) EP_PCT
  103. FROM hist_data,v_nd;

执行的效果是这样的:
  
按20 buckets收集,是High Balance Histogram
ID
                ENDPOINT_VALUE   ENDPOINT_ROWS  EP_PCT
------------------------------ --------------- -------
1                                         2.06     .21
3                                       200.00   20.00
6                                       100.00   10.00
10                                      500.00   50.00
25                                        2.06     .21
50                                        2.06     .21
75                                        2.06     .21
100                                       2.06     .21
Name
                ENDPOINT_VALUE   ENDPOINT_ROWS  EP_PCT
------------------------------ --------------- -------
1                                         3.06     .31
10                                      500.00   50.00
23                                        3.06     .31
3                                       200.00   20.00
39                                        3.06     .31
53                                        3.06     .31
69                                        3.06     .31
84                                        3.06     .31
99                                        3.06     .31

阅读(3012) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~