创建一张表,表中ID字段数据倾斜:
create table skew
(
id number(20) not null ,
begin varchar2(20),
end varchar2(20),
a1 varchar2(20),
a2 varchar2(20),
year varchar2(4),
mixcode char(1)
);
表中数据:
id distinct 700
year distinct 2
id:
501 30%
414 19%
2 15%
123 7%
其他 29%
year约为1半1半
(id,year) 750
declare
begin_rdm number(5);
end_rdm number(5);
b_val number(20);
e_val number(20);
id number(20);
a1 varchar2(20);
a2 varchar2(20);
year varchar2(4);
mixcode char(1);
v_sql varchar2(2000);
begin
b_val:=0;
for i in 1..1000000 loop
select trunc(dbms_random.value(0,500)) into begin_rdm from dual;
if begin_rdm !=14 then
b_val:=b_val+1;
end if;
select trunc(dbms_random.value(0,6)) into end_rdm from dual;
e_val:=b_val+end_rdm;
select dbms_random.string('U', 1) into mixcode from dual;
select dbms_random.string('u', 1)||lpad(to_char(trunc(dbms_random.value(1, 1000))),5,0)||dbms_random.string('l', 3) into a1 from dual;
select dbms_random.string('x', 10) into a2 from dual;
select trunc(dbms_random.value(1,101)) into id from dual;
if id<31 then
id:=501;
elsif id>30 and id<50 then
id:=414;
elsif id>49 and id<65 then
id:=2;
elsif id>64 and id<72 then
id:=123;
else
select trunc(dbms_random.value(1,701)) into id from dual;
end if;
if mod(id,2)=0 then
year:=2002;
elsif mod(id,7)=0 then
select trunc(dbms_random.value(1,3)) into year from dual;
year:=year+2000;
else
year:=2001;
end if;
v_sql:='insert into skew values('||id||',lpad(to_char('||b_val||'),9,0),lpad(to_char('||e_val||'),9,0),'''||a1||''','''||a2||''','||year||','''||mixcode||''')';
execute immediate v_sql;
if mod(i,10000)=0 then
commit;
end if;
end loop;
end;
/
在id字段上建立一个普通索引:
create index ldy.IDX_skew_id on ldy.skew(id);
对ID字段采用不同bucket数量进行分析,1、20、254:
exec dbms_stats.GATHER_TABLE_STATS(ownname=>'LDY',tabname=>'SKEW',estimate_percent=>100,cascade=>true,method_opt=>'FOR COLUMNS id SIZE 1');
exec dbms_stats.GATHER_TABLE_STATS(ownname=>'LDY',tabname=>'SKEW',estimate_percent=>100,cascade=>true,method_opt=>'FOR COLUMNS id SIZE 20');
exec dbms_stats.GATHER_TABLE_STATS(ownname=>'LDY',tabname=>'SKEW',estimate_percent=>100,cascade=>true,method_opt=>'FOR COLUMNS id SIZE 254');
每次进行新的分析之前需要删除原来的分析信息,并刷新共享池
exec dbms_stats.DELETE_TABLE_STATS(ownname=>'LDY',tabname=>'SKEW',cascade_columns=>true,cascade_indexes=>true);
alter system flush shared_pool;
分别分析以下2个条件的sql语句:
select * from skew where id=501;
select * from skew where id=601;
bucket为1的柱状图分析:
ID=501时,选择了299676行,大约占总数的30%,该执行计划走了索引范围扫描,一致读消耗了47901。
执行计划
----------------------------------------------------------
Plan hash value: 1401535606
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1429 | 72879 | 452 (0)| 00:00:06 |
| 1 | TABLE ACCESS BY INDEX ROWID| SKEW | 1429 | 72879 | 452 (0)| 00:00:06 |
|* 2 | INDEX RANGE SCAN | IDX_SKEW_ID | 1429 | | 5 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
ID=601时,选择了431行,大约占总数的0.04%,该执行计划走了索引范围扫描,一致读消耗了456。
执行计划
----------------------------------------------------------
Plan hash value: 1401535606
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1429 | 72879 | 452 (0)| 00:00:06 |
| 1 | TABLE ACCESS BY INDEX ROWID| SKEW | 1429 | 72879 | 452 (0)| 00:00:06 |
|* 2 | INDEX RANGE SCAN | IDX_SKEW_ID | 1429 | | 5 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
bucket为20的柱状图分析:
ID=501时,选择了299676行,大约占总数的30%,该执行计划走了全表扫描,一致读消耗了28895,比之前走索引消耗的要少得多。
执行计划
----------------------------------------------------------
Plan hash value: 246648590
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 300K| 14M| 1999 (1)| 00:00:24 |
|* 1 | TABLE ACCESS FULL| SKEW | 300K| 14M| 1999 (1)| 00:00:24 |
--------------------------------------------------------------------------
ID=601时,选择了431行,大约占总数的0.04%,该执行计划也走了全表扫描,一致读消耗了9116,比之前走索引的消耗要大。
执行计划
----------------------------------------------------------
Plan hash value: 246648590
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14312 | 712K| 1997 (1)| 00:00:24 |
|* 1 | TABLE ACCESS FULL| SKEW | 14312 | 712K| 1997 (1)| 00:00:24 |
--------------------------------------------------------------------------
bucket为254的柱状图分析:
ID=501时,选择了299676行,大约占总数的30%,该执行计划走了全表扫描,一致读消耗了28895。
执行计划
----------------------------------------------------------
Plan hash value: 246648590
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 303K| 14M| 1999 (1)| 00:00:24 |
|* 1 | TABLE ACCESS FULL| SKEW | 303K| 14M| 1999 (1)| 00:00:24 |
--------------------------------------------------------------------------
ID=601时,选择了431行,大约占总数的0.04%,该执行计划走了索引范围扫描,一致读消耗了456。
执行计划
----------------------------------------------------------
Plan hash value: 1401535606
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 416 | 21216 | 133 (0)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID| SKEW | 416 | 21216 | 133 (0)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | IDX_SKEW_ID | 416 | | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
结论:
bucket为1时,相当于一个普通分析,没有柱状图信息,执行计划按绝大多数ID字段内容来选择走索引。本例中ID=501,明显不适合走索引,额外开销了大量的一致读。
bucket为20时,分为了20个柱状,不够精细,每个范围内优化器判断变得都不适合走索引,所以当ID=602,走了全表扫描,额外开销了不必要的一致读。
bucket为254时,柱状图已经相当精细了,所以可以正确地判断ID为不同值时需要选择不同的执行计划,ID=501,全表扫描,ID=601,走索引扫描。
从上面的对比可以看出,bucket为254分析之后,选择的执行计划最为正确,开销最小。
由此可见,一个倾斜的列的柱状图分析对执行计划有较明显的影响。
对于不同值较多的倾斜字段,需要尽可能地精确分析,bucket的范围是1-254。
对于不同值NDV(the number of distinct values)的数量大于254的,bucket设置为254,
小于254的,按NDV的数量设置。