批量加载数据索引处理方式
在大批量导数入数据时候,如果字段上有索引,那么每次insert数据都要进行索引维护操作,这种方式
使用的是随机IO的方式.如果对像的索引在数据导入后再增加,那么重建索引索引的时候使用的是批量处理
方式,相比之前的方式,后一种的效率会非常高.
我们使用oracle 11G来验证一下.
DONGDONGTANG@tbk >create table t1 (a int, b varchar2(100));
Table created.
DONGDONGTANG@tbk > create index ix_a on t1(a);
Index created.
DONGDONGTANG@tbk > create index ix_b on t1(b);
Index created.
打开另一个会话,使用Tanel的会话活动跟踪器进行跟踪
SYS@tbk >@snapper.sql stat 120 1 user=DONGDONGTANG;
在原来的会话中执行insert 100万行数据
DONGDONGTANG@tbk >insert into t1
select level, substr('abcdefghbiklkmtnok',dbms_random.value(1,20),dbms_random.value(1,20)) ||trunc(dbms_random.value(1,1000))
from dual
connect by level<=1000000;
1000000 rows created.
DONGDONGTANG@tbk >commit;
Commit complete.
在会话活动跟踪器的窗口输出,我们观察 consistent gets指标
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH , NUM_WAITS, WAITS/SEC, AVERAGES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
509, DONGDONGTANG , STAT, consistent gets , 26329, 219.02, , , , , 88.65 per execution
当使用先创建表的时候,导数据,再创建索引的方式,我们来看看结果如何
DONGDONGTANG@tbk >create table t1 (a int, b varchar2(100));
Table created.
DONGDONGTANG@tbk >insert into t1
select level, substr('abcdefghbiklkmtnok',dbms_random.value(1,20),dbms_random.value(1,20)) ||trunc(dbms_random.value(1,1000))
from dual
connect by level<=1000000; 2 3 4
1000000 rows created.
DONGDONGTANG@tbk >create index ix_a on t1(a);
Index created.
DONGDONGTANG@tbk > create index ix_b on t1(b);
Index created.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH , NUM_WAITS, WAITS/SEC, AVERAGES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
509, DONGDONGTANG , STAT, consistent gets , 12360, 102.8, , , , , 21.16 per execution
使用后一种方式,我们看到只进行了12360次consistent gets,相比前一种方式,减少将近一半左右. 所以我们在批量加载或导入数据的方式
尽量使用后一种方式来处理.这样会大大加快数据处理时间,正所谓磨刀不误砍柴功.