分类: Oracle
2011-03-28 18:04:10
本文讨论内部LOB段(CLOB,BLOB,NCLOB)如何选择正确的存储参数。
将会讨论如下几点
CLOB,NCLOB用来存储超过4000Byte的文本大字段,如字符文本;BLOB用来存储二进制字段,如图片,Word文档等。
在存储方面,LOB字段可以分为2部分,Lob Segment和Lob Index Segment。 在表的数据段中只存储一个LOB Locator(当Disable storage in Row或者Enable storage in row的时候lob字段大于4000byte),LOB Locator指向Lob Index 中的Lob Segment 的Chunks,再通过Lob Index Entry访问到具体的Lob Segment Chunk.
SQL> create table t ( id int primary key,txt clob);
Table created.
SQL> col SEGMENT_NAME format a40
SQL> select segment_name,segment_type from user_segments;
SEGMENT_NAME SEGMENT_TYPE
---------------------------------------- ------------------
T TABLE
SYS_IL0000007665C00002$$ LOBINDEX
SYS_C002647 INDEX
SYS_LOB0000007665C00002$$ LOBSEGMENT
创建带有clob字段的表,段SYS_C002647 为维护主键的索引段; SYS_LOB0000007665C00002$$ 为系统生成的Lob数据段,SYS_IL0000007665C00002$$为系统生成的Lob索引段。
LOB数据段和LOB索引段具有相同的存储参数。
SQL> set pagesize 999 long 2000
SQL> select dbms_metadata.get_ddl('TABLE','T') from dual;
DBMS_METADATA.GET_DDL('TABLE','T')
---------------------------------------------------------------
CREATE TABLE "ORACLE"."T"
( "ID" NUMBER(*,0),
"TXT" CLOB,
PRIMARY KEY ("ID")
USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE(INITIAL 131072 NEXT 131072 MINEXTENTS 1 MAXEXTENTS 2147483645 CTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) ABLESPACE "TOOLS" ENABLE
) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
STORAGE(INITIAL 131072 NEXT 131072 MINEXTENTS 1 MAXEXTENTS 2147483645 CTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)TABLESPACE "TOOLS"
LOB ("TXT") STORE AS (TABLESPACE "TOOLS" ENABLE STORAGE IN ROW CHUNK 8192 PCTVERSION 10 NOCACHE STORAGE(INITIAL 131072 NEXT 131072 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT))
如上为LOB ("TXT") 指定的存储参数同样适用于Lob Index Segment。
TABLESPACE "TOOLS" 为Lob data segment 和lob index segment指定了存储表空间,因此可以为LOB段指定与表段不同的表空间。但是Lob data segment和lob index segment必须存储在相同的表空间里。
通常为了管理的方便性和性能原因,通常将LOB段放在不同于表段的表空间里面。
IN ROW Clause
默认是" ENABLE STORAGE IN ROW“,表示当lob 数据小余4000byte的时候(actual maximum in-line LOB is 3964 bytes.)将lob数据和其他字段,一起存储在表段里面;当lob数据大于4000byte的时候,将lob数据存储在lob data segment中,仅仅在表段里面存储指向Lob Index的Lob Locator.
当指定DISABLE STOREGE IN ROW的时候,不管LOB 数据大小,都将lob数据存储在lob data segment中,仅仅在表段里面存储指向Lob Index的Lob Locator.
当LOB数据不是很大,例如Descript CLOB,通常大部分Descript都较小,仅仅有少数Descript数据比较大,且该字段访问相对较频繁,因此可以将该字段设置为" ENABLE STORAGE IN ROW,存储在表段里,可以跟data block一起cache在db buffer cache中。
对于大的LOB数据和设置为DISABLE STOREGE IN ROW的LOB数据,都存在lob data segment中。对存储在lob data segment中lob 数据的访问,都需要先根据lob locator访问lob index,再访问lob data segment 因此,引起额外的消耗在lob index上的逻辑IO.且默认情况下,lob data segment为NOCACHE选项,表示lob data segment将不会被缓存在db buffer cache中,每次访问lob data segment都将是物理I/O,使用direct patch read。
lob index segment同in row lob一样,会被缓存到buffer cache中。Cache/NOCache对lob index无效。Cache/NOCache一会再讨论。
如下,比较enable/disable storage in row的性能。
SQL> create table t(id int primary key,in_row clob,out_row clob)
2 lob(in_row) store as (enable storage in row)
3 lob(out_row) store as ( disable storage in row);
Table created.
SQL> insert into t select rownum,owner||' ' || object_name||' ' ||object_type ||' '|| status ,owner||' ' || object_name||' ' ||object_type ||' '|| status from dba_objects where rownum<100;
99 rows created.
SQL> commit;
Commit complete.
SQL> alter session set events '10046 trace name context forever,level 8';
Session altered.
SQL> declare
l_cnt number;
l_data varchar2(32765);
begin
select count(*) into l_cnt from t;
for i in 1..l_cnt
loop
select in_row into l_data from t where id=i;
select out_row into l_data from t where id=i;
end loop;
end;
/
PL/SQL procedure successfully completed.
************************************
SELECT IN_ROW FROM T WHERE ID=:B1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- -
Parse 1 0.00 0.00 0 0 0 0
Execute 99 0.00 0.00 0 0 0 0
Fetch 99 0.02 0.02 0 396 0 99
------- ------ -------- ---------- ---------- -------
total 199 0.02 0.03 0 396 0 99
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 31 (recursive depth: 1)
Rows Row Source Operation
------- ---------------------------------------------------
99 TABLE ACCESS BY INDEX ROWID T
99 INDEX UNIQUE SCAN SYS_C002648 (object id 7674
**************************************
SELECT OUT_ROW FROM T WHERE ID=:B1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 99 0.01 0.01 0 0 0 0
Fetch 99 0.05 0.03 99 495 0 99
------- ------ -------- ---------- ---------- ---------- --
total 199 0.06 0.04 99 495 0 99
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 31 (recursive depth: 1)
Rows Row Source Operation
------- ---------------------------------------------------
99 TABLE ACCESS BY INDEX ROWID T
99 INDEX UNIQUE SCAN SYS_C002648 (object id 7674)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ----------
direct path read (lob) 99 0.00 0.00
************************************
如上trace结果, enable in row比disable in row使用更少的逻辑I/O和CPU时间,而且disable in row因为默认为NOCACHE选项(lob data不会被缓存),因此每次访问lob数据都要等待direct path read (lob) 事件,增加了物理I/O。同理,在insert 和update lob数据的时候,disable in row性能都要差一些,且默认情况(NOCACHE)都会导致物理I/O,等待 direct path read (lob) 事件或者direct path write(lob)事件。
因此,当lob data相对小与4000byte的时候,使用Enable storage in row,从性能上来说(I/O),是相对较好的选择。
Chunk Size
当disable storage in row或者enable storage in row的时候,lob数据存储在LOB data segment的Chunk里面(CHUNK does NOT affect in-line LOBS); CHUNK大小必须是数据库block 得整数倍,如8k,16k,32k,Etc.Oracle为每条非空lob 记录分配至少一个Chunk,且该Chunk为该条lob 记录专用,不能被其他的LOB记录共享使用,lob data扩展的时候,以Chunk Size为单位分配空间。
表中记录lob index locator指向lob index entries(每个entry的结构大致为Lob index locator, Chunk ID,Chunk Location),每个entry指向一个Lob Chunk. 也就是说,对于一个LOB 记录,它记录在Lob Index里面的Entry数目等于它的Chunk数目。
必须根据实际情况,选择合适的Chunk Size,太大了会浪费空间。如果设置Chunk Size 为32K,而大部分lob纪录大小为7K,则每条lob记录将会被分配32K的空间,且剩余的32-7=25K的空间将会被浪费,不能被其他lob 记录使用。如果设置Chunk Size 为8K,则每个Lob仅剩余1K的空间,大大降低了空间浪费。
如果太小,则会导致每条Lob纪录有很多Chunks,如果一个lob 4M,Chunk Size 8k,则一个lob分配512个Chunk,则在Lob Index中就有512个Entry,当有很多4M这样的LOB时候,Lob Index 就会有很多Entry,当通过Lob Index访问Lob data Segment的时候就会多出额外的逻辑I/O。
The ultimate goal is to use a CHUNK size that minimizes your ”waste”,but also efficiently stores your data.
PCTVERSION
这个参数关系到lob数据的一致读。
LOB Index Segment产生UNDO形式和其他段类似。对于In Row存储的lob数据,它产生UNDO的形式,也和Data block产生UNDO形式一样。
但对于Out Row存储的LOB数据,其产生UNDO的形式则有变化。Out Row存储的LOB数据产生的UNDO将不会被存储在Undo segment中,而是存储在该LOB段中。当LOB段被更新的时候,Oracle会分配额外的新Chunk保留更新的数据,而不是在old image chunk上直接修改,回滚和一致读的时候就可以使用到old image chunk。
观察对LOB数据的一致读
SQL> set serveroutput on
SQL> declare
2 l_clob clob;
3 cursor c is select out_row from t where id=1;
4 begin
5 open c;
6 update t set out_row='hello,world' where id=1;
7 commit;
8 fetch c into l_clob;
9 dbms_output.put_line('out_row clob='||l_clob);
10 close c;
11 end;
12 /
out_row clob=SYS ACCESS$ TABLE VALID
PL/SQL procedure successfully completed.
SQL> select * from t where id=1;
ID OUT_ROW
1 hello,world
既然,Out Row的LOB数据的UNDO信息放在LOB段中,那什么决定了UNDO信息的保留时间哪?便有PCTVERSION决定,Oracle保留大概LOB段的百分之PCTVERSION的空间来保留LOB UNDO信息,超过这个设置的LOB UNDO信息将会被接下来的LOB Data覆盖。
1. Before-images of Lobdata Segment, related to the lobdata segment and required to rollback a transaction, are created in the segment itself ;if there are nospace limitations (MAXEXTENTS, no more space in tablespace).
2. Before images, which are no longer necessary, are gradually overwritten. However, Oracle keeps PCTVERSION percent of the entire storage available for older before images.
如下metalink上的解释:
"STORE AS ( PCTVERSION n )"
PCTVERSION can be changed after creation using:
ALTER TABLE tabname MODIFY LOB (lobname) ( PCTVERSION n );
PCTVERSION affects the reclamation of old copies of LOB data. This affects the ability to perform consistent read. If a session is attempting to use an OLD version of a LOB and that version gets overwritten (because PCTVERSION is too small) then the user will typically see the errors: ORA-01555: snapshot too old: rollback segment number with name "" too small
PCTVERSION can prevent OLD pages being used and force the segment to extend instead. Do not expect PCTVERSION to be an exact percentage of space as there is an internal fudge factor applied.
Oracle9i也引入了新的参数RETENTION来保证LOB一致读,它的大小和意义由数据库初始化参数UNDO_RETENTION设定,因此在使用回滚段自动管理的时候,可以使用该参数。UNDO信息仍然保留在LOB段中,只不过增加了类似回滚段自动管理的Chunk 时间戳,设置什么时候可以覆盖该Chunk.
设置合适的PCTVERSION和RETENTION来保证LOB段频繁更新时候,读取LOB不会出现ORA-1555 Snapshot too old 错误。
Cache/NoCache
该选项只针对Out Row Lob 段有影响。IN ROW LOB会随着Data Block而被缓存到DB Buffer Cache中。lob index segment同in row lob一样,会被缓存到buffer cache中。
In-line LOBS are not affected by the CACHE option as they reside in the actual table block (which is typically accessed via the buffer cache any way).
Cache的选项有CACHE / CACHE READS / NOCACHE
ALTER TABLE tabname MODIFY LOB (lobname) ( CACHE );
ALTER TABLE tabname MODIFY LOB (lobname) ( CACHE READS);
ALTER TABLE tabname MODIFY LOB (lobname) ( NOCACHE );
默认为NOCACHE ,表示对读写LOB SEGMENT的时候采用 direct reads 和writes. LOB BLOCK不会被缓存到buffer cache 中并且每次读写都会发生物理I/O,进程就会等待"direct path read" "direct path write" 事件,并且每次I/O都会读写很多BLOCK.( blocks can be read/written at a time (provided the caller is using a large enough buffer size))
设置为CACHE,表示会将LOB SEGMENT 缓存在buffer cache中。从磁盘上读取将会等待db file sequential read" 事件,并且LOB BLOCK会放在most-recently-used end of the LRU chain. 这和表的CACHE选项不同。
The CACHE options for LOB columns is different to the CACHE option for tables as CACHE_SIZE_THRESHOLD does not limit the size of LOB read into the buffer cache. This means that extreme caution is required otherwise the read of a long LOB can effectively flush the cache.
Cache选项也会影响OUT ROW LOB段产生的REDO量。NOCACHE blocks 直接从磁盘上读写,因此全部的BLOCK都会作为redo写到redo log buffer. 如果设置了CACHE,则只有被修改的部分才会被写到redo log buffer 中。
Eg: In the extreme case 'DISABLE STORAGE IN ROW NOCACHE CHUNK 32K' would write redo for the whole 32K even if the LOB was only 5 characters long. CACHE would write a redo record describing the 5 byte change (taking about 100-200 bytes).
通常如果OUT ROW LOB不是很大,或者经常访问,可以缓存到Buffer Cache中,但由于其缓存机制可能会影响到其他BLOCK的正常访问,因此可以结合BUFFER POOL,使用Keep Pool或者Recycle Pool缓存OUT ROW LOB,避免其他经常被访问的数据块因为Buffer Cache缓存了很多大的OUT ROW LOB而被刷新到磁盘(Buffer Free Wait).
且OUT ROW LOB被缓存后,将会有DBWR在后台执行写操作,避免了前台进程写LOB导致的磁盘I/O等待。
LOGGNG/NOLOGGING
该参数只有在NOCACHE环境下且对OUT ROW LOB才有效(updates to in-line LOBS are still logged as normal).
"STORE AS ( NOCACHE LOGGING )" or
"STORE AS ( NOCACHE NOLOGGING )"
This option can be changed after creation but the LOGGING / NOLOGGING attribute must be prefixed by the NOCACHE option. The CACHE option implicitly enables LOGGING. The default for this option is LOGGING.
如果OUT ROW LOB设置为NOCACHE NOLOGGING ,则对LOB DATA SEGMENT做dml操作将不产生redo logs.
当对LOB进行批量操作的时候,如用SQL*LOADER批量装载LOB则可以使用NOLOGGING加快操作。
NOLOGGING of the LOB segment means that if you have to recover the database then sections of the LOB segment will be marked as corrupt during recovery.
If your application makes frequent small changes to NOLOGGING LOBs, then it may well be that the controlfile transactions required to update the unrecoverable SCN are actually taking a lot longer than it would take to log the redo for the LOB changes if the LOBs were changed to LOGGING. However, there is a better alternative -- namely, setting event 10359.
Event 10359 disables all updates of unrecoverable SCNs. By setting this event you can retain the performance benefit of not logging LOB changes without sustaining the performance penalty of repeated foreground controlfile transactions. The only disadvantage is that RMAN will no longer be able to report which datafiles have recently been affected by NOLOGGING operations, and so you will have to adopt a backup strategy that compensates for that.
其他Storage选项
LOB ("TXT") STORE AS (
TABLESPACE "TOOLS" ENABLE STORAGE IN ROW CHUNK 8192 PCTVERSION 10 NOCACHE
STORAGE(INITIAL 131072 NEXT 131072 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT))
语法中也包括了STORAGE,表示这些存储参数都会对LOB Data Segment和LOB Index Segment产生影响,其作用和其他类型段一样。
如BUFFER_POOL,可以结合BUFFER_POOL和CACHE选项来合适的缓存LOB数据。
如FREELISTS,FREELIST GROUPS 如果并行用户更新频繁,则可以设置多个FREELISTS减少竞争。
注意LOB Data Segment和LOB Index Segment的存储参数是一样的。