路虽弥,不行不至;事虽少,不做不成。
分类: Oracle
2007-09-06 15:16:26
After the header block, all controlfile blocks occur in pairs. Each logical block is represented by two physical blocks. This is necessary for the mechanism.
It is theoretically possible that a hot backup of a controlfile could contain a split block. Therefore all controlfile blocks other than the file header have a that can be compared when mounting a database and whenever a controlfile block is read. The block type is 0 for virgin controlfile blocks and 21 otherwise. The physical controlfile block number is used in place of an RDBA in the cache header, and a is used in place of an SCN to record when the block was last changed. An ORA-00227 error is returned if the header and tail do not match, or if the block checksum does not match the checksum recorded in the cache header (if any).
The controlfile contains several different types of records, each in its own record section of one or more logical blocks. Records may span block boundaries within their section. The fixed view V$CONTROLFILE_RECORD_SECTION lists the types of records stored in each record section, together with the size of the record type, and the number of record slots available and in use within that section. The underlying X$KCCRS structure includes the starting logical block number (RSLBN) for each section.
For the first record section of the controlfile, the database information entry section, this requirement is trivial, because the database information entry only takes about 210 bytes and is therefore guaranteed to always fit into a single controlfile block that can be written atomically. Therefore changes to the database entry can be implicitly committed as they are written, without any recoverability concerns.
Recoverability for changes to the other controlfile records sections is provided by maintaining all the information in duplicate. Each logical block is represented by two physical blocks. One contains the current information, and the other contains either an old copy of the information, or a pending version that is yet to be committed. To keep track of which physical copy of each logical block contains the current information, Oracle maintains a block version bitmap with the database information entry in the first record section of the controlfile.
To read information from the controlfile, a session must first read the block version bitmap to determine which physical block to read. Then if a change must be made to the logical block, the change is first written to the alternate physical block for that logical block, and then committed by atomically rewriting the block containing the block version bitmap with the bit representing that logical block flipped. When changes need to be made to multiple records in the same controlfile block, such as when updating the checkpoint SCN in all online datafiles, those changes are buffered and then written together. Note that each controlfile transaction requires at least 4 serial I/O operations against the controlfile, and possibly more if multiple blocks are affected, or if the controlfile is multiplexed and asynchronous I/O is not available. So controlfile transactions are potentially expensive in terms of I/O latency.
Whenever a controlfile transaction is committed, the controlfile sequence number is incremented. This number is recorded with the block version bitmap and database information entry in the first record section of the controlfile. It is used in the cache header of each controlfile block in place of an SCN to detect possible split blocks from hot backups. It is also used in queries that perform multiple controlfile reads to ensure that a consistent snapshot of the controlfile has been seen. If not, an ORA-00235 error is returned.
The controlfile transaction mechanism is not used for updates to the checkpoint heartbeat. Instead the size of the checkpoint progress record is overstated as half of the available space in a controlfile block, so that one physical block is allocated to the checkpoint progress record section per thread. Then, instead of using pairs of physical blocks to represent each logical block, each checkpoint progress record is maintained in its own physical block so that checkpoint heartbeat writes can be performed and committed atomically without affecting any other data.
All datafile blocks are written and read by the cache layer of the Oracle kernel (KCB) generally through the database buffer cache. The cache layer reads and maintains a 20-byte header and 4-byte tail on each data block, called the cache header and tail. The cache header is called the common block header in V$TYPE_SIZE and elsewhere. Controlfile blocks also have a cache header and tail, although not all the fields are used.
This is what the cache header and tail look like in a datablock dump. (This is taken from a blockdump of the segment header block of the SYSTEM rollback segment.)
buffer tsn: 0 rdba: 0x00400002 (1/2)
scn: 0x0000.00e9ffb4 seq: 0x01 flg: 0x04 tail: 0xffb40e01
frmt: 0x02 chkval: 0xb31e type: 0x0e=KTU UNDO HEADER W/UNLIMITED EXTENTS
database
block address4 bytes The tablespace relative database block address (RDBA). This is constructed from the tablespace relative file number, and the block number of the data block within that file. SCN 6 bytes The SCN at which the block was last changed. The low-order 4 bytes are called the SCN base, and the high-order 2 bytes are called the SCN wrap. sequence 1 byte A sequence number incremented for each change to a block at the same SCN. If the sequence number wraps, a new SCN must be allocated. The value 0xff is reserved. When present it indicates that the block has been marked as corrupt by Oracle.
flag 1 byte A combination of 1-bit flag values.
1 = virgin block
2 = last change to the block was for a cleanout operation
4 = checksum value is set
8 = temporary dataformat 1 byte The format of the cache header was changed for Oracle8. Under Oracle8 and 9, the value is always 2. Previously, it was 1. checksum 2 bytes An optional checksum of the block contents. When a block is written, the checksum is either cleared or set depending on the setting of the parameter. When a block is read, the checksum is verified if present and if the parameter is set to TRUE. Checksums are always calculated and checked for blocks in the SYSTEM tablespace. The checksum is the XOR of all the other 2-byte pairs in the block. Thus when a block with a checksum is checked, the XOR of all the 2-byte words in the block should be 0.
1 byte The most common block types is 6, which is used for all table, index and cluster data blocks. unused 4 bytes Unused space, possibly for backward or forward compatibility.
The tail is comprised of the low-order two bytes of the SCN base followed by the block type and the sequence number. The consistency of the header and tail is checked whenever a block is read. This detects most block corruptions, in particular split blocks from hot backups.
The physical order of the header fields is: block type, format, unused (2 bytes), RDBA, SCN, sequence, flag, checksum, unused (2 bytes). The following output from BBED (a low level block browser / editor utility) corresponds to the above extract from a blockdump of the segment header block of the SYSTEM rollback segment.
BBED> print kcbh
struct kcbh, 20 bytes @0
ub1 type_kcbh @0 0x0e
ub1 frmt_kcbh @1 0x02
ub1 spare1_kcbh @2 0x00
ub1 spare2_kcbh @3 0x00
ub4 rdba_kcbh @4 0x00400002
ub4 bas_kcbh @8 0x00e9ffb4
ub2 wrp_kcbh @12 0x0000
ub1 seq_kcbh @14 0x01
ub1 flg_kcbh @15 0x04 (KCBHFCKV)
ub2 chkval_kcbh @16 0xb31e
ub2 spare3_kcbh @18 0x0000
BBED> print tailchk
ub4 tailchk @2044 0xffb40e01