首页　| 　博文目录　| 　关于我

博客访问： 153072
博文数量： 19
博客积分： 1425
博客等级：上尉
技术积分： 425
用户组：普通用户
注册时间： 2007-08-20 16:48

文章分类

全部博文（19）

Solaris+Linux（12）
Weblogic（0）
Oracle（7）
未分配的博文（0）

文章存档

2009年（1）

2008年（18）

我的朋友

wildlily

由SCN所想到的

关于SCN的含义，Oracle官方文档里是这样描述的：

system change number (SCN)

A stamp that defines a committed version of a database at a point in time. Oracle assigns every committed transaction a unique SCN

我想我们需要了解的问题是：为什么要引入SCN这个东西，它是用来干嘛的？

回答是：它是用来维护数据库一致性的。正因为此它才会被用于数据的恢复当中。

问题又来了：

l SCN是怎么获取的，最大支持多大。

l 它在实现数据库的一致性过程中啥时用到它？

l 它在数据恢复的过程中担当什么样的角色和作用？

一、SCN是怎么获取的，最大支持多大

获得当前SCN的几种方式:

1.在Oracle9i中，可以使用dbms_flashback.get_system_change_number来获得

例如:

SQL> select dbms_flashback.get_system_change_number from dual;

GET_SYSTEM_CHANGE_NUMBER

------------------------

2982184

2.在Oracle9i之前

可以通过查询x$ktuxe获得

X$KTUXE-------------[K]ernel [T]ransaction [U]ndo Transa[x]tion [E]ntry (table)

SQL> select max(ktuxescnw*power(2,32)+ktuxescnb) from x$ktuxe;

MAX(KTUXESCNW*POWER(2,32)+KTUXESCNB)

------------------------------------

2980613

最大支持多大？

1.oracle 为了防止scn的异常增长所以限制每一秒钟允许最多产生 256*256/4 个scn

2.oracle 内部使用了一个4G范围的数据来表示01/01/1988 00:00:00 ~ 08/18/2121 06:28:15 这段时间.它的算法简单,说来就是每个月都是用的31天来表示时间,每增加1秒,这个数值就增加1. 可以从redo file dump ,control file dump,datafile head dump 观察到这个值.

当前系统可能的最大scn 就是上面两个值的乘集. 可用下面的脚本获得系统当前scn可能的最大值:

select

to_char(

(

to_char(sysdate,'YYYY')-1988

)*12+

to_char(sysdate,'mm')-1

)*31+to_char(sysdate,'dd')-1

)*24+to_char(sysdate,'hh24')

)*60+to_char(sysdate,'mi')

)*60+to_char(sysdate,'ss')

) * to_number('ffff','XXXXXXXX')/4,'XXXXXXXXXXXXXXXX') scn_hex

from dual

也就是说将来最大值是将08/18/2121 06:28:15换成秒数，再入乘以256*256/4所得的值。有人问那就是可以生长到2121年8月18号，那19号怎么办呢？那就100多年以后再说吧，当时计算机可能都N位的CPU了，也可能没计算机这个东西了，谁想得到呢，总之不用担心就是了。

摘了一篇eygle.com的文章

作为对于闪回操作(flashback)的一个增强，Oracle10g提供了函数对于SCN和时间戳进行相互转换。

首先通过dbms_flashback.get_system_change_number 可以获得系统当前的SCN值：

SQL> col scn for 9999999999999

SQL> select dbms_flashback.get_system_change_number scn from dual;

SCN

--------------

8908390522972

通过scn_to_timestamp函数可以将SCN转换为时间戳:

SQL> select scn_to_timestamp(8908390522972) scn from dual;

SCN

---------------------------------------------------------------------------

05-JAN-07 10.56.30.000000000 AM

再通过timestamp_to_scn可以将时间戳转换为SCN:

SQL> select timestamp_to_scn(scn_to_timestamp(8908390522972)) scn from dual;

SCN

--------------

8908390522972

通过这两个函数，最终Oracle将SCN和时间的关系建立起来，在Oracle10g之前，是没有办法通过函数转换得到SCN和时间的对应关系的，一般可以通过logmnr分析日志获得。

但是这种转换要依赖于数据库内部的数据记录，对于久远的SCN则不能转换，请看以下举例:

SQL> select min(FIRST_CHANGE#) scn,max(FIRST_CHANGE#) scn from v$archived_log;

SCN SCN

------------------ ------------------

8907349093953 8908393582271

SQL> select scn_to_timestamp(8907349093953) scn from dual;

select scn_to_timestamp(8907349093953) scn from dual

ERROR at line 1:

ORA-08181: specified number is not a valid system change number

ORA-06512: at "SYS.SCN_TO_TIMESTAMP", line 1

ORA-06512: at line 1

SQL> select scn_to_timestamp(8908393582271) scn from dual;

SCN

---------------------------------------------------------------------------

05-JAN-07 11.45.50.000000000 AM

二、要了解这个问题，就得先搞清数据库在发生DML操作时，发生了什么。

以Update操作为例

当发生Update操作时，redo和undo都会生成。所生成的undo信息足以使Update操作“没发生过”。而生成的redo信息则足以使这个操作“再次发生”。

在commit之前，发生了：

n 已经在SGA中生成了Undo块

n 已经在SGA中生成了已修改的数据块（脏块 dirty block）

n 已经在SGA中生成了对应前两项的缓存redo

n 这时可能前三项的某些数据已经刷新输出（flush）到磁盘上【注1】

n 【注2】

n 已经得到了所需的全部锁

执行commit之后，发生了：

n 为事务生成一个SCN

n LGWR将redo log buffer中的缓存写入磁盘，并将SCN记录到在线重做日志文件中。这一步后即可返回commited success信息。事务条目会从V$TRANSACTION中删除，这说明我们已经提交。【注3】

n V$LOCK中的我们会话所持有的锁都将被释放，等待用这些锁的人都会被唤醒。

n 如果事务的某些脏块还在缓冲区中，则会对这些块实施块清除（Block cleanout）——将块首部的与锁相关信息清理掉。

【注1】因为redo log buffer中的缓存会按照一定规则flush到磁盘（写到redo log file中），满足以下情况之一就flush。

每3秒flush一次（heart beat）

redo log buffer满1/3时或者包含了1MB的缓冲数据时

发生提交（commit）

LGWR writes one contiguous portion of the buffer to disk. LGWR writes:

A commit record when a user process commits a transaction
Redo log buffers
- Every three seconds
- When the redo log buffer is one-third full
- When a DBWn process writes modified buffers to disk, if necessary

Note:

Before DBWn can write a modified buffer, all redo records associated with the changes to the buffer must be written to disk (the write-ahead protocol). If DBWn finds that some redo records have not been written, it signals LGWR to write the redo records to disk and waits for LGWR to complete writing the redo log buffer before it can write out the data buffers.

那么data block buffer中的缓存呢？是否也有可能在提交之前就已经写入磁盘了（写到data file中）？

答案是可能会写，这取决于事务大小及花费时间。那这些data block是因为什么原因会被写的呢？

The DBWn process writes dirty buffers to disk under the following conditions:

· When a server process cannot find a clean reusable buffer after scanning a threshold number of buffers, it signals DBWn to write. DBWn writes dirty buffers to disk asynchronously while performing other processing.

· DBWn periodically writes buffers to advance the checkpoint, which is the position in the redo thread (log) from which instance recovery begins. This log position is determined by the oldest dirty buffer in the buffer cache.

第一，在没有data block buffer中没有空位时

这是user process 寻找空位的过程

Before reading a data block into the cache, the process must first find a free buffer. The process searches the LRU list, starting at the least recently used end of the list. The process searches either until it finds a free buffer or until it has searched the threshold limit of buffers.

If the user process finds a dirty buffer as it searches the LRU list, it moves that buffer to the write list and continues to search. When the process finds a free buffer, it reads the data block from disk into the buffer and moves the buffer to the MRU end of the LRU list.

If an Oracle user process searches the threshold limit of buffers without finding a free buffer, the process stops searching the LRU list and signals the DBW0 background process to write some of the dirty buffers to disk.

第二，空闲时发生。也会被checkpoint唤醒。

当checkpoint事件发生时，会请求DBWR将某些脏块写入数据文件（究竟是哪些脏块，请看checkpoint介绍）。

【注2】如果这时系统崩溃了会怎样？例如掉电了。

那在库启动时，会先将系统置于失败点上（应该是最后一个checkpoint点上），再读取online redo log file中对一些已提交的事务进行重做，对未提交的事务进行回滚。将datafiles调校为最后一次成功提交的状态。

【注3】为什么呢？只写了日志啊，数据还没写到datafile中呢？

这是因为大多数关系型数据库都采用"在提交时并不强迫针对数据块的修改完成"而是"提交时保证修改记录（以重做日志的形式）写入日志文件"的机制，来获得性能的优势。即：当用户提交事务，写数据文件是"异步"的，写日志文件是"同步"的。因为写数据之前一定要先写日志，这样就能保证数据不丢，而且写日志速度比写数据文件速度快多了，写日志是顺序写，写数据文件则是分散写（scattered write）。最后这些脏块会被DBWR在某个时刻按参数（_db_block_write_batch）设置的大小批量的写入磁盘。

三、SCN在数据恢复中的作用

我们先得了解checkpoint这个概念，详见checkpoint介绍。

这里只需要知道checkpoint完成了什么：

n 请求DBWR将目前重做日志条目所对应的所有脏块，写入data file中

n 生成SCN，并将SCN写入control file 和各datafile文件头中

When a checkpoint occurs, Oracle must update the headers of all datafiles to record the details of the checkpoint. This is done by the CKPT process. The CKPT process does not write blocks to disk; DBWn always performs that work.

controlfile文件中的记录为：

System checkpoint SCN (SYSTEM CHECKPOINT SCN in control file)

SQL> select checkpoint_change# from v$database;

CHECKPOINT_CHANGE#

--------------------

datafile文件中记录为：

Datafile checkpoint SCN (DATAFILE CHECKPOINT SCN in control file)

SQL> select name,checkpoint_change# start_SCN,e.LAST_CHANGE# stop_SCN from v$datafile;

NAME CHECKPOINT_CHANGE# e.LAST_CHANGE#

----------------- ------------------ --------------------

亦可以通过命令SQL> alter session set events 'immediate trace name CONTROLF level 10';会在bdump目录成生成一个trc文件。查看文件也可得到相关信息。

* To dump the control file:

alter session set events 'immediate trace name CONTROLF level 10'

* To dump the file headers:

alter session set events 'immediate trace name FILE_HDRS level 10'

* To dump redo log headers:

alter session set events 'immediate trace name REDOHDR level 10'

* To dump the system state:

alter session set events 'immediate trace name SYSTEMSTATE level 10'

【注意】：在系统OPEN状态中的时候，SYSTEM CHECKPOINT SCN和DATAFILE CHECKPOINT SCN可能不一样，因为某个tablespace在这个时候是read-only的，其SCN不会变化。DATAFILE CHECKPOINT SCN中的Stop_SCN总为空直到系统正常shutdown时，会再次触发checkpoint将生成的末代SCN写入datafile header中。系统在启动的过程中（MOUNT时），我们把control file中的SCN称为sys_SCN，datafile中的称为start_SCN和stop_SCN，会检查sys_SCN=start_SCN=stop_SCN才会正常OPEN。

〖1〗如果发现stop_SCN为空，则说明系统是非正常关闭，因为还没来得及checkpoint。

这时候需进行crash recovery，一般系统会自动完成。会先将系统置于失败点上（应该是最后一个checkpoint点上），再读取online redo log file中这个chekpoint点之后的，对一些已提交的事务进行重做，对未提交的事务进行回滚。将datafiles调校为最后一次成功提交的状态。

〖2〗如果发现sys_SCN > start_SCN，例如人为的拷一个旧的datafile过来。

这时候需要Media recovery。这时候会根据start_SCN开始，去log文件中（在线的、归档的）这个start_SCN之后的事务日志来重做事务，直到start_SCN变成和sys_SCN一样。RECOVER DATAFILE N or FILENAME

〖3〗如果发现sys_SCN < start_SCN，例如人为的拷一个旧的control file过来。

这时如果要恢复一定要有备份的控制文件才行，所以作好控制文件的自动备份很重要。用RECOVER DATABASE UNTIL CANCEL USING BACKUP CONTROLFILE; ===> OPEN DATABASE RESETLOG。

阅读(1222) | 评论(0) | 转发(0) |

上一篇：vi键盘图

下一篇：checkpoint介绍

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6