分类:
2012-03-27 10:06:12
原文地址:利用STANDBY将单实例数据库升级为RAC环境(四) 作者:TOMSYAN
作者:yangtingkun
利用Oracle的STANDBY技术,可以将单实例数据库升级到RAC数据库。这种方式可以有效的降低单实例迁移到RAC环境的停机时间。
这篇文章描述整个操作过程中碰到的错误。
利用STANDBY将单实例数据库升级为RAC环境(一):http://yangtingkun.itpub.net/post/468/494736
利用STANDBY将单实例数据库升级为RAC环境(二):http://yangtingkun.itpub.net/post/468/494766
利用STANDBY将单实例数据库升级为RAC环境(三):http://yangtingkun.itpub.net/post/468/494782
最开始碰了几个初始化参数设置的小错误,主要问题是FLASH_RECOVERY_AREA设置到ASM实例上导致了问题:
SQL> startup nomount pfile=/export/home/oracle/inittest11gr1.ora
ORA-01261: Parameter db_recovery_file_dest destination string cannot be translated
将DB_RECOVERY_FILE_DEST设置到本地磁盘后,问题解决。
由于SWITCHOVER过程中,设置DB_RECOVERY_FILE_DEST可能会引发bug,因此在主库和STANDBY数据库,都先将DB_RECOVERY_FILE_DEST关闭。
随后检查主库到STANDBY数据库的远端归档时发现,查询V$ARCHIVE_DEST视图,对应远端归档的目的地出现ORA-1031错误。
SQL> select dest_name, status, target, destination, error
2 from v$archive_dest
3 where dest_id in (1, 2);
DEST_NAME STATUS TARGET DESTINATION ERROR
-------------------- ------ ------- -------------------------------- -------------------
LOG_ARCHIVE_DEST_1 VALID PRIMARY /data/oradata/test11g/archivelog
LOG_ARCHIVE_DEST_2 ERROR STANDBY TEST11GR ORA-01031: 权限不足
这个没有权限的错误是由于没有拷贝密码文件造成的。将远端的密码文件拷贝到本地的$ORACLE_HOME/dbs目录下,并进行重命名,修改为orapwtest11gr1。这时必须重启本地的实例,才能加载密码文件。成功加载密码文件后,查询V$ARCHIVE_DEST视图,结果恢复正常:
SQL> select dest_name, status, target, destination, error
2 from v$archive_dest
3 where dest_id in (1, 2);
DEST_NAME STATUS TARGET DESTINATION ERROR
-------------------- ------ ------- -------------------------------- -----------------------
LOG_ARCHIVE_DEST_1 VALID PRIMARY /data/oradata/test11g/archivelog
LOG_ARCHIVE_DEST_2 VALID STANDBY TEST11GR
随后在启动standby数据库的恢复过程后,检查不到对应的进程在工作,检查后台alert文件,发现出现了很多错误:
Mon Jul 13 16:55:17 2009
alter database recover managed standby database disconnect from session
Attempt to start background Managed Standby Recovery process (test11gr1)
Mon Jul 13 16:55:17 2009
MRP0 started with pid=15, OS id=8356
MRP0: Background Managed Standby Recovery process started (test11gr1)
Fast Parallel Media Recovery enabled
Mon Jul 13 16:55:22 2009
Managed Standby Recovery not using Real Time Apply
MRP0: Background Media Recovery terminated with error 1110
Errors in file /data/oracle/diag/rdbms/test11gr/test11gr1/trace/test11gr1_mrp0_8356.trc:
ORA-01110: data file 1: '+DATA/test11g/system01.dbf'
ORA-01122: database file 1 failed verification check
ORA-01110: data file 1: '+DATA/test11g/system01.dbf'
ORA-01251: Unknown File Header Version read for file number 1
Errors in file /data/oracle/diag/rdbms/test11gr/test11gr1/trace/test11gr1_mrp0_8356.trc:
ORA-01110: data file 1: '+DATA/test11g/system01.dbf'
ORA-01122: database file 1 failed verification check
ORA-01110: data file 1: '+DATA/test11g/system01.dbf'
ORA-01251: Unknown File Header Version read for file number 1
MRP0: Background Media Recovery process shutdown (test11gr1)
Completed: alter database recover managed standby database disconnect from session
Mon Jul 13 17:07:05 2009
db_recovery_file_dest_size of 4096 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Mon Jul 13 17:18:40 2009
Using STANDBY_ARCHIVE_DEST parameter default value as +DATA/test11g/archivelog
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
RFS[1]: Assigned to RFS process 23554
RFS[1]: Identified database type as 'physical standby'
RFS LogMiner: Client disabled from further notification
Mon Jul 13 17:18:41 2009
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
RFS[2]: Assigned to RFS process 23564
RFS[2]: Identified database type as 'physical standby'
Mon Jul 13 17:18:41 2009
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
RFS[3]: Assigned to RFS process 23578
RFS[3]: Identified database type as 'physical standby'
Mon Jul 13 17:18:53 2009
RFS[2]: Archived Log: '+DATA/test11g/archivelog/1_119_683602501.dbf'
Mon Jul 13 17:18:53 2009
RFS[1]: Archived Log: '+DATA/test11g/archivelog/1_121_683602501.dbf'
Mon Jul 13 17:18:53 2009
RFS[3]: Archived Log: '+DATA/test11g/archivelog/1_120_683602501.dbf'
Mon Jul 13 17:18:55 2009
Shutting down instance: further logons disabled
Stopping background process MMNL
Stopping background process MMON
Shutting down instance (immediate)
License high water mark = 4
Mon Jul 13 17:18:57 2009
ORA-1089 : opidrv aborting process O000 ospid (23558_1)
Mon Jul 13 17:18:57 2009
ORA-1089 : opidrv aborting process O001 ospid (23726_1)
ORA-1089 : opidrv aborting process unknown ospid (23564_1)
ORA-1089 : opidrv aborting process unknown ospid (23554_1)
ORA-1089 : opidrv aborting process unknown ospid (23578_1)
Errors in file /data/oracle/diag/rdbms/test11gr/test11gr1/trace/test11gr1_rfs_23564.trc (incident=465):
ORA-00600: internal error code, arguments: [kfmdSlvLeaveWrt1], [12], [], [], [], [], [], []
ORA-01089: immediate shutdown in progress - no operations are permitted
ORA-12577: Message 12577 not found; product=RDBMS; facility=ORA
Incident details in: /data/oracle/diag/rdbms/test11gr/test11gr1/incident/incdir_465/test11gr1_rfs_23564_i465.trc
Errors in file /data/oracle/diag/rdbms/test11gr/test11gr1/trace/test11gr1_rfs_23554.trc (incident=121):
ORA-00600: internal error code, arguments: [kfmdSlvLeaveWrt1], [16], [], [], [], [], [], []
ORA-01089: immediate shutdown in progress - no operations are permitted
ORA-12577: Message 12577 not found; product=RDBMS; facility=ORA
Incident details in: /data/oracle/diag/rdbms/test11gr/test11gr1/incident/incdir_121/test11gr1_rfs_23554_i121.trc
Errors in file /data/oracle/diag/rdbms/test11gr/test11gr1/trace/test11gr1_rfs_23578.trc (incident=481):
ORA-00600: internal error code, arguments: [kfmdSlvLeaveWrt1], [12], [], [], [], [], [], []
ORA-01089: immediate shutdown in progress - no operations are permitted
ORA-12577: Message 12577 not found; product=RDBMS; facility=ORA
Incident details in: /data/oracle/diag/rdbms/test11gr/test11gr1/incident/incdir_481/test11gr1_rfs_23578_i481.trc
不但有ORA-1110、ORA-1122和ORA-1151错误,后面还有ORA-600(kfmdSlvLeaveWrt1)错误。仔细检查了一下,发现后面的600错误倒是没有多大关系,是由于SHUTDOWN操作强行关闭了RFS进程引发的。这并不是什么大问题。
关键是前面的ORA-1110、ORA-1122和ORA-1151错误。是什么导致了数据文件验证出现了问题,查询metalink很久,也没有找到对应的描述。
回想所有的操作,只有一个地方有疑点,就是数据文件是通过ASMCMD工具的CP命令放到ASM磁盘组中的,会不会是这个操作导致的问题呢,尝试利用其他的方法来传输数据文件。为了有所对比,利用ftp的方式,先把system01.dbf数据文件从源站点再传送一次,然后再次启动ALTER DATABASE RECOVER MANAGED STANDBY DATABASE命令:
SQL> alter database recover managed standby database disconnect from session;
数据库已更改。
再次检查alert文件,发现错误信息改变了:
Wed Jul 15 15:08:34 2009
alter database recover managed standby database disconnect from session
Attempt to start background Managed Standby Recovery process (test11gr1)
Wed Jul 15 15:08:34 2009
MRP0 started with pid=64, OS id=8453
MRP0: Background Managed Standby Recovery process started (test11gr1)
Fast Parallel Media Recovery enabled
Wed Jul 15 15:08:39 2009
Managed Standby Recovery not using Real Time Apply
MRP0: Background Media Recovery terminated with error 1110
Errors in file /data/oracle/diag/rdbms/test11gr/test11gr1/trace/test11gr1_mrp0_8453.trc:
ORA-01110: data file 2: '+DATA/test11g/sysaux01.dbf'
ORA-01122: database file 2 failed verification check
ORA-01110: data file 2: '+DATA/test11g/sysaux01.dbf'
ORA-01251: Unknown File Header Version read for file number 2
Errors in file /data/oracle/diag/rdbms/test11gr/test11gr1/trace/test11gr1_mrp0_8453.trc:
ORA-01110: data file 2: '+DATA/test11g/sysaux01.dbf'
ORA-01122: database file 2 failed verification check
ORA-01110: data file 2: '+DATA/test11g/sysaux01.dbf'
ORA-01251: Unknown File Header Version read for file number 2
MRP0: Background Media Recovery process shutdown (test11gr1)
Completed: alter database recover managed standby database disconnect from session
刚才传送的system数据文件已经没有问题了,看来问题就是asmcmd造成的,通过ftp的方式将所有的数据文件重新传送,问题解决。
由于开始的时候没有拷贝REDO.LOG文件,因此alert文件中出现下面的错误信息:
Wed Jul 15 16:30:53 2009
Managed Standby Recovery not using Real Time Apply
parallel recovery started with 7 processes
Waiting for all non-current ORLs to be archived...
Errors in file /data/oracle/diag/rdbms/test11gr/test11gr1/trace/test11gr1_mrp0_3616.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '+DATA/test11g/redo01.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/test11g/redo01.log
ORA-15173: entry 'redo01.log' does not exist in directory 'test11g'
Errors in file /data/oracle/diag/rdbms/test11gr/test11gr1/trace/test11gr1_mrp0_3616.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '+DATA/test11g/redo01.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/test11g/redo01.log
ORA-15173: entry 'redo01.log' does not exist in directory 'test11g'
Clearing online redo logfile 1 +DATA/test11g/redo01.log
Clearing online log 1 of thread 1 sequence number 151
Errors in file /data/oracle/diag/rdbms/test11gr/test11gr1/trace/test11gr1_mrp0_3616.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '+DATA/test11g/redo01.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/test11g/redo01.log
ORA-15173: entry 'redo01.log' does not exist in directory 'test11g'
Clearing online redo logfile 1 complete
Errors in file /data/oracle/diag/rdbms/test11gr/test11gr1/trace/test11gr1_mrp0_3616.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '+DATA/test11g/redo02.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/test11g/redo02.log
ORA-15173: entry 'redo02.log' does not exist in directory 'test11g'
Errors in file /data/oracle/diag/rdbms/test11gr/test11gr1/trace/test11gr1_mrp0_3616.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '+DATA/test11g/redo02.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/test11g/redo02.log
ORA-15173: entry 'redo02.log' does not exist in directory 'test11g'
Clearing online redo logfile 2 +DATA/test11g/redo02.log
Clearing online log 2 of thread 1 sequence number 152
Errors in file /data/oracle/diag/rdbms/test11gr/test11gr1/trace/test11gr1_mrp0_3616.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '+DATA/test11g/redo02.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/test11g/redo02.log
ORA-15173: entry 'redo02.log' does not exist in directory 'test11g'
Clearing online redo logfile 2 complete
Errors in file /data/oracle/diag/rdbms/test11gr/test11gr1/trace/test11gr1_mrp0_3616.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '+DATA/test11g/redo03.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/test11g/redo03.log
ORA-15173: entry 'redo03.log' does not exist in directory 'test11g'
Errors in file /data/oracle/diag/rdbms/test11gr/test11gr1/trace/test11gr1_mrp0_3616.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '+DATA/test11g/redo03.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/test11g/redo03.log
ORA-15173: entry 'redo03.log' does not exist in directory 'test11g'
Clearing online redo logfile 3 +DATA/test11g/redo03.log
Clearing online log 3 of thread 1 sequence number 150
Errors in file /data/oracle/diag/rdbms/test11gr/test11gr1/trace/test11gr1_mrp0_3616.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '+DATA/test11g/redo03.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/test11g/redo03.log
ORA-15173: entry 'redo03.log' does not exist in directory 'test11g'
Completed: alter database recover managed standby database disconnect from session
Clearing online redo logfile 3 complete
将所有的REDO文件拷贝到ASM磁盘组中,问题消失。
下面这个问题是由于创建STANDBY CONTROLFILE过早造成的:
Media Recovery Log +DATA/test11g/archivelog/1_119_683602501.dbf
Errors in file /data/oracle/diag/rdbms/test11gr/test11gr1/trace/test11gr1_mrp0_3616.trc:
ORA-01119: error in creating database file '+DATA/test11g/undotbs2_01.dbf'
ORA-17502: ksfdcre:4 Failed to create file +DATA/test11g/undotbs2_01.dbf
ORA-15005: name "test11g/undotbs2_01.dbf" is already used by an existing alias
File #7 added to control file as 'UNNAMED00007'.
Originally created as:
'/data/oracle/oradata/test11g/undotbs2_01.dbf'
Recovery was unable to create the file as:
'+DATA/test11g/undotbs2_01.dbf'
MRP0: Background Media Recovery terminated with error 1274
Errors in file /data/oracle/diag/rdbms/test11gr/test11gr1/trace/test11gr1_mrp0_3616.trc:
ORA-01274: cannot add datafile '/data/oracle/oradata/test11g/undotbs2_01.dbf' - file could not be created
Shutting down recovery slaves due to error 1274
Some recovered datafiles maybe left media fuzzy
Media recovery may continue but open resetlogs may fail
Errors in file /data/oracle/diag/rdbms/test11gr/test11gr1/trace/test11gr1_mrp0_3616.trc:
ORA-01274: cannot add datafile '/data/oracle/oradata/test11g/undotbs2_01.dbf' - file could not be created
MRP0: Background Media Recovery process shutdown (test11gr1)
由于RAC环境需要创建THREAD2对应的UNDOTBS2表空间,而这个操作发生在创建STANDBY CONTROLFILE之后,所以导致了上面的错误信息。
通过重建STANDBY CONTROLFILE的方面可以方便的解决这个错误。