WINDOWS下的程序员出身,偶尔也写一些linux平台下小程序, 后转行数据库行业,专注于ORACLE和DB2的运维和优化。 同时也是ios移动开发者。欢迎志同道合的朋友一起研究技术。 数据库技术交流群:58308065,23618606
全部博文(599)
分类: Oracle
2009-12-25 15:56:11
这篇文章描述为RAC环境创建STANDBY数据库。
由于篇幅限制,加上碰到了很多的bug,只能将文章拆分成多篇。
由于错误太多,导致一篇文章无法完全记录下来,这一篇继续记录STANDBY数据库创建过程中碰到的问题。
RAC数据库建立STANDBY(一):http://yangtingkun.itpub.net/post/468/484988
RAC数据库建立STANDBY(二):http://yangtingkun.itpub.net/post/468/485013
RAC数据库建立STANDBY(三):http://yangtingkun.itpub.net/post/468/485054
刚刚在进行RAC环境的DUPLICATE DATABASE的时候,就碰到了很多问题,由于二者命令比较相似,本来认为这次不会碰到太多的问题,没有想到的是,这次碰到的问题居然比DUPLICATE碰到的问题多出一倍。而且基本上所有碰到的问题都是DUPLICATE操作时不曾遇到的。
执行DUPLICATE DATABASE FOR STANDBY时出现下面的错误:
bash-3.00$ rman target sys/test@rac11g auxiliary sys/test@rac11g1_s
Recovery Manager: Release 11.1.0.6.0 - Production on Tue Sep 9 16:28:54 2008
Copyright (c) 1982, 2007, Oracle. All rights reserved.
connected to target database: RAC11G (DBID=1712482917)
connected to auxiliary database: RAC11G (not mounted)
RMAN> run
2> {
3> allocate channel c1 device type disk connect sys/test@rac11g1;
4> allocate channel c2 device type disk connect sys/test@rac11g2;
5> allocate auxiliary channel ac1 device type disk;
6> allocate auxiliary channel ac2 device type disk;
7> duplicate target database for standby
8> db_file_name_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'
9> dorecover
10> from active database
11> spfile
12> parameter_value_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'
13> set log_file_name_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'
14> set fal_client='RAC11G_S'
15> set fal_server='RAC11G'
16> set log_archive_dest_1='LOCATION=+DATA/RAC11G VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=rac11g_s'
17> set log_archive_dest_2='SERVICE=rac11g LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=rac11g'
18> set standby_archive_dest='+DATA/RAC11G'
19> set db_unique_name='rac11g_s';
20> }
using target database control file instead of recovery catalog
allocated channel: c1
channel c1: SID=619 instance=rac11g1 device type=DISK
allocated channel: c2
channel c2: SID=119 instance=rac11g2 device type=DISK
allocated channel: ac1
channel ac1: SID=112 device type=DISK
allocated channel: ac2
channel ac2: SID=39 device type=DISK
Starting Duplicate Db at 09-SEP-08
contents of Memory Script:
{
backup as copy reuse
file '/data/oracle/product/11.1/database/dbs/orapwrac11g2' auxiliary format
'/data/oracle/product/11.1/database/dbs/orapwrac11g1' file
'/dev/vx/rdsk/datavg/rac11g_spfile' auxiliary format
'+DATA/rac11g/spfilerac11g.ora' ;
sql clone "alter system set spfile= ''+DATA/rac11g/spfilerac11g.ora''";
}
executing Memory Script
Starting backup at 09-SEP-08
RMAN-03009: failure of backup command on c1 channel at 09/09/2008 16:29:06
ORA-19505: failed to identify file "/data/oracle/product/11.1/database/dbs/orapwrac11g2"
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Additional information: 3
continuing other job steps, job failed will not be re-run
released channel: c1
released channel: c2
released channel: ac1
released channel: ac2
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 09/09/2008 16:29:08
RMAN-03015: error occurred in stored script Memory Script
RMAN-03009: failure of backup command on c1 channel at 09/09/2008 16:29:06
ORA-19505: failed to identify file "/data/oracle/product/11.1/database/dbs/orapwrac11g2"
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Additional information: 3
RMAN> exit
Recovery Manager complete.
这个错误是由于连接源数据库的服务名采用了RAC的总体配置,因此很可能连接到实例2上进行恢复,有两种方法解决这个问题,一个是在任意节点上保证密码文件orapwrac11g1和orapwrac11g2都存在,另一种方法更加简单,可以连接源数据库的时候指定唯一一个实例进行连接,比如这里将RAC11G替换为RAC11G1。
bash-3.00$ rman target sys/test@rac11g auxiliary sys/test@rac11g1_s
Recovery Manager: Release 11.1.0.6.0 - Production on Tue Sep 9 16:31:00 2008
Copyright (c) 1982, 2007, Oracle. All rights reserved.
connected to target database: RAC11G (DBID=1712482917)
connected to auxiliary database: RAC11G (not mounted)
RMAN> run
2> {
3> allocate channel c1 device type disk connect sys/test@rac11g1;
4> allocate channel c2 device type disk connect sys/test@rac11g2;
5> allocate auxiliary channel ac1 device type disk;
6> allocate auxiliary channel ac2 device type disk;
7> duplicate target database for standby
8> db_file_name_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'
9> dorecover
10> from active database
11> spfile
12> parameter_value_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'
13> set log_file_name_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'
14> set fal_client='RAC11G_S'
15> set fal_server='RAC11G'
16> set log_archive_dest_1='LOCATION=+DATA/RAC11G VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=rac11g_s'
17> set log_archive_dest_2='SERVICE=rac11g LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=rac11g'
18> set standby_archive_dest='+DATA/RAC11G'
19> set db_unique_name='rac11g_s';
20> }
using target database control file instead of recovery catalog
allocated channel: c1
channel c1: SID=619 instance=rac11g1 device type=DISK
allocated channel: c2
channel c2: SID=119 instance=rac11g2 device type=DISK
allocated channel: ac1
channel ac1: SID=113 device type=DISK
allocated channel: ac2
channel ac2: SID=39 device type=DISK
Starting Duplicate Db at 09-SEP-08
contents of Memory Script:
{
backup as copy reuse
file '/data/oracle/product/11.1/database/dbs/orapwrac11g2' auxiliary format
'/data/oracle/product/11.1/database/dbs/orapwrac11g1' file
'/dev/vx/rdsk/datavg/rac11g_spfile' auxiliary format
'+DATA/rac11g/spfilerac11g.ora' ;
sql clone "alter system set spfile= ''+DATA/rac11g/spfilerac11g.ora''";
}
executing Memory Script
Starting backup at 09-SEP-08
Finished backup at 09-SEP-08
sql statement: alter system set spfile= ''+DATA/rac11g/spfilerac11g.ora''
contents of Memory Script:
{
sql clone "alter system set log_file_name_convert =
''/dev/vx/rdsk/datavg'', ''+DATA/RAC11G'' comment=
'''' scope=spfile";
sql clone "alter system set fal_client =
''RAC11G_S'' comment=
'''' scope=spfile";
sql clone "alter system set fal_server =
''RAC11G'' comment=
'''' scope=spfile";
sql clone "alter system set log_archive_dest_1 =
''LOCATION=+DATA/RAC11G VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=rac11g_s'' comment=
'''' scope=spfile";
sql clone "alter system set log_archive_dest_2 =
''SERVICE=rac11g LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=rac11g'' comment=
'''' scope=spfile";
sql clone "alter system set standby_archive_dest =
''+DATA/RAC11G'' comment=
'''' scope=spfile";
sql clone "alter system set db_unique_name =
''rac11g_s'' comment=
'''' scope=spfile";
shutdown clone immediate;
startup clone nomount ;
}
executing Memory Script
sql statement: alter system set log_file_name_convert = ''/dev/vx/rdsk/datavg'', ''+DATA/RAC11G'' comment= '''' scope=spfile
released channel: c1
released channel: c2
released channel: ac1
released channel: ac2
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 09/09/2008 16:31:11
RMAN-03015: error occurred in stored script Memory Script
RMAN-03009: failure of sql command on clone_default channel at 09/09/2008 16:31:11
RMAN-11003: failure during parse/execution of SQL statement: alter system set log_file_name_convert = '/dev/vx/rdsk/datavg', '+DATA/RAC11G' comment= '' scope=spfile
ORA-17510: Attempt to do i/o beyond file size
ORA-17510: Attempt to do i/o beyond file size
RMAN> exit
Recovery Manager complete.
检查了一下,这个问题应该是ASM的bug造成的。关于ORA-17510的错误有一些,但是没有和当前情况相似的。
由于找不到解决这个问题的办法,只好想办法绕过这个bug。于是从这里以后,采用了先在本地创建一个SPFILE的方式,将所有需要修改的参数都提前修改完成,避免在DUPLICATE命令的时候设置SPFILE参数。这样Oracle就不会执行ALTER SYSTEM语句,也就不会导致SPFILE的动态扩展。
随后利用SQLPLUS创建了本地的SPFILE,然后启动到NOMOUNT状态,通过RMAN连接却报错:
bash-3.00$ rman target sys/test@rac11g auxiliary sys/test@rac11g1_s
Recovery Manager: Release 11.1.0.6.0 - Production on Tue Sep 9 16:44:09 2008
Copyright (c) 1982, 2007, Oracle. All rights reserved.
connected to target database: RAC11G (DBID=1712482917)
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554: initialization of internal recovery manager package failed
RMAN-04006: error from auxiliary database: ORA-01031: insufficient privileges
这是因为RMAN刚才的恢复操作覆盖了密码文件,所以这里必须重建。从源数据库再次拷贝密码文件到本地,再次执行:
bash-3.00$ rman target sys/test@rac11g auxiliary sys/test@rac11g1_s
Recovery Manager: Release 11.1.0.6.0 - Production on Tue Sep 9 17:25:24 2008
Copyright (c) 1982, 2007, Oracle. All rights reserved.
connected to target database: RAC11G (DBID=1712482917)
connected to auxiliary database: RAC11GS (not mounted)
RMAN> run
2> {
3> allocate channel c1 device type disk connect sys/test@rac11g1;
4> allocate channel c2 device type disk connect sys/test@rac11g2;
5> allocate auxiliary channel ac1 device type disk;
6> allocate auxiliary channel ac2 device type disk;
7> duplicate target database for standby
8> dorecover
9> from active database;
10> }
using target database control file instead of recovery catalog
allocated channel: c1
channel c1: SID=621 instance=rac11g1 device type=DISK
allocated channel: c2
channel c2: SID=628 instance=rac11g2 device type=DISK
allocated channel: ac1
channel ac1: SID=333 instance=rac11g1 device type=DISK
allocated channel: ac2
channel ac2: SID=306 instance=rac11g1 device type=DISK
Starting Duplicate Db at 09-SEP-08
contents of Memory Script:
{
backup as copy reuse
file '/data/oracle/product/11.1/database/dbs/orapwrac11g1' auxiliary format
'/data/oracle/product/11.1/database/dbs/orapwrac11g1' ;
}
executing Memory Script
Starting backup at 09-SEP-08
Finished backup at 09-SEP-08
contents of Memory Script:
{
backup as copy current controlfile for standby auxiliary format '+DATA/rac11g/rac11g_control_1';
restore clone controlfile to '+DATA/rac11g/rac11g_control_2' from
'+DATA/rac11g/rac11g_control_1';
restore clone controlfile to '+DATA/rac11g/rac11g_control_3' from
'+DATA/rac11g/rac11g_control_1';
sql clone 'alter database mount standby database';
}
executing Memory Script
Starting backup at 09-SEP-08
channel c1: starting datafile copy
copying standby control file
released channel: c1
released channel: c2
released channel: ac1
released channel: ac2
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 09/09/2008 17:25:35
RMAN-03015: error occurred in stored script Memory Script
RMAN-03009: failure of backup command on c1 channel at 09/09/2008 17:25:35
ORA-17629: Cannot connect to the remote database server
ORA-17627: ORA-01031: insufficient privileges
ORA-17629: Cannot connect to the remote database server
RMAN> exit
Recovery Manager complete.
这个问题导致的原因比较复杂。在本地测试连接,没有发现任何问题。在远端用sys测试连接辅助实例,也没有发现任何异常的情况。
但是只要执行DUPLICATE命令就会报这个错误。
由于没有其他可供参考的信息,只能根据现有的错误信息进行推测。首先报错发生在CHANNEL c1上,而且是连接远端数据库服务器。说明问题是在执行过程中,源数据库尝试连接辅助实例时出现的错误,怀疑问题可能与密码文件有关。
可是通过命令行测试没有发现任何的问题。而且前面采用SPFILE方式的时候,也经历了类似的步骤,为什么没有报错。
通过观察Oracle执行的脚本发现了问题,Oracle执行了下面的操作:
contents of Memory Script:
{
backup as copy reuse
file '/data/oracle/product/11.1/database/dbs/orapwrac11g1' auxiliary format
'/data/oracle/product/11.1/database/dbs/orapwrac11g1' ;
}
executing Memory Script
这个操作时前面通过SPFILE方式所没有的,这也是为什么原来没有报错,而这里报错的原因。
不过问题又出现了,Oracle这个操作的目录是将远端的密码文件同步到本地服务器上,使得后续的SYS连接可以正常登陆。这个操作本来是没有任何问题的,而且实际上前面我就是采用了操作系统上面的类似操作,将源数据库的密码文件同步到本地的。那么为什么Oracle的rman执行了这个操作后,源数据库再连接辅助实例的时候反而报错了呢。
检查了源数据库的配置后,发现了问题的所在,源数据库采用了裸设备的方式,而本地的$ORACLE_HOME/dbs目录下的密码文件只是一个链接而已。
$ ls -l
total 38404
-rw-rw---- 1 oracle oinstall 1552 Sep 5 16:09 hc_rac11g1.dat
-rw-r--r-- 1 oracle oinstall 8385 Sep 11 1998 init.ora
-rw-r--r-- 1 oracle oinstall 12920 May 3 2001 initdw.ora
-rw-r----- 1 oracle oinstall 43 Jul 16 15:48 initrac11g1.ora
lrwxrwxrwx 1 oracle oinstall 34 Jul 16 14:39 orapwrac11g1 -> /dev/vx/rdsk/datavg/rac11g_pwdfile
-rw-r----- 1 oracle oinstall 19611648 Sep 8 18:27 snapcf_rac11g1.f
很可能Oracle在这里执行备份的时候出现了错误,没有采用裸设备备份而是当作操作系统文件进行了拷贝,导致恢复到辅助实例后,密码文件不可用。
于是先去掉了连接,直接将裸设备中的密码文件拷贝到本地目录,使得RMAN可以正常的拷贝这个文件:
$ dd if=/dev/vx/rdsk/datavg/rac11g_pwdfile of=/data/rac11g_pwfile
204800+0 records in
204800+0 records out
bash-3.00$ rm orapwrac11g1
bash-3.00$ rm orapwrac11g2
bash-3.00$ cp /data/rac11g_pwfile orapwrac11g1
再次执行DUPLICATE,则权限不足的问题被解决。