2014年(88)
分类: Oracle
2014-06-17 14:31:10
今天在一台机器上模拟了dataguard,主备两个实例从物理上不共享任何归档文件路径。
主要有以下内容:
dataguard Physical standby的创建
protection mode的切换
switch over
模拟了两台机器oel1,oel2 主库的归档放在oel1里面,备库的放在oel2里面
创建的路径如下
./oel1:
orcl_pri_arch orcl_stdby_arch
./oel2:
standby_pri_arch standby_stdby_arch
--强制logging
SQL> alter database force logging;
alter database force logging
*
ERROR at line 1:
ORA-12920: database is already in force logging mode
SQL> select force_logging from v$database;
FOR
---
YES
--查看归档状态
SQL> archive log list
Database log mode Archive Mode
Automatic archival Enabled
Archive destination USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence 9
Next log sequence to archive 11
Current log sequence 11
--创建密码文件
[oracle@oel1 dbs]$ orapwd file=orapwstandby password=oracle entries=10
[oracle@oel1 dbs]$
listener.ora
LISTENER1 =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = tcp)(HOST = oel1.oracle.com)(PORT = 1521))
)
SID_LIST_LISTENER1 =
(SID_LIST =
(SID_DESC =
(GLOBAL_DBNAME = orcl.oracle.com)
(SID_NAME = orcl)
(ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1)
)
(SID_DESC =
(GLOBAL_DBNAME = standby.oracle.com)
(ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1)
(SID_NAME = standby)
)
)
tnsnames.ora
ORCL =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = tcp)(HOST = oel1.oracle.com)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = orcl.oracle.com)
)
)
standby =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = tcp)(HOST = oel1.oracle.com)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = stdby.oracle.com)
)
)
--创建相应的备库目录
[oracle@oel1 standby]$ mkdir adump bdump cdump udump
[oracle@oel1 standby]$ ll
total 16
drwxr-xr-x 2 oracle dba 4096 Oct 21 15:27 adump
drwxr-xr-x 2 oracle dba 4096 Oct 21 15:27 bdump
drwxr-xr-x 2 oracle dba 4096 Oct 21 15:27 cdump
drwxr-xr-x 2 oracle dba 4096 Oct 21 15:27 udump
参数文件
主库参数文件(ORCL)
db_unique_name='orcl'
log_archive_config='dg_config=(orcl,standby)'
log_archive_dest_1='location=/u01/app/oracle/dg/oel1/orcl_pri_arch valid_for=(ONLINE_LOGFILE,PRIMARY_ROLE) db_unique_name=orcl'
log_archive_dest_2='service=standby valid_for=(ONLINE_LOGFILE,PRIMARY_ROLE) db_unique_name=standby'
log_archive_dest_3='location=/u01/app/oracle/dg/oel1/orcl_stdby_arch valid_for=(STANDBY_LOGFILE,STANDBY_ROLE) db_unique_name=orcl'
db_file_name_convert='/u01/app/oracle/oradata/orcl','/u01/app/oracle/oradata/standby'
log_file_name_convert='/u01/app/oracle/oradata/orcl','/u01/app/oracle/oradata/standby'
standby_file_management=auto
备库参数文件(standby)
db_unique_name='standby'
log_archive_config='dg_config=(orcl,standby)'
log_archive_dest_1='location=/u01/app/oracle/dg/oel2/standby_pri_arch valid_for=(ONLINE_LOGFILE,PRIMARY_ROLE) db_unique_name=standby'
log_archive_dest_2='service=orcl valid_for=(ONLINE_LOGFILE,PRIMARY_ROLE) db_unique_name=orcl'
log_archive_dest_3='location=/u01/app/oracle/dg/oel2/standby_stdby_arch valid_for=(STANDBY_LOGFILE,STANDBY_ROLE) db_unique_name=standby'
db_file_name_convert='/u01/app/oracle/oradata/orcl','/u01/app/oracle/oradata/standby'
log_file_name_convert='/u01/app/oracle/oradata/orcl','/u01/app/oracle/oradata/standby'
standby_file_management=auto
主库运行
SQL> alter database create standby controlfile as '/u01/app/oracle/dg/standby.ctl';
Database altered.
SQL>
--错误的操作来生成控制文件
[oracle@oel1 standby]$ cp /u01/app/oracle/dg/standby.ctl control01.ctl
[oracle@oel1 standby]$ ll
total 21588
-rw-r----- 1 oracle dba 7356416 Oct 21 16:22 control01.ctl
-rw-r----- 1 oracle dba 7356416 Oct 21 16:23 control02.ctl
-rw-r----- 1 oracle dba 7356416 Oct 21 16:22 control03.ctl
--正确的方法
[oracle@oel1 standby]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.1.0 - Production on Sun Oct 21 16:26:01 2012
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup nomount;
ORACLE instance started.
Total System Global Area 524288000 bytes
Fixed Size 1220336 bytes
Variable Size 142606608 bytes
Database Buffers 373293056 bytes
Redo Buffers 7168000 bytes
SQL>
[oracle@oel1 standby]$ rman target /
Recovery Manager: Release 10.2.0.1.0 - Production on Sun Oct 21 16:27:44 2012
Copyright (c) 1982, 2005, Oracle. All rights reserved.
connected to target database: orcl (not mounted)
RMAN> restore controlfile from '/u01/app/oracle/dg/standby.ctl';
Starting restore at 21-OCT-12
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=39 devtype=DISK
channel ORA_DISK_1: copied control file copy
output filename=/u01/app/oracle/oradata/standby/control01.ctl
output filename=/u01/app/oracle/oradata/standby/control02.ctl
output filename=/u01/app/oracle/oradata/standby/control03.ctl
Finished restore at 21-OCT-12
RMAN>
[oracle@oel1 standby]$ ll
total 21588
-rw-r----- 1 oracle dba 7356416 Oct 21 16:28 control01.ctl
-rw-r----- 1 oracle dba 7356416 Oct 21 16:28 control02.ctl
-rw-r----- 1 oracle dba 7356416 Oct 21 16:28 control03.ctl
[oracle@oel1 standby]$
将备库启动到Mount状态
--到此为止要保证网络是互通的,可以检查密码文件,listener等的配置是否合适
[oracle@oel1 admin]$ sqlplus as sysdba
[oracle@oel1 admin]$ sqlplus as sysdba
-standby库 使用下面命令来apply
SQL> alter database recover managed standby database disconnect from session;
Database altered.
--当然也可以取消apply
SQL> alter database recover managed standby database cancel;
Database altered.
--在取消apply的时候,添加standby logfile,强烈建议在apply之前能够创建几组standby logfile
SQL> alter database add standby logfile;
Database altered.
SQL> select member from v$logfile;
/u01/app/oracle/STANDBY/onlinelog/o1_mf_8_88807p3x_.log
--在主库切换日志
SQL> alter system switch logfile;
System altered.
SQL>
如果没有看到归档到指定的目录,查看primary database的log_archive_dest参数,很可能是修改了Pfile 之后没有同步到spfile.
Error 16047 attaching to destination LOG_ARCHIVE_DEST_2 standby host 'standby'
ORA-16047: DGID mismatch between destination setting and standby
*** 2012-10-21 17:09:46.533 60679 kcrr.c
PING[ARC0]: Heartbeat failed to connect to standby 'standby'. Error is 16047.
*** 2012-10-21 17:09:46.533 58941 kcrr.c
kcrrfail: dest:2 err:16047 force:0 blast:1
查找了半天,原来是在参数文件中
db_unique_name='standb'
log_archive_config='dg_config=(orcl,standby)'
在主库中查看到如下的错误
Errors in file /u01/app/oracle/admin/orcl/bdump/orcl_arc0_16479.trc:
ORA-16047: DGID mismatch between destination setting and standby
Sun Oct 21 17:14:46 2012
PING[ARC0]: Heartbeat failed to connect to standby 'standby'. Error is 16047.
Sun Oct 21 17:18:50 2012
Thread 1 advanced to log sequence 15
Current log# 2 seq# 15 mem# 0: /u01/app/oracle/oradata/orcl/redo02.log
Sun Oct 21 17:18:50 2012
ARC1: Archivelog destination LOG_ARCHIVE_DEST_2 disabled: Data Guard configuration identifier mismatch
可以查看
SQL> select dest_id,error from v$archive_dest;
SQL> select dest_id,error from v$archive_dest;
DEST_ID ERROR
---------- -----------------------------------------------------------------
1
2 ORA-16047: DGID mismatch between destination setting and standby
3 ORA-16053: DB_UNIQUE_NAME is not in the Data Guard Configuration
4
5
6
7
8
9
。。
就能查看到错误的原因了。
ORA-16053: DB_UNIQUE_NAME is not in the Data Guard Configuration
原因是在主库的参数配置中,db_unique_name
*.log_archive_dest_3='location=/u01/app/oracle/dg/oel1/orcl_stdby_arch valid_for=(STANDBY_LOGFILE,STANDBY_ROLE) db_unique_name=standby'
dataguard 有一个很好的优点就是如果之前的配置错误,导致文件不能正确apply,会向primary database 索取apply点之前的日志文件
如下:
[oracle@oel1 orcl_pri_arch]$ ll
total 2164
-rw-r----- 1 oracle dba 248832 Oct 21 16:46 1_13_796831768.dbf
-rw-r----- 1 oracle dba 1257984 Oct 21 17:18 1_14_796831768.dbf
-rw-r----- 1 oracle dba 690176 Oct 21 17:31 1_15_796831768.dbf
[oracle@oel1 orcl_pri_arch]$
--在主库的归档目录下只有三个文件
--但是在正确配置了参数之后,重启两个实例后,手动切换日志,
--主库日志如下:
Sun Oct 21 17:31:44 2012
Thread 1 advanced to log sequence 16
Current log# 3 seq# 16 mem# 0: /u01/app/oracle/oradata/orcl/redo03.log
Sun Oct 21 17:32:10 2012
ARC0: Standby redo logfile selected for thread 1 sequence 15 for destination LOG_ARCHIVE_DEST_2
--在备库,中会发现gap.然后索取之前的日志
Completed: alter database recover managed standby database disconnect from session
Sun Oct 21 17:31:36 2012
Using STANDBY_ARCHIVE_DEST parameter default value as /u01/app/oracle/dg/oel2/standby_stdby_arch
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
RFS[1]: Assigned to RFS process 19230
RFS[1]: Identified database type as 'physical standby'
Sun Oct 21 17:31:36 2012
RFS LogMiner: Client disabled from further notification
RFS[1]: Archived Log: '/u01/app/oracle/dg/oel2/standby_stdby_arch/1_1_796831768.dbf'
RFS[1]: Archived Log: '/u01/app/oracle/dg/oel2/standby_stdby_arch/1_2_796831768.dbf'
Sun Oct 21 17:31:37 2012
Fetching gap sequence in thread 1, gap sequence 11-13
FAL[client]: Error fetching gap sequence, no FAL server specified
Sun Oct 21 17:31:37 2012
RFS[1]: Archived Log: '/u01/app/oracle/dg/oel2/standby_stdby_arch/1_3_796831768.dbf'
Sun Oct 21 17:31:47 2012
RFS[1]: Archived Log: '/u01/app/oracle/dg/oel2/standby_stdby_arch/1_4_796831768.dbf'
Sun Oct 21 17:31:58 2012
RFS[1]: Archived Log: '/u01/app/oracle/dg/oel2/standby_stdby_arch/1_5_796831768.dbf'
RFS[1]: Archived Log: '/u01/app/oracle/dg/oel2/standby_stdby_arch/1_6_796831768.dbf'
RFS[1]: Archived Log: '/u01/app/oracle/dg/oel2/standby_stdby_arch/1_7_796831768.dbf'
RFS[1]: Archived Log: '/u01/app/oracle/dg/oel2/standby_stdby_arch/1_8_796831768.dbf'
RFS[1]: Archived Log: '/u01/app/oracle/dg/oel2/standby_stdby_arch/1_9_796831768.dbf'
Sun Oct 21 17:32:07 2012
Fetching gap sequence in thread 1, gap sequence 11-14
FAL[client]: Error fetching gap sequence, no FAL server specified
Sun Oct 21 17:32:07 2012
RFS[1]: Archived Log: '/u01/app/oracle/dg/oel2/standby_stdby_arch/1_10_796831768.dbf'
RFS[1]: Archived Log: '/u01/app/oracle/dg/oel2/standby_stdby_arch/1_11_796831768.dbf'
RFS[1]: Archived Log: '/u01/app/oracle/dg/oel2/standby_stdby_arch/1_12_796831768.dbf'
RFS[1]: Archived Log: '/u01/app/oracle/dg/oel2/standby_stdby_arch/1_13_796831768.dbf'
RFS[1]: Archived Log: '/u01/app/oracle/dg/oel2/standby_stdby_arch/1_14_796831768.dbf'
RFS[1]: Successfully opened standby log 4: '/u01/app/oracle/flash_recovery_area/ORCL/onlinelog/o1_mf_4_85x8vjt5_.log'
Sun Oct 21 17:32:37 2012
Media Recovery Log /u01/app/oracle/dg/oel2/standby_stdby_arch/1_11_796831768.dbf
Media Recovery Log /u01/app/oracle/dg/oel2/standby_stdby_arch/1_12_796831768.dbf
Media Recovery Log /u01/app/oracle/dg/oel2/standby_stdby_arch/1_13_796831768.dbf
Media Recovery Log /u01/app/oracle/dg/oel2/standby_stdby_arch/1_14_796831768.dbf
Media Recovery Log /u01/app/oracle/dg/oel2/standby_stdby_arch/1_15_796831768.dbf
Media Recovery Waiting for thread 1 sequence 16
--在备库的归档路径下
[oracle@oel1 standby_stdby_arch]$ ll
total 242948
-rw-r----- 1 oracle dba 4096 Oct 21 17:32 1_10_796831768.dbf
-rw-r----- 1 oracle dba 1609728 Oct 21 17:32 1_11_796831768.dbf
-rw-r----- 1 oracle dba 2048 Oct 21 17:32 1_12_796831768.dbf
-rw-r----- 1 oracle dba 248832 Oct 21 17:32 1_13_796831768.dbf
-rw-r----- 1 oracle dba 1257984 Oct 21 17:32 1_14_796831768.dbf
-rw-r----- 1 oracle dba 690176 Oct 21 17:32 1_15_796831768.dbf
-rw-r----- 1 oracle dba 60416 Oct 21 17:31 1_1_796831768.dbf
-rw-r----- 1 oracle dba 8704 Oct 21 17:31 1_2_796831768.dbf
-rw-r----- 1 oracle dba 5632 Oct 21 17:31 1_3_796831768.dbf
-rw-r----- 1 oracle dba 100657664 Oct 21 17:31 1_4_796831768.dbf
-rw-r----- 1 oracle dba 100661248 Oct 21 17:31 1_5_796831768.dbf
-rw-r----- 1 oracle dba 16238592 Oct 21 17:32 1_6_796831768.dbf
-rw-r----- 1 oracle dba 1735680 Oct 21 17:32 1_7_796831768.dbf
-rw-r----- 1 oracle dba 450560 Oct 21 17:32 1_8_796831768.dbf
再次手动切换日志,一切正常
--从备库中查看
Sun Oct 21 17:38:38 2012
RFS[1]: Successfully opened standby log 4: '/u01/app/oracle/flash_recovery_area/ORCL/onlinelog/o1_mf_4_85x8vjt5_.log'
Sun Oct 21 17:38:41 2012
Media Recovery Log /u01/app/oracle/dg/oel2/standby_stdby_arch/1_16_796831768.dbf
Media Recovery Waiting for thread 1 sequence 17
--change protection mode
orcl>select database_role,protection_mode,protection_level from v$database;
DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL
---------------- -------------------- --------------------
PRIMARY MAXIMUM PERFORMANCE MAXIMUM PERFORMANCE
SQL> alter database set standby to maximize protection;
alter database set standby to maximize protection
*
ERROR at line 1:
ORA-01126: database must be mounted in this instance and not open in any
instance
orcl>shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
orcl>startup mount;
ORACLE instance started.
Total System Global Area 524288000 bytes
Fixed Size 1220336 bytes
Variable Size 213909776 bytes
Database Buffers 301989888 bytes
Redo Buffers 7168000 bytes
Database mounted.
orcl>alter database set standby database to maximize protection;
Database altered.
orcl>alter database open;
alter database open
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
--主库就突然down了
--查看备库,模式也没有任何改变
SQL> select database_role,protection_mode,protection_level from v$database;
DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL
---------------- -------------------- --------------------
PHYSICAL STANDBY MAXIMUM PERFORMANCE MAXIMUM PERFORMANCE
--日志 from primary database orcl
LGWR: Primary database is in MAXIMUM PROTECTION mode
LGWR: Destination LOG_ARCHIVE_DEST_2 is not serviced by LGWR
LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR
LGWR: Minimum of 1 LGWR standby database required
Sun Oct 21 17:53:11 2012
Errors in file /u01/app/oracle/admin/orcl/bdump/orcl_lgwr_21162.trc:
ORA-16072: a minimum of one standby database destination is required
Sun Oct 21 17:53:11 2012
Errors in file /u01/app/oracle/admin/orcl/bdump/orcl_lgwr_21162.trc:
ORA-16072: a minimum of one standby database destination is required
LGWR: terminating instance due to error 16072
Instance terminated by LGWR, pid = 21162
--先修复主库的问题
SQL> alter database set standby database to maximize performance;
Database altered.
SQL> alter database open;
Database altered.
SQL> alter system set log_archive_dest_2='service=standby LGWR SYNC AFFIRM VALID_FOR=(ONLINE_LOGFILE,PRIMARY_ROLE) db_unique_name=standby';
System altered.
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.
Total System Global Area 524288000 bytes
Fixed Size 1220336 bytes
Variable Size 213909776 bytes
Database Buffers 301989888 bytes
Redo Buffers 7168000 bytes
Database mounted.
SQL> show parameter log_archive_dest_2
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_2 string service=standby LGWR SYNC AFFI
RM VALID_FOR=(ONLINE_LOGFILE,P
RIMARY_ROLE) db_unique_name=st
andby
SQL> alter database set standby database to maximize protection;
Database altered.
SQL> alter database open;
Database altered.
SQL>
--在open时,有的人遇到了一些问题。
可能是网络不好,需要设置timeout参数,改为
*.log_archive_dest_2='SERVICE=standby LGWR SYNC AFFIRM NET_TIMEOUT=120 VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=standby'
就可以了。
--再次查看备库的状态
SQL> select database_role,protection_mode,protection_level from v$database;--最开始的
DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL
---------------- -------------------- --------------------
PHYSICAL STANDBY MAXIMUM PERFORMANCE MAXIMUM PERFORMANCE
SQL> /
DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL --修改后的
---------------- -------------------- --------------------
PHYSICAL STANDBY MAXIMUM PROTECTION MAXIMUM PROTECTION
SQL>
--在主库继续切换模式
SQL> alter database set standby database to maximize performance;
Database altered.
--备库也会相应的变化
SQL> /
DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL
---------------- -------------------- --------------------
PHYSICAL STANDBY MAXIMUM PERFORMANCE MAXIMUM PERFORMANCE
SQL> alter database set standby database to maximize availability;
Database altered.
--在备库查看模式
SQL> /
DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL
---------------- -------------------- --------------------
PHYSICAL STANDBY MAXIMUM AVAILABILITY RESYNCHRONIZATION
--在主库查看归档情况
SQL> select dest_id,status,error from v$archive_dest ;
DEST_ID STATUS
---------- ---------
ERROR
-----------------------------------------------------------------
1 VALID
2 ERROR
ORA-16086: standby database does not contain available standby
log files
3 BAD PARAM
ORA-16053: DB_UNIQUE_NAME is not in the Data Guard Configuration
在备库的日志中查看,原来是超时导致的。
ORA-16086: standby database does not contain available standby log files
*** 2012-10-21 22:13:23.151 60679 kcrr.c
LGWR: Error 16086 creating archivelog file 'standby'
*** 2012-10-21 22:13:23.151 58941 kcrr.c
kcrrfail: dest:2 err:16086 force:0 blast:1
Receiving message from LNSb
*** 2012-10-21 22:13:23.176 55444 kcrr.c
Making upidhs request to LNSb (ocis 0x0xb70a58b8). Begin time is <10/21/2012 22:13:23> and NET_TIMEOUT <180> seconds
NetServer pid:27564
*** 2012-10-21 22:13:27.195 55616 kcrr.c
upidhs done status 0
在主库尝试加入timeout
SQL> alter system set log_archive_dest_2='service=standby LGWR SYNC AFFIRM timeout=200 valid_for=(online_logfile,primary_role) db_unique_name=standby'
2 ;
alter system set log_archive_dest_2='service=standby LGWR SYNC AFFIRM timeout=200 valid_for=(online_logfile,primary_role) db_unique_name=standby'
*
ERROR at line 1:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-16025: parameter LOG_ARCHIVE_DEST_2 contains repeated or conflicting
attributes
--是设置timeout时参数名错误导致的
SQL> alter system set log_archive_dest_2='service=standby LGWR SYNC AFFIRM NET_TIMEOUT=200 valid_for=(online_logfile,primary_role) db_unique_name=standby'
2 ;
System altered.
--switch over and failover
-primary database orcl
SQL> select database_role,switchover_status from v$database;
DATABASE_ROLE SWITCHOVER_STATUS
---------------- --------------------
PRIMARY SESSIONS ACTIVE
--standby database standby
SQL> select database_role,switchover_status from v$database;
DATABASE_ROLE SWITCHOVER_STATUS
---------------- --------------------
PHYSICAL STANDBY NOT ALLOWED
ORA-16416: Switchover target is not synchronized with the primary
-未完待续