OS: LINUX AS4 DB Version: oracle10.2.02
主机PRIMARY和备机STANDY配置一样。
在logical standby dataguard维护过程中发现以下现象:
1、主机的归档日志已经完全同步到备机上,并且注册到备机数据库上;
SQL> select sequence#, first_time, next_time, dict_begin, dict_end from dba_logstdby_log order by sequence#;
SEQUENCE# FIRST_TIME NEXT_TIME DIC DIC
---------- ------------ ------------ --- ---
12675 27-DEC-07 27-DEC-07 NO NO
12676 27-DEC-07 27-DEC-07 NO NO
…………………………
12711 28-DEC-07 28-DEC-07 NO NO
12712 28-DEC-07 28-DEC-07 NO NO
38 rows selected.
2、备机上没有需要手动补偿的事务;
SQL> select event,xidusn,xidslt,xidsqn from dba_logstdby_events where event_time=(select max(event_time) from dba_logstdby_events);
EVENT
--------------------------------------------------------------------------------
XIDUSN XIDSLT XIDSQN
---------- ---------- ----------
3、备机数据同步执行非常缓慢,貌似数据库hung住;
alter session set nls_date_format = 'DD-Mon-YYYY hh24:mi:ss';
select * from dba_logstdby_progress;
检查发现,数据库只同步到昨天晚上9点。
SQL>SELECT TYPE, HIGH_SCN, STATUS FROM V$LOGSTDBY;
TYPE HIGH_SCN STATUS
COORDINATOR 9,371,034,866,807 ORA-16116: no work available
READER 9,371,034,866,812 ORA-16127: stalled waiting for additional transactions to be applied
BUILDER 9,371,034,866,807 ORA-16127: stalled waiting for additional transactions to be applied
PREPARER 9,371,034,866,806 ORA-16127: stalled waiting for additional transactions to be applied
ANALYZER 9,371,034,866,807 ORA-16116: no work available
APPLIER 9,371,034,866,805 ORA-16124: transaction 20 17 74407 is waiting on another transaction
APPLIER ORA-16116: no work available
APPLIER ORA-16116: no work available
APPLIER ORA-16116: no work available
APPLIER ORA-16116: no work available
查看ORACLE METALINK发现:
If there are a number of appliers that consistently have a status code of 16124, then there is a possibility of SQL Apply being delayed by an excessive number of Eager Transactions.
applier状态出现16124说明sql apply正处理一个非常大的事务,造成运行缓慢。(后来询问了开发的同事,果然是在主库中有个job,需要在一个事务中更新几万行记录)
An Eager Transaction is a transaction that updates many rows. From Data Guard 10g Release 1, an Eager Transaction is defined as updating more than 201 rows. The purpose of an Eager Transaction is to alleviate an apparent "hang" by trickling records into the standby database as they become available rather than waiting for the entire transaction to be built before applying it to the standby database.
This works well for most cases. However the performance of SQL Apply will deteriorate if an application creates multiple concurrent eager transactions on the primary database (multiple sessions connected to the primary database, all modifying more than 201 rows each).
An Eager Transaction指需要更新多行的大事务,从DATAGUAD 10G的第一个版本开始,an Eager Transaction 定义为可以更新多于201行数据。。。。。。。。。
If the slowed performance of SQL Apply impacts the standby database service levels, it is possible to reduce the interference by increasing the definition of an Eager Transaction.
如果sql applay性能缓慢,导致standy服务受到影响,可以通过增加an Eager Transaction的限制参数来降低影响。
Use the following query to determine if this is the cause of a hung or slowly progressing SQL Apply:
col event format a50 trunc
select ls.status_code, s.event, count(1) No_Of_Appliers
from v$logstdby ls
, v$streams_apply_server sas
, v$session s
where ls.type = 'APPLIER'
and ls.status_code in ( 16124, 16123 )
and ls.logstdby_id = sas.server_id
and s.sid = sas.sid
group by ls.status_code, s.event;
STATUS_CODE EVENT No_Of_Appliers
----------- -------------------------------------------------- --------------
16123 rdbms ipc message 12
16124 rdbms ipc message 97
16124 db file sequential read 1
In the example above, the large number of appliers waiting on the "rdbms ipc message" event suggests that a number of eager transactions are being applied, and that performance would benefit from increasing the number of rows that define an eager transaction.
To determine an appropriate value for Eager Size, analyze the application (if possible) that is creating the slow down to occur, determine the number of rows that make up the transaction, and set Eager Size equal to this value.
To increase the Eager Transaction parameter, SQL Apply must be stopped briefly.
步骤:
SQL> alter database stop logical standby apply;
SQL> execute dbms_logstdby.apply_set('_EAGER_SIZE',);
SQL> alter database start logical standby apply [immediate];
The value to which the _EAGER_SIZE should be increased should not exceed 1000. Since a higher value will result in fewer transactions being defined as Eager, this may result in the LCR Cache (controlled by the MAX_SGA parameter) becoming full.
After changing the _EAGER_SIZE parameter, double-check to ensure paging is not occurring per b above. If paging is occurring, then the MAX_SGA parameter should be increased (if possible) or else the _EAGER_SIZE parameter should be reduced to a level where paging does not occur.
至此,故障得到解决。在部署dataguard的过程中,除了安装官方文档进行安装外,还要在部署前检查在primary 机器上面有没大事务需要处理的,尽量将事务分解成若干个小事务,或者调大AN EAGER SIZE的限定,以免影响dataguard的同步性能。