Chinaunix首页 | 论坛 | 博客
  • 博客访问: 283391
  • 博文数量: 27
  • 博客积分: 368
  • 博客等级: 一等列兵
  • 技术积分: 491
  • 用 户 组: 普通用户
  • 注册时间: 2012-05-09 21:35
个人简介

再出发..

文章分类

全部博文(27)

文章存档

2018年(1)

2014年(6)

2013年(5)

2012年(15)

我的朋友

分类: Oracle

2012-12-12 20:54:01

    早上巡检时发现OGG目标端一个事务处理时间非常长,直接导致REPLICAT进程严重延迟:


  1. GGSCI (sv890n01) 13> info rads_1
  2. REPLICAT RADS_1 Last Started 2012-12-10 09:46 Status RUNNING
  3. Checkpoint Lag 01:33:16 (updated 11:35:14 ago)
  4. Log Read Checkpoint File ./dirdat/pa000021
  5. 2012-12-11 01:55:10.313205 RBA 1753940084
  6. SQL> select table_name,num_rows from dba_tables where table_name='GT_MXYE_NEW';
 其中 "updated 11:35:14 ago"  记录了上一次检查点发生时间(即目标库上一个事务commit)和当前系统时间的时间差,而 "Checkpoint Lag 01:33:16" 指的是上一个检查点发生时间和上一个事务最后一条record的时间戳之间的时间差。也就是说当前这个事务已经执行了11.5个小时....以下是处理过程:

通过 v$session , v$sql 定位到当前sql为

  1. DELETE FROM "ADS"."GT_MXYE_NEW" WHERE "ZQDM" = :b0 AND "GTTGZH" = :b1 AND "JYSDM" = :b2 AND "YE" = :b3 AND "DQDM" = :b4 AND "SXDM" = :b5 AND ROWNUM = 1

通过 v$transaction.start_time 发现事务的开始时间与前面的延时是一致的,并且这条sql已经执行了50万次,在源库查询发现晚上"ADS"."GT_MXYE_NEW"这个表删除了500万条记录,sql为

  1. DELETE FROM GT_MXYE_NEW A WHERE EXISTS ( SELECT 1 FROM ADB_ZW_GTTGYEBDMX B WHERE 1=1 AND A.ZQDM=B.ZQDM AND A.GTTGZH=B.GTTGZH AND A.JYSDM=B.JYSDM AND B.KSRQ=:B1 )
该表没有主键和唯一约束,目标端把源端的sql拆分成了500万条再执行,查看执行计划

  1. SQL> explain plan for DELETE FROM "ADS"."GT_MXYE_NEW" WHERE "ZQDM" = :b0 AND "GTTGZH" = :b1 AND "JYSDM" = :b2 AND "YE" = :b3 AND "DQDM" = :b4 AND "SXDM" = :b5 AND ROWNUM = 1;
  2. Explained.
  3. SQL> @?/rdbms/admin/utlxpls.sql
  4. PLAN_TABLE_OUTPUT
  5. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  6. Plan hash value: 3603936140
  7. -------------------------------------------------------------------------------------------------
  8. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
  9. -------------------------------------------------------------------------------------------------
  10. | 0 | DELETE STATEMENT | | 1 | 41 | 629 (1)| 00:00:08 |
  11. | 1 | DELETE | GT_MXYE_NEW | | | | |
  12. |* 2 | COUNT STOPKEY | | | | | |
  13. |* 3 | TABLE ACCESS BY INDEX ROWID| GT_MXYE_NEW | 1 | 41 | 629 (1)| 00:00:08 |
  14. |* 4 | INDEX RANGE SCAN | IDX_GT_MXYE_NEW | 13477 | | 87 (2)| 00:00:02 |
  15. -------------------------------------------------------------------------------------------------
  16. Predicate Information (identified by operation id):
  17. ---------------------------------------------------
  18. 2 - filter(ROWNUM=1)
  19. 3 - filter("GTTGZH"=:B1 AND "JYSDM"=:B2 AND "DQDM"=:B4 AND "SXDM"=:B5 AND
  20. "YE"=TO_NUMBER(:B3))
  21. 4 - access("ZQDM"=:B0)
  22. 19 rows selected.
kill replicat 后在目标端新建复合索引(官方文档:KILL REPLICAT 后当前事务回滚并记录检查点,下次重启进程不会丢失数据):
  1. GGSCI (sv890n01) 15> KILL REPLICAT RADS_1
  2. Sending KILL request to MANAGER ...
  3. Killed process (26318) for REPLICAT RADS_1
  4. GGSCI (sv890n01) 16> info all
  5. Program Status Group Lag at Chkpt Time Since Chkpt
  6. MANAGER RUNNING
  7. REPLICAT ABENDED RADS_1 01:33:16 11:40:01
创建复合索引再看执行计划


  1. SQL> create index ads.ind_gt_mxye_ogg on ads.GT_MXYE_NEW(ZQDM,GTTGZH,JYSDM,YE,DQDM,SXDM) TABLESPACE TS_ADC_INDEX nologging;
  2. Index created.
  3. SQL> explain plan for DELETE FROM "ADS"."GT_MXYE_NEW" WHERE "ZQDM" = :b0 AND "GTTGZH" = :b1 AND "JYSDM" = :b2 AND "YE" = :b3 AND "DQDM" = :b4 AND "SXDM" = :b5 AND ROWNUM = 1;
  4. Explained.
  5. SQL> @?/rdbms/admin/utlxpls.sql
  6. PLAN_TABLE_OUTPUT
  7. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  8. Plan hash value: 415052804
  9. --------------------------------------------------------------------------------------
  10. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
  11. --------------------------------------------------------------------------------------
  12. | 0 | DELETE STATEMENT | | 1 | 41 | 3 (0)| 00:00:01 |
  13. | 1 | DELETE | GT_MXYE_NEW | | | | |
  14. |* 2 | COUNT STOPKEY | | | | | |
  15. |* 3 | INDEX RANGE SCAN| IND_GT_MXYE_OGG | 1 | 41 | 3 (0)| 00:00:01 |
  16. --------------------------------------------------------------------------------------
  17. Predicate Information (identified by operation id):
  18. ---------------------------------------------------
  19. 2 - filter(ROWNUM=1)
  20. 3 - access("ZQDM"=:B0 AND "GTTGZH"=:B1 AND "JYSDM"=:B2 AND
  21. "YE"=TO_NUMBER(:B3) AND "DQDM"=:B4 AND "SXDM"=:B5)
  22. 17 rows selected.
重新启动REPLICAT进程,运行了11个小时还没运行完的事务15分钟内执行完毕,再看延时情况


  1. GGSCI (sv890n01) 29> info all
  2. Program Status Group Lag at Chkpt Time Since Chkpt
  3. MANAGER RUNNING
  4. REPLICAT RUNNING RADS_1 09:33:16 00:00:07
由于当前事务已经提交并且写了检查点,相应的延时都发生了变化。


总结:源端的sql和目标端的sql不一定是一致的,原因有可能是OGG的内部机制,也有可能是该表没有外键或者唯一约束,也有可能是REPLICAT进程参数设置不当造成的,需要进一步学习。
阅读(11678) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~