Chinaunix首页 | 论坛 | 博客
  • 博客访问: 2758175
  • 博文数量: 389
  • 博客积分: 4177
  • 博客等级: 上校
  • 技术积分: 4773
  • 用 户 组: 普通用户
  • 注册时间: 2008-11-16 23:29
文章分类

全部博文(389)

分类: Oracle

2013-10-25 20:56:20

                                           RAC hang manager(简称hm)
      hang manager是用一个用来检测hang的机制,主要是用diag0来实现,在RAC的版本上的oracle才有,普通版本并不具体
有hang manager.之前hm就一直存在,并且随版本发布不断增加,直到 11.2.0.2.0后,hm才真正可以通过进程或实例来解决
hang的情况,主要有隐含参数_hang_resolution_scope控制,default为process,表示可以终止引起hang的前台进程,不包括
后台进程,因为终止后台进程会引起整个instance的crash。也可以设为instance,表示hm可以终止整个instance。
     hm的启用由参数_hang_detection_enabled参数控制,default为TRUE.hang manager对于用户级别的的锁是无能为力的。
比如TX,UL等.
     hm的工作主要分为五个阶段:detect,ha,analyxe,verify,victim.所有node的diag协调工作,节点号最小的diag进程为
集群中的master diag进程,负责整个集群层面的hang分析.

     hm的几个阶段比较难理解。整理资料如下:
    
   1,DETECT,This phase scans each instance’s all local sessions to locate any possible hung sessions. Each scan is called a snap.
There are 3 snaps kept with a default detection interval between snaps of 32 seconds.
 The interval is controlled by the hidden initialization parameter _HANG_DETECTION_INTERVAL.
Once one or more sessions have appeared in 3 snaps in the same INVOLUNTARY wait or 'not in a wait', the sessions are considered hung.
 At this point, the HA phase is requested by sending a REQHM message to the master DIA0 process which includes all hung sessions.
 This is the normal phase. If no hung sessions are ever detected, HM will remain in this state except when it periodically enters the HAONLY Phase.

   2,HA.This phase is entered from the DETECT phase upon receipt of a REQHM message from any node.
The requesting DIA0 process sends all of its interesting sessions to the master.
Interesting sessions are sessions that the node suspected might be hung and are in waits defined to be INVOLUNTARY waits or are 'not in a wait'.
 A global Hang Analysis is started to create Wait For Graphs (WFGs) that may span nodes.
 The HA callbacks are used to find the blockers of any hung session. The callbacks could be called twice.
The first call is during phase 1 of building the WFGs. In this phase, all local blockers are identified.
If during phase 1 it is determined that there are remote blockers, the callbacks are called again to find all of the remote blockers.
 At the end of this phase, WFGs have been built and returned to the master DIA0 process.
After completing this phase, HM will move directly to the ANALYZE phase.这个阶段主要是生成WFG.

   3,ANALYZE.This phase is entered from the HA phase. HM matches the suspected hung sessions sent to it by all DIA0 processes on all nodes with the sessions in the WFGs created in the HA Phase.
 Any sessions that do not appear in both the WFGs and the suspected hung sessions sent separately by the non-master DIA0 process in the cluster are ignored.
The root and the immediate waiter session information is matched against the Hang Signature Cache (HSC).
If a match is found, a new hang is not created. Instead, the HSC entry that was matched is updated with the last occurrence time and the number of occurrences seen.
 This reduces the number of hangs created due to transient identical hangs. If a new hang is to be created, hang heuristics are applied to the hang for the first time to determine various attributes
of the hang including whether it is a GLOBAL or LOCAL hang, has LOW, MEDIUM or HIGH confidence, etc. At this point, this new hang is considered a Suspected Hang. After completing this phase, HM will return to the DETECT phase.
 The VERIFY phase is controlled by a separate parameter and is completely orthogonal to the detection interval. Hence, it is necessary to return to the DETECT phase.
  对WFG中的信息进行分析,列出有可能出现hang的会话

  4,VERIFY.This phase is entered from the DETECT phase once the verify interval or ignored hang interval expires for a Suspected Hang or Ignored Hang. The hang is verified by sending messages to all nodes on which sessions involved in the hang reside.
 The nodes verify that these sessions are still hung and return that information to the master DIA0 process. If all sessions involved in a hang are still hung, the hang is considered to be a Verified Hang. However, if one or more sessions are no longer hung,
 the hang is either dissolved or rebuilt depending on whether the recently freed session is the ultimate waiter (the last waiter in the chain) or the root or one of the other sessions in the chain respectively. At the end of the VERIFY phase, hang confidence heuristics are applied to all Verified Hangs to re-evaluate the hang confidence of each hang. T
he X$ table information is updated but it is not broadcast.
After completing this phase, HM will either proceed to the VICTIM phase or go back to the DETECT phase. If there are Verified Hangs, it will proceed to the VICTIM phase otherwise, it will proceed to the DETECT phase.
  根据一段时间内的WFG,确认是否真正的hang   

   5,VICTM This phase is entered from the VERIFY phase.
A victim, the root of the hang for a hang of type HANG and a non-fatal process for a hang of type DEADLOCK,
is chosen. Hang resolution heuristics are now applied to the hang. Based on the heurstics,
 the hang which will be either resolved or ignored. One of the hang resolution heuristics is a check to determine if hang resolution is enabled.
 This is controlled by the _HANG_RESOLUTION_SCOPE initialization parameter.
 If this parameter is set to at least PROCESS and hang resolution heuristics have determined that the hang should be resolved,
 an attempt is made to terminate that session. If the session does not terminate, resolution is escalated to process termination.
 If _HANG_RESOLUTION_SCOPE is set to INSTANCE and the victim is a fatal background, the instance will be killed.
 If there are multiple hangs which involve instance termination, an attempt will be made to resolve only one hang.
   由于最终引起hang的会话,看是否可以解决,根据参数配置的不同,终止进程,实例或是什么都不做.

  主要相关的视图有v$hang_info,x$kjznhangs,x$kjznhangses,diag的跟踪文件等.

阅读(3446) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~