Oracle ocp Total dlm rcfg time (inc 6): 0.100 secs (1278500359581, 1278500359681)
Begin step .........: 0.001 secs (1278500359581, 1278500359582)
Freeze step ........: 0.020 secs (1278500359582, 1278500359602)
Remap step .........: 0.002 secs (1278500359602, 1278500359604)
Comm step ..........: 0.005 secs (1278500359604, 1278500359609)
Sync 1 step ........: 0.000 secs (0, 0)
Exchange step ......: 0.000 secs (1278500359609, 1278500359609)
Sync 2 step ........: 0.000 secs (0, 0)
Enqueue cleanup step: 0.011 secs (1278500359609, 1278500359620)
Sync pcm1 step .....: 0.000 secs (0, 0)
Cleanup step .......: 0.013 secs (1278500359620, 1278500359633)
Timerq step ........: 0.000 secs (1278500359633, 1278500359633)
Ddq step ...........: 0.000 secs (1278500359633, 1278500359633)
Set master step ....: 0.006 secs (1278500359633, 1278500359639)
Sync 3 step ........: 0.000 secs (0, 0)
Enqueue replay step : 0.004 secs (1278500359639, 1278500359643)
Sync 4 step ........: 0.000 secs (0, 0)
Enqueue dubious step: 0.001 secs (1278500359643, 1278500359644)
Sync 5 step ........: 0.000 secs (0, 0)
Enqueue grant step .: 0.001 secs (1278500359644, 1278500359645)
Sync 6 step ........: 0.000 secs (0, 0)
PCM replay step ....: 0.030 secs (1278500359645, 1278500359675)
Sync 7 step ........: 0.000 secs (0, 0)
Fixwrt replay step .: 0.003 secs (1278500359675, 1278500359678)
Sync 8 step ........: 0.000 secs (0, 0)
End step ...........: 0.001 secs (1278500359680, 1278500359681)
Number of replayed enqueues sent / received .......: 0 / 0
Number of replayed fusion locks sent / received ...: 0 / 0
Number of enqueues mastered before / after rcfg ...: 2217 / 2941
Number of fusion locks mastered before / after rcfg: 3120 / 5747
**************** END DLM RCFG HA STATS *****************
*** 2011-06-27 22:19:36.589
kjxgfipccb: msg 0x0x7ff526139320, mbo 0x0x7ff526139310, type 19, ack 0, ref 0, stat 34
=====================================================================
============================lms trace begin==========================
*** 2011-06-27 22:38:54.663
2011-06-27 22:38:54.663764 : 0 GCS shadows cancelled, 0 closed, 0 Xw survived
2011-06-27 22:38:54.673539 : 5230 GCS resources traversed, 0 cancelled
2011-06-27 22:38:54.707671 : 9322 GCS shadows traversed, 0 replayed, 0 duplicates,
5183 not replayed, dissolve 0 timeout 0 RCFG(10) lms 0 finished replaying gcs resources
2011-06-27 22:38:54.709132 : 0 write requests issued in 384 GCS resources --check past image
0 PIs marked suspect, 0 flush PI msgs
2011-06-27 22:38:54.709520 : 0 write requests issued in 273 GCS resources
1 PIs marked suspect, 0 flush PI msgs
2011-06-27 22:38:54.709842 : 0 write requests issued in 281 GCS resources
0 PIs marked suspect, 0 flush PI msgs
2011-06-27 22:38:54.710159 : 0 write requests issued in 233 GCS resources
0 PIs marked suspect, 0 flush PI msgs
2011-06-27 22:38:54.710531 : 0 write requests issued in 350 GCS resources
lms 0 finished fixing gcs write protocol
Instance Recovery和普通的Crash Recovery最大的区别在于实例恢复过程中的GRD Frozen和对GES/GCS资源的Remaster,这部分工作主要由LMON进程完成,可以从以上trace中发现一些KJGA_RCFG_*形式的Reconfiguration步骤,它们的含义:
Reconfiguration Steps:
1. KJGA_RCFG_BEGIN
LMON continuously polling for reconfiguration event. Once cgs reports a change in cluster membership,
LMON starts reconfiguration
2. KJGA_RCFG_FREEZE
All processes acknowledges to the reconfiguration freeze before LMON continue
3. KJGA_RCFG_REMAP
Updates new instance map (kjfchsu), re-distributes resource mastership. Invalidate recovery domains
if reconfiguration is caused by instance death.
4. KJGA_RCFG_COMM
Reinitialize communication channel
5. KJGA_RCFG_EXCHANGE
Exchange of master information of gcs, ges and file affinity master
6. KJGA_RCFG_ENQCLEANUP
Delete remote dead gcs/ges locks. Cancel converting gcs requests.
7. KJGA_RCFG_CLEANUP
Cleanup/remove ges resources
8. KJGA_RCFG_TIMERQ
Restore relative timeout for enqueue locks on timeout queue
9. KJGA_RCFG_DDQ
Clean out enqueue locks on deadlock queue
10. KJGA_RCFG_SETMASTER
Update master info for each enqueue resources that need to be remastered.
11. KJGA_RCFG_REPLAY
Replay enqueue locks
12. KJGA_RCFG_ENQDUBIOUS
Invalidates ges resources without established value
13. KJGA_RCFG_ENQGRANT
Grants all grantable ges lock requests
14. KJGA_RCFG_REPLAY2
Enqueue reconfiguration complete. Post SMON to start instance recovery. Starts replaying gcs resources.
15. KJGA_RCFG_FIXWRITES2
Fix write state of gcs resources
16. KJGA_RCFG_END
Unfreeze lock database