Chinaunix首页 | 论坛 | 博客
  • 博客访问: 288983
  • 博文数量: 60
  • 博客积分: 1437
  • 博客等级: 中尉
  • 技术积分: 632
  • 用 户 组: 普通用户
  • 注册时间: 2011-02-10 14:12
文章存档

2012年(7)

2011年(53)

分类: Oracle

2011-02-10 14:36:28

    春节的假期里接到客户客户的电话,曰:主机重启后,RAC一个也起不来(一个4节点的RAC,两个满配的570+两个半配的570).一台主机启动很慢很慢,一台主机报错,四个节点竟然2个节点报硬件错误!幸好今年春节在魔都过,简单的了解了一下情况,火速赶往现场,路上联系主机工程师,NND在魔都的工程师只有一人并且是转销售去了的,估计不会来,电话找公司安排主机工程师,竟然无人接电话,无果,打公司800电话,TMD还是无人接,看来TMD什么7×24啊,什么800,都TMD是浮云,接单之前吹得天花乱坠,有事的时候又找不到人,找到了又安排一个新手去,TMD还不如我这个业余的去处理好了.......要不是跟客户熟,客户早就发飙了.....好了,牢骚发完了处理问题吧.....
    硬件不熟,还是先检查RAC为啥起不来,检查crsd进程的log:
2011-02-07 15:03:03.869: [  CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..

2011-02-07 15:03:05.254: [ COMMCRS][351]clsc_connect: (1103b91d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_secu_crs))

2011-02-07 15:03:05.254: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9

2011-02-07 15:03:05.256: [  CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..

2011-02-07 15:03:06.590: [ COMMCRS][353]clsc_connect: (1103b91d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_secu_crs))

2011-02-07 15:03:06.590: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9

2011-02-07 15:03:06.590: [  CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..

2011-02-07 15:03:07.973: [ COMMCRS][355]clsc_connect: (1103b91d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_secu_crs))

发现是cssd没起来,继续检查cssd的日志,发现一些信息:
 [    CSSD]2011-02-07 15:13:08.415 >node3:    Copyright 2011, Oracle version 10.2.0.4.0
[    CSSD]2011-02-07 15:13:08.415 >node3:    CSS daemon log for node node1, number 1, in cluster crs
[    CSSD]2011-02-07 15:13:08.421 [1] >TRACE:   clssscmain: local-only set to false
[    CSSD]2011-02-07 15:13:08.427 [1] >TRACE:   clssnmReadNodeInfo: added node 1 (node1) to cluster
[    CSSD]2011-02-07 15:13:08.431 [1] >TRACE:   clssnmReadNodeInfo: added node 2 (node2) to cluster
[    CSSD]2011-02-07 15:13:08.436 [1] >TRACE:   clssnmReadNodeInfo: added node 3 (node3) to cluster
[  clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=node1DBG_CSSD))
[    CSSD]2011-02-07 15:13:08.441 [1] >TRACE:   clssnmReadNodeInfo: added node 4 (node4) to cluster
[    CSSD]2011-02-07 15:13:08.444 [1] >TRACE:   clssgmInitCMInfo: Wait for remote node termination set to 805306368 seconds
[    CSSD]2011-02-07 15:13:08.446 [1029] >TRACE:   clssnm_skgxninit: Compatible vendor clusterware not in use
[    CSSD]2011-02-07 15:13:08.446 [1029] >TRACE:   clssnm_skgxnmon: skgxn init failed
[    CSSD]2011-02-07 15:13:08.447 [1] >TRACE:   clssnmNMInitialize: misscount set to (30)
[    CSSD]2011-02-07 15:13:08.448 [1] >TRACE:   clssnmNMInitialize: Network heartbeat thresholds are: impending reconfig 15000 ms, reconfig start (misscount) 30000 ms
[    CSSD]2011-02-07 15:13:08.451 [1] >TRACE:   clssnmDiskStateChange: state from 1 to 2 disk (0//dev/voting1)
[    CSSD]2011-02-07 15:13:08.452 [1030] >TRACE:   clssnmvDPT: spawned for disk 0 (/dev/voting1)
[    CSSD]2011-02-07 15:13:08.453 [1] >TRACE:   clssnmDiskStateChange: state from 1 to 2 disk (1//dev/voting2)
[    CSSD]2011-02-07 15:13:08.453 [1287] >TRACE:   clssnmvDPT: spawned for disk 1 (/dev/voting2)
[    CSSD]2011-02-07 15:13:08.455 [1] >TRACE:   clssnmDiskStateChange: state from 1 to 2 disk (2//dev/voting3)
[    CSSD]2011-02-07 15:13:08.455 [1544] >TRACE:   clssnmvDPT: spawned for disk 2 (/dev/voting3)
[    CSSD]2011-02-07 15:13:10.464 [1030] >TRACE:   clssnmDiskStateChange: state from 2 to 4 disk (0//dev/voting1)
[    CSSD]2011-02-07 15:13:10.464 [1801] >TRACE:   clssnmvKillBlockThread: spawned for disk 0 (/dev/voting1) initial sleep interval (1000)ms
[    CSSD]2011-02-07 15:13:10.464 [1030] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(13) wrtcnt(604) LATS(4844712) Disk lastSeqNo(604)
[    CSSD]2011-02-07 15:13:10.464 [1030] >TRACE:   clssnmReadDskHeartbeat: node(3) is down. rcfg(11) wrtcnt(604) LATS(4844712) Disk lastSeqNo(604)
[    CSSD]2011-02-07 15:13:10.464 [1030] >TRACE:   clssnmReadDskHeartbeat: node(4) is down. rcfg(14) wrtcnt(3085) LATS(4844712) Disk lastSeqNo(3085)
[    CSSD]2011-02-07 15:13:10.481 [1544] >TRACE:   clssnmDiskStateChange: state from 2 to 4 disk (2//dev/voting3)
[    CSSD]2011-02-07 15:13:10.481 [2058] >TRACE:   clssnmvKillBlockThread: spawned for disk 2 (/dev/voting3) initial sleep interval (1000)ms
[    CSSD]2011-02-07 15:13:10.481 [1544] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(13) wrtcnt(604) LATS(4844729) Disk lastSeqNo(604)
[    CSSD]2011-02-07 15:13:10.481 [1544] >TRACE:   clssnmReadDskHeartbeat: node(3) is down. rcfg(11) wrtcnt(605) LATS(4844729) Disk lastSeqNo(605)
[    CSSD]2011-02-07 15:13:10.487 [1287] >TRACE:   clssnmDiskStateChange: state from 2 to 4 disk (1//dev/voting2)
[    CSSD]2011-02-07 15:13:10.487 [2315] >TRACE:   clssnmvKillBlockThread: spawned for disk 1 (/dev/voting2) initial sleep interval (1000)ms
[    CSSD]2011-02-07 15:13:10.488 [1] >TRACE:   clssnmFatalInit: fatal mode enabled
[    CSSD]2011-02-07 15:13:10.500 [2829] >TRACE:   clssnmClusterListener: Listening on (ADDRESS=(PROTOCOL=tcp)(HOST=node1-priv)(PORT=49895))

[    CSSD]2011-02-07 15:13:10.500 [2829] >TRACE:   clssnmClusterListener: Probing node node2 (2), probcon(1113fa5d0)
[    CSSD]2011-02-07 15:13:10.500 [2829] >TRACE:   clssnmClusterListener: Probing node node3 (3), probcon(11156db50)
[    CSSD]2011-02-07 15:13:10.501 [2829] >TRACE:   clssnmClusterListener: Probing node node4 (4), probcon(111570730)
[    CSSD]2011-02-07 15:13:10.501 [2829] >TRACE:   clssnmDiscHelper: node2, node(2) connection failed, con (1113fa5d0), probe(1113fa5d0)

只发现“clssnm_skgxnmon: skgxn init failed”这样的错误,在metalink上查了一下,发现没啥可以参考的结果,其实这个日志里一个重要的信息被我忽略了:[    CSSD]2011-02-07 17:29:53.412 [1] >TRACE:   clssgmInitCMInfo: Wait for remote node termination set to 805306368 seconds,这导致我花了很多时间去检查日志,重启主机,在我ps -ef|grep d.bin的时候也忽略了oprocd进程的参数值。
阅读(2142) | 评论(0) | 转发(0) |
0

上一篇:没有了

下一篇:春节后处理的第一个RAC故障--2

给主人留下些什么吧!~~