分类:
2008-06-05 17:18:34
今天测试新build时,配完重起后发现vcs只启动5个端口。
root@lxsfrac04 # gabconfig -a
===============================================================
Port a gen 70a501 membership 01
Port b gen 70a507 membership 01
Port d gen 70a503 membership 01
Port h gen 70a506 membership 01
Port o gen 70a509 membership 01
情况是运行tc时有个步骤修改vcs的配置文件时系统做了个haconf -makerw操作后导致的。以前遇到过这种问题,一般来说f,v,w未启动均与vcs有关。
先察看一下日志vcs日志。
Lxsfrac04# tail –f /var/VRTSvcs/log/engine_A.log
…………………………………………………………………………………………….
2008/06/02 10:29:43 VCS NOTICE V-16-1-10114 Opening GAB library
2008/06/02 10:29:43 VCS NOTICE V-16-1-10619 'HAD' starting on: lxsfrac04
2008/06/02 10:29:43 VCS ERROR V-16-1-10624 Local cluster configuration stale
2008/06/02 10:29:43 VCS INFO V-16-1-10125 GAB timeout set to 15000 ms
2008/06/02 10:29:47 VCS INFO V-16-1-10077 Received new cluster membership
2008/06/02 10:29:47 VCS NOTICE V-16-1-10080 System (lxsfrac04) - Membership: 0x3, Jeopardy: 0x0
2008/06/02 10:29:47 VCS NOTICE V-16-1-10322 System (Node '1') changed state from UNKNOWN to INITING
2008/06/02 10:29:47 VCS NOTICE V-16-1-10086 System lxsfrac04 (Node '0') is in Regular Membership - Membership: 0x3
2008/06/02 10:29:47 VCS NOTICE V-16-1-10086 System (Node '1') is in Regular Membership - Membership: 0x3
2008/06/02 10:29:47 VCS NOTICE V-16-1-10453 Node: 1 changed name from: '' to: 'lxsfrac03'
2008/06/02 10:29:47 VCS NOTICE V-16-1-10322 System lxsfrac03 (Node '1') changed state from INITING to STALE_ADMIN_WAIT
2008/06/02 10:29:47 VCS NOTICE V-16-1-10322 System lxsfrac04 (Node '0') changed state from STALE_DISCOVER_WAIT to STALE_ADMIN_WAIT
2008/06/02 10:37:01 VCS NOTICE V-16-1-11022 VCS engine (had) started
2008/06/02 10:37:01 VCS NOTICE V-16-1-11050 VCS engine version=4.1
2008/06/02 10:37:01 VCS NOTICE V-16-1-11051 VCS engine join version=4.1001
2008/06/02 10:37:01 VCS NOTICE V-16-1-11052 VCS engine pstamp=4.1 03/15/06-20:13:00
2008/06/02 10:37:01 VCS NOTICE V-16-1-10114 Opening GAB library
2008/06/02 10:37:04 VCS NOTICE V-16-1-10619 'HAD' starting on: lxsfrac04
2008/06/02 10:37:06 VCS ERROR V-16-1-10624 Local cluster configuration stale
2008/06/02 10:37:06 VCS INFO V-16-1-10125 GAB timeout set to 15000 ms
2008/06/02 10:37:10 VCS INFO V-16-1-10077 Received new cluster membership
2008/06/02 10:37:10 VCS NOTICE V-16-1-10080 System (lxsfrac04) - Membership: 0x1, Jeopardy: 0x2
2008/06/02 10:37:10 VCS NOTICE V-16-1-10086 System lxsfrac04 (Node '0') is in Regular Membership - Membership: 0x1
2008/06/02 10:37:10 VCS NOTICE V-16-1-10322 System lxsfrac04 (Node '0') changed state from STALE_DISCOVER_WAIT to STALE_ADMIN_WAIT
2008/06/02 10:37:20 VCS INFO V-16-1-10077 Received new cluster membership
2008/06/02 10:37:20 VCS NOTICE V-16-1-10080 System (lxsfrac04) - Membership: 0x3, Jeopardy: 0x0
2008/06/02 10:37:20 VCS NOTICE V-16-1-10322 System (Node '1') changed state from UNKNOWN to INITING
2008/06/02 10:37:20 VCS NOTICE V-16-1-10086 System (Node '1') is in Regular Membership - Membership: 0x3
2008/06/02 10:37:20 VCS NOTICE V-16-1-10453 Node: 1 changed name from: '' to: 'lxsfrac03'
2008/06/02 10:37:20 VCS NOTICE V-16-1-10322 System lxsfrac03 (Node '1') changed state from INITING to STALE_DISCOVER_WAIT
2008/06/02 10:37:20 VCS NOTICE V-16-1-10322 System lxsfrac03 (Node '1') changed state from STALE_DISCOVER_WAIT to STALE_ADMIN_WAIT
2008/06/02 10:53:38 VCS ERROR V-16-1-10069 All systems have configuration files marked STALE. Unable to form cluster.
2008/06/02 10:53:38 VCS INFO V-16-1-50135 User root fired command: MSG_CLUSTER_STOP_SYS from localhost
2008/06/02 10:53:38 VCS NOTICE V-16-1-10322 System lxsfrac04 (Node '0') changed state from STALE_ADMIN_WAIT to EXITED
2008/06/02 10:54:49 VCS NOTICE V-16-1-11022 VCS engine (had) started
2008/06/02 10:54:49 VCS NOTICE V-16-1-11050 VCS engine version=4.1
2008/06/02 10:54:49 VCS NOTICE V-16-1-11051 VCS engine join version=4.1001
2008/06/02 10:54:49 VCS NOTICE V-16-1-11052 VCS engine pstamp=4.1 03/15/06-20:13:00
2008/06/02 10:54:49 VCS NOTICE V-16-1-10114 Opening GAB library
2008/06/02 10:54:49 VCS NOTICE V-16-1-10619 'HAD' starting on: lxsfrac04
2008/06/02 10:54:49 VCS INFO V-16-1-10125 GAB timeout set to 15000 ms
2008/06/02 10:54:54 VCS INFO V-16-1-10077 Received new cluster membership
2008/06/02 10:54:54 VCS NOTICE V-16-1-10080 System (lxsfrac04) - Membership: 0x3, Jeopardy: 0x0
2008/06/02 10:54:54 VCS NOTICE V-16-1-10322 System (Node '1') changed state from UNKNOWN to INITING
2008/06/02 10:54:54 VCS NOTICE V-16-1-10086 System lxsfrac04 (Node '0') is in Regular Membership - Membership: 0x3
2008/06/02 10:54:54 VCS NOTICE V-16-1-10086 System (Node '1') is in Regular Membership - Membership: 0x3
2008/06/02 10:54:54 VCS NOTICE V-16-1-10453 Node: 1 changed name from: '' to: 'lxsfrac03'
2008/06/02 10:54:54 VCS NOTICE V-16-1-10322 System lxsfrac03 (Node '1') changed state from INITING to STALE_ADMIN_WAIT
2008/06/02 10:54:54 VCS NOTICE V-16-1-10322 System lxsfrac04 (Node '0') changed state from CURRENT_DISCOVER_WAIT to LOCAL_BUILD
2008/06/02 10:54:54 VCS NOTICE V-16-1-10322 System lxsfrac03 (Node '1') changed state from STALE_ADMIN_WAIT to STALE_PEER_WAIT
2008/06/02 10:54:55 VCS WARNING V-16-1-10030 UseFence=NONE. Hence do not need fencing
2008/06/02 10:54:55 VCS NOTICE V-16-1-10322 System lxsfrac04 (Node '0') changed state from LOCAL_BUILD to RUNNING
2008/06/02 10:54:55 VCS NOTICE V-16-1-10322 System lxsfrac03 (Node '1') changed state from STALE_PEER_WAIT to REMO TE_BUILD
2008/06/02 10:54:55 VCS NOTICE V-16-1-10016 Agent /opt/VRTSvcs/bin/CFSfsckd/CFSfsckdAgent for resource type CFSfsc
kd successfully started at Mon Jun 2 10:54:55 2008
2008/06/02 10:54:55 VCS NOTICE V-16-1-10016 Agent /opt/VRTSvcs/bin/CVMCluster/CVMClusterAgent for resource type CV
MCluster successfully started at Mon Jun 2 10:54:55 2008
2008/06/02 10:54:55 VCS NOTICE V-16-1-10016 Agent /opt/VRTSvcs/bin/CVMVxconfigd/CVMVxconfigdAgent for resource typ
e CVMVxconfigd successfully started at Mon Jun 2 10:54:55 2008
2008/06/02 10:54:55 VCS INFO V-16-1-10463 Sending snapshot to node: 1
2008/06/02 10:54:55 VCS NOTICE V-16-1-10322 System lxsfrac03 (Node '1') changed state from REMOTE_BUILD to RUNNING
2008/06/02 10:54:55 VCS ERROR V-16-10001-1005 (lxsfrac04) CVMCluster:???:monitor:node - state: out of cluster
2008/06/02 10:54:56 VCS INFO V-16-1-10304 Resource cvm_clus (Owner: unknown, Group: cvm) is offline on lxsfrac04 (First probe)
2008/06/02 10:54:56 VCS INFO V-16-1-10304 Resource vxfsckd (Owner: unknown, Group: cvm) is offline on lxsfrac04 (First probe)
2008/06/02 10:54:56 VCS INFO V-16-1-10297 Resource cvm_vxconfigd (Owner: unknown, Group: cvm) is online on lxsfrac04 (First probe)
2008/06/02 10:54:56 VCS NOTICE V-16-1-10438 Group cvm has been probed on system lxsfrac04
2008/06/02 10:54:56 VCS NOTICE V-16-1-10442 Initiating auto-start online of group cvm on system lxsfrac04
2008/06/02 10:54:56 VCS NOTICE V-16-1-10301 Initiating Online of Resource cvm_clus (Owner: unknown, Group: cvm) on System lxsfrac04
2008/06/02 10:54:56 VCS ERROR V-16-10001-1005 (lxsfrac03) CVMCluster:???:monitor:node - state: out of cluster
察看一下 vcs状态,
root@lxsfrac04 # hastatus -sum
--
-- System State Frozen
A lxsfrac03 STALE_ADMIN_WAIT 0
A lxsfrac04 STALE_ADMIN_WAIT 0
root@lxsfrac04 # hastatus
attempting to connect....connected
group resource system message
------- --------------- ------------ ----------------------------------------
lxsfrac04 STALE ADMIN WAIT: all systems stale
lxsfrac03 STALE ADMIN WAIT: all systems stale
^C
此时的状态为stale,赶紧温习一下vcs关于stale的讲解,没看太明白,大概意思是说:vcs运行时会在共享内存上保留一份配置信息,如果当前的main.cf与内存上的配置不一致的时候就会出现stale状态,会生成.stale文件。
先尝试将配置状态转为readonly状态,失败
root@lxsfrac04 # haconf -dump -makero
VCS WARNING V-16-1-50129 Operation 'haconf -dump -makero' rejected as the node is in STALE_ADMIN_WAIT state
停掉vcs
root@lxsfrac04 # hastop –all
删除.stale文件
root@lxsfrac04 # ls -alrt
total 240
………………………………………………………………………………………
-rw------- 2 root root 495 Jun 1 23:19 CFSTypes.cf
-rw------- 1 root root 941 Jun 1 23:19 main.cf
-rw------- 1 root root 0 Jun 2 09:53 .stale
-rw------- 1 root root 373 Jun 2 10:03 MultiPrivNIC.cf
-r--r--r-- 1 root sys 366 Jun 2 10:03 PrivNIC.cf_new
-rw------- 1 root root 395 Jun 2 10:04 PrivNIC.cf
-rw------- 1 root root 1013 Jun 2 10:28 main.cf_for_privNIC
-rw------- 1 root root 71618 Jun 2 10:29 main.cmd
drwxr-xr-x 2 root other 1024 Jun 2 10:37 .
………………………………………………………………………………………………
root@lxsfrac04 # rm -rf .stale
重起各节点vcs
root@lxsfrac04 # hastart
root@lxsfrac03 # hastart
root@lxsfrac04 # gabconfig -a
===============================================================
Port a gen 70a501 membership 01
Port b gen 70a507 membership 01
Port d gen 70a503 membership 01
Port f gen 70a512 membership 01
Port h gen 70a508 membership 01
Port o gen 70a509 membership 01
Port v gen 70a50e membership 01
Port w gen 70a510 membership 01
再看vcs日志
Lxsfrac04# tail –f /var/VRTSvcs/log/engine_A.log
2008/06/02 10:54:57 VCS INFO V-16-1-10297 Resource cvm_vxconfigd (Owner: unknown, Group: cvm) is online on lxsfrac03 (First probe)
2008/06/02 10:54:57 VCS INFO V-16-1-10304 Resource vxfsckd (Owner: unknown, Group: cvm) is offline on lxsfrac03 (First probe)
2008/06/02 10:54:57 VCS INFO V-16-1-10304 Resource cvm_clus (Owner: unknown, Group: cvm) is offline on lxsfrac03 (First probe)
2008/06/02 10:54:57 VCS NOTICE V-16-1-10438 Group cvm has been probed on system lxsfrac03
2008/06/02 10:54:57 VCS NOTICE V-16-1-10442 Initiating auto-start online of group cvm on system lxsfrac03
2008/06/02 10:54:57 VCS NOTICE V-16-1-10301 Initiating Online of Resource cvm_clus (Owner: unknown, Group: cvm) on System lxsfrac03
2008/06/02 10:55:15 VCS INFO V-16-10001-1003 (lxsfrac03) CVMCluster:cvm_clus:online:CVMCluster role is - mode: enabled: cluster active - MASTER
master: lxsfrac03
2008/06/02 10:55:17 VCS INFO V-16-1-10298 Resource cvm_clus (Owner: unknown, Group: cvm) is online on lxsfrac03 (VCS initiated)
2008/06/02 10:55:17 VCS NOTICE V-16-1-10301 Initiating Online of Resource vxfsckd (Owner: unknown, Group: cvm) on System lxsfrac03
2008/06/02 10:55:19 VCS INFO V-16-1-10298 Resource vxfsckd (Owner: unknown, Group: cvm) is online on lxsfrac03 (VCS initiated)
2008/06/02 10:55:19 VCS NOTICE V-16-1-10447 Group cvm is online on system lxsfrac03
2008/06/02 10:55:19 VCS INFO V-16-10001-15051 (lxsfrac03) triggers:???:nfs_restart:Trigger does not do anything as there is no NFS/NFSLock/Share resource in the group
2008/06/02 10:55:19 VCS INFO V-16-6-15002 (lxsfrac03) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/nfs_restart cvm successfully
2008/06/02 10:55:19 VCS INFO V-16-6-15004 (lxsfrac03) hatrigger:Failed to send trigger for postonline; script doesn't exist
2008/06/02 10:55:35 VCS INFO V-16-10001-1003 (lxsfrac04) CVMCluster:cvm_clus:online:CVMCluster role is - mode: enabled: cluster active – SLAVE master: lxsfrac03
2008/06/02 10:55:37 VCS INFO V-16-1-10298 Resource cvm_clus (Owner: unknown, Group: cvm) is online on lxsfrac04 (VCS initiated)
2008/06/02 10:55:37 VCS NOTICE V-16-1-10301 Initiating Online of Resource vxfsckd (Owner: unknown, Group: cvm) on System lxsfrac04
2008/06/02 10:55:39 VCS INFO V-16-1-10298 Resource vxfsckd (Owner: unknown, Group: cvm) is online on lxsfrac04 (VCS initiated)
2008/06/02 10:55:39 VCS NOTICE V-16-1-10447 Group cvm is online on system lxsfrac04
2008/06/02 10:55:39 VCS INFO V-16-10001-15051 (lxsfrac04) triggers:???:nfs_restart:Trigger does not do anything as there is no NFS/NFSLock/Share resource in the group
2008/06/02 10:55:39 VCS INFO V-16-6-15002 (lxsfrac04) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/nfs_restart cvm successfully
总结:
通过该case熟悉了stale的原理及出解决方法。也注意更多的用hastatus来查看vcs状态。
其实很多东西都是相通的,在dns里就有各个zone的文本文件(也就是dns的“库文件“),我们做配置时改的都是这些文件,但真正生效的用户查询出结果的不是这个文本文件,而是通过文本文件加载到内存里的内容。