Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1192634
  • 博文数量: 245
  • 博客积分: 10185
  • 博客等级: 上将
  • 技术积分: 2744
  • 用 户 组: 普通用户
  • 注册时间: 2006-10-30 17:07
文章分类

全部博文(245)

文章存档

2015年(1)

2014年(1)

2013年(1)

2012年(1)

2011年(37)

2010年(20)

2009年(14)

2008年(38)

2007年(88)

2006年(44)

分类:

2008-06-05 17:18:34

今天测试新build时,配完重起后发现vcs只启动5个端口。

root@lxsfrac04 # gabconfig -a

GAB Port Memberships

===============================================================

Port a gen   70a501 membership 01                             

Port b gen   70a507 membership 01                             

Port d gen   70a503 membership 01                             

Port h gen   70a506 membership 01                             

Port o gen   70a509 membership 01 

 

情况是运行tc时有个步骤修改vcs的配置文件时系统做了个haconf  -makerw操作后导致的。以前遇到过这种问题,一般来说f,v,w未启动均与vcs有关。

先察看一下日志vcs日志。

Lxsfrac04# tail –f /var/VRTSvcs/log/engine_A.log

 

…………………………………………………………………………………………….

2008/06/02 10:29:43 VCS NOTICE V-16-1-10114 Opening GAB library

2008/06/02 10:29:43 VCS NOTICE V-16-1-10619 'HAD' starting on: lxsfrac04

2008/06/02 10:29:43 VCS ERROR V-16-1-10624 Local cluster configuration stale

2008/06/02 10:29:43 VCS INFO V-16-1-10125 GAB timeout set to 15000 ms

2008/06/02 10:29:47 VCS INFO V-16-1-10077 Received new cluster membership

2008/06/02 10:29:47 VCS NOTICE V-16-1-10080 System (lxsfrac04) - Membership: 0x3, Jeopardy: 0x0

2008/06/02 10:29:47 VCS NOTICE V-16-1-10322 System  (Node '1') changed state from UNKNOWN to INITING

2008/06/02 10:29:47 VCS NOTICE V-16-1-10086 System lxsfrac04 (Node '0') is in Regular Membership - Membership: 0x3

2008/06/02 10:29:47 VCS NOTICE V-16-1-10086 System  (Node '1') is in Regular Membership - Membership: 0x3

2008/06/02 10:29:47 VCS NOTICE V-16-1-10453 Node: 1 changed name from: '' to: 'lxsfrac03'

2008/06/02 10:29:47 VCS NOTICE V-16-1-10322 System lxsfrac03 (Node '1') changed state from INITING to STALE_ADMIN_WAIT

2008/06/02 10:29:47 VCS NOTICE V-16-1-10322 System lxsfrac04 (Node '0') changed state from STALE_DISCOVER_WAIT to STALE_ADMIN_WAIT

2008/06/02 10:37:01 VCS NOTICE V-16-1-11022 VCS engine (had) started

2008/06/02 10:37:01 VCS NOTICE V-16-1-11050 VCS engine version=4.1

2008/06/02 10:37:01 VCS NOTICE V-16-1-11051 VCS engine join version=4.1001

2008/06/02 10:37:01 VCS NOTICE V-16-1-11052 VCS engine pstamp=4.1 03/15/06-20:13:00

2008/06/02 10:37:01 VCS NOTICE V-16-1-10114 Opening GAB library

2008/06/02 10:37:04 VCS NOTICE V-16-1-10619 'HAD' starting on: lxsfrac04

2008/06/02 10:37:06 VCS ERROR V-16-1-10624 Local cluster configuration stale

2008/06/02 10:37:06 VCS INFO V-16-1-10125 GAB timeout set to 15000 ms

2008/06/02 10:37:10 VCS INFO V-16-1-10077 Received new cluster membership

2008/06/02 10:37:10 VCS NOTICE V-16-1-10080 System (lxsfrac04) - Membership: 0x1, Jeopardy: 0x2

2008/06/02 10:37:10 VCS NOTICE V-16-1-10086 System lxsfrac04 (Node '0') is in Regular Membership - Membership: 0x1

2008/06/02 10:37:10 VCS NOTICE V-16-1-10322 System lxsfrac04 (Node '0') changed state from STALE_DISCOVER_WAIT to STALE_ADMIN_WAIT

2008/06/02 10:37:20 VCS INFO V-16-1-10077 Received new cluster membership

2008/06/02 10:37:20 VCS NOTICE V-16-1-10080 System (lxsfrac04) - Membership: 0x3, Jeopardy: 0x0

2008/06/02 10:37:20 VCS NOTICE V-16-1-10322 System  (Node '1') changed state from UNKNOWN to INITING

2008/06/02 10:37:20 VCS NOTICE V-16-1-10086 System  (Node '1') is in Regular Membership - Membership: 0x3

2008/06/02 10:37:20 VCS NOTICE V-16-1-10453 Node: 1 changed name from: '' to: 'lxsfrac03'

2008/06/02 10:37:20 VCS NOTICE V-16-1-10322 System lxsfrac03 (Node '1') changed state from INITING to STALE_DISCOVER_WAIT

2008/06/02 10:37:20 VCS NOTICE V-16-1-10322 System lxsfrac03 (Node '1') changed state from STALE_DISCOVER_WAIT to STALE_ADMIN_WAIT

2008/06/02 10:53:38 VCS ERROR V-16-1-10069 All systems have configuration files marked STALE.  Unable to form cluster.

2008/06/02 10:53:38 VCS INFO V-16-1-50135 User root fired command: MSG_CLUSTER_STOP_SYS from localhost

2008/06/02 10:53:38 VCS NOTICE V-16-1-10322 System lxsfrac04 (Node '0') changed state from STALE_ADMIN_WAIT to EXITED

2008/06/02 10:54:49 VCS NOTICE V-16-1-11022 VCS engine (had) started

2008/06/02 10:54:49 VCS NOTICE V-16-1-11050 VCS engine version=4.1

2008/06/02 10:54:49 VCS NOTICE V-16-1-11051 VCS engine join version=4.1001

2008/06/02 10:54:49 VCS NOTICE V-16-1-11052 VCS engine pstamp=4.1 03/15/06-20:13:00

2008/06/02 10:54:49 VCS NOTICE V-16-1-10114 Opening GAB library

2008/06/02 10:54:49 VCS NOTICE V-16-1-10619 'HAD' starting on: lxsfrac04

2008/06/02 10:54:49 VCS INFO V-16-1-10125 GAB timeout set to 15000 ms

2008/06/02 10:54:54 VCS INFO V-16-1-10077 Received new cluster membership

2008/06/02 10:54:54 VCS NOTICE V-16-1-10080 System (lxsfrac04) - Membership: 0x3, Jeopardy: 0x0

2008/06/02 10:54:54 VCS NOTICE V-16-1-10322 System  (Node '1') changed state from UNKNOWN to INITING

2008/06/02 10:54:54 VCS NOTICE V-16-1-10086 System lxsfrac04 (Node '0') is in Regular Membership - Membership: 0x3

2008/06/02 10:54:54 VCS NOTICE V-16-1-10086 System  (Node '1') is in Regular Membership - Membership: 0x3

2008/06/02 10:54:54 VCS NOTICE V-16-1-10453 Node: 1 changed name from: '' to: 'lxsfrac03'

2008/06/02 10:54:54 VCS NOTICE V-16-1-10322 System lxsfrac03 (Node '1') changed state from INITING to STALE_ADMIN_WAIT

2008/06/02 10:54:54 VCS NOTICE V-16-1-10322 System lxsfrac04 (Node '0') changed state from CURRENT_DISCOVER_WAIT to LOCAL_BUILD

2008/06/02 10:54:54 VCS NOTICE V-16-1-10322 System lxsfrac03 (Node '1') changed state from STALE_ADMIN_WAIT to STALE_PEER_WAIT

2008/06/02 10:54:55 VCS WARNING V-16-1-10030 UseFence=NONE. Hence do not need fencing

2008/06/02 10:54:55 VCS NOTICE V-16-1-10322 System lxsfrac04 (Node '0') changed state from LOCAL_BUILD to RUNNING

2008/06/02 10:54:55 VCS NOTICE V-16-1-10322 System lxsfrac03 (Node '1') changed state from STALE_PEER_WAIT to REMO TE_BUILD

2008/06/02 10:54:55 VCS NOTICE V-16-1-10016 Agent /opt/VRTSvcs/bin/CFSfsckd/CFSfsckdAgent for resource type CFSfsc

kd successfully started at Mon Jun  2 10:54:55 2008

2008/06/02 10:54:55 VCS NOTICE V-16-1-10016 Agent /opt/VRTSvcs/bin/CVMCluster/CVMClusterAgent for resource type CV

MCluster successfully started at Mon Jun  2 10:54:55 2008

2008/06/02 10:54:55 VCS NOTICE V-16-1-10016 Agent /opt/VRTSvcs/bin/CVMVxconfigd/CVMVxconfigdAgent for resource typ

e CVMVxconfigd successfully started at Mon Jun  2 10:54:55 2008

2008/06/02 10:54:55 VCS INFO V-16-1-10463 Sending snapshot to node: 1

2008/06/02 10:54:55 VCS NOTICE V-16-1-10322 System lxsfrac03 (Node '1') changed state from REMOTE_BUILD to RUNNING

2008/06/02 10:54:55 VCS ERROR V-16-10001-1005 (lxsfrac04) CVMCluster:???:monitor:node - state: out of cluster

2008/06/02 10:54:56 VCS INFO V-16-1-10304 Resource cvm_clus (Owner: unknown, Group: cvm) is offline on lxsfrac04 (First probe)

2008/06/02 10:54:56 VCS INFO V-16-1-10304 Resource vxfsckd (Owner: unknown, Group: cvm) is offline on lxsfrac04 (First probe)

2008/06/02 10:54:56 VCS INFO V-16-1-10297 Resource cvm_vxconfigd (Owner: unknown, Group: cvm) is online on lxsfrac04 (First probe)

2008/06/02 10:54:56 VCS NOTICE V-16-1-10438 Group cvm has been probed on system lxsfrac04

2008/06/02 10:54:56 VCS NOTICE V-16-1-10442 Initiating auto-start online of group cvm on system lxsfrac04

2008/06/02 10:54:56 VCS NOTICE V-16-1-10301 Initiating Online of Resource cvm_clus (Owner: unknown, Group: cvm) on System lxsfrac04

2008/06/02 10:54:56 VCS ERROR V-16-10001-1005 (lxsfrac03) CVMCluster:???:monitor:node - state: out of cluster

     察看一下 vcs状态,

root@lxsfrac04 # hastatus -sum

 

-- SYSTEM STATE

-- System               State                Frozen             

 

A  lxsfrac03            STALE_ADMIN_WAIT     0                   

A  lxsfrac04            STALE_ADMIN_WAIT     0    

               

root@lxsfrac04 # hastatus     

attempting to connect....connected

 

group       resource        system                message            

------- --------------- ------------ ----------------------------------------

                          lxsfrac04    STALE ADMIN WAIT: all systems stale

                          lxsfrac03    STALE ADMIN WAIT: all systems stale

^C

 

此时的状态为stale,赶紧温习一下vcs关于stale的讲解,没看太明白,大概意思是说:vcs运行时会在共享内存上保留一份配置信息,如果当前的main.cf与内存上的配置不一致的时候就会出现stale状态,会生成.stale文件。

先尝试将配置状态转为readonly状态,失败

root@lxsfrac04 # haconf -dump -makero

VCS WARNING V-16-1-50129 Operation 'haconf -dump -makero' rejected as the node is in STALE_ADMIN_WAIT state

    停掉vcs

root@lxsfrac04 # hastop –all

删除.stale文件

root@lxsfrac04 # ls -alrt

total 240

………………………………………………………………………………………

-rw-------   2 root     root         495 Jun  1 23:19 CFSTypes.cf

-rw-------   1 root     root         941 Jun  1 23:19 main.cf

-rw-------   1 root     root           0 Jun  2 09:53 .stale

-rw-------   1 root     root         373 Jun  2 10:03 MultiPrivNIC.cf

-r--r--r--   1 root     sys          366 Jun  2 10:03 PrivNIC.cf_new

-rw-------   1 root     root         395 Jun  2 10:04 PrivNIC.cf

-rw-------   1 root     root        1013 Jun  2 10:28 main.cf_for_privNIC

-rw-------   1 root     root       71618 Jun  2 10:29 main.cmd

drwxr-xr-x   2 root     other       1024 Jun  2 10:37 .

………………………………………………………………………………………………

root@lxsfrac04 # rm -rf .stale

重起各节点vcs

root@lxsfrac04 # hastart

root@lxsfrac03 # hastart

root@lxsfrac04 # gabconfig -a

GAB Port Memberships

===============================================================

Port a gen   70a501 membership 01                             

Port b gen   70a507 membership 01                             

Port d gen   70a503 membership 01                              

Port f gen   70a512 membership 01                             

Port h gen   70a508 membership 01                             

Port o gen   70a509 membership 01                             

Port v gen   70a50e membership 01                              

Port w gen   70a510 membership 01 

 

再看vcs日志

Lxsfrac04# tail –f /var/VRTSvcs/log/engine_A.log

 

2008/06/02 10:54:57 VCS INFO V-16-1-10297 Resource cvm_vxconfigd (Owner: unknown, Group: cvm) is online on lxsfrac03 (First probe)

2008/06/02 10:54:57 VCS INFO V-16-1-10304 Resource vxfsckd (Owner: unknown, Group: cvm) is offline on lxsfrac03 (First probe)

2008/06/02 10:54:57 VCS INFO V-16-1-10304 Resource cvm_clus (Owner: unknown, Group: cvm) is offline on lxsfrac03 (First probe)

2008/06/02 10:54:57 VCS NOTICE V-16-1-10438 Group cvm has been probed on system lxsfrac03

2008/06/02 10:54:57 VCS NOTICE V-16-1-10442 Initiating auto-start online of group cvm on system lxsfrac03

2008/06/02 10:54:57 VCS NOTICE V-16-1-10301 Initiating Online of Resource cvm_clus (Owner: unknown, Group: cvm) on  System lxsfrac03

2008/06/02 10:55:15 VCS INFO V-16-10001-1003 (lxsfrac03) CVMCluster:cvm_clus:online:CVMCluster role is - mode: enabled: cluster active - MASTER

master: lxsfrac03

2008/06/02 10:55:17 VCS INFO V-16-1-10298 Resource cvm_clus (Owner: unknown, Group: cvm) is online on lxsfrac03 (VCS initiated)

2008/06/02 10:55:17 VCS NOTICE V-16-1-10301 Initiating Online of Resource vxfsckd (Owner: unknown, Group: cvm) on System lxsfrac03

2008/06/02 10:55:19 VCS INFO V-16-1-10298 Resource vxfsckd (Owner: unknown, Group: cvm) is online on lxsfrac03 (VCS initiated)

2008/06/02 10:55:19 VCS NOTICE V-16-1-10447 Group cvm is online on system lxsfrac03

2008/06/02 10:55:19 VCS INFO V-16-10001-15051 (lxsfrac03) triggers:???:nfs_restart:Trigger does not do anything as there is no NFS/NFSLock/Share resource in the group

2008/06/02 10:55:19 VCS INFO V-16-6-15002 (lxsfrac03) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/nfs_restart cvm    successfully

2008/06/02 10:55:19 VCS INFO V-16-6-15004 (lxsfrac03) hatrigger:Failed to send trigger for postonline; script doesn't exist

2008/06/02 10:55:35 VCS INFO V-16-10001-1003 (lxsfrac04) CVMCluster:cvm_clus:online:CVMCluster role is - mode: enabled: cluster active – SLAVE master: lxsfrac03

2008/06/02 10:55:37 VCS INFO V-16-1-10298 Resource cvm_clus (Owner: unknown, Group: cvm) is online on lxsfrac04 (VCS initiated)

2008/06/02 10:55:37 VCS NOTICE V-16-1-10301 Initiating Online of Resource vxfsckd (Owner: unknown, Group: cvm) on System lxsfrac04

2008/06/02 10:55:39 VCS INFO V-16-1-10298 Resource vxfsckd (Owner: unknown, Group: cvm) is online on lxsfrac04 (VCS initiated)

2008/06/02 10:55:39 VCS NOTICE V-16-1-10447 Group cvm is online on system lxsfrac04

2008/06/02 10:55:39 VCS INFO V-16-10001-15051 (lxsfrac04) triggers:???:nfs_restart:Trigger does not do anything as  there is no NFS/NFSLock/Share resource in the group

2008/06/02 10:55:39 VCS INFO V-16-6-15002 (lxsfrac04) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/nfs_restart cvm    successfully

 

 

总结:

通过该case熟悉了stale的原理及出解决方法。也注意更多的用hastatus来查看vcs状态。

其实很多东西都是相通的,在dns里就有各个zone的文本文件(也就是dns的“库文件“),我们做配置时改的都是这些文件,但真正生效的用户查询出结果的不是这个文本文件,而是通过文本文件加载到内存里的内容。

 

阅读(5667) | 评论(2) | 转发(0) |
给主人留下些什么吧!~~

chinaunix网友2009-06-30 11:04:21

简单的说,如果系统内存中的main.cf 和 硬盘中的main.cf 不一致,就会导致 vcs 进入.stale 状态.(比如你没停止vcs而直接编辑main.cf) ---------------wangyl1977

wangdonsy2009-05-14 14:08:31

stale的讲解那里有资料