Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1196252
  • 博文数量: 245
  • 博客积分: 10185
  • 博客等级: 上将
  • 技术积分: 2744
  • 用 户 组: 普通用户
  • 注册时间: 2006-10-30 17:07
文章分类

全部博文(245)

文章存档

2015年(1)

2014年(1)

2013年(1)

2012年(1)

2011年(37)

2010年(20)

2009年(14)

2008年(38)

2007年(88)

2006年(44)

分类: 系统运维

2008-09-01 16:09:20

       今天早上起来登陆thoribm78,79发现8个端口一个都没有起来,

 

thoribm78# gabconfig -a

GAB Port Memberships

 

thoribm78# lltconfig –a list

llt也没有输出,表明llt也未启动。有些怀疑是不是新版本的问题,重起后就一个端口都未启动。

 

thoribm79# gabconfig -a

GAB Port Memberships

===============================================================

 

thoribm78# date

Sat Aug 23 18:29:05 PDT 2008

 

检查vcs日志

thoribm78# tail -f engine_A.log

2008/08/23 08:11:36 VCS NOTICE V-16-1-11027 VCS engine startup arguments=-restart

2008/08/23 08:11:36 VCS NOTICE V-16-1-11050 VCS engine version=5.0

2008/08/23 08:11:36 VCS NOTICE V-16-1-11051 VCS engine join version=5.0.30.0

2008/08/23 08:11:36 VCS NOTICE V-16-1-11052 VCS engine pstamp=Veritas-5.0MP3-07/20/08-09:29:00

2008/08/23 08:11:36 VCS NOTICE V-16-1-10114 Opening GAB library

2008/08/23 08:11:36 VCS NOTICE V-16-1-10619 'HAD' starting on: thoribm78

2008/08/23 08:11:36 VCS INFO V-16-1-10196 Cluster logger started

2008/08/23 08:11:36 VCS INFO V-16-1-10125 GAB timeout set to 30000 ms

2008/08/23 08:11:36 VCS ERROR V-16-1-10116 GabHandle::open failed errno = 161

2008/08/23 08:11:36 VCS ERROR V-16-1-11033 GAB open failed. Exiting

 

thoribm79# tail -f engine_A.log

2008/08/23 08:01:54 VCS NOTICE V-16-1-10322 System thoribm79 (Node '1') changed state from EXITING to EXITED

2008/08/23 08:09:07 VCS INFO V-16-1-10196 Cluster logger started

2008/08/23 08:09:07 VCS NOTICE V-16-1-11022 VCS engine (had) started

2008/08/23 08:09:07 VCS NOTICE V-16-1-11050 VCS engine version=5.0

2008/08/23 08:09:07 VCS NOTICE V-16-1-11051 VCS engine join version=5.0.30.0

2008/08/23 08:09:07 VCS NOTICE V-16-1-11052 VCS engine pstamp=Veritas-5.0MP3-07/20/08-09:29:00

2008/08/23 08:09:07 VCS NOTICE V-16-1-10114 Opening GAB library

2008/08/23 08:09:11 VCS NOTICE V-16-1-10619 'HAD' starting on: thoribm79

2008/08/23 08:09:14 VCS INFO V-16-1-10125 GAB timeout set to 30000 ms

2008/08/23 08:09:28 VCS CRITICAL V-16-1-11306 Did not receive cluster membership, manual intervention may be needed for seeding

V-16-1-11033 GAB open failed. Exiting

大概是Vcs打开gab进程,结果等到timeout已过,gab也未启动,只得退出。

用“ V-16-1-11033 GAB open failed. Exiting”做关键字,找到一篇文章是由于lltgab的启动脚本丢失导致lltgab无法启动。

 

 

 

gab error v-15-1-20109 port h registration failed gab not configured


Details:
On reboot, LLT and GAB doesn't get started.

We see the following error on bootup in messages file:
May 8 10:37:25 emarssu2 Had[2611]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-11034 Registering for cluster membership
May 8 10:37:25 emarssu2 gab: [ID 243033 kern.notice] GAB ERROR V-15-1-20109 Port h registration failed, GAB not configured
May 8 10:37:25 emarssu2 Had[2611]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10116 GabHandle::open failed errno = 261
May 8 10:37:25 emarssu2 Had[2611]: [ID 702911 daemon.notice] VCS ERROR V-16-1-11033 GAB open failed. Exiting
May 8 10:37:29 emarssu2 llt: [ID 296307 kern.notice] LLT Protocol unavailable
May 8 10:37:29 emarssu2 gab: [ID 983455 kern.notice] GAB INFO V-15-1-20022 GAB unavailable

In Solaris 10, there should be two files in /etc/rc2.d for LLT and GAB
# ls -l *llt
-rwxr--r-- 2 root sys 1539 Sep 29 2005 S70llt
-rwxr--r-- 1 root sys 2073 Apr 5 2004 X70llt
# ls -l *gab
-rwxr--r-- 3 root sys 1979 Nov 10 2005 S92gab
-rwxr--r-- 1 root sys 2346 Dec 10 2003 X92gab

If they are missing then LLT and GAB may not startup.

 

 

 

 

For some unknown reason ,my vcs engine startup arguments is not onenode any
more.Acturally I can change the parameter using command "hastart -onenode"
manually.But I want to know where can I change the argument="one node" ,so
that the vcs engine can startup automatically. My platform is sun
9.

Thanks more.
2006/07/28 23:38:31 VCS NOTICE V-16-1-11050 VCS engine version=4.0
2006/07/28 23:38:31 VCS NOTICE V-16-1-11051 VCS engine join version=4.1
2006/07/28 23:38:31 VCS NOTICE V-16-1-11052 VCS engine pstamp=4.0 07/23/04-15:56:00
2006/07/28 23:38:31 VCS NOTICE V-16-1-10114 Opening GAB library
2006/07/28 23:38:31 VCS NOTICE V-16-1-10619 'HAD' starting on: ztenms01
2006/07/28 23:38:32 VCS INFO V-16-1-10125 GAB timeout set to 15000 ms
2006/07/28 23:38:32 VCS ERROR V-16-1-10116 GabHandle: pen failed errno =
261
2006/07/28 23:38:32 VCS ERROR V-16-1-11033 GAB open failed. Exiting
2006/07/29 01:55:33 VCS NOTICE V-16-1-11022 VCS engine (had) started
2006/07/29 01:55:33 VCS NOTICE V-16-1-11027 VCS engine startup arguments=-onenode

2006/07/29 01:55:33 VCS NOTICE V-16-1-11022 VCS engine (had) started
2006/07/29 01:55:33 VCS NOTICE V-16-1-11027 VCS engine startup arguments=-onenode

2006/07/29 01:56:10 VCS NOTICE V-16-1-11050 VCS engine version=4.0
2006/07/29 01:56:10 VCS NOTICE V-16-1-11051 VCS engine join version=4.1
2006/07/29 01:56:10 VCS NOTICE V-16-1-11052 VCS engine pstamp=4.0 07/23/04-15:56:00
2006/07/29 01:56:22 VCS NOTICE V-16-1-10115 Using GABSIM

Fix GAB to start

If there is no GAB, VCS will not be able to communicate with any other
nodes and thus will start in a single node .

OK, so first fix /etc/llttab and /etc/llthosts (both machines must have
the same links and the same cluster id and the hosts file must know
about the other node), then fix /etc/gabtab to something like :
/sbin/gabconfig -n2


This will make sure that GAB will only start once there are 2 modes in
the cluster.

OK, to check if LLT has started properly: lltstat -vnv
to check if GAB started : gabconfig -a



Mike Wang wrote:

> For some unknown reason ,my vcs engine startup arguments is not onenode any
> more.Acturally I can change the parameter using command "hastart -onenode"
> manually.But I want to know where can I change the argument="one node" ,so
> that the vcs engine can startup automatically. My platform is

 

      我随即检查两机的启动脚本,都在。想起来检查一下lltgab的配置文件。

 

thoribm78# ls -rlt llt*

-rw-r--r--    1 root     system          110 Mar 24 09:16 llttab.bak

-rw-rw-r--    1 root     system          102 Aug 21 00:35 llt.conf

thoribm78# ls -rlt *tab

-rw-r--r--    1 root     system         2447 Dec 11 2007  bootptab

-rw-r--r--    1 root     system         4069 Aug 21 00:45 hasnaptab

-rw-r--r--    1 root     system          209 Aug 21 00:48 vrtswebtab

-rw-r--r--    1 root     system           23 Aug 21 02:31 vcsmmtab

-rw-r--r--    1 root     system          108 Aug 21 02:31 org.llttab

-rw-r--r--    1 root     system           23 Aug 21 02:31 gabtab

-rw-rw-r--    1 oracle   oinstall        697 Aug 22 01:36 oratab

-rw-r--r--    1 root     system         3674 Aug 23 08:04 inittab

-rw-r--r--    1 root     system           14 Aug 23 08:05 rmtab

-rw-r--r--    1 root     system           37 Aug 23 08:06 xtab

thoribm78# ls -rlt gab*

-rw-r--r--    1 root     system           23 Aug 21 02:31 gabtab

 

thoribm79# cd /etc/

thoribm79# ls -rlt llt*

-rw-rw-r--    1 root     system          102 Aug 21 00:35 llt.conf

-rw-r--r--    1 root     system           25 Aug 21 02:31 llthosts

-rw-r--r--    1 root     system          108 Aug 21 02:31 llttab

thoribm79# more llthosts

0 thoribm78

1 thoribm79

 

thoribm79# ls -rlt *tab

-rw-r--r--    1 root     system         2447 Dec 11 2007  bootptab

-rw-r--r--    1 root     system         4069 Aug 21 00:45 hasnaptab

-rw-r--r--    1 root     system          209 Aug 21 00:48 vrtswebtab

-rw-r--r--    1 root     system           23 Aug 21 02:31 vcsmmtab

-rw-r--r--    1 root     system           23 Aug 21 02:31 gabtab

-rw-r--r--    1 root     system          108 Aug 21 02:31 llttab

-rw-rw-r--    1 oracle   oinstall        697 Aug 22 01:36 oratab

-rw-r--r--    1 root     system         3853 Aug 23 08:06 inittab

thoribm79#

thoribm79# ls -rlt gab*

-rw-r--r--    1 root     system           23 Aug 21 02:31 gabtab

 

thoribm79# more llttab

set-node thoribm79

set-cluster 666

link en2 /dev/dlpi/en:2 - ether - -

link en3 /dev/dlpi/en:3 - ether - -

 

一看才知道,原来是thoribm78llt配置文件llthosts,llttab文件全部丢失,导致无法启动。由于lltgabvcs运行的基础,丢失后导致集群里的lltgab均无法启动。

Llthosts文件的丢失,导致系统无法认出集群里主机的组成,所以在thoribm78上运行没有任何输出

 

thoribm79# rcp llttab llthosts thoribm78:/etc/

 

reboot both nodes

重起后发现8个端口还是未启动,用lltconfig –a list来看,有输出了,但是thoribm782private网卡状态均为down, lltstat –vnv更能看出来。原以为是thoribm78的网卡坏,但又觉得不可能。

 

检查lltgab 的配置文件,原来直接copy过来的llttab文件需要修改,修改正确后重起主机,一切正常。

 

后记: lltgabvcs的基础,最早启动lltgab,如果日志里提示gab都未启动,其他的6个端口肯定也启动不了。如果gabllt都不能启动,就要先检查lltgab的启动脚本和配置文件是否存在,是否正确。

阅读(9165) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~