VCS Troubleshooting II-holland

小隐隐于野,大隐隐于市!

首页　| 　博文目录　| 　关于我

holland_1

博客访问： 150308
博文数量： 31
博客积分： 2075
博客等级：大尉
技术积分： 340
用户组：普通用户
注册时间： 2009-08-29 10:06

文章分类

全部博文（31）

Finance（1）
essay（1）
veteran（2）
rookie（26）
未分配的博文（1）

文章存档

2017年（3）

2014年（1）

2013年（1）

2011年（9）

2010年（14）

2009年（3）

我的朋友

Using gabconfig -a output to determine problems

gabconfig -a # Shows the state of the VCS resources required to implement clustering.

The letters returned from gabconfig -a mean the resource is available on a particular node:

   a    gab driver
   b    I/O fencing (designed to guarantee data integrity)
   d    ODM (Oracle Disk Manager)
   f    CFS (Cluster File System)
   h    VCS (VERITAS Cluster Server: high availability daemon)
   o    VCSMM driver (kernel module needed for Oracle and VCS interface)
   q    QuickLog daemon
   v    CVM (Cluster Volume Manager)
   w    vxconfigd (module for cvm)

With regard to the GAB driver(Port a)

The /etc/gabtab file will contain the number of nodes defined in the cluster. During an initial build, the cluster won't fully start until all nodes are seen. The gabtab is in the following format:

 /sbin/gabconfig -c -n2

Where -n2 specifies there are 2 nodes required to "seed" the cluster. That number should reflect the actual number of nodes in the cluster. Once that number of nodes is seen, the "Port a" membership is established. Running gabconfig -a | grep "Port a" will show the current membership ID and count for the Port a membership. This check is in place to prevent split-brain conditions and the resulting data corruption that occurs if the cluster starts two or more mini-clusters and related resources.

If you are certain that no split-brain condition is happening, gabconfig -cx can be used to manually bypass the protection from pre-existing partitions.

IOFencing driver(Port b)

Port b/IOFencing is started as a result of the /etc/rc2.d/S97vxfen start script. It performs the following actions:

reads /etc/vxfendg to determine name of the diskgroup (DG) that contains the coordinator disks
parses "vxdisk -o alldgs list" output for list of disks in that DG
performs a "vxdisk list diskname" for each to determine all available paths to each coordinator disk
uses all paths to each disk in the DG to build a current /etc/vxfentab

The purpose of all this is that the IOFencing driver is simply trying to find the same shared disk on all nodes to use for the coordinator disk.

Oracle Disk Manager/ODM (Port d)

This port is started by the commands in /etc/rc2.d/S92odm

Cluster File System/CFS (Port f)

There are various methods that can be done to reload CFS if required. Much of VxFS needs to be unloaded to reload this and it usually isn't required.

Veritas Cluster Server/VCS (Port h)

This is the cluster daemon itself.

CVM (ports v and w)

Cluster Volume Manager allows multiple disks to be mounted and shared on the Veritas cluster. You must have the IOFencing driver running before you can start CVM. You can check CVM status with the following commands:

gabconfig -a | egrep "Port v|Port w"
vxdctl -c mode
vxclustadm -v nodestate

For debugging purposes, you can start CVM manually with the following command on each node:

  vxclustadm -m vcs -t gab startnode
  vxclustadm: initialization completed

All diskgroups with disks marked with "shared flag" should now automatically be imported shared. You can check their status with:

vxdg list

and look for "enabled,shared" in the result for each shared disk group.

To see if a disk has the shared flag, run:

 vxdisk -o alldgs list | grep shared

and

 vxdisk list DISKNAME

QuickLog daemon (Port q)

To reload the QuickLog daemon:

 # ps -ef| grep qlog
     root  2099     1  0 13:04:44 ?        0:00 /opt/VRTSvxfs/sbin/qlogckd
 # kill -9 2099

 # modinfo | grep qlog
 195 7821e000  17fc7 208   1  qlog (VxQLOG 3.5_REV-MP1f QuickLog dr)
 # modunload -i 195

 # /opt/VRTSvxfs/sbin/qlogckd

VCSMM(port_o)

VCSMM is required for RAC communications. It loads in /etc/rc2.d/S98vcsmm

Changing cluster status

hagrp -online RESOURCE_GROUP -sys SYSTEM # Bring a resource online on a particular system

hagrp -switch RESOURCE_GROUP -to SYSTEM # Move a resource to a particular system

hagrp -autoenable RESOURCE_GROUP # Enable a group that has been autodisabled.

Editing cluster configuration

/etc/VRTSvcs/conf/config/main.cf # The main configuration file for VCS.

  I usually copy the config directory elsewhere, then do a hacf -verify . 
  in the config directory, then hacf -cftocmd . and then hacf -cmdtocf . to 
  rebuild the dependency mapping in main.cf. When it looks good, put the 
  main.cf in place and activate it. 
  
  If you only do a hacf -verify, it doesn't find some problems in the main.cf 
  and does not rebuild the dependency tree diagram in the file.

tail /var/VRTSvcs/log/engine_A.log # The logging file

vxdctl -c mode # Determine current node status when using CVM

lltstat # will print output similar to the following to diagnose the low latency transport:

LLT statistics:
    15903      Snd data packets
    469        Snd retransmit data
    4384       Snd connect packets
    2999       Snd independent ACKs
    10355      Snd piggyback ACKs
    0          Snd independent NACKs
    0          Snd piggyback NACKs
    4138       Snd loopback packets
    15749      Rcv data packets
    586        Rcv out of window
    0          Rcv duplicates
    0          Rcv datagrams dropped
    0          Rcv multiblock data
    0          Rcv misaligned data
LLT errors:
    0          Rcv not connected
    0          Rcv unconfigured
    0          Rcv bad dest address
    0          Rcv bad source address
    0          Rcv bad generation
    0          Rcv no buffer
    0          Rcv malformed packet
    0          Rcv bad dest SAP
    0          Rcv bad STREAM primitive
    0          Rcv bad DLPI primitive
    0          Rcv DLPI error
    26         Snd not connected
    0          Snd no buffer
    0          Snd stream flow drops
    26         Snd no links up
    0          Rcv bad checksum

If you run an lltstat -nvv it will show a verbose status of each Low Latency Transport (LLT) interface. This can be used to check that each interface is plugged into the right destination. It shows what the node thinks its interface name is and what it thinks the remote interface names are. Running the command on all nodes will give a map of the overall LLT network.

Files

/etc/gabtab

/etc/llttab

阅读(1826) | 评论(0) | 转发(0) |

上一篇：VCS troubleshooting

下一篇：PAM

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6