HPUX下MC故障一例简要处理过程-penguinstorm-ChinaUnix博客

好好学习,天天向SUNpenguinstorm.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

penguinstorm

博客访问： 5783422
博文数量： 745
博客积分： 10075
博客等级：上将
技术积分： 7716
用户组：普通用户
注册时间： 2005-04-29 12:09

文章分类

全部博文（745）

Vmware（2）
Tuxedo（4）
Solaris（162）

ES310（9）

ES255（10）

SM240（10）

ES222（29）

读书笔记（0）

实践操作（18）

Solaris高级系统（10）

ST350（10）

SA399（10）

SA299（27）

SA239（29）
English（8）
Informix（8）
weblogic（6）
软件工程（7）
双机专题（47）

Solaris_cluster（20）

HPUX_MC/ServiceG（4）

AIX_HACMP（23）
考试认证（112）

荣誉勋章（15）

经验总结（15）

CCIE（9）

CCNP（25）

CCNA（26）

CISCO认证（0）

ORACLE认证（6）

SUN认证（2）

HP认证（10）

IBM认证（4）
闲言碎语（13）
好文收录（10）
人在职场（5）
热点关注（3）
系统管理（30）

文档备份（22）
HPUX（106）

11.31专题（9）

学逻辑卷（20）

存储备份（2）

动手实践（67）

基础知识（8）
Linux（22）
Oracle（86）

DataGuard（12）

数据保护（2）

streams（11）

RAC（14）

故障诊断（13）

安装迁移（8）

升级调优（11）

备份恢复（15）
CISCO（16）

Dynamips（13）

PacketTracer（2）

路由相关（0）

交换相关（1）
AIX（10）

故障处理（10）
未分配的博文（88）

文章存档

2019年（1）

2016年（1）

2010年（31）

2009年（88）

2008年（129）

2007年（155）

2006年（197）

2005年（143）

我的朋友

最近访客

推荐博文

HPUX下MC故障一例简要处理过程

分类：

2009-09-07 20:14:05

今天在HPUX下配MC双机，遇到了一个从来没有遇到的故障，以至于在第一步生成二进制配置文件cluster.ascii的时候就报错，无法通过命令“cmquerycl”进行节点信息的收集，报错如下：

[STORM1@/etc/cmcluster]#cmquerycl -n STORM1 -n STORM2 -v -C /etc/cmcluster/cluster.ascii
Warning: Unable to determine local domain name for STORM1
Looking for other clusters ... Done
Gathering storage information
Error 231 (Software caused connection abort) performing security validation. Please verify that identd is running properly.
Unable to connect to node STORM2: Software caused connection abort
Found 27 devices on node STORM1
Analysis of 27 devices should take approximately 6 seconds
0%----10%----20%----30%----40%----50%----60%----70%----80%----90%----100%
Found 4 volume groups on node STORM1
Analysis of 4 volume groups should take approximately 1 seconds
0%----10%----20%----30%----40%----50%----60%----70%----80%----90%----100%
Note: Disks were discovered which are not in use by either LVM or VxVM.
Use pvcreate(1M) to initialize a disk for LVM or,
use vxdiskadm(1M) to initialize a disk for VxVM.
Gathering network information
Beginning network probing
Not probing node STORM2 as it is currently unreachable.
This may cause network partitions to be reported.
Completed network probing
Failed to gather configuration information.

出现这个错误之后，检查了/etc/hosts表，检查了两节点之间的信任机制，检查了两节点各自MC的版本是否一致，都没有问题。最后发现是内核模块一个地方出了问题，通过kcmodule命令可以进行查询各内核模块的当前状态等等信息。

对于kcmodule这个命令，摘录HP官方说明如下：

kcmodule 命令可查询和更改当前运行配置中或保存的配置中的内核模块的状态。HP-UX 内核由很多模块构建，每个模块均包括设备驱动程序、内核子系统和其他一些内核代码实体。典型的内核具有 200-300 个模块。

不带任何选项运行 kcmodule 时，它会显示系统上运行的模块、它们的当前状态和下次引导时的状态。在典型的系统上，可以看到很多模块处于静态；一些模块处于未使用状态（通常是系统上尚未安装的硬件的设备驱动程序）；还有少数模块处于加载状态。

根据报错信息的提示“Unable to connect to node STORM2: Software caused connection abort”可以看到节点STORM2无法访问，下面通过调用kcmodule命令进行故障修复。

1，通过在节点“STORM2"上运行命令“kcmodule”进行查看各内核模块的状态；

2，仔细观察各模块状态，发现模块rng出现异常；

3，对rng内核模块进行修复：

[STORM2@/]#kcmodule rng=loaded
WARNING: The automatic 'backup' configuration currently contains the
         configuration that was in use before the last reboot of this
         system.
     ==> Do you wish to update it to contain the current configuration
         before making the requested change? y
       * The automatic 'backup' configuration has been updated.
       * The requested changes have been applied to the currently
         running system.
Module            State   Cause     Notes
rng     (before) unused error     loadable, unloadable
        (now)     loaded explicit

4，修复之后，在主节点“STORM1”上执行命令“cmquerycl -n STORM1 -n STORM2 -v -C /etc/cluster/cluster.ascii"，这次没有任何报错信息，命令得以顺利执行。

rng这个内核模块具体是干什么用的我不想追究，只想通过此文想大家列举遇到此类似故障问题时应该怎么解决。

阅读(2972) | 评论(0) | 转发(0) |

上一篇：HP小型机RAC方式下以ASM方式建立数据库（完）

下一篇：物理方式下建立oracle data guard（一）：前奏

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6