今天在HPUX下配MC双机,遇到了一个从来没有遇到的故障,以至于在第一步生成二进制配置文件cluster.ascii的时候就报错,无法通过命令“cmquerycl”进行节点信息的收集,报错如下:
[STORM1@/etc/cmcluster]#cmquerycl -n STORM1 -n STORM2 -v -C /etc/cmcluster/cluster.ascii
Warning: Unable to determine local domain name for STORM1
Looking for other clusters ... Done
Gathering storage information
Error 231 (Software caused connection abort) performing security validation. Please verify that identd is running properly.
Unable to connect to node STORM2: Software caused connection abort
Found 27 devices on node STORM1
Analysis of 27 devices should take approximately 6 seconds
0%----10%----20%----30%----40%----50%----60%----70%----80%----90%----100%
Found 4 volume groups on node STORM1
Analysis of 4 volume groups should take approximately 1 seconds
0%----10%----20%----30%----40%----50%----60%----70%----80%----90%----100%
Note: Disks were discovered which are not in use by either LVM or VxVM.
Use pvcreate(1M) to initialize a disk for LVM or,
use vxdiskadm(1M) to initialize a disk for VxVM.
Gathering network information
Beginning network probing
Not probing node STORM2 as it is currently unreachable.
This may cause network partitions to be reported.
Completed network probing
Failed to gather configuration information.
出现这个错误之后,检查了/etc/hosts表,检查了两节点之间的信任机制,检查了两节点各自MC的版本是否一致,都没有问题。最后发现是内核模块一个地方出了问题,通过kcmodule命令可以进行查询各内核模块的当前状态等等信息。
对于kcmodule这个命令,摘录HP官方说明如下:
kcmodule 命令可查询和更改当前运行配置中或保存的配置中的内核模块的状态。HP-UX 内核由很多模块构建,每个模块均包括设备驱动程序、内核子系统和其他一些内核代码实体。典型的内核具有 200-300 个模块。
不带任何选项运行 kcmodule 时,它会显示系统上运行的模块、它们的当前状态和下次引导时的状态。在典型的系统上,可以看到很多模块处于静态;一些模块处于未使用状态(通常是系统上尚未安装的硬件的设备驱动程序);还有少数模块处于加载状态。
根据报错信息的提示“Unable to connect to node STORM2: Software caused connection abort”可以看到节点STORM2无法访问,下面通过调用kcmodule命令进行故障修复。
1,通过在节点“STORM2"上运行命令“kcmodule”进行查看各内核模块的状态;
2,仔细观察各模块状态,发现模块rng出现异常;
3,对rng内核模块进行修复:
[STORM2@/]#kcmodule rng=loaded
WARNING: The automatic 'backup' configuration currently contains the
configuration that was in use before the last reboot of this
system.
==> Do you wish to update it to contain the current configuration
before making the requested change? y
* The automatic 'backup' configuration has been updated.
* The requested changes have been applied to the currently
running system.
Module State Cause Notes
rng (before) unused error loadable, unloadable
(now) loaded explicit
4,修复之后,在主节点“STORM1”上执行命令“cmquerycl -n STORM1 -n STORM2 -v -C /etc/cluster/cluster.ascii",这次没有任何报错信息,命令得以顺利执行。
rng这个内核模块具体是干什么用的我不想追究,只想通过此文想大家列举遇到此类似故障问题时应该怎么解决。
阅读(2941) | 评论(0) | 转发(0) |