故障描述:
今天检查P595 HACMP主机上报以下错误信息:
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
F6FDF227 1106142808 T S tty0 FAILURE CAUSED AUTOMATIC RESET
F6FDF227 1106103708 T S tty0 FAILURE CAUSED AUTOMATIC RESET
备机上报:
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
864D2CE3 1106142708 P S topsvcs NIM thread blocked
864D2CE3 1106103608 P S topsvcs NIM thread blocked
详细的信息参见附件
原因:
解决方法:(IBM工程师提供,未实施)
根据AIX二线对数据进行分析,有以下结论:
关于NIM的报错,有以下解释,主要和系统performance有关:
An explanation of the TS_NIM_ERROR_STUCK is as follows:
Explanation: One of the threads in a NIM process was blocked.
Details: This entry indicates that a thread in one of the NIM
Processes did not make progress and was possibly blocked for a
period of time. Depending on which of the threads was blocked and for
how long, the adapter corresponding to the NIM process may be
erroneously considered down.
The standard fields indicate that the NIM was blocked and present
possible causes and actions to prevent the problem from reoccurring.
The problem may have been caused by resource starvation at the
node, or possibly excessive I/O activity. The detailed fields show the
name of the thread which was blocked, the interval in seconds during
which the thread was blocked, and the interface name which is associated
with this instance of the NIM.
由于发生过一次报错,而目前状态为正常,二线建议先对系统进行观察,不需要进行变更操作。
此外,针对该问题有以下解决方法(升级文件版本有可能会导致操作系统整体版本升级):
Usually one of two things will stop the errors from being logged.
1.) upgrade bos.rte.libpthreads to the latest levels
2.) Check and change the NIMs failure detection rate to "slow":
smitty hacmp
cluster config
cluster topology
Configure Network Modules
Change a Network Module using Predefined Values
Select the network(s) corresponding to the TS_NIM message in
errpt
Choose "slow" for "failure detection rate"
3.) Check syncd frequency
smitty hacmp / cluster Config. / Advanced Performance Tuning /
Change/Show syncd frequency
max. 60 -> recommended for HACMP clusters is 10
阅读(3524) | 评论(0) | 转发(0) |