没有简介就是最好的简介
分类: AIX
2013-11-01 08:06:33
n this Document
|
|
|
|
本文档的目的是总结可能阻止 Grid Infrastructure (GI) 成功启动的 5 大问题。
本文档仅适用于 11gR2 Grid Infrastructure。
要确定 GI 的状态,请运行以下命令:
症状:
1. 命令“$GRID_HOME/bin/crsctl check crs”返回错误:
CRS-4639: Could not contact Oracle High Availability Services
2. 命令“ps -ef | grep init”不显示类似于如下所示的行:
root 4878 1 0 Sep12 ? 00:00:02 /bin/sh /etc/init.d/init.ohasd
run
3. 命令“ps -ef | grep d.bin”不显示类似于如下所示的行:
root 21350 1 6 22:24 ? 00:00:01 /u01/app/11.2.0/grid/bin/ohasd.bin
reboot
或者它只显示 "ohasd.bin reboot"
进程而没有其他进程
可能的原因:
解决方案:
症状:
1. 命令“$GRID_HOME/bin/crsctl check crs”返回错误:
CRS-4638:
Oracle High Availability Services is online
CRS-4535:
Cannot communicate with Cluster Ready Services
CRS-4530:
Communications failure contacting Cluster Synchronization Services
daemon
CRS-4534:
Cannot communicate with Event Manager
2. 命令“ps -ef | grep d.bin”不显示类似于如下所示的行:
oragrid
21543 1 1 22:24 ? 00:00:01 /u01/app/11.2.0/grid/bin/ocssd.bin
3. ocssd.bin 正在运行,但在 ocssd.log
中显示消息“CLSGPNP_CALL_AGAIN”后又中止运行
4. ocssd.log 显示如下内容:
2012-01-27 13:42:58.796: [
CSSD][19]clssnmvDHBValidateNCopy: node 1, racnode1, has a disk HB,
but no network HB, DHB has rcfg 223132864, wrtcnt, 1112, LATS
783238209,
lastSeqNo 1111, uniqueness
1327692232, timestamp 1327693378/787089065
5. 对于 3 个或更多节点的情况,2 个节点形成的集群一切正常,但是,当第 3 个节点加入时就出现故障,ocssd.log 显示如下内容:
2012-02-09 11:33:53.048: [
CSSD][1120926016](:CSSNM00008:)clssnmCheckDskInfo: Aborting local
node to avoid splitbrain. Cohort of 2 nodes with leader 2,
racnode2, is smaller
than
cohort of 2 nodes led by node
1, racnode1, based on map type 2
2012-02-09 11:33:53.048: [
CSSD][1120926016]###################################
2012-02-09 11:33:53.048: [
CSSD][1120926016]clssscExit: CSSD aborting from thread
clssnmRcfgMgrThread
6. 10 分钟后 ocssd.bin 启动超时
2012-04-08 12:04:33.153: [
CSSD][1]clssscmain: Starting CSS daemon, version
11.2.0.3.0, in (clustered) mode with uniqueness value
1333911873
......
2012-04-08 12:14:31.994: [
CSSD][5]clssgmShutDown: Received abortive
shutdown request from client.
2012-04-08 12:14:31.994: [
CSSD][5]###################################
2012-04-08 12:14:31.994: [
CSSD][5]clssscExit: CSSD
aborting from thread GMClientListener
2012-04-08 12:14:31.994: [
CSSD][5]###################################
2012-04-08 12:14:31.994: [
CSSD][5](:CSSSC00012:)clssscExit: A fatal error
occurred and the CSS daemon is terminating
abnormally
可能的原因:
解决方案:
症状:
1. 命令“$GRID_HOME/bin/crsctl check crs”返回错误:
CRS-4638:
Oracle High Availability Services is online
CRS-4535:
Cannot communicate with Cluster Ready Services
CRS-4529:
Cluster Synchronization Services is online
CRS-4534:
Cannot communicate with Event Manager
2. 命令“ps -ef | grep d.bin”不显示类似于如下所示的行:
root 23017 1
1 22:34 ? 00:00:00 /u01/app/11.2.0/grid/bin/crsd.bin reboot
3. 即使存在 crsd.bin 进程,命令“crsctl stat res -t –init”仍然显示:
ora.crsd
1
ONLINE
INTERMEDIATE
可能的原因:
解决方案:
症状:
1. orarootagent 未运行. ohasd.log 显示:
2012-12-21 02:14:05.071: [
A**][24] {0:0:2} Created
alert : (:CRSAGF00123:) : Failed to start the
agent process: /grid/11.2.0/grid_2/bin/orarootagent Category: -1
Operation: fail Loc: canexec2 OS error: 0 Other : no exe
permission, file [/grid/11.2.0/grid_2/bin/orarootagent]
2. mdnsd.bin, gpnpd.bin 或者 gipcd.bin 未运行, 以下是 mdnsd
log中显示的一个例子:
2012-12-31 21:37:27.601: [
clsdmt][1088776512]Creating PID [4526] file for home
/u01/app/11.2.0/grid host lc1n1 bin mdns to
/u01/app/11.2.0/grid/mdns/init/
2012-12-31 21:37:27.602: [
clsdmt][1088776512]Error3 -2 writing PID [4526] to the file
[]
2012-12-31 21:37:27.602: [
clsdmt][1088776512]Failed to record pid for MDNSD
或者
2012-12-31 21:39:52.656: [
clsdmt][1099217216]Creating PID [4645] file for home
/u01/app/11.2.0/grid host lc1n1 bin mdns to
/u01/app/11.2.0/grid/mdns/init/
2012-12-31 21:39:52.656: [
clsdmt][1099217216]Writing PID [4645] to the file
[/u01/app/11.2.0/grid/mdns/init/lc1n1.pid]
2012-12-31 21:39:52.656: [
clsdmt][1099217216]Failed to record pid for MDNSD
3. oraagent 或 appagent 未运行, 日志crsd.log显示:
2012-12-01 00:06:24.462: [
A**][1164069184] {0:2:27}
Created alert : (:CRSAGF00130:) : Failed to start
the agent /u01/app/grid/11.2.0/bin/appagent_oracle
可能的原因:
解决方案:
症状:
1. 命令“ps -ef | grep asm”不显示 ASM 进程
2. 命令“crsctl stat res -t –init”显示:
ora.asm
1
ONLINE
OFFLINE
可能的原因:
解决方案:
要进一步调试 GI 启动问题,请参考 Document 1050908.1 Troubleshoot Grid Infrastructure Startup Issues.