# crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
检查crs进程:
# ps -ef | grep css
root 6929 1 0 19:56 ? 00:00:00 /bin/sh /etc/init.d/init.cssd fatal
root 6960 6928 0 19:56 ? 00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root 6963 6929 0 19:56 ? 00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root 7064 6935 0 19:56 ? 00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
从上可以看出init.cssd停在startcheck中,并没有运行ocssd.bin daemon。
检查crsd.log及ocssd.log,没有发现有用的信息,而在$ORA_CRS_HOME/log/rac1/client中:
[root@rac1 client]# more css339.log
Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 . All rights reserved.
2009-06-09 20:00:28.799: [ CSSCLNT][2541220896]clsssInitNative: connect failed, rc 9[root@rac1 client]# more clsc790.log
Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 . All rights reserved.
2009-06-09 20:39:27.940: [ COMMCRS][2541220896]clsc_connect: (0×66c3d0) no listener at (ADDRESS=(PROTOCOL=IPC)(KEY=CRSD_UI_SOCKET))
2009-06-09 20:39:27.941: [ COMMCRS][2541220896]clsc_connect: (0×6116e0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))
2009-06-09 20:39:27.941: [ default][2541220896]Terminating clsd session
检查系统日志,在客户报故障的时间范围内发现如下信息:
Jun 9 17:19:39 rac1 logger: Cluster Ready Services starting up automatically.
Jun 9 17:19:39 rac1 init.crs: Startup will be queued to init within 90 seconds.
Jun 9 17:19:39 rac1 rc: Starting init.crs: succeeded
Jun 9 17:19:39 rac1 readahead: Starting background readahead:
Jun 9 17:19:39 rac1 rc: Starting readahead: succeeded
Jun 9 17:19:41 rac1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.4528.
Jun 9 17:19:41 rac1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.4531.
Jun 9 17:19:41 rac1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.4603.
转到/tmp去查看crsctl.*文件,却发现没有,奇怪!那这些信息又是在哪里产生呢?于是去查看/etc/init.d/init.cssd脚本,该信息在脚本中的如下地方产生:
# Wait for additional filesystems and objects to become available
# crsctl should print out a message indicating cause of failure.$SU $ORACLE_USER -c “$CRSCTL check boot > $CRSCTLOUT”
RC=$?
while [ "$RC" != "0" ]
do
$LOGMSG Cluster Ready Services waiting on dependencies. Diagnostics in $CRSCTLOUT.
$SLEEP $DEP_CHECK_WAIT
$SU $ORACLE_USER -c “$CRSCTL check boot > $CRSCTLOUT”
RC=$?
done
该段脚本正好是在startcheck的选项中执行的,再看看变量$CRSCTLOUT的定义:
# Temp file for crsctl output
CRSCTLOUT=/tmp/crsctl.$$
正是/tmp目录。难道是/tmp目录有问题?手工执行:
[root@rac1 /]# /etc/init.d/init.cssd startcheck
-bash: /tmp/crsctl.31969: No such file or directory
果然是/tmp有问题,检查/tmp的权限:
drwxr-xr-x 7 1003 dba 4096 Jun 10 04:02 tmp
晕,权限被改了,正确的应该是:
drwxrwxrwt 15 root root 4096 Jun 10 09:51 tmp
于是修改/tmp权限:
#chown -R root:root /tmp
#chmod -R 1777 /tmp
然后执行:
[root@rac1 tmp]# /etc/init.d/init.crs stop
Shutting down Cluster Ready Services (CRS):
Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.[root@rac1 tmp]# /etc/init.d/init.crs start
Startup will be queued to init within 90 seconds.
过一会儿查看crs进程,crs成功运行,至此问题解决!
客户环境:RHEL AS 4 + 10g RAC