今天发现一个很奇怪的问题。
当时我想登录EM去查询一些SQL相关的东西,却发现EM是down的。于是我去启动了EM:
-
[oracle@rac1 bin]$ emctl start dbconsole
-
Oracle Enterprise Manager 11g Database Control Release 11.2.0.4.0
-
Copyright (c) 1996, 2013 Oracle Corporation. All rights reserved.
-
https://rac1:1158/em/console/aboutApplication
-
Starting Oracle Enterprise Manager 11g Database Control ..... started.
-
------------------------------------------------------------------
-
Logs are generated in directory /u01/app/oracle/product/11.2.0/db_1/rac1_rac112/sysman/log
但是我登录EM之后就发现节点2的db down了。好奇怪,因为前几天刚解决了节点2db自动crash的问题,见http://blog.chinaunix.net/uid/69978508.html
于是我赶紧去看日志,这次log和上次一样:
-
Wed Jul 22 16:20:11 2020
-
MARK started with pid=42, OS id=33461
-
NOTE: MARK has subscribed
-
lmon registered with NM - instance number 2 (internal mem no 1)
-
Reconfiguration started (old inc 0, new inc 8)
-
List of instances:
-
1 2 (myinst: 2)
-
Global Resource Directory frozen
-
* allocate domain 0, invalid = TRUE
-
Errors in file /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/rac1122_asmb_33451.trc:
-
ORA-27157: OS post/wait facility removed
-
ORA-27300: OS system dependent operation:semop failed with status: 43
-
ORA-27301: OS failure message: Identifier removed
-
ORA-27302: failure occurred at: sskgpwwait1
-
ASMB (ospid: 33451): terminating the instance due to error 27157
-
Instance terminated by ASMB, pid = 33451
-
Errors in file /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/rac1122_asmb_33451.trc:
-
ORA-27300: OS system dependent operation:semctl failed with status: 22
-
ORA-27301: OS failure message: Invalid argument
-
ORA-27302: failure occurred at: sskgpwrm1
-
ORA-27157: OS post/wait facility removed
-
ORA-27300: OS system dependent operation:semop failed with status: 43
-
ORA-27301: OS failure message: Identifier removed
-
ORA-27302: failure occurred at: sskgpwwait1
-
Wed Jul 22 16:31:02 2020
-
Starting ORACLE instance (normal)
-
…………
-
…………
-
…………
-
Cluster communication is configured to use the following interface(s) for this instance
-
169.254.191.155
-
cluster interconnect IPC version:Oracle UDP/IP (generic)
-
IPC Vendor 1 proto 2
-
Wed Jul 22 16:31:13 2020
-
PMON started with pid=2, OS id=38094
-
Error occured while spawning process PMON; error = 27153
-
USER (ospid: 38021): terminating the instance due to error 27153
-
Instance terminated by USER, pid = 38021
-
Wed Jul 22 16:48:18 2020
-
Starting ORACLE instance (normal)
看这个log,db刚好现在down了,难道跟我start dbconsole有关系?
我于是试一下重启db,没想到报错了:
-
[oracle@rac2 trace]$ srvctl status database -d rac112
-
实例 rac1121 正在节点 rac1 上运行
-
实例 rac1122 没有在 rac2 节点上运行
-
[oracle@rac2 trace]$ srvctl start database -d rac112
-
PRCC-1014 : rac112 已在运行
-
PRCR-1004 : 资源 ora.rac112.db 已在运行
-
PRCR-1079 : 无法启动资源 ora.rac112.db
-
CRS-5017: The resource action "ora.rac112.db start" encountered the following error:
-
ORA-27153: wait operation failed
-
ORA-27300: OS system dependent operation:semop failed with status: 22
-
ORA-27301: OS failure message: Invalid argument
-
ORA-27302: failure occurred at: sskgpwwait3
-
ORA-27303: additional information: ctx(0xc0b6780); wid(0x6fc3536450); flags(0)
-
semid(0x15800d); sem_num(35); oldval(-1)
-
. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/rac2/agent/crsd/oraagent_oracle/oraagent_oracle.log".
-
-
CRS-2674: Start of 'ora.rac112.db' on 'rac2' failed
-
CRS-2528: Unable to place an instance of 'ora.rac112.db' as all possible servers are occupied by the resource
上次出现这样的问题是刚好修改了内核参数,重启后问题没有再复现。现在是怎么回事?
因为已经是生产环境了,不能再随便重启了,而且也没有重启的理由了。
难道是启动dbconsole导致的?
不管怎么说,我先试一下吧。于是我stop了dbconsole,再启动db,果然就成功了。。。
搜索了一下这个应该是解决方案http://blog.itpub.net/29371470/viewspace-2125673/
看了一下我的login.config:
-
[Login]
-
#NAutoVTs=6
-
#ReserveVT=6
-
#KillUserProcesses=no
-
#KillOnlyUsers=
-
#KillExcludeUsers=root
-
#InhibitDelayMaxSec=5
-
#HandlePowerKey=poweroff
-
#HandleSuspendKey=suspend
-
#HandleHibernateKey=hibernate
-
#HandleLidSwitch=suspend
-
#HandleLidSwitchDocked=ignore
-
#PowerKeyIgnoreInhibited=no
-
#SuspendKeyIgnoreInhibited=no
-
#HibernateKeyIgnoreInhibited=no
-
#LidSwitchIgnoreInhibited=yes
-
#IdleAction=ignore
-
#IdleActionSec=30min
-
#RuntimeDirectorySize=10%
-
#RemoveIPC=yes
暂时先不做调整。等在其他设备上验证没问题再操作。
阅读(2172) | 评论(0) | 转发(0) |