Chinaunix首页 | 论坛 | 博客
  • 博客访问: 27845
  • 博文数量: 6
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 70
  • 用 户 组: 普通用户
  • 注册时间: 2020-07-06 10:35
文章分类

全部博文(6)

文章存档

2020年(6)

我的朋友

分类: Oracle

2020-07-22 17:43:16

今天发现一个很奇怪的问题。
当时我想登录EM去查询一些SQL相关的东西,却发现EM是down的。于是我去启动了EM:

点击(此处)折叠或打开

  1. [oracle@rac1 bin]$ emctl start dbconsole
  2. Oracle Enterprise Manager 11g Database Control Release 11.2.0.4.0
  3. Copyright (c) 1996, 2013 Oracle Corporation. All rights reserved.
  4. https://rac1:1158/em/console/aboutApplication
  5. Starting Oracle Enterprise Manager 11g Database Control ..... started.
  6. ------------------------------------------------------------------
  7. Logs are generated in directory /u01/app/oracle/product/11.2.0/db_1/rac1_rac112/sysman/log
但是我登录EM之后就发现节点2的db down了。好奇怪,因为前几天刚解决了节点2db自动crash的问题,见http://blog.chinaunix.net/uid/69978508.html
于是我赶紧去看日志,这次log和上次一样:

点击(此处)折叠或打开

  1. Wed Jul 22 16:20:11 2020
  2. MARK started with pid=42, OS id=33461
  3. NOTE: MARK has subscribed
  4. lmon registered with NM - instance number 2 (internal mem no 1)
  5. Reconfiguration started (old inc 0, new inc 8)
  6. List of instances:
  7.  1 2 (myinst: 2)
  8.  Global Resource Directory frozen
  9. * allocate domain 0, invalid = TRUE
  10. Errors in file /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/rac1122_asmb_33451.trc:
  11. ORA-27157: OS post/wait facility removed
  12. ORA-27300: OS system dependent operation:semop failed with status: 43
  13. ORA-27301: OS failure message: Identifier removed
  14. ORA-27302: failure occurred at: sskgpwwait1
  15. ASMB (ospid: 33451): terminating the instance due to error 27157
  16. Instance terminated by ASMB, pid = 33451
  17. Errors in file /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/rac1122_asmb_33451.trc:
  18. ORA-27300: OS system dependent operation:semctl failed with status: 22
  19. ORA-27301: OS failure message: Invalid argument
  20. ORA-27302: failure occurred at: sskgpwrm1
  21. ORA-27157: OS post/wait facility removed
  22. ORA-27300: OS system dependent operation:semop failed with status: 43
  23. ORA-27301: OS failure message: Identifier removed
  24. ORA-27302: failure occurred at: sskgpwwait1
  25. Wed Jul 22 16:31:02 2020
  26. Starting ORACLE instance (normal)
  27. …………
  28. …………
  29. …………
  30. Cluster communication is configured to use the following interface(s) for this instance
  31.   169.254.191.155
  32. cluster interconnect IPC version:Oracle UDP/IP (generic)
  33. IPC Vendor 1 proto 2
  34. Wed Jul 22 16:31:13 2020
  35. PMON started with pid=2, OS id=38094
  36. Error occured while spawning process PMON; error = 27153
  37. USER (ospid: 38021): terminating the instance due to error 27153
  38. Instance terminated by USER, pid = 38021
  39. Wed Jul 22 16:48:18 2020
  40. Starting ORACLE instance (normal)
看这个log,db刚好现在down了,难道跟我start dbconsole有关系?
我于是试一下重启db,没想到报错了:

点击(此处)折叠或打开

  1. [oracle@rac2 trace]$ srvctl status database -d rac112
  2. 实例 rac1121 正在节点 rac1 上运行
  3. 实例 rac1122 没有在 rac2 节点上运行
  4. [oracle@rac2 trace]$ srvctl start database -d rac112
  5. PRCC-1014 : rac112 已在运行
  6. PRCR-1004 : 资源 ora.rac112.db 已在运行
  7. PRCR-1079 : 无法启动资源 ora.rac112.db
  8. CRS-5017: The resource action "ora.rac112.db start" encountered the following error:
  9. ORA-27153: wait operation failed
  10. ORA-27300: OS system dependent operation:semop failed with status: 22
  11. ORA-27301: OS failure message: Invalid argument
  12. ORA-27302: failure occurred at: sskgpwwait3
  13. ORA-27303: additional information: ctx(0xc0b6780); wid(0x6fc3536450); flags(0)
  14. semid(0x15800d); sem_num(35); oldval(-1)
  15. . For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/rac2/agent/crsd/oraagent_oracle/oraagent_oracle.log".

  16. CRS-2674: Start of 'ora.rac112.db' on 'rac2' failed
  17. CRS-2528: Unable to place an instance of 'ora.rac112.db' as all possible servers are occupied by the resource
上次出现这样的问题是刚好修改了内核参数,重启后问题没有再复现。现在是怎么回事?
因为已经是生产环境了,不能再随便重启了,而且也没有重启的理由了。

难道是启动dbconsole导致的?
不管怎么说,我先试一下吧。于是我stop了dbconsole,再启动db,果然就成功了。。。

搜索了一下这个应该是解决方案http://blog.itpub.net/29371470/viewspace-2125673/
看了一下我的login.config:

点击(此处)折叠或打开

  1. [Login]
  2. #NAutoVTs=6
  3. #ReserveVT=6
  4. #KillUserProcesses=no
  5. #KillOnlyUsers=
  6. #KillExcludeUsers=root
  7. #InhibitDelayMaxSec=5
  8. #HandlePowerKey=poweroff
  9. #HandleSuspendKey=suspend
  10. #HandleHibernateKey=hibernate
  11. #HandleLidSwitch=suspend
  12. #HandleLidSwitchDocked=ignore
  13. #PowerKeyIgnoreInhibited=no
  14. #SuspendKeyIgnoreInhibited=no
  15. #HibernateKeyIgnoreInhibited=no
  16. #LidSwitchIgnoreInhibited=yes
  17. #IdleAction=ignore
  18. #IdleActionSec=30min
  19. #RuntimeDirectorySize=10%
  20. #RemoveIPC=yes
暂时先不做调整。等在其他设备上验证没问题再操作。

阅读(2066) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~