由于CRS磁盘dismount造成的CRS进程无法启动问题-十字螺丝钉-ChinaUnix博客

一路向前！deadhorse.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

十字螺丝钉

博客访问： 2908408
博文数量： 200
博客积分： 2413
博客等级：大尉
技术积分： 3067
用户组：普通用户
注册时间： 2011-04-01 22:07

文章分类

全部博文（200）

技术杂类（1）
MongoDB（5）

MongoDB（5）
linux（16）
MYSQL（24）

优化（2）

主从（1）
ORACLE（136）

Grid Contro（2）

动态性能视图（3）

GoldenGate（3）

监听（0）

优化（8）

工具（4）

RAC（9）

DataGuard（7）

ORACLE基础知识（10）

RMAN（4）

索引（0）

并行技术（0）

分区技术（1）

SQL（2）

等待事件（3）

MOS文档（5）

troubleshooting（14）

数据字典（1）
Python（1）
Zabbix（1）
Greenplum（9）
硬件（0）
生活（3）
未分配的博文（4）

文章存档

2018年（2）

2017年（8）

2016年（35）

2015年（14）

2014年（20）

2013年（24）

2012年（53）

2011年（44）

我的朋友

相关博文

由于CRS磁盘dismount造成的CRS进程无法启动问题

分类： Oracle

2013-12-10 22:42:10

环境：11.2.0.3 rac primary+rac standby

生产库rac standby的node1节点CRS自动关闭问题

--EM报警
Message=Clusterware has problems on the master agent host CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

登陆到该库node1，发现CRS进程已经关闭

[grid@node1 ~]$ crsctl check cluster -all
**************************************************************
node1:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
node2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************

$ crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

[grid@node1 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services <----------
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

ps -ef|grep crs --同样没有crs进程

试着启动crs进程，但是无法启动
[root@node1 ~]# /app/grid/bin/crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.

[root@node1 ~]# /app/grid/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services <<----CRS没有起来
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

在grid用户下查看crs进程日志

$ cd $ORACLE_HOME/log/node1/crsd
$ vim crsd.log
-------------------
2013-12-10 15:47:19.902: [ OCRASM][33715952]ASM Error Stack :
2013-12-10 15:47:19.934: [ OCRASM][33715952]proprasmo: kgfoCheckMount returned [6]
2013-12-10 15:47:19.934: [ OCRASM][33715952]proprasmo: The ASM disk group crs is not found or not mounted <-------
2013-12-10 15:47:19.934: [ OCRRAW][33715952]proprioo: Failed to open [+crs]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2013-12-10 15:47:19.934: [ OCRRAW][33715952]proprioo: No OCR/OLR devices are usable
2013-12-10 15:47:19.934: [ OCRASM][33715952]proprasmcl: asmhandle is NULL
2013-12-10 15:47:19.935: [ GIPC][33715952] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 690], original from [clsss.c : 5343]
2013-12-10 15:47:19.935: [ default][33715952]clsvactversion:4: Retrieving Active Version from local storage.
2013-12-10 15:47:19.937: [ OCRRAW][33715952]proprrepauto: The local OCR configuration matches with the configuration published by OCR Cache Writer. No repair required.
2013-12-10 15:47:19.938: [ OCRRAW][33715952]proprinit: Could not open raw device
2013-12-10 15:47:19.938: [ OCRASM][33715952]proprasmcl: asmhandle is NULL
2013-12-10 15:47:19.939: [ OCRAPI][33715952]a_init:16!: Backend init unsuccessful : [26]
2013-12-10 15:47:19.939: [ CRSOCR][33715952] OCR context init failure. Error: PROC-26: Error while accessing the physical storage <-------
2013-12-10 15:47:19.939: [ CRSD][33715952] Created alert : (:CRSD00111:) : Could not init OCR, error: PROC-26: Error while accessing the physical storage
2013-12-10 15:47:19.939: [ CRSD][33715952][PANIC] CRSD exiting: Could not init OCR, code: 26
2013-12-10 15:47:19.939: [ CRSD][33715952] Done.
---------------------

通过日志，可以看出是CRS磁盘组有问题

也确实如此，没有挂载CRS磁盘组
su - grid
SQL> set linesize 200
SQL> select GROUP_NUMBER,NAME,TYPE,ALLOCATION_UNIT_SIZE,STATE from v$asm_diskgroup;

GROUP_NUMBER NAME TYPE ALLOCATION_UNIT_SIZE STATE
------------ ------------------------------ ------ -------------------- -----------
0 CRS 0 DISMOUNTED<--------
2 DATA1 EXTERN 4194304 MOUNTED

查看asm实例alert日志，返现CRS磁盘组被强制卸载了
SQL> show parameter dump
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
background_core_dump string partial
background_dump_dest string /app/gridbase/diag/asm/+asm/+A
SM1/trace
cd /app/gridbase/diag/asm/+asm/+ASM1/trace
$ vim alert_+ASM1.log
-------------------------------------------
Tue Dec 10 11:13:57 2013
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 1.
Tue Dec 10 11:13:57 2013
NOTE: process _b000_+asm1 (15822) initiating offline of disk 0.3916226472 (CRS_0000) with mask 0x7e in group 1
NOTE: process _b000_+asm1 (15822) initiating offline of disk 1.3916226471 (CRS_0001) with mask 0x7e in group 1
NOTE: process _b000_+asm1 (15822) initiating offline of disk 2.3916226470 (CRS_0002) with mask 0x7e in group 1
NOTE: checking PST: grp = 1
GMON checking disk modes for group 1 at 12 for pid 37, osid 15822
ERROR: no read quorum in group: required 2, found 0 disks
NOTE: checking PST for grp 1 done.
NOTE: initiating PST update: grp = 1, dsk = 0/0xe96cdfa8, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 1/0xe96cdfa7, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 2/0xe96cdfa6, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 13 for pid 37, osid 15822
ERROR: no read quorum in group: required 2, found 0 disks
Tue Dec 10 11:13:57 2013
NOTE: cache dismounting (not clean) group 1/0x165C2F6D (CRS)
WARNING: Offline for disk CRS_0000 in mode 0x7f failed.
WARNING: Offline for disk CRS_0001 in mode 0x7f failed.
NOTE: messaging CKPT to quiesce pins Unix process pid: 15824, image: oracle@node1 (B001)
WARNING: Offline for disk CRS_0002 in mode 0x7f failed.
Tue Dec 10 11:13:57 2013
NOTE: halting all I/Os to diskgroup 1 (CRS)
Tue Dec 10 11:13:57 2013
NOTE: LGWR doing non-clean dismount of group 1 (CRS)
NOTE: LGWR sync ABA=3.42 last written ABA 3.42
Tue Dec 10 11:13:57 2013
kjbdomdet send to inst 2
detach from dom 1, sending detach message to inst 2
Tue Dec 10 11:13:57 2013
List of instances:
1 2
Dirty detach reconfiguration started (new ddet inc 1, cluster inc 4)
Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 1 invalid = TRUE
Tue Dec 10 11:13:57 2013
NOTE: No asm libraries found in the system
520 GCS resources traversed, 0 cancelled
Dirty Detach Reconfiguration complete
Tue Dec 10 11:13:57 2013
WARNING: dirty detached from domain 1
NOTE: cache dismounted group 1/0x165C2F6D (CRS)
SQL> alter diskgroup CRS dismount force /* ASM SERVER:375140205 */ <------------CRS被强制dismount---
Tue Dec 10 11:13:57 2013
NOTE: cache deleting context for group CRS 1/0x165c2f6d
GMON dismounting group 1 at 14 for pid 41, osid 15824
NOTE: Disk CRS_0000 in mode 0x7f marked for de-assignment
NOTE: Disk CRS_0001 in mode 0x7f marked for de-assignment
NOTE: Disk CRS_0002 in mode 0x7f marked for de-assignment
NOTE:Waiting for all pending writes to complete before de-registering: grpnum 1
Tue Dec 10 11:14:27 2013
NOTE:Waiting for all pending writes to complete before de-registering: grpnum 1
Tue Dec 10 11:14:29 2013
ASM Health Checker found 1 new failures
Tue Dec 10 11:14:57 2013
SUCCESS: diskgroup CRS was dismounted
SUCCESS: alter diskgroup CRS dismount force /* ASM SERVER:375140205 */
SUCCESS: ASM-initiated MANDATORY DISMOUNT of group CRS
--------------------------------------

挂载CRS 磁盘组
su - grid
sqlplus / as sysasm --！！！一定是sysasm
SQL> alter diskgroup crs mount;

SQL> select GROUP_NUMBER,NAME,TYPE,ALLOCATION_UNIT_SIZE,STATE from v$asm_diskgroup;
GROUP_NUMBER NAME TYPE ALLOCATION_UNIT_SIZE STATE
------------ ------------------------------ ------ -------------------- -----------
1 CRS NORMAL 4194304 MOUNTED
2 DATA1 EXTERN 4194304 MOUNTED

启动CRS
但是常用的start crs命令执行不成功
# /app/grid/bin/crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.

使用该命令启动成功
[root@node1 ~]# /app/grid/bin/crsctl start res ora.crsd -init
CRS-2672: Attempting to start 'ora.crsd' on 'node1'
CRS-2676: Start of 'ora.crsd' on 'node1' succeeded

# /app/grid/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online <------------------
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

$ crsctl check cluster -all
**************************************************************
node1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
node2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************

解决路线图:
crsd_log-->asm_instance_alert_log-->mount crs diskgroup -->start crs

虽然node1的CRS恢复正常，但CRS磁盘组会被强制dismount的原因还没找到，找到后会贴在这里。

阅读(34957) | 评论(1) | 转发(1) |

上一篇：V$RMAN_BACKUP_JOB_DETAILS

下一篇：ORA-00245: control file backup failed; target is likely on a local file system

给主人留下些什么吧！~~

burphy2016-04-20 17:32:16

谢谢，遇到相同问题了，感谢

回复 | 举报

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6