脚踏实地、勇往直前!
全部博文(1005)
分类: Oracle
2014-02-15 01:23:46
环境:
OS:Red Hat Linux As 5
DB:10.2.0.5
之前rac部署完毕后,试着导出ocr,但发现无法导出,报如下的错误.
[root@node1 ~]# /u01/app/oracle/product/10.2.0/crs_1/bin/ocrconfig -export /u01/app/oracle/ocr_export140210_8.bak
PROT-4: Failed to retrieve data from the cluster registry
node1-> ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 1043916
Used space (kbytes) : 5408
Available space (kbytes) : 1038508
ID : 1855713603
Device/File Name : /dev/raw/raw1
Device/File integrity check succeeded
Device/File Name : /dev/raw/raw3
Device/File integrity check succeeded
Cluster registry integrity check succeeded
node1-> cluvfy comp ocr -n all
Verifying OCR integrity
Checking OCR integrity...
Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations.
Uniqueness check for OCR device passed.
Checking the version of OCR...
OCR of correct Version "2" exists.
Checking data integrity of OCR...
Data integrity check for OCR passed.
OCR integrity check passed.
Verification of OCR integrity was successful.
从ocr检查来看没有任何问题,crsd日志也没有发现有用的信息,计划打算重建OCR,重建步骤大概如下:
1.两个节点停止crs
[root@node1 ~]# /u01/app/oracle/product/10.2.0/crs_1/bin/crsctl stop crs
Stopping resources. This could take several minutes.
Error while stopping resources. Possible cause: CRSD is down.
[root@node2 ~]# /u01/app/oracle/product/10.2.0/crs_1/bin/crsctl stop crs
Stopping resources. This could take several minutes.
Successfully stopped CRS resources.
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
2.在每个节点上执行如下的脚本(root用户下执行)
[root@node1 10.2.0]# /u01/app/oracle/product/10.2.0/crs_1/install/rootdelete.sh
Shutting down Oracle Cluster Ready Services (CRS):
Feb 13 04:41:13.568 | INF | daemon shutting down
Stopping resources. This could take several minutes.
Error while stopping resources. Possible cause: CRSD is down.
Shutdown has begun. The daemons should exit soon.
Checking to see if Oracle CRS stack is down...
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in '/etc/oracle/scls_scr'
Cleaning up Network socket directories
[root@node1 10.2.0]#
[root@node2 ~]# /u01/app/oracle/product/10.2.0/crs_1/install/rootdelete.sh
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources. This could take several minutes.
Error while stopping resources. Possible cause: CRSD is down.
Shutdown has begun. The daemons should exit soon.
Checking to see if Oracle CRS stack is down...
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in '/etc/oracle/scls_scr'
Cleaning up Network socket directories
3.在主节点上执行rootdeinstall.sh
这里的主节点是执行crs安装过程的那个节点,我这里是在节点1上执行的.
[root@node1 10.2.0]# /u01/app/oracle/product/10.2.0/crs_1/install/rootdeinstall.sh
Removing contents from OCR mirror device
2560+0 records in
2560+0 records out
10485760 bytes (10 MB) copied, 1.46619 seconds, 7.2 MB/s
Removing contents from OCR device
2560+0 records in
2560+0 records out
10485760 bytes (10 MB) copied, 2.48259 seconds, 4.2 MB/s
[root@node1 10.2.0]#
4.在主节点上执行root.sh,跟执行步骤3所在的节点上执行.
[root@node1 crs_1]# /u01/app/oracle/product/10.2.0/crs_1/root.sh
WARNING: directory '/u01/app/oracle/product/10.2.0' is not owned by root
WARNING: directory '/u01/app/oracle/product' is not owned by root
WARNING: directory '/u01/app/oracle' is not owned by root
WARNING: directory '/u01/app' is not owned by root
WARNING: directory '/u01' is not owned by root
No value set for the CRS parameter CRS_OCR_LOCATIONS. Using Values in paramfile.crs
Checking to see if Oracle CRS stack is already configured
Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/u01/app/oracle/product/10.2.0' is not owned by root
WARNING: directory '/u01/app/oracle/product' is not owned by root
WARNING: directory '/u01/app/oracle' is not owned by root
WARNING: directory '/u01/app' is not owned by root
WARNING: directory '/u01' is not owned by root
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node
node 1: node1 node1-priv node1
node 2: node2 node2-priv node2
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Now formatting voting device: /dev/raw/raw2
Format of 1 voting devices complete.
Startup will be queued to init within 30 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
Failure at final check of Oracle CRS stack.
10
该节点的crs无法启动,先不管,继续执行下面的步骤.
5.在另外一个节点上执行
[root@node2 ~]# /u01/app/oracle/product/10.2.0/crs_1/root.sh
WARNING: directory '/u01/app/oracle/product/10.2.0' is not owned by root
WARNING: directory '/u01/app/oracle/product' is not owned by root
WARNING: directory '/u01/app/oracle' is not owned by root
WARNING: directory '/u01/app' is not owned by root
WARNING: directory '/u01' is not owned by root
No value set for the CRS parameter CRS_OCR_LOCATIONS. Using Values in paramfile.crs
Checking to see if Oracle CRS stack is already configured
Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/u01/app/oracle/product/10.2.0' is not owned by root
WARNING: directory '/u01/app/oracle/product' is not owned by root
WARNING: directory '/u01/app/oracle' is not owned by root
WARNING: directory '/u01/app' is not owned by root
WARNING: directory '/u01' is not owned by root
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node
node 1: node1 node1-priv node1
node 2: node2 node2-priv node2
clscfg: Arguments check out successfully.
NO KEYS WERE WRITTEN. Supply -force parameter to override.
-force is destructive and will destroy any previous cluster
configuration.
Oracle Cluster Registry for cluster has already been initialized
Startup will be queued to init within 30 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
node1
node2
CSS is active on all nodes.
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps
Invalid interface "255.255.255.0/eth0" entered in an input argument.
发现节点2的crs也有问题,crsd错误日志如下:
2014-02-14 03:28:28.510: [ CSSCLNT][1176720]clsssInitNative: connect failed, rc 9
2014-02-14 03:28:28.510: [ CRSRTI][1176720]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2014-02-14 03:28:29.603: [ COMMCRS][40778640]clsc_connect: (0x88a7d00) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_crs
))
2014-02-14 03:28:29.603: [ CSSCLNT][1176720]clsssInitNative: connect failed, rc 9
2014-02-14 03:28:29.603: [ CRSRTI][1176720]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2014-02-14 03:28:30.707: [ COMMCRS][40778640]clsc_connect: (0x88a7d00) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_crs
))
2014-02-14 03:28:30.707: [ CSSCLNT][1176720]clsssInitNative: connect failed, rc 9
这个问题在网上大部分原因是节点通信问题,但是我验证了两个节点通信没有问题,重新执行如上的步骤,问题依旧,最后想到彻底删除crs,然后
重新安装集群软件.
彻底删除crs的步骤可以参考:http://blog.chinaunix.net/uid-77311-id-3298250.html
6.重新安装集群
安装的集群软件是10.2.0.1的,之前的集群软件已经升级到了10.2.0.5,所以先安装10.2.0.1集群软件,但是不执行vipca,
然后后再升级到10.2.0.5,最后才执行vipca.
7.配置ons
[root@node1 ~]# /u01/app/oracle/product/10.2.0/crs_1/bin/racgons add_config node1:6200 node2:6200
WARNING: node1:6200 already configured.
WARNING: node2:6200 already configured.
[root@node1 ~]# /u01/app/oracle/product/10.2.0/crs_1/bin/onsctl ping
Number of configuration nodes retrieved: 2
0: {node = node1, port = 6200}
Adding remote host node1:6200
1: {node = node2, port = 6200}
Adding remote host node2:6200
ons is not running ...
8.配置集群网路接口
在节点1上配置
node1-> $ORA_CRS_HOME/bin/oifcfg iflist
eth0 192.168.1.0 -- public接口
eth1 10.10.10.0 -- 私有通信接口
node1->$ORA_CRS_HOME/bin/oifcfg setif -global eth0/192.168.1.0:public
node1->$ORA_CRS_HOME/bin/oifcfg setif -global eth1/10.10.10.0:cluster_interconnect
node1-> $ORA_CRS_HOME/bin/oifcfg getif
eth0 192.168.1.0 global public
eth1 10.10.10.0 global cluster_interconnect
9.使用netca配置监听器
分别在节点1和节点2上将之前的监听文件转移到临时目录
node1->mv $ORACLE_HOME/network/admin/listener.ora /tmp/listener.ora.original_node1
node2->mv $ORACLE_HOME/network/admin/listener.ora /tmp/listener.ora.original_node2
在其中一个节点上使用netca添加监听器,添加完成后可以看到监听器资源已经加入到ocr.
node1-> crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....E1.lsnr application ONLINE ONLINE node1
ora.node1.gsd application ONLINE ONLINE node1
ora.node1.ons application ONLINE ONLINE node1
ora.node1.vip application ONLINE ONLINE node1
ora....E2.lsnr application ONLINE ONLINE node2
ora.node2.gsd application ONLINE ONLINE node2
ora.node2.ons application ONLINE ONLINE node2
ora.node2.vip application ONLINE ONLINE node2
10.将资源添加到ocr.
添加asm实例(注意大小写),操作只在一个节点上进行.
node1-> $ORA_CRS_HOME/bin/srvctl add asm -i +ASM1 -n node1 -o /u01/app/oracle/product/10.2.0/db_1
node1-> $ORA_CRS_HOME/bin/srvctl add asm -i +ASM2 -n node2 -o /u01/app/oracle/product/10.2.0/db_1
添加数据库
node1-> $ORA_CRS_HOME/bin/srvctl add database -d racdb -o /u01/app/oracle/product/10.2.0/db_1
添加实例
node1-> $ORA_CRS_HOME/bin/srvctl add instance -d racdb -i racdb1 -n node1
node1-> $ORA_CRS_HOME/bin/srvctl add instance -d racdb -i racdb2 -n node2
添加之前数据库的服务
node1-> $ORA_CRS_HOME/bin/srvctl add service -d racdb -s s1 -r racdb1 -a racdb2 -P BASIC
node1-> $ORA_CRS_HOME/bin/srvctl add service -d racdb -s s2 -r racdb2 -a racdb1 -P BASIC
添加完成后检查服务情况
node1-> crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....SM1.asm application OFFLINE OFFLINE
ora....E1.lsnr application ONLINE ONLINE node1
ora.node1.gsd application ONLINE ONLINE node1
ora.node1.ons application ONLINE ONLINE node1
ora.node1.vip application ONLINE ONLINE node1
ora....SM2.asm application OFFLINE OFFLINE
ora....E2.lsnr application ONLINE ONLINE node2
ora.node2.gsd application ONLINE ONLINE node2
ora.node2.ons application ONLINE ONLINE node2
ora.node2.vip application ONLINE ONLINE node2
ora.racdb.db application OFFLINE OFFLINE
ora....b1.inst application OFFLINE OFFLINE
ora....b2.inst application OFFLINE OFFLINE
ora....b.s1.cs application OFFLINE OFFLINE
ora....db1.srv application OFFLINE OFFLINE
ora....b.s2.cs application OFFLINE OFFLINE
ora....db2.srv application OFFLINE OFFLINE
node1-> srvctl start asm -n node1
node1-> srvctl start asm -n node2
node1-> srvctl start database -d racdb
node1-> srvctl start service -d racdb
这个时候检查资源运行情况
node1-> crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....SM1.asm application ONLINE ONLINE node1
ora....E1.lsnr application ONLINE ONLINE node1
ora.node1.gsd application ONLINE ONLINE node1
ora.node1.ons application ONLINE ONLINE node1
ora.node1.vip application ONLINE ONLINE node1
ora....SM2.asm application ONLINE ONLINE node2
ora....E2.lsnr application ONLINE ONLINE node2
ora.node2.gsd application ONLINE ONLINE node2
ora.node2.ons application ONLINE ONLINE node2
ora.node2.vip application ONLINE ONLINE node2
ora.racdb.db application ONLINE ONLINE node1
ora....b1.inst application ONLINE ONLINE node1
ora....b2.inst application ONLINE ONLINE node2
ora....b.s1.cs application ONLINE ONLINE node1
ora....db1.srv application ONLINE ONLINE node1
ora....b.s2.cs application ONLINE ONLINE node2
ora....db2.srv application ONLINE ONLINE node2
node1-> cluvfy stage -post crsinst -n node1,node2
Performing post-checks for cluster services setup
Checking node reachability...
Node reachability check passed from node "node1".
Checking user equivalence...
User equivalence check passed for user "oracle".
Checking Cluster manager integrity...
Checking CSS daemon...
Daemon status check passed for "CSS daemon".
Cluster manager integrity check passed.
Checking cluster integrity...
Cluster integrity check passed
Checking OCR integrity...
Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations.
Uniqueness check for OCR device passed.
Checking the version of OCR...
OCR of correct Version "2" exists.
Checking data integrity of OCR...
Data integrity check for OCR passed.
OCR integrity check passed.
Checking CRS integrity...
Checking daemon liveness...
Liveness check passed for "CRS daemon".
Checking daemon liveness...
Liveness check passed for "CSS daemon".
Checking daemon liveness...
Liveness check passed for "EVM daemon".
Checking CRS health...
CRS health check passed.
CRS integrity check passed.
Checking node application existence...
Checking existence of VIP node application (required)
Check passed.
Checking existence of ONS node application (optional)
Check passed.
Checking existence of GSD node application (optional)
Check passed.
Post-check for cluster services setup was successful.
到这里重建ocr完成,重新执行之前的export导出没有问题.
[root@node1 logs]# /u01/app/oracle/product/10.2.0/crs_1/bin/ocrconfig -export /u01/app/oracle/ocr_export140210_8.bak
[root@node1 logs]#
说明:
之前一直有一个自己理解的误区就是ASM实例的参数信息是保留在OCR里的,重建会将这些参数信息清理掉.其实10G里的ASM实例的参数文件是保存在/u01/app/oracle/admin/+ASM/pfile/init.ora,注册ASM实例资源,启动实例的时候会自动读取该文件(所以在彻底删除crs的时候不要将该文件删除掉).
-- The End --