检查刚装好的RAC时发现状态有问题:
[root@rac10g2 ~]# crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.ora10g.db application ONLINE ONLINE rac10g2
ora....g1.inst application ONLINE ONLINE rac10g1
ora....g2.inst application ONLINE ONLINE rac10g2
ora....G1.lsnr application ONLINE ONLINE rac10g1
ora....0g1.gsd application ONLINE ONLINE rac10g1
ora....0g1.ons application ONLINE ONLINE rac10g1
ora....0g1.vip application ONLINE ONLINE rac10g1
ora....G2.lsnr application ONLINE OFFLINE
ora....0g2.gsd application ONLINE ONLINE rac10g2
ora....0g2.ons application ONLINE ONLINE rac10g2
ora....0g2.vip application ONLINE ONLINE rac10g1
节点2上的监听没有起,并且节点2的vip起在节点1上。
之前监听器是在节点1上建立的一个集群监听。
尝试将节点2的vip停止下来再重新启动,发现只能在节点1上启动。
/u01/oracle/crs/log/rac10g2/crsd/crsd.log的日志报错:
2009-12-18 13:41:33.386: [ CRSRES][2659736480]0Attempting to start `ora.rac10g2.vip` on member `rac10g2`
2009-12-18 13:41:46.720: [ CRSAPP][2659736480]0StartResource error for ora.rac10g2.vip error code = 1
2009-12-18 13:41:51.405: [ CRSRES][2659736480]0Start of `ora.rac10g2.vip` on member `rac10g2` failed.
2009-12-18 13:41:51.532: [ CRSRES][2659736480]0Attempting to start `ora.rac10g2.vip` on member `rac10g1`
2009-12-18 13:41:55.388: [ CRSRES][2659736480]0Start of `ora.rac10g2.vip` on member `rac10g1` succeeded.
将所有服务都停止,然后一一重启就好了。
[root@rac10g2 crsd]# crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.ora10g.db application ONLINE ONLINE rac10g2
ora....g1.inst application ONLINE ONLINE rac10g1
ora....g2.inst application ONLINE ONLINE rac10g2
ora....G1.lsnr application ONLINE ONLINE rac10g1
ora....0g1.gsd application ONLINE ONLINE rac10g1
ora....0g1.ons application ONLINE ONLINE rac10g1
ora....0g1.vip application ONLINE ONLINE rac10g1
ora....G2.lsnr application ONLINE ONLINE rac10g2
ora....0g2.gsd application ONLINE ONLINE rac10g2
ora....0g2.ons application ONLINE ONLINE rac10g2
ora....0g2.vip application ONLINE ONLINE rac10g2
但是过了一会,节点2的vip又再次飘移到了节点1上,并且节点2的监听服务也OFFLINE了。
在/u01/oracle/crs/log/rac10g2/racg下的ora.rac10g2.vip.log日志中发现如下报错:
2009-12-18 15:26:01.656: [ RACG][3086924000] [26596][3086924000][ora.rac10g2.vip]: ping to 192.168.94.1 via eth0 failed, rc = 1 (host=rac10g2)
ping to 192.168.94.1 via eth0 failed, rc = 1 (host=rac10g2)
2009-12-18 15:26:38.630: [ RACG][3086924000] [26861][3086924000][ora.rac10g2.vip]: ping to 192.168.94.1 via eth0 failed, rc = 1 (host=rac10g2)
ping to 192.168.94.1 via eth0 failed, rc = 1 (host=rac10g2)
Interface eth0 checked failed (host=rac10g2)
Invalid parameters, or failed to bring up VIP (host=rac10g2)
2009-12-18 15:26:38.630: [ RACG][3086924000] [26861][3086924000][ora.rac10g2.vip]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/oracle/crs
2009-12-18 15:26:38.630: [ RACG][3086924000] [26861][3086924000][ora.rac10g2.vip]: clsrcexecut: cmd = /u01/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /u01/oracle/crs/bin/racgvip check rac10g2
2009-12-18 15:26:38.630: [ RACG][3086924000] [26861][3086924000][ora.rac10g2.vip]: clsrcexecut: rc = 1, time = 6.640s
2009-12-18 15:26:38.630: [ RACG][3086924000] [26861][3086924000][ora.rac10g2.vip]: end for resource = ora.rac10g2.vip, action = check, status = 1, time = 6.690s
发现eth0 ping的网关不正常,应该是192.168.1.1,而不是192.168.94.1,192.168.94.1是配置给私有IP的网关,是之前的遗留配置。
将eth1上的网关设置删除后,还是出现相同的报错,怀疑是vip的默认网关配置出了问题。
将所有的服务都停止。
对节点2的vip进行debug,最后的5是_USR_ORA_DEBUG flag ,表示级别。
[root@rac10g2 ~]# crsctl debug log res "ora.rac10g2.vip:5"
Set Resource Debug Module: ora.rac10g2.vip Level: 5
启动ora.rac10g2.vip。
[root@rac10g2 ~]# crs_start ora.rac10g2.vip
查看trace日志。trace日志在$ORA_CRS_HOME/log//racg/
果然发现默认网关设置不正常,设置为192.168.94.1,并不是预计的192.168.1.1。
2009-12-18 15:53:15.830: [ RACG][3086924000] [2200][3086924000][ora.rac10g2.vip]: Fri Dec 18 15:53:09 CST 2009 [ 2204 ] /sbin/mii-tool eth0 error
Fri Dec 18 15:53:09 CST 2009 [ 2204 ] defaultgw: started
Fri Dec 18 15:53:09 CST 2009 [ 2204 ] defaultgw: completed with 192.168.94.1
2009-12-18 15:53:15.830: [ RACG][3086924000] [2200][3086924000][ora.rac10g2.vip]: ping to 192.168.94.1 via eth0 failed, rc = 1 (host=rac10g2)
ping to 192.168.94.1 via eth0 failed, rc = 1 (host=rac10g2)
Fri Dec 18 15:53:15 CST 2009 [ 2204 ] checkIf: ping and RX packets checked if=eth0 failed
Interface eth0 checked failed (host=rac10g2)
2009-12-18 15:53:15.830: [ RACG][3086924000] [2200][3086924000][ora.rac10g2.vip]: Fri Dec 18 15:53:15 CST 2009 [ 2204 ] checkIf: end for if=eth0
Invalid parameters, or failed to bring up VIP (host=rac10g2)
修改默认网关:
修改文件$ORA_CRS_HOME/bin/racgvip,将以下内容修改为想要指派的新的网关。
DEFAULTGW原本是没有值的,现在修改为192.168.1.1。
# hard code default gateway here if needed
DEFAULTGW=192.168.1.1
修改之后,在trace看到该部分已经检测ok。
2009-12-18 16:11:40.984: [ RACG][3086924000] [7808][3086924000][ora.rac10g2.vip]: Fri Dec 18 16:11:40 CST 2009 [ 7812 ] /sbin/mii-tool eth0 error
Fri Dec 18 16:11:40 CST 2009 [ 7812 ] checkIf: ping checked if=eth0 ok
Fri Dec 18 16:11:40 CST 2009 [ 7812 ] checkIf: end for if=eth0
该修改似乎只需要在节点2上修改,节点1上不需要修改,观察很长一段时间之后,没有再发现之前的问题。
其实该问题的出现,主要的原因是在eth1上配置了默认网关,用于私有网络的网卡不建议配置网关。所以该问题中,如果没有特别的必要,将私有网卡上的网关删除,删除默认路由网关,重启vip即可解决该问题。
阅读(3163) | 评论(1) | 转发(1) |