分类:
2006-10-20 13:46:55
共享存储用的是富士通的,切换测试时带takeover切换都正常,但这天遇麻烦了,主机突然DOWN机,备机HACMP在获取共享存储时,无法清除硬盘上事前在主机上设置的的保留标志,导致共享硬盘无法访问,进而共享卷组激活失败,最后备机无法接管主机的应用。同主机的其它LPAR的HACMP用的存储是IBM ESS.就没有出现这情况.很轻松的就接管或是手工的将RG move to backup-node.不知道是不是富士通的存储与IBM HACMP有兼容性问题呢.还请大家发表下下观点与意见和建议.
以下是HACMP发生接管时产生的日志(对其中失败操作部分已做显式标志);
hacmp.out.1:
……
1067:
NODEA_res:cl_disk_available[22] disktype=UNKNOWN
... ...
1097:
NODEA_res:cl_disk_available[10] scdiskutil -t /dev/hdisk3
scdiskutil[1993]disk= hdisk3
scdiskutil[1994] location =
scdiskutil[1995] uniquetype = disk/fcp/e6000
scdiskutil[2043] scsi_id = 0x11400
scdiskutil[2045] lun_id = 0x1000000000000
scdiskutil[2087] fscsi name = cbx1
scdiskutil[2097] dynamic tracking = 0
scdiskutil[2201] Adapter device /dev/cbx1 opened.
scdiskutil[2207] IOCINFO succeeded. devtype=b, devsubtype=F
scdiskutil[2224] connection type = Switch (Fabric)
scdiskutil[2676] ioctl(SCIOLTUR), tur_rc=-1, errno=5
scdiskutil[2676] ioctl(SCIOLTUR), tur_rc=-1, errno=5
scdiskutil[2718] validity=1 scsi_status=0x18
SC_RESERVATION_CONFLICT
scdiskutil[2267] SCSI_TEST_UNIT_READY(0x00), rc1=24
scdiskutil[2285] Issue fcp_issue_stop()
scdiskutil[2321] rc1=24, rc2=0, rc3=0, rc4=0, rc6=0
scdiskutil[155] version 1.17
... ...
1127:
NODEA_res:cl_disk_available[173] cl_scdiskrsrv /dev/hdisk3
openx failed on device /dev/hdisk3.
: Device busy
errorno 16
NODEA_res:cl_disk_available[174] rsrv_rc=255
NODEA_res:cl_disk_available[184] (( 255 == 255 ))
NODEA_res:cl_disk_available[185] cl_log 205 'cl_disk_available: Failed reset/rese
rve for device: hdisk3.' cl_disk_available hdisk3
NODEA_res:cl_log[50] version=1.9
NODEA_res:cl_log[92] SYSLOG_FILE=/usr/es/adm/cluster.log
***************************
Oct 9 2006 12:39:44 !!!!!!!!!! ERROR !!!!!!!!!!
***************************
Oct 9 2006 12:39:44 cl_disk_available: Failed reset/reserve for device: hdisk3.
NODEA_res:cl_disk_available[189] return 255
1292:
NODEA_res:clvaryonvg[504] varyonvg -n sharevg
PV Status: hdisk3 0038469cbbcbea
0516-013 varyonvg: The volume group cannot be varied on because
there are no good copies of the descriptor area.
1328:
Oct 9 2006 12:39:47 cl_activate_vgs: Failed clvaryonvg of sharevg.NODEA_res:cl_ac
tivate_vgs[99] STATUS=1
1435:
HACMP Event Summary
Event: node_down NODEA
Start time: Mon Oct 9 12:39:22 2006
End time: Mon Oct 9 12:39:52 2006
Action: Resource: Script Name:
----------------------------------------------------------------------------
Acquiring resource group: NODE_res process_resources
Search on: Mon.Oct.9.12:39:25.BEIST.2006.process_resources.NODE_res.ref
Acquiring resource: All_service_addrs acquire_takeover_addr
Search on: Mon.Oct.9.12:39:27.BEIST.2006.acquire_takeover_addr.All_service_addrs
.NODEA_res.ref
Resource online: All_nonerror_service_addrs acquire_takeover_addr
Search on: Mon.Oct.9.12:39:34.BEIST.2006.acquire_takeover_addr.All_nonerror_serv
ice_addrs.NODEA_res.ref
Acquiring resource: All_disks cl_disk_available
Search on: Mon.Oct.9.12:39:36.BEIST.2006.cl_disk_available.All_disks.NODEA_res.re
f
Resource online: All_nonerror_disks cl_disk_available
Search on: Mon.Oct.9.12:39:44.BEIST.2006.cl_disk_available.All_nonerror_disks.NODEA_res.ref
Acquiring resource: All_volume_groups cl_activate_vgs
Search on: Mon.Oct.9.12:39:45.BEIST.2006.cl_activate_vgs.All_volume_groups.NODEA_
res.ref
Error encountered with resource: sharevg cl_activate_vgs
Search on: Mon.Oct.9.12:39:46.BEIST.2006.cl_activate_vgs.feqsvg.NODEA_res.ref
Resource online: All_nonerror_volume_groups cl_activate_vgs
Search on: Mon.Oct.9.12:39:47.BEIST.2006.cl_activate_vgs.All_nonerror_volume_gro
ups.NODEA_res.ref
Error encountered with group: NODEA_res process_resources
Search on: Mon.Oct.9.12:39:49.BEIST.2006.process_resources.NODEA_res.ref
----------------------------------------------------------------------------