此例中两台ibm x系例主机使用suresave HA软件做双机,后接ds4300双控做存储,每台机子使用两块qlogic 2312 HBA卡,彼此之间使用qlogic驱动做failover.使用过程中经常出现阵列上所建文件系统ext3 read-only,应用无法正常运行.必须umount后,fsck才能正常mount后读写.
下面是测试及解决方案:
环境与实际应用一致:
host1:10.0.0.1 (test1) 192.168.0.1 (test11 心跳)
host2:10.0.0.2 (test2) 192.168.0.2 (test12 心跳)
对外虚拟为test:10.0.0.3
1.搭建测试环境,使用nfs做为此次测试的网络应用.
#vi /etc/exports
/test *(rw,sync,insecure,anonuid=0)
#cat /usr/local/dmnk/bin/ss_nfs
#!/bin/sh
case "$1" in
start)
/etc/init.d/nfsserver start
exit 0
;;
stop)
/etc/init.d/nfsserver stop
;;
*)
echo $0 [start] [stop];
exit 1;
esac
客户端用下面的命令加载
#mount 10.0.0.3:/test /mnt
2.使用下面的脚本做网络应用测试,主要是看ha的切换是否能让应用平滑过渡.
#cat test.sh
#!/bin/bash
echo -n "doing test...."
for j in `seq 1 2000`; do
for i in `seq 1 50`;do
cp test.file /mnt/testwy${j}-${i}
if [ $? -ne 0 ]; then
find /mnt -type f -print | xargs rm -rf
fi
done
echo $j-$i
done
echo "test ok!"
3.在服务器上使用下面脚本来测试read-only出现的问题.
#cat test.sh
#!/bin/bash
for j in `seq 1 200`;do
for i in `seq 1 1000`;do
while ((`cat /proc/mounts | grep test | wc -l` < 1)) ; do
sleep 1
done
cp test.file /test/test1-${i}-${j}
if [ $? -ne 0 ]; then
find /test -type f -print | xargs rm -rf
fi
done
echo ${i}-${j}
done
4.在测试过程中发现,qlogic 2312 suse 9中驱动为8.00.在做failover测试中后出现read-only现象,
在HA切换中也会随机出现,根据解决问题的一个思路"靠近原则",确定很可能问题出在驱动或硬件上.选择最简单的方法下载驱动,重做测试.
5.下载IBM测试过的最新的驱动qla2xxx-src-v8.01.60.tar.gz(此驱动需要suse9 sp3以上或suse10)
驱动安装步骤:
#tar -zxvf qla2xxx-src-v8.01.60.tar.gz
#cd qla2xxx-8.01.60
#./extras/build.sh install
#vi /etc/sysconfig/kernel
修改此行为INITRD_MODULES="ata_piix mptspi mptfc mptsas qla2xxx_conf qla2xxx qla2300 jbd ext3"
#mkinitrd
#reboot
6.结论.
通过10万小文件nfs应用反复读写过程中,ha的反复切换未发现问题.
通过10万小文件本地反复读写过程中,反复插拔fc线,failover均能平滑过渡。
问题得到解决。
7.下面是成功后qlogic failover切换的一些信息.
qla2xxx_conf: module not supported by Novell, setting U taint flag.
qla2xxx: module not supported by Novell, setting U taint flag.
QLogic Fibre Channel HBA Driver
qla2300: module not supported by Novell, setting U taint flag.
ACPI: PCI interrupt 0000:04:02.0[A] -> GSI 24 (level, low) -> IRQ 20
qla2300 0000:04:02.0: Found an ISP2312, irq 20, iobase 0xf98da000
qla2300 0000:04:02.0: Configuring PCI space...
qla2300 0000:04:02.0: Configure NVRAM parameters...
qla2300 0000:04:02.0: Verifying loaded RISC code...
powernow: This module only works with AMD K7 CPUs
qla2300 0000:04:02.0: LIP reset occured (f8f7).
qla2300 0000:04:02.0: Waiting for LIP to complete...
qla2300 0000:04:02.0: LIP occured (f8f7).
hda: ATAPI 24X DVD-ROM drive, 256kB Cache
Uniform CD-ROM driver Revision: 3.20
qla2300 0000:04:02.0: LOOP UP detected (2 Gbps).
st: Version 20040318, fixed bufsize 32768, s/g segs 256
Attached scsi generic sg0 at scsi1, channel 0, id 0, lun 0, type 0
Attached scsi generic sg1 at scsi1, channel 0, id 8, lun 0, type 3
qla2300 0000:04:02.0: Topology - (Loop), Host Loop address 0x7d
scsi2 : qla2xxx
qla2300 0000:04:02.0:
QLogic Fibre Channel HBA Driver: 8.01.60-fo
QLogic QLA2340 - 133MHz PCI-X to 2Gb FC, Single Channel
ISP2312: PCI-X (100 MHz) @ 0000:04:02.0 hdma-, host#=2, fw=3.03.15 IPX
Vendor: IBM Model: 1722-600 Rev: 0520
Type: Direct-Access ANSI SCSI revision: 03
qla2300 0000:04:02.0: scsi(2:0:0:0): Enabled tagged queuing, queue depth 32.
SCSI device sdb: 287716672 512-byte hdwr sectors (147311 MB)
SCSI device sdb: drive cache: write back
sdb: sdb1 sdb2 sdb3
Attached scsi disk sdb at scsi2, channel 0, id 0, lun 0
Attached scsi generic sg2 at scsi2, channel 0, id 0, lun 0, type 0
Vendor: IBM Model: 1722-600 Rev: 0520
Type: Direct-Access ANSI SCSI revision: 03
qla2300 0000:04:02.0: scsi(2:0:0:1): Enabled tagged queuing, queue depth 32.
SCSI device sdc: 629145600 512-byte hdwr sectors (322123 MB)
SCSI device sdc: drive cache: write back
sdc: sdc1
Attached scsi disk sdc at scsi2, channel 0, id 0, lun 1
Attached scsi generic sg3 at scsi2, channel 0, id 0, lun 1, type 0
Vendor: IBM Model: Universal Xport Rev: 0520
Type: Direct-Access ANSI SCSI revision: 03
qla2300 0000:04:02.0: scsi(2:0:0:31): Enabled tagged queuing, queue depth 32.
SCSI device sdd: 40960 512-byte hdwr sectors (21 MB)
SCSI device sdd: drive cache: write through
sdd:
Attached scsi disk sdd at scsi2, channel 0, id 0, lun 31
Attached scsi generic sg4 at scsi2, channel 0, id 0, lun 31, type 0
ACPI: PCI interrupt 0000:05:01.0[A] -> GSI 48 (level, low) -> IRQ 21
qla2300 0000:05:01.0: Found an ISP2312, irq 21, iobase 0xf9ab9000
qla2300 0000:05:01.0: Configuring PCI space...
qla2300 0000:05:01.0: Configure NVRAM parameters...
qla2300 0000:05:01.0: Verifying loaded RISC code...
qla2300 0000:05:01.0: Waiting for LIP to complete...
Non-volatile memory driver v1.2
BIOS EDD facility v0.16 2004-Jun-25, 1 devices found
qla2300 0000:05:01.0: LIP reset occured (f7f7).
qla2300 0000:05:01.0: LIP occured (f7f7).
qla2300 0000:05:01.0: LOOP UP detected (2 Gbps).
qla2300 0000:05:01.0: Topology - (Loop), Host Loop address 0x7d
scsi3 : qla2xxx
qla2300 0000:05:01.0:
QLogic Fibre Channel HBA Driver: 8.01.60-fo
QLogic QLA2340 - 133MHz PCI-X to 2Gb FC, Single Channel
ISP2312: PCI-X (133 MHz) @ 0000:05:01.0 hdma-, host#=3, fw=3.03.15 IPX
lp: driver loaded but no devices found
drivers/usb/serial/usb-serial.c: USB Serial support registered for Generic
usbcore: registered new driver usbserial
drivers/usb/serial/usb-serial.c: USB Serial Driver core v2.0
eth1: no IPv6 routers present
eth0: no IPv6 routers present
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
qla2300 0000:05:01.0: LIP reset occured (f8ef).
qla2300 0000:05:01.0: LOOP DOWN detected (2).
qla2300 0000:05:01.0: LIP occured (f8ef).
qla2300 0000:05:01.0: LOOP UP detected (2 Gbps).
qla2300 0000:04:02.0: LIP reset occured (f8e4).
qla2300 0000:04:02.0: LIP occured (f8f7).
qla2300 0000:05:01.0: LIP reset occured (f8ef).
qla2300 0000:05:01.0: LIP occured (f8f7).
qla2300 0000:04:02.0: LIP reset occured (f8e4).
qla2300 0000:04:02.0: LOOP DOWN detected (2).
qla2300 0000:04:02.0: LIP occured (f8e4).
qla2300 0000:04:02.0: LOOP UP detected (2 Gbps).
qla2300 0000:05:01.0: LIP reset occured (f8ef).
qla2300 0000:04:02.0: LIP reset occured (f8e4).
qla2300 0000:04:02.0: LOOP DOWN detected (2).
qla2300 0000:05:01.0: LIP occured (f8f7).
qla2300 0000:04:02.0: LIP occured (f8e4).
qla2300 0000:04:02.0: LOOP UP detected (2 Gbps).
qla2300 0000:04:02.0: LIP reset occured (f8e4).
qla2300 0000:04:02.0: LOOP DOWN detected (2).
qla2x00: FAILOVER device 0 from 200500a0b81807a5 -> 200400a0b81807a5 - LUN 00, reason=0x2
qla2x00: FROM HBA 0 to HBA 1
qla2300 0000:04:02.0: LIP occured (f8e4).
qla2300 0000:04:02.0: LOOP UP detected (2 Gbps).
qla2x00: FAILBACK device 0 -> 200400a0b81807a4 LUN 00
qla2x00: FROM HBA 1 to HBA 0
qla2300 0000:04:02.0: LIP reset occured (f8e4).
qla2300 0000:04:02.0: LOOP DOWN detected (2).
qla2300 0000:04:02.0: LIP occured (f8e4).
qla2300 0000:04:02.0: LOOP UP detected (2 Gbps).
qla2300 0000:04:02.0: LIP reset occured (f8e4).
qla2300 0000:04:02.0: LOOP DOWN detected (2).
qla2300 0000:04:02.0: LIP occured (f8e4).
qla2300 0000:04:02.0: LOOP UP detected (2 Gbps).
qla2300 0000:04:02.0: LIP reset occured (f8e4).
qla2300 0000:04:02.0: LIP occured (f8f7).
qla2300 0000:04:02.0: LIP reset occured (f8e4).
qla2300 0000:04:02.0: LIP occured (f8f7).
切换过程十分平滑,failover和failback时,驱动堵塞了数据到盘阵的读写几秒,确保了切换过程的平滑和数据的完整.
8.相关描述
/*
* QLogic ISP2XXX Linux Driver Revision List File.
*
*
* Rev 8.01.00test2 August 24, 2005 RA (在切换过程中挂起,加大切换时间)
* - Suspend/Unsuspend the target during failback.
* - For failover increase the cmd timeout when tgt is suspended.
* - Suspend/Unsupend the lun during transiton for DSXXX.
* - Fixed the increment of fo_retry_cnt for a failed path.
* - Increased the wait time for lun transition to 190 sec
* after set tgt port grp succeds.
*
*
* Rev 8.01.00b9 July 26, 2005 DG/RA/AV
*
* - Update version to 8.01.00b9.
* - Correct domain/area exclusion logic within FCAL.
* - Remove RISC pause/release barriers during flash
* manipulation.
* - Correct ISP24xx soft-reset handling.
* - Correct LED scheme definition.
* - Fixed DS400 handling of check conditions.
* - Add DSXXX failover support. (官方支持从此开始)