Chinaunix首页 | 论坛 | 博客
  • 博客访问: 604216
  • 博文数量: 43
  • 博客积分: 4250
  • 博客等级: 上校
  • 技术积分: 486
  • 用 户 组: 普通用户
  • 注册时间: 2006-05-04 04:09
文章分类
文章存档

2009年(2)

2008年(5)

2007年(29)

2006年(7)

我的朋友

分类: LINUX

2007-09-17 17:13:23

此例中两台ibm x系例主机使用suresave HA软件做双机,后接ds4300双控做存储,每台机子使用两块qlogic 2312 HBA卡,彼此之间使用qlogic驱动做failover.使用过程中经常出现阵列上所建文件系统ext3 read-only,应用无法正常运行.必须umount后,fsck才能正常mount后读写.
 
下面是测试及解决方案:
 
环境与实际应用一致:
host1:10.0.0.1 (test1) 192.168.0.1 (test11 心跳)
host2:10.0.0.2 (test2) 192.168.0.2 (test12 心跳)
对外虚拟为test:10.0.0.3
1.搭建测试环境,使用nfs做为此次测试的网络应用.
#vi /etc/exports
/test *(rw,sync,insecure,anonuid=0)
 
#cat /usr/local/dmnk/bin/ss_nfs
#!/bin/sh
case "$1" in
start)
        /etc/init.d/nfsserver start
        exit 0
        ;;
stop)
        /etc/init.d/nfsserver stop
        ;;
*)
        echo $0 [start] [stop];
        exit 1;
esac

客户端用下面的命令加载
#mount 10.0.0.3:/test /mnt
 
2.使用下面的脚本做网络应用测试,主要是看ha的切换是否能让应用平滑过渡.
#cat test.sh
#!/bin/bash
echo -n "doing test...."
for j in `seq 1 2000`; do
for i in `seq 1 50`;do
cp  test.file /mnt/testwy${j}-${i}
if [ $? -ne 0 ]; then
find /mnt -type f -print | xargs rm -rf
fi
done
echo $j-$i
done
echo "test ok!"
 
 
3.在服务器上使用下面脚本来测试read-only出现的问题.
#cat test.sh
#!/bin/bash
for j in `seq 1 200`;do
for i in `seq 1 1000`;do
while ((`cat /proc/mounts | grep test | wc -l` < 1)) ; do
sleep 1
done
cp test.file /test/test1-${i}-${j}
if [ $? -ne 0 ]; then
find /test -type f -print | xargs rm -rf
fi
done
echo ${i}-${j}
done
 
4.在测试过程中发现,qlogic 2312 suse 9中驱动为8.00.在做failover测试中后出现read-only现象,
在HA切换中也会随机出现,根据解决问题的一个思路"靠近原则",确定很可能问题出在驱动或硬件上.选择最简单的方法下载驱动,重做测试.
 
5.下载IBM测试过的最新的驱动qla2xxx-src-v8.01.60.tar.gz(此驱动需要suse9 sp3以上或suse10)
驱动安装步骤:
#tar -zxvf qla2xxx-src-v8.01.60.tar.gz
#cd qla2xxx-8.01.60
#./extras/build.sh install
#vi /etc/sysconfig/kernel
修改此行为INITRD_MODULES="ata_piix mptspi mptfc mptsas  qla2xxx_conf qla2xxx qla2300 jbd ext3"
#mkinitrd
#reboot
 
6.结论.
通过10万小文件nfs应用反复读写过程中,ha的反复切换未发现问题.
通过10万小文件本地反复读写过程中,反复插拔fc线,failover均能平滑过渡。
问题得到解决。
 
 
7.下面是成功后qlogic failover切换的一些信息.
qla2xxx_conf: module not supported by Novell, setting U taint flag.
qla2xxx: module not supported by Novell, setting U taint flag.
QLogic Fibre Channel HBA Driver
qla2300: module not supported by Novell, setting U taint flag.
ACPI: PCI interrupt 0000:04:02.0[A] -> GSI 24 (level, low) -> IRQ 20
qla2300 0000:04:02.0: Found an ISP2312, irq 20, iobase 0xf98da000
qla2300 0000:04:02.0: Configuring PCI space...
qla2300 0000:04:02.0: Configure NVRAM parameters...
qla2300 0000:04:02.0: Verifying loaded RISC code...
powernow: This module only works with AMD K7 CPUs
qla2300 0000:04:02.0: LIP reset occured (f8f7).
qla2300 0000:04:02.0: Waiting for LIP to complete...
qla2300 0000:04:02.0: LIP occured (f8f7).
hda: ATAPI 24X DVD-ROM drive, 256kB Cache
Uniform CD-ROM driver Revision: 3.20
qla2300 0000:04:02.0: LOOP UP detected (2 Gbps).
st: Version 20040318, fixed bufsize 32768, s/g segs 256
Attached scsi generic sg0 at scsi1, channel 0, id 0, lun 0,  type 0
Attached scsi generic sg1 at scsi1, channel 0, id 8, lun 0,  type 3
qla2300 0000:04:02.0: Topology - (Loop), Host Loop address 0x7d
scsi2 : qla2xxx
qla2300 0000:04:02.0:
 QLogic Fibre Channel HBA Driver: 8.01.60-fo
  QLogic QLA2340 - 133MHz PCI-X to 2Gb FC, Single Channel
  ISP2312: PCI-X (100 MHz) @ 0000:04:02.0 hdma-, host#=2, fw=3.03.15 IPX
  Vendor: IBM       Model: 1722-600          Rev: 0520
  Type:   Direct-Access                      ANSI SCSI revision: 03
qla2300 0000:04:02.0: scsi(2:0:0:0): Enabled tagged queuing, queue depth 32.
SCSI device sdb: 287716672 512-byte hdwr sectors (147311 MB)
SCSI device sdb: drive cache: write back
 sdb: sdb1 sdb2 sdb3
Attached scsi disk sdb at scsi2, channel 0, id 0, lun 0
Attached scsi generic sg2 at scsi2, channel 0, id 0, lun 0,  type 0
  Vendor: IBM       Model: 1722-600          Rev: 0520
  Type:   Direct-Access                      ANSI SCSI revision: 03
qla2300 0000:04:02.0: scsi(2:0:0:1): Enabled tagged queuing, queue depth 32.
SCSI device sdc: 629145600 512-byte hdwr sectors (322123 MB)
SCSI device sdc: drive cache: write back
 sdc: sdc1
Attached scsi disk sdc at scsi2, channel 0, id 0, lun 1
Attached scsi generic sg3 at scsi2, channel 0, id 0, lun 1,  type 0
  Vendor: IBM       Model: Universal Xport   Rev: 0520
  Type:   Direct-Access                      ANSI SCSI revision: 03
qla2300 0000:04:02.0: scsi(2:0:0:31): Enabled tagged queuing, queue depth 32.
SCSI device sdd: 40960 512-byte hdwr sectors (21 MB)
SCSI device sdd: drive cache: write through
 sdd:
Attached scsi disk sdd at scsi2, channel 0, id 0, lun 31
Attached scsi generic sg4 at scsi2, channel 0, id 0, lun 31,  type 0
ACPI: PCI interrupt 0000:05:01.0[A] -> GSI 48 (level, low) -> IRQ 21
qla2300 0000:05:01.0: Found an ISP2312, irq 21, iobase 0xf9ab9000
qla2300 0000:05:01.0: Configuring PCI space...
qla2300 0000:05:01.0: Configure NVRAM parameters...
qla2300 0000:05:01.0: Verifying loaded RISC code...
qla2300 0000:05:01.0: Waiting for LIP to complete...
Non-volatile memory driver v1.2
BIOS EDD facility v0.16 2004-Jun-25, 1 devices found
qla2300 0000:05:01.0: LIP reset occured (f7f7).
qla2300 0000:05:01.0: LIP occured (f7f7).
qla2300 0000:05:01.0: LOOP UP detected (2 Gbps).
qla2300 0000:05:01.0: Topology - (Loop), Host Loop address 0x7d
scsi3 : qla2xxx
qla2300 0000:05:01.0:
 QLogic Fibre Channel HBA Driver: 8.01.60-fo
  QLogic QLA2340 - 133MHz PCI-X to 2Gb FC, Single Channel
  ISP2312: PCI-X (133 MHz) @ 0000:05:01.0 hdma-, host#=3, fw=3.03.15 IPX
lp: driver loaded but no devices found
drivers/usb/serial/usb-serial.c: USB Serial support registered for Generic
usbcore: registered new driver usbserial
drivers/usb/serial/usb-serial.c: USB Serial Driver core v2.0
eth1: no IPv6 routers present
eth0: no IPv6 routers present
kjournald starting.  Commit interval 5 seconds
EXT3 FS on dm-1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
qla2300 0000:05:01.0: LIP reset occured (f8ef).
qla2300 0000:05:01.0: LOOP DOWN detected (2).
qla2300 0000:05:01.0: LIP occured (f8ef).
qla2300 0000:05:01.0: LOOP UP detected (2 Gbps).
qla2300 0000:04:02.0: LIP reset occured (f8e4).
qla2300 0000:04:02.0: LIP occured (f8f7).
qla2300 0000:05:01.0: LIP reset occured (f8ef).
qla2300 0000:05:01.0: LIP occured (f8f7).
qla2300 0000:04:02.0: LIP reset occured (f8e4).
qla2300 0000:04:02.0: LOOP DOWN detected (2).
qla2300 0000:04:02.0: LIP occured (f8e4).
qla2300 0000:04:02.0: LOOP UP detected (2 Gbps).
qla2300 0000:05:01.0: LIP reset occured (f8ef).
qla2300 0000:04:02.0: LIP reset occured (f8e4).
qla2300 0000:04:02.0: LOOP DOWN detected (2).
qla2300 0000:05:01.0: LIP occured (f8f7).
qla2300 0000:04:02.0: LIP occured (f8e4).
qla2300 0000:04:02.0: LOOP UP detected (2 Gbps).
qla2300 0000:04:02.0: LIP reset occured (f8e4).
qla2300 0000:04:02.0: LOOP DOWN detected (2).
qla2x00: FAILOVER device 0 from 200500a0b81807a5 -> 200400a0b81807a5 - LUN 00, reason=0x2
qla2x00: FROM HBA 0 to HBA 1
qla2300 0000:04:02.0: LIP occured (f8e4).
qla2300 0000:04:02.0: LOOP UP detected (2 Gbps).
qla2x00: FAILBACK device 0 -> 200400a0b81807a4 LUN 00
qla2x00: FROM HBA 1 to HBA 0
qla2300 0000:04:02.0: LIP reset occured (f8e4).
qla2300 0000:04:02.0: LOOP DOWN detected (2).
qla2300 0000:04:02.0: LIP occured (f8e4).
qla2300 0000:04:02.0: LOOP UP detected (2 Gbps).
qla2300 0000:04:02.0: LIP reset occured (f8e4).
qla2300 0000:04:02.0: LOOP DOWN detected (2).
qla2300 0000:04:02.0: LIP occured (f8e4).
qla2300 0000:04:02.0: LOOP UP detected (2 Gbps).
qla2300 0000:04:02.0: LIP reset occured (f8e4).
qla2300 0000:04:02.0: LIP occured (f8f7).
qla2300 0000:04:02.0: LIP reset occured (f8e4).
qla2300 0000:04:02.0: LIP occured (f8f7).
切换过程十分平滑,failover和failback时,驱动堵塞了数据到盘阵的读写几秒,确保了切换过程的平滑和数据的完整.
 
8.相关描述
/*
 * QLogic ISP2XXX Linux Driver Revision List File.
 *

 *
 *  Rev  8.01.00test2 August 24, 2005  RA (在切换过程中挂起,加大切换时间)
 *      - Suspend/Unsuspend the target during failback.
 *      - For failover increase the cmd timeout when tgt is suspended.
 *      - Suspend/Unsupend the lun during transiton for DSXXX.
 * - Fixed the increment of fo_retry_cnt for a failed path.
 * - Increased the wait time for lun transition to 190 sec
 *   after set tgt port grp succeds.
 * 

 *
 *  Rev  8.01.00b9 July 26, 2005   DG/RA/AV
 *
 * - Update version to 8.01.00b9.
 * - Correct domain/area exclusion logic within FCAL.
 * - Remove RISC pause/release barriers during flash
 *   manipulation.
 * - Correct ISP24xx soft-reset handling.
 * - Correct LED scheme definition.
 * - Fixed DS400 handling of check conditions.
 * - Add DSXXX failover support. (官方支持从此开始)
 
阅读(4185) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~