分类:
2007-11-16 16:31:52
现象:
今天下午db报了几个EMS信息.内容如下:
EMS [3183]: ------ EMS Event Notification ------ Value: "CRITICAL (5)" for Resource: "/adapters/events/TL
_adapter/1_0_4_1_0" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R
208601093 -r /adapters/events/TL_adapter/1_0_4_1_0 -n 208601089 -a
详细信息如下:
ARCHIVED MONITOR DATA:
Event Time..........: Tue Nov 13 13:26:37 2007
Severity............: CRITICAL
Monitor.............: dm_TL_adapter
Event #.............: 37
System..............: bjeopd01
Summary:
Adapter at hardware path 1/0/4/1/0 : Target WWN no longer acceptable
Description of Error:
lbolt value: 18270782
World wide name (unique identifier) for following device has
changed
nport Id = 0x13800
Probable Cause / Recommended Action:
Unlike paralled SCSI, Fibre Channel allows easy hot-swapping of disks
Due to address conflicts it is possible for disks to swap loop IDs
accidently when re-plugging them into a hub.
In order to reduce the chance of user data corruption or file system
panics, we normally require the world wide names for a particular device
to stay the same. Requiring the fcmsutil replace_dsk command ensures
that the replacement is intentional.
The driver has detected a new "world-wide name" unique identifier
at a loop position previously occupied by a device with a different
identifier.
If this is due to a deliberate replacement of the
previous device, use the fcmsutil replace_dsk command
to allow the new device to be used. If there has
been no intentional replacing of any devices, check
for conflicts in address assignments of devices on
the loop.
Additional Event Data:
System IP Address...:
System IP Address...: System IP Address...:
Event Id............: 0x4739359000000000
Monitor Version.....: B.01.00
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_dm_TL_adapter.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
0x4739358b00000000
Additional System Data:
System Model Number.............: 9000/800/rp7420
OS Version......................: B.11.11
EMS Version.....................: A.04.00
STM Version.....................: A.45.00
Latest information on this event:
v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v
Component Data:
Physical Device Path....: 1/0/4/1/0
Vendor Id...............: 0x0000103C
Serial Number(WWN)......: 50060B000025DF58
I/O Log Event Data:
Driver Status Code..................: 0x00000025
Length of Logged Hardware Status....: 0 bytes.
Offset to Logged Manager Information: 0 bytes.
Length of Logged Manager Information: 68 bytes.
Manager-Specific Information:
Raw data from FCMS Adapter driver:
00000001 0116CA3E 00000003 00000001 00013800 100000E0 00000000 2F75782F
6B65726E 2F6B6973 752F544C 2F737263 2F636F6D 6D6F6E2F 7773696F 2F74645F
66632E63
(其他EMS信息基本与该信息相同)
************************************************************************
========================================================================
************************************************************************
分析:
是光纤卡报告的问题,所以应该查看与光纤卡连接的设备
步骤:
1.使用ioscan查看光纤卡
#ioscan -fnCfc
Class I H/W Path Driver S/W State H/W Type Description
==================================================================
fc 0 1/0/2/1/0 td CLAIMED INTERFACE HP Tachyon XL2 Fibre Channel Mass Storage Adapter
/dev/td0
fc 1 1/0/4/1/0 td CLAIMED INTERFACE HP Tachyon XL2 Fibre Channel Mass Storage Adapter
/dev/td1
2.为了避免是光纤卡的问题,检查每个光纤卡:
#fcmsutil /dev/td0
Vendor ID is = 0x00103c
Device ID is = 0x001029
XL2 Chip Revision No is = 2.3
PCI Sub-system Vendor ID is = 0x00103c
PCI Sub-system ID is = 0x00128c
Topology = PTTOPT_FABRIC
Link Speed = 2Gb
Local N_Port_id is = 0x010200
N_Port Node World Wide Name = 0x50060b000025df55
N_Port Port World Wide Name = 0x50060b000025df54
Driver state = ONLINE
Hardware Path is = 1/0/2/1/0
Number of Assisted IOs = 77665912
Number of Active Login Sessions = 2
Dino Present on Card = NO
Maximum Frame Size = 2048
Driver Version = @(#) libtd.a HP Fibre Channel Tachyon TL/TS/XL2 Driver B.11.11.12 PATCH_11.11 (PHSS_31326) /ux/kern/kisu/TL/src/common/wsio/td_glue.c: Sep 5 2005, 10:14:40
fcmsutil /dev/td1
Vendor ID is = 0x00103c
Device ID is = 0x001029
XL2 Chip Revision No is = 2.3
PCI Sub-system Vendor ID is = 0x00103c
PCI Sub-system ID is = 0x00128c
Topology = PTTOPT_FABRIC
Link Speed = 2Gb
Local N_Port_id is = 0x010200
N_Port Node World Wide Name = 0x50060b000025df59
N_Port Port World Wide Name = 0x50060b000025df58
Driver state = ONLINE
Hardware Path is = 1/0/4/1/0
Number of Assisted IOs = 4775204
Number of Active Login Sessions = 2
Dino Present on Card = NO
Maximum Frame Size = 2048
Driver Version = @(#) libtd.a HP Fibre Channel Tachyon TL/TS/XL2 Driver B.11.11.12 PATCH_11.11 (PHSS_31326) /ux/kern/kisu/TL/src/common/wsio/td_glue.c: Sep 5 2005, 10:14:40
发现这两个光纤卡都是online状态.
3.检查光纤卡上连接的设备中到底是哪个有问题(主要担心外接存储有问题,因为这会影响应用),用ioscan查看disk和tape
#ioscan -funC disk
发现所有外接硬盘均能认到,且,状态正常.
#ioscan -funC tape
发现有几个设备是NP_HW状态:
tape 34 1/0/2/1/0.1.56.255.0.0.0 stape NO_HW DEVICE HP Ultrium 3-SCSI
/dev/rmt/34m /dev/rmt/34mn /dev/rmt/c26t0d0BEST /dev/rmt/c26t0d0BESTn
/dev/rmt/34mb /dev/rmt/34mnb /dev/rmt/c26t0d0BESTb /dev/rmt/c26t0d0BESTnb
tape 37 1/0/2/1/0.1.56.255.0.0.1 stape NO_HW DEVICE HP Ultrium 3-SCSI
/dev/rmt/37m /dev/rmt/37mn /dev/rmt/c26t0d1BEST /dev/rmt/c26t0d1BESTn
/dev/rmt/37mb /dev/rmt/37mnb /dev/rmt/c26t0d1BESTb /dev/rmt/c26t0d1BESTnb
4.结论:
确定是带机的问题.现在正有HP工程师在对带机做些操作.该EMS不是系统问题.而是td0和td1这两个光纤卡无法识别该带机.
****************************************
5.总结:
也可以用一个相对简单的方法来检查.
1)先看报错的设备的nportID为0x13800.
2)其中,0x后面的1,对应的是设备地址中第一个小数点后面的1.
3)再看nport中,1后面的38.该数值是16进制,转换成10进制,则为56.即可判断问题设备的硬件地址为1/0/4/1/0.1.56
4)经ioscan查看,该设备为Ultrium 3的带库.