阵列:
阵列有两种访问方式:串口、网络。
一般第一次配置是通过串口,然后可以给阵列配置IP,就可以通过网络TELNET访问了。
常用机型命令:
T3、SE6120: 此两种机型命令基本一致,常用的是fru和vol命令。
fru list是阵列中各fru部件的列表。
fru stat是阵列中各fru部件的状态列表。
vol list是阵列上所有卷的名称、容量、raid级别、卷的组成、卷的standby盘。
vol stat是卷的状态、卷中各盘的状态。
fru——现场可替换单元(包括控制器、盘、互连卡、电源模块、主板)
下面是正常的阵列系统情况:
:/:<10>fru list
ID TYPE VENDOR MODEL REVISION SERIAL
------ ----------------- ----------- ----------- ------------- --------
u1ctr controller card 0301 501-5710-02( 0200/020100 120430
u1d1 disk drive SEAGATE ST373405FSUN A338 3EK1E00Y
u1d2 disk drive SEAGATE ST373405FSUN A338 3EK1JXJP
u1d3 disk drive SEAGATE ST373405FSUN A338 3EK1FKJK
u1d4 disk drive SEAGATE ST373405FSUN A338 3EK1JZ3V
u1d5 disk drive SEAGATE ST373405FSUN A338 3EK1EFB4
u1d6 disk drive SEAGATE ST373405FSUN A338 3EK1JVTM
u1d7 disk drive SEAGATE ST373405FSUN A338 3EK1JVLM
u1d8 disk drive SEAGATE ST373405FSUN A338 3EK1JW1N
u1d9 disk drive SEAGATE ST373405FSUN A338 3EK1JVMV
u1l1 loop card SLR-MI 375-0085-01- 5.02 Flash 119447
u1l2 loop card SLR-MI 375-0085-01- 5.02 Flash 119555
u1pcu1 power/cooling unit TECTROL-CAN 300-1454-01( 0000 088393
u1pcu2 power/cooling unit TECTROL-CAN 300-1454-01( 0000 088415
u1mpn mid plane SLR-MI 370-3990-02- 0000 050298
:/:<11>fru stat
CTLR STATUS STATE ROLE PARTNER TEMP
------ ------- ---------- ---------- ------- ----
u1ctr ready enabled master - 34.0
DISK STATUS STATE ROLE PORT1 PORT2 TEMP VOLUME
------ ------- ---------- ---------- --------- --------- ---- ------
u1d1 ready enabled data disk ready ready 38 v0
u1d2 ready enabled data disk ready ready 36 v0
u1d3 ready enabled data disk ready ready 37 v0
u1d4 ready enabled data disk ready ready 38 v0
u1d5 ready enabled data disk ready ready 38 v1
u1d6 ready enabled data disk ready ready 39 v1
u1d7 ready enabled data disk ready ready 38 v1
u1d8 ready enabled data disk ready ready 39 v1
u1d9 ready enabled data disk ready ready 39 v1
LOOP STATUS STATE MODE CABLE1 CABLE2 TEMP
------ ------- ---------- ------- --------- --------- ----
u1l1 ready enabled master - - 31.5
u1l2 ready enabled slave - - 33.5
POWER STATUS STATE SOURCE OUTPUT BATTERY TEMP FAN1 FAN2
------ ------- --------- ------ ------ ------- ------ ------ ------
u1pcu1 ready enabled line normal normal normal normal normal
u1pcu2 ready enabled line normal normal normal normal normal
:/:<13>vol list
volume capacity raid data standby
v0 204.510 GB 5 u1d1-4 none
v1 272.681 GB 5 u1d5-9 none
:/:<14>vol stat
v0 u1d1 u1d2 u1d3 u1d4
mounted 0 0 0 0
v1 u1d5 u1d6 u1d7 u1d8 u1d9
mounted 0 0 0 0 0
不正常的阵列系统情况:
:/:<42>fru stat
CTLR STATUS STATE ROLE PARTNER TEMP
------ ------- ---------- ---------- ------- ----
u1ctr ready enabled master - 34.5
DISK STATUS STATE ROLE PORT1 PORT2 TEMP VOLUME
------ ------- ---------- ---------- --------- --------- ---- ------
u1d1 ready enabled data disk ready ready 39 v0
u1d2 ready enabled data disk ready ready 38 v0
u1d3 ready enabled data disk ready ready 38 v0
u1d4 ready enabled data disk ready ready 38 v0
u1d5 ready enabled data disk ready ready 38 v1
u1d6 ready enabled data disk ready ready 39 v1
u1d7 ready enabled data disk ready ready 39 v1
u1d8 ready enabled data disk ready ready 40 v1
u1d9 ready enabled data disk ready ready 40 v1
LOOP STATUS STATE MODE CABLE1 CABLE2 TEMP
------ ------- ---------- ------- --------- --------- ----
u1l1 ready enabled master - - 31.0
u1l2 ready enabled slave - - 33.5
POWER STATUS STATE SOURCE OUTPUT BATTERY TEMP FAN1 FAN2
------ ------- --------- ------ ------ ------- ------ ------ ------
u1pcu1 ready enabled line normal fault normal normal normal
u1pcu2 ready enabled line normal normal normal normal normal
以上情况中的电源模块u1pcu1的BATTERY显示为fault,表明电源模块u1pcu1中的电池有问题。
:/:<14>vol stat
v0 u1d1 u1d2 u1d3 u1d4
mounted 0 0 1 0
v1 u1d5 u1d6 u1d7 u1d8 u1d9
mounted 0 0 0 0 0
以上情况中的卷v0中的盘u1d3状态为1,表明卷v0中的盘u1d3有问题。
SE3310、SE3510阵列是一种类似图形化的访问界面,可以看到阵列中各部件的状态;做的卷、分区状态;划分的LUN等。
根据图形化菜单选择即可,不需要命令。
主机:
Sun公司的主机虽然有很多机型,但其安装的操作系统都是一样的,现在比较常用的是Solaris 8和9;主板的NVRAM芯片上包含了在操作系统脱机的情况下察看系统状态和进行配置的系统OBP,OBP的版本也是不断升级的,版本越高,支持的功能也就越多;还有一些机型可以通过系统中安装的ALOM或RSC卡配置IP后,进行远程控制,可以实现远程登录、重启、开关机等操作。
常用命令:
操作系统中:
操作系统的日志是非常重要的检查系统状态的工具,它们是位于/var/adm目录下的messages文件。一般来讲,当messages文件中含有ERROR、WARNING等信息时就需要注意了。
如下面这条信息:
Apr 14 09:02:57 ruiyi scsi: [ID 107833 kern.warning] WARNING: /pci@1f,4000/scsi@3/sd@1,0 (sd1):
Apr 14 09:02:57 ruiyi Error for Command: load/start/stop Error Level: Informational
Apr 14 09:02:57 ruiyi scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0
Apr 14 09:02:57 ruiyi scsi: [ID 107833 kern.notice] Vendor: SEAGATE Serial Number: 0029J13424
Apr 14 09:02:57 ruiyi scsi: [ID 107833 kern.notice] Sense Key: Soft Error
Apr 14 09:02:57 ruiyi scsi: [ID 107833 kern.notice] ASC: 0x5d (), ASCQ: 0x32, FRU: 0x32
这是一条WARNING信息,标明了物理地址/pci@1f,4000/scsi@3/sd@1,0上的设备出现错误,经过检查,这个设备是一块scsi硬盘,它出现了读写错误,这个时候我们就应该更换硬盘了。
如下面的信息:
Dec 3 19:20:38 ncsesn001 unix: WARNING: [AFT1] Uncorrectable Memory Error on CPU1 Data access at TL=0, errID 0x00055008.dee19385^M
Dec 3 19:20:38 ncsesn001 unix: AFSR 0x00000000.00200000 AFAR 0x00000000.b4cbb350^M
Dec 3 19:20:38 ncsesn001 unix: AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x11b064^M
Dec 3 19:20:38 ncsesn001 unix: UDBH 0x0203 UDBH.ESYND 0x03 UDBL 0x0000 UDBL.ESYND 0x00^M
Dec 3 19:20:38 ncsesn001 unix: UDBH Syndrome 0x3 Memory Module 170x
Dec 3 19:20:38 ncsesn001 unix: WARNING: [AFT1] errID 0x00055008.dee19385 Syndrome 0x3 indicates that this may not be a memory module problem
Dec 3 19:20:38 ncsesn001 unix: [AFT2] errID 0x00055008.dee19385 PA=0x00000000.b4cbb350
Dec 3 19:20:38 ncsesn001 unix: E$tag 0x00000000.0a401699 E$State: Shared E$parity 0x05
Dec 3 19:20:38 ncsesn001 unix: [AFT2] E$Data (0x00): 0x00000000.00000000
Dec 3 19:20:38 ncsesn001 unix: [AFT2] E$Data (0x08): 0x00000000.0067c078
Dec 3 19:20:38 ncsesn001 unix: [AFT2] E$Data (0x10): 0x004b72e0.00356f08 *Bad* PSYND=0xff00
Dec 3 19:20:38 ncsesn001 unix: [AFT2] E$Data (0x18): 0x0084f938.00000000
Dec 3 19:20:38 ncsesn001 unix: [AFT2] E$Data (0x20): 0x0027f540.00022093
Dec 3 19:20:38 ncsesn001 unix: [AFT2] E$Data (0x28): 0x00000000.004b72e0
Dec 3 19:20:38 ncsesn001 unix: [AFT2] E$Data (0x30): 0x003d448c.008d5374
Dec 3 19:20:38 ncsesn001 unix: [AFT2] E$Data (0x38): 0x00000000.0940192e
Dec 3 19:20:38 ncsesn001 unix: NOTICE: Scheduling clearing of error on page 0x00000000.b4cba000
Dec 3 19:20:38 ncsesn001 unix: [AFT3] errID 0x00055008.dee19385 Above Error is in User Mode
Dec 3 19:20:38 ncsesn001 unix: and is fatal: will reboot
Dec 3 19:20:38 ncsesn001 unix: WARNING: [AFT1] initiating reboot due to above error in pid 22879 (osserver)
这条信息为CPU检测到了内存错误,一般可定位为内存问题,通过信息标明的内存编号可确定是哪一块内存;但有时可能会是CPU本身的问题,可以根据信息的具体提示和经验进行判断。
有时系统里会有一些不会导致系统出问题的WARNING信息,如
WARNING forceload of misc/md_trans failed
WARNING forceload of misc/md_raid failed
WARNING forceload of misc/md_hotspares failed
WARNING forceload of misc/md_sp failed
Mar 20 12:33:12 jsun-server1 syslogd: line 24: WARNING: loghost could not be resolved
这是一些软件没有配置完成、或是系统配置文件不完整所导致的,可根据经验修正,使它们不再出现。
prtdiag命令
#/usr/platform/sun4u/sbin/prtdiag -v
System Configuration: Sun Microsystems sun4u Sun Fire 6800
System clock frequency: 150 MHz
Memory size: 16384 Megabytes
========================= CPUs ===============================================
CPU Run E$ CPU CPU
FRU Name ID MHz MB Impl. Mask
---------- ------- ---- ---- ------- ----
/N0/SB3/P0 12 1200 8.0 US-III+ 11.0
/N0/SB3/P1 13 1200 8.0 US-III+ 11.0
/N0/SB3/P2 14 1200 8.0 US-III+ 11.0
/N0/SB3/P3 15 1200 8.0 US-III+ 11.0
/N0/SB5/P0 20 1200 8.0 US-III+ 11.0
/N0/SB5/P1 21 1200 8.0 US-III+ 11.0
/N0/SB5/P2 22 1200 8.0 US-III+ 11.0
/N0/SB5/P3 23 1200 8.0 US-III+ 11.0
========================= Memory Configuration ===============================
Logical Logical Logical
Port Bank Bank Bank DIMM Interleave Interleave
FRU Name ID Num Size Status Size Factor Segment
------------- ---- ---- ------ ----------- ------ ---------- ----------
/N0/SB3/P0/B0 12 0 1024MB pass 512MB 8-way 0
/N0/SB3/P0/B0 12 2 1024MB pass 512MB 8-way 0
/N0/SB3/P1/B0 13 0 1024MB pass 512MB 8-way 0
/N0/SB3/P1/B0 13 2 1024MB pass 512MB 8-way 0
/N0/SB3/P2/B0 14 0 1024MB pass 512MB 8-way 0
/N0/SB3/P2/B0 14 2 1024MB pass 512MB 8-way 0
/N0/SB3/P3/B0 15 0 1024MB pass 512MB 8-way 0
/N0/SB3/P3/B0 15 2 1024MB pass 512MB 8-way 0
/N0/SB5/P0/B0 20 0 1024MB pass 512MB 8-way 1
/N0/SB5/P0/B0 20 2 1024MB pass 512MB 8-way 1
/N0/SB5/P1/B0 21 0 1024MB pass 512MB 8-way 1
/N0/SB5/P1/B0 21 2 1024MB pass 512MB 8-way 1
/N0/SB5/P2/B0 22 0 1024MB pass 512MB 8-way 1
/N0/SB5/P2/B0 22 2 1024MB pass 512MB 8-way 1
/N0/SB5/P3/B0 23 0 1024MB pass 512MB 8-way 1
/N0/SB5/P3/B0 23 2 1024MB pass 512MB 8-way 1
========================= IO Cards =========================
Bus Max
IO Port Bus Freq Bus Dev,
FRU Name Type ID Side Slot MHz Freq Func State Name Model
---------- ---- ---- ---- ---- ---- ---- ---- ----- -------------------------------- ----------------------
/N0/IB6/P0 PCI 24 B 0 33 33 1,0 ok pci-pci8086,b154.0/pci108e,1000 pci-bridge
/N0/IB6/P0 PCI 24 B 0 33 33 0,0 ok pci108e,1000-pci108e,1000.1
/N0/IB6/P0 PCI 24 B 0 33 33 0,1 ok SUNW,qfe-pci108e,1001 SUNW,pci-qfe
/N0/IB6/P0 PCI 24 B 0 33 33 1,0 ok pci108e,1000-pci108e,1000.1
/N0/IB6/P0 PCI 24 B 0 33 33 1,1 ok SUNW,qfe-pci108e,1001 SUNW,pci-qfe
/N0/IB6/P0 PCI 24 B 0 33 33 2,0 ok pci108e,1000-pci108e,1000.1
/N0/IB6/P0 PCI 24 B 0 33 33 2,1 ok SUNW,qfe-pci108e,1001 SUNW,pci-qfe
/N0/IB6/P0 PCI 24 B 0 33 33 3,0 ok pci108e,1000-pci108e,1000.1
/N0/IB6/P0 PCI 24 B 0 33 33 3,1 ok SUNW,qfe-pci108e,1001 SUNW,pci-qfe
/N0/IB6/P0 PCI 24 A 3 66 66 1,0 ok SUNW,qlc-pci1077,2300.1077.106.1+
/N0/IB6/P1 PCI 25 A 7 66 66 1,0 ok SUNW,qlc-pci1077,2300.1077.106.1+
/N0/IB7/P0 PCI 26 A 3 66 66 1,0 ok pci-pci8086,b154.0/network (netw+ pci-bridge
/N0/IB7/P0 PCI 26 A 3 66 66 0,0 ok network-pci108e,abba.20 SUNW,pci-ce
/N0/IB7/P0 PCI 26 A 3 66 66 1,0 ok network-pci108e,abba.20 SUNW,pci-ce
/N0/IB7/P0 PCI 26 A 3 66 66 2,0 ok scsi-pci1000,b.1000.1000.7/disk +
/N0/IB7/P0 PCI 26 A 3 66 66 2,1 ok scsi-pci1000,b.1000.1000.7/disk +
========================= Active Boards for Domain ===========================
Power Fault HotPlug Board
FRU Name LED LED LED Cond.
-------- ----- ----- ------- -------
/N0/SB3 on off off ok
/N0/SB5 on off off ok
/N0/IB6 on off off ok
/N0/IB7 on off off ok
/N0/IB8 on off off ok
/N0/IB9 on off off ok
========================= Available Boards/Slots for Domain ==================
Power Fault HotPlug Board/Slot Board/Slot
FRU Name LED LED LED Condition Assigned
-------- ----- ----- ------- ---------- ----------
/SB0 off off off Empty yes
/SB1 off off off Empty yes
/SB2 off off off Empty yes
/SB4 off off off Empty yes
========================= Hardware Failures ==================================
No Hardware failures found in System
========================= HW Revisions =======================================
System PROM revisions:
----------------------
OBP 5.17.1 05/10/04 14:55
IO ASIC revisions:
------------------
Port
FRU Name Model ID Status Version
----------- --------------- ---- ------ -------
/N0/IB6/P0 SUNW,schizo 24 ok 4
/N0/IB6/P1 SUNW,schizo 25 ok 4
/N0/IB7/P0 SUNW,schizo 26 ok 4
/N0/IB7/P1 SUNW,schizo 27 ok 4
/N0/IB8/P0 SUNW,schizo 28 ok 4
/N0/IB8/P1 SUNW,schizo 29 ok 4
/N0/IB9/P0 SUNW,schizo 30 ok 4
/N0/IB9/P1 SUNW,schizo 31 ok 4
/N0/IB6/P0 SUNW,sgsbbc 24 ok 2
/N0/IB7/P0 SUNW,sgsbbc 26 ok 2
/N0/IB8/P0 SUNW,sgsbbc 28 ok 2
/N0/IB9/P0 SUNW,sgsbbc 30 ok 2
上面为一台F6800系统的系统信息,从中可以看到系统中所有的配置情况:CPU、内存、I/O部件、硬件指示灯状态等。通过这些信息和messages日志文件的内容我们可以比较准确地定位问题。
硬盘检查工具
当硬盘出现问题时,一般我们可以在messages信息中看到提示,我们也可以使用format命令来比较全面地检查硬盘。
# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0t1d0
/pci@1f,4000/scsi@3/sd@1,0
1. c0t3d0
/pci@1f,4000/scsi@3/sd@3,0
Specify disk (enter its number): 0
selecting c0t1d0
[disk formatted]
FORMAT MENU:
disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
save - save new disk/partition definitions
inquiry - show vendor, product and revision
volname - set 8-character volume name
! - execute , then return
quit
format>
在format命令里选择硬盘后,我们可以看到很多对硬盘进行操作的命令:改变分区、对硬盘进行读写分析、察看错误块等。对硬盘的操作一般由系统管理员来完成,非超级用户不能运行format命令。
OBP参数设置
进入OBP环境就是一般我们所说的ok状态,此时操作系统尚未启动,我们可以对系统启动参数进行各种设置,包括设置自检类型、定位输入输出、选择启动设备、察看系统硬件等。
printenv是察看参数设置值,setenv是设置参数值,是OBP中最常用的两个命令。
SC(系统控制)是一项比较实用的功能,提供了一个包含较多实用命令的操作环境,可以对系统进行远程控制。但这项功能需要硬件的支持,需要系统中安装有ALOM或RSC卡,第一次配置还是使用串口,然后可以对远程控制卡配置IP地址,以后就可以通过网络TELNET登录进入系统控制环境了,可以实现远程开关机和系统操作的一系列功能,完全不需要人到现场。
阅读(1780) | 评论(0) | 转发(0) |