最近处理了一例比较诡异的故障,本来是很简单的事情,就是更换cell板的电源模块,计划也就是一个小时搞定(包括业务启停),但当partition掉电之后居然无法启动了,通过VFP可以看到当所有cell板自检完成之后准备加入到分区时突然就reset了。查看sel日志,发现有MEM_DIMM_FAILED,通过decode代码定位出对应的内存条(7a 7b最后两条),交叉测试确定内存损坏,事实上内存损坏不足以导致分区过不去BIB,最多内存被deconfig掉,(当然如果坏的是0a 0b这种也有可能会起不来)
清空sel后重启partiton,发现有如下报错:
SFW 0,1,0 *5 ae800c8710e00e03 0000000000000005 PD_INCOMPATIBLE_FW_REVS
代码描述:
The cell indicated in the data field is at a different firmware revision than the reporting cell. This is determined by evaluating the checksums of the 2 rom images.
The reporting cell is at a different firmware revision than the cell reported in the data field. A PD cannot be established. Please reprogram the 2 cells to the same firmware revision
解决办法:重新刷cell板FW,使所有cell板统一。
-------------------------------------------分析-----------------------------
1.重启后
| Active | Inactive | Active | Inactive |
Cab,Slot | SYS FW | SYS FW | PDHC | PDHC |
---------+-----------------+-----------------+----------------+----------------|
0, 0 | 009.048.000 i | 009.048.000 i | 023.003.040 | 023.003.040 |
0, 1 | 009.048.000 i | 009.066.000 i | 023.003.040 | 023.003.040 |
0, 2 | 009.048.000 i | 009.048.000 i | 023.003.040 | 023.003.040 |
0, 3 | 009.048.000 i | 009.048.000 i | 023.003.040 | 023.003.040 |
0, 4 | 009.048.000 i | 009.048.000 i | 023.003.040 | 023.003.040 |
0, 5 | 009.066.000 i | 009.048.000 i | 023.003.040 | 023.003.040 |
0, 6 | 009.048.000 i | 009.048.000 i | 023.003.040 | 023.003.040 |
0, 7 | 009.048.000 i | 009.022.000 i | 023.003.040 | 023.003.040 |
可以看到cell 5有两份FW,重启之后系统读取了第二份FW,按常理来说设备重启不应该会导致cell板两份FW 的主备份关系互换!
这里有两个问题需要思考:
1.两份FW分别存在什么地方?
2.重启的时候cell通过什么机制来选择读取那一份FW?
2.重刷FW的方法,其实sx2000和RX86在这方面一样简单,都可以通过复制其他cell板FW的方式来实现。
partiton BIB
[mp03] MP> fw
Welcome to the Firmware Update Utility
(Use to return to main menu.)
Choose firmware image source:
0) FTP Server
1) Installed firmware image (Duplicate)
Q) Quit
Please enter selection ([0], 1, Q): 1
Firmware source entity types available for update:
Entity Name Entity Type
------------------------- -----------
0) PDHC PDHC
1) IPF System Firmware IPF
2) PDC System Firmware PDC
Q) Quit
Please enter selection (ex. [0], 1, 2, Q): 1
IPF_FW Firmware entity(s) available for duplication:
Firmware
Physical Location Revision
---------------------------------------- --------------
0) IPF_FW Cabinet 0, Cell 0 009.048.000
1) IPF_FW Cabinet 0, Cell 1 009.048.000
2) IPF_FW Cabinet 0, Cell 2 009.048.000
3) IPF_FW Cabinet 0, Cell 3 009.048.000
4) IPF_FW Cabinet 0, Cell 4 009.048.000
5) IPF_FW Cabinet 0, Cell 5 009.066.000
6) IPF_FW Cabinet 0, Cell 6 009.048.000
7) IPF_FW Cabinet 0, Cell 7 009.048.000
Q) Quit
Please enter selection: 1
You have selected:
IPF_FW Cabinet 0, Cell 1
IPF_FW Firmware entity(s) available for update:
Firmware
Physical Location Revision
---------------------------------------- --------------
0) IPF_FW Cabinet 0, Cell 0 009.048.000
1) IPF_FW Cabinet 0, Cell 2 009.048.000
2) IPF_FW Cabinet 0, Cell 3 009.048.000
3) IPF_FW Cabinet 0, Cell 4 009.048.000
4) IPF_FW Cabinet 0, Cell 5 009.066.000
5) IPF_FW Cabinet 0, Cell 6 009.048.000
6) IPF_FW Cabinet 0, Cell 7 009.048.000
R) Refresh entity list
A) All
P) Partition
Q) Quit
Please enter selection: 4
You have selected:
IPF_FW Cabinet 0, Cell 5
Begin update? (Y/[N]) y
Checking update state of IPF_FW in Cabinet 0 Cell 5 .......
WARNING: Erasing the flash can take up to 4 minutes, please wait!
Updating IPF_FW to version 009.048.000
Percent Complete
0..........25..........50..........75..........100
Physical Location New Firmware Revision Update Status
-------------------------------------- ------------------------ -------------
IPF_FW Cabinet 0, Cell 5 009.048.000 PASSED
Reset entity(s) with update status PASSED ([Y]/N) ? y
Successful reset of IPF_FW Cabinet 0, Cell 5
注意:
1.partition 要处在BIB状态,可以用RR来设置。
2.要正确选择源cell板和目标cell板,特别是在第二次输入目标cell的时候cell板已经拿掉一块编号就不一样了(希望明白我说什么,呵呵)
3. 刷的新FW确定可用的情况下,最好两份ROM都刷成一样的。
3. 如果你改变了cell板的内存配置(添加或者拔掉内存)在启动EFI的时候会出现报错,自动重启就可用过了。
Firmware detected a configuration mismatch. The partition will be reset.
阅读(3922) | 评论(0) | 转发(0) |