分类:
2009-01-05 13:33:22
文件:
RA8000断电重启后CACHE失效和分区表丢失的处理过程.pdf
大小:
184KB
下载:
下载
因需要更换UPS,机房内所有主机和存储均被关闭,一台ES40与一台GS160使用SCSI线连接RA8000做TruCluster集群,关闭RA8000存储是操作步骤不规范(未shutdown控制器,直接断电)导致加点后启动告警,主机端无法识别RA8000上的LUN,登录RA8000查看发现如下告警
HSZ80_TOP>show this Controller:
HSZ80 ZG10505171 Software V83Z-1, Hardware E06
NODE_ID = 5000-1FE1-000C-2020
ALLOCATION_CLASS = 0
SCSI_VERSION = SCSI-2
Configured for dual-redundancy with ZG05111402
In dual-redundant configuration
Device Port SCSI address 7
Time: NOT SET
Host PORT_1:
SCSI target(s) (0, 1, 2, 3)
Preferred target(s) (0, 1)
TRANSFER_RATE_REQUESTED = 20MHZ
Host Functionality Mode = A
Command Console LUN is target 0, lun 3
Host PORT_2:
No SCSI targets
No preferred targets
TRANSFER_RATE_REQUESTED = 20MHZ
Cache:
128 megabyte write cache, version 0022
Cache is INVALID. Cache containing unflushed data
has been removed from this controller
Unknown unflushed data in cache
CACHE_FLUSH_TIMER = 60 (seconds)
Mirrored Cache:
128 megabyte write cache, version 0022
Cache is INVALID. Cache containing unflushed data
has been removed from this controller
No unflushed data in cache
Battery:
MORE THAN 50% CHARGED
Expires: 02-APR-2010
NOCACHE_UPS
Previous controller operation terminated by power failure.
This controller has an invalid cache module
Invalid cache -- CLI command set reduced. Type SHOW THIS_CONTROLLER. Please-
see user guide to determine corrective action
查询相关资料得知,出现这种现象最常见的原因是关电时没有做Shutdown操作,当维持Cache数据的电池将电耗尽后,无论Cache有无数据,再次加电时,Controller就认为丢失数据,将CACHE状态置成INVALID_CACHE。其次就是更换Controller 或 Cache,使Cache变成Invalid_cache状态。 可以通过以下命令清除unflushed data:
CLEAR_ERRORS OTHER_CONTROLLER INVALID_CACHE DESTROY_UNFLUSHED_DATA
或 CLEAR_ERRORS THIS_CONTROLLER INVALID_CACHE NODESTROY_UNFLUSHED_DATA
Specify NODESTROY_UNFLUSHED_DATA if:
— The controller module is replaced.
— The controller nonvolatile memory (NVMEM) contents are lost.
Specify DESTROY_UNFLUSHED_DATA parameter if used to retain the
controller information and discard unwritten cache data in the following
situations:
— If the cache module is replaced.
— Any other reason not listed above.
执行命令分别清理两个Controller的CACHE:
HSZ80_BOT>CLEAR THIS_CONTROLLER INVALID_CACHE DESTROY_UNFLUSHED_DATA
Previous controller operation terminated by power failure.
HSZ80_BOT>SHOW THIS Controller:
HSZ80 ZG05111402 Software V83Z-1, Hardware E06
NODE_ID = 5000-1FE1-000C-2020
ALLOCATION_CLASS = 0
SCSI_VERSION = SCSI-2
Configured for dual-redundancy with ZG10505171
In dual-redundant configuration
Device Port SCSI address 6
Time: NOT SET
Host PORT_1:
SCSI target(s) (0, 1, 2, 3)
Preferred target(s) (2, 3)
TRANSFER_RATE_REQUESTED = 20MHZ
Host Functionality Mode = A
Command Console LUN is target 0, lun 3
Host PORT_2:
No SCSI targets
No preferred targets
TRANSFER_RATE_REQUESTED = 20MHZ
Cache:
128 megabyte write cache, version 0022
Cache is GOOD
No unflushed data in cache
CACHE_FLUSH_TIMER = 60 (seconds)
Mirrored Cache:
128 megabyte write cache, version 0022
Cache is GOOD
No unflushed data in cache
Battery:
MORE THAN 50% CHARGED
Expires: 02-APR-2010
NOCACHE_UPS
Cache即处于Good状态,此时就可以查看阵列里RAID的信息了,主机端可以认到盘但是会报错,无法正常使用,日志类似:
CPU 0 booting
pga0.0.0.5.1 Link is down.
(boot dka1.0.0.4.1 -flags A)
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch
retries to pka0.0.0.4.1 exhausted(cmd-0 sts-2)
failed to open dka1.0.0.4.1
P00>>>
继续检查RA8000中RAID状态发现:
HSZ80_BOT>SHOW STORAGES
FULL Name Storageset Uses Used by
------------------------------------------------------------------------------
S1 stripeset M1 D0
M2 D1
D2
D4
Switches:
CHUNKSIZE = NOT YET KNOWN
State:
INOPERATIVE
M1, no information available
M2, no information available
Size: NOT YET KNOWN
M1 mirrorset DISK10300 S1
DISK30200
Switches:
POLICY (for replacement) = BEST_PERFORMANCE
COPY (priority) = NORMAL
READ_SOURCE = LEAST_BUSY
MEMBERSHIP = 2, 2 members present
NODT_SUPPORT
State:
NORMAL
DISK10300 (member 0) is NORMAL
DISK30200 (member 1) is NORMAL
Size: NOT YET KNOWN
M2 mirrorset DISK20300 S1
DISK40200
Switches:
POLICY (for replacement) = BEST_PERFORMANCE
COPY (priority) = NORMAL
READ_SOURCE = LEAST_BUSY
MEMBERSHIP = 2, 2 members present
NODT_SUPPORT
State:
NORMAL
DISK40200 (member 0) is NORMAL
DISK20300 (member 1) is NORMAL
Size: NOT YET KNOWN
R1 raidset DISK10000 D200
DISK10200
DISK20200
DISK30000
DISK40000
DISK50000
DISK50200
DISK60000
DISK60100
DISK60200
Switches:
POLICY (for replacement) = BEST_PERFORMANCE
RECONSTRUCT (priority) = NORMAL
CHUNKSIZE = 128 blocks
State:
RECONSTRUCT 0% complete
DISK10000 (member 0) is RECONSTRUCTING 0% complete
DISK60100 (member 1) is RECONSTRUCTING 0% complete
DISK30000 (member 2) is RECONSTRUCTING 0% complete
DISK40000 (member 3) is RECONSTRUCTING 0% complete
DISK50000 (member 4) is RECONSTRUCTING 0% complete
DISK60000 (member 5) is RECONSTRUCTING 0% complete
DISK50200 (member 6) is RECONSTRUCTING 0% complete
DISK60200 (member 7) is RECONSTRUCTING 0% complete
DISK10200 (member 8) is RECONSTRUCTING 0% complete
DISK20200 (member 9) is RECONSTRUCTING 0% complete
Size: NOT YET KNOWN
R2 raidset DISK10100 D201
DISK20100
DISK30100
DISK30300
DISK40100
DISK40300
DISK50100
DISK50300
DISK60300
Switches:
POLICY (for replacement) = BEST_PERFORMANCE
RECONSTRUCT (priority) = NORMAL
CHUNKSIZE = 256 blocks
State:
RECONSTRUCT 0% complete
DISK10100 (member 0) is RECONSTRUCTING 0% complete
DISK20100 (member 1) is RECONSTRUCTING 0% complete
DISK30100 (member 2) is RECONSTRUCTING 0% complete
DISK40100 (member 3) is RECONSTRUCTING 0% complete
DISK30300 (member 4) is RECONSTRUCTING 0% complete
DISK50100 (member 5) is RECONSTRUCTING 0% complete
DISK40300 (member 6) is RECONSTRUCTING 0% complete
DISK50300 (member 7) is RECONSTRUCTING 0% complete
DISK60300 (member 8) is RECONSTRUCTING 0% complete
Size: NOT YET KNOWN
SPARESET spareset DISK20000
FAILEDSET failedset
Switches:
NOAUTOSPARE
HSZ80_BOT>SHOW UNIT LUN Uses
--------------------------------------------------------------
D0 S1 (partition)
D1 S1 (partition)
D2 S1 (partition)
D4 S1 (partition)
D200 R1
D201 R2
HSZ80_BOT>SHOW UNIT FULL LUN Uses
--------------------------------------------------------------
D0 S1 (partition)
LUN ID: 6000-1FE1-000C-2020-0009-1050-5171-0087
Switches:
RUN NOWRITE_PROTECT READ_CACHE
READAHEAD_CACHE WRITEBACK_CACHE
MAXIMUM_CACHED_TRANSFER_SIZE = 32
Access:
Access path = ALL
State:
INOPERATIVE
Unit has lost data
PREFERRED_PATH = OTHER_CONTROLLER
WRITE_PROTECT - DATA SAFETY
Size: NOT YET KNOWN
Geometry (C/H/S): NOT YET KNOWN
D1 S1 (partition)
LUN ID: 6000-1FE1-000C-2020-0009-1050-5171-0088
Switches:
RUN NOWRITE_PROTECT READ_CACHE
READAHEAD_CACHE WRITEBACK_CACHE
MAXIMUM_CACHED_TRANSFER_SIZE = 32
Access:
Access path = ALL
State:
INOPERATIVE
Unit has lost data
PREFERRED_PATH = OTHER_CONTROLLER
WRITE_PROTECT - DATA SAFETY
Size: NOT YET KNOWN
Geometry (C/H/S): NOT YET KNOWN
D2 S1 (partition)
LUN ID: 6000-1FE1-000C-2020-0009-1050-5171-0089
Switches:
RUN NOWRITE_PROTECT READ_CACHE
READAHEAD_CACHE WRITEBACK_CACHE
MAXIMUM_CACHED_TRANSFER_SIZE = 32
Access:
Access path = ALL
State:
INOPERATIVE
Unit has lost data
PREFERRED_PATH = OTHER_CONTROLLER
WRITE_PROTECT - DATA SAFETY
Size: NOT YET KNOWN
Geometry (C/H/S): NOT YET KNOWN
D4 S1 (partition)
LUN ID: 6000-1FE1-000C-2020-0009-1050-5171-0086
Switches:
RUN NOWRITE_PROTECT READ_CACHE
READAHEAD_CACHE WRITEBACK_CACHE
MAXIMUM_CACHED_TRANSFER_SIZE = 32
Access:
Access path = ALL
State:
INOPERATIVE
Unit has lost data
PREFERRED_PATH = OTHER_CONTROLLER
WRITE_PROTECT - DATA SAFETY
Size: NOT YET KNOWN
Geometry (C/H/S): NOT YET KNOWN
D200 R1
LUN ID: 6000-1FE1-000C-2020-0009-1050-5171-008A
Switches:
RUN NOWRITE_PROTECT READ_CACHE
READAHEAD_CACHE WRITEBACK_CACHE
MAXIMUM_CACHED_TRANSFER_SIZE = 32
Access:
Access path = ALL
State:
INOPERATIVE
Unit has lost data
PREFERRED_PATH = THIS_CONTROLLER
WRITE_PROTECT - DATA SAFETY
Size: NOT YET KNOWN
Geometry (C/H/S): NOT YET KNOWN
D201 R2
LUN ID: 6000-1FE1-000C-2020-0009-1050-5171-0095
Switches:
RUN NOWRITE_PROTECT READ_CACHE
READAHEAD_CACHE WRITEBACK_CACHE
MAXIMUM_CACHED_TRANSFER_SIZE = 32
Access:
Access path = ALL
State:
INOPERATIVE
Unit has lost data
PREFERRED_PATH = THIS_CONTROLLER
WRITE_PROTECT - DATA SAFETY
Size: NOT YET KNOWN
Geometry (C/H/S): NOT YET KNOWN