Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1011418
  • 博文数量: 132
  • 博客积分: 14065
  • 博客等级: 上将
  • 技术积分: 1270
  • 用 户 组: 普通用户
  • 注册时间: 2006-05-06 16:08
文章分类

全部博文(132)

文章存档

2009年(2)

2008年(58)

2007年(14)

2006年(58)

我的朋友

分类:

2009-01-05 13:33:22

文件: RA8000断电重启后CACHE失效和分区表丢失的处理过程.pdf
大小: 184KB
下载: 下载

因需要更换UPS,机房内所有主机和存储均被关闭,一台ES40与一台GS160使用SCSI线连接RA8000TruCluster集群,关闭RA8000存储是操作步骤不规范(shutdown控制器,直接断电)导致加点后启动告警,主机端无法识别RA8000上的LUN,登录RA8000查看发现如下告警

 

HSZ80_TOP>show this Controller:

        HSZ80 ZG10505171 Software V83Z-1, Hardware  E06

        NODE_ID          = 5000-1FE1-000C-2020

        ALLOCATION_CLASS = 0

        SCSI_VERSION     = SCSI-2

        Configured for dual-redundancy with ZG05111402

            In dual-redundant configuration

        Device Port SCSI address 7

        Time: NOT SET

Host PORT_1:

        SCSI target(s) (0, 1, 2, 3)

        Preferred target(s) (0, 1)

        TRANSFER_RATE_REQUESTED = 20MHZ

        Host Functionality Mode = A

        Command Console LUN is target 0, lun 3

Host PORT_2:

        No SCSI targets

        No preferred targets

        TRANSFER_RATE_REQUESTED = 20MHZ

Cache:

        128 megabyte write cache, version 0022

        Cache is INVALID.  Cache containing unflushed data

         has been removed from this controller

        Unknown unflushed data in cache

        CACHE_FLUSH_TIMER = 60 (seconds)

Mirrored Cache:

        128 megabyte write cache, version 0022

        Cache is INVALID.  Cache containing unflushed data

         has been removed from this controller

        No unflushed data in cache

Battery:

        MORE THAN 50% CHARGED

        Expires:             02-APR-2010

        NOCACHE_UPS

Previous controller operation terminated by power failure.

This controller has an invalid cache module

Invalid cache -- CLI command set reduced.  Type SHOW THIS_CONTROLLER. Please-

see user guide to determine corrective action

 

查询相关资料得知,出现这种现象最常见的原因是关电时没有做Shutdown操作,当维持Cache数据的电池将电耗尽后,无论Cache有无数据,再次加电时,Controller就认为丢失数据,将CACHE状态置成INVALID_CACHE。其次就是更换Controller Cache,使Cache变成Invalid_cache状态。 可以通过以下命令清除unflushed data:

 

CLEAR_ERRORS OTHER_CONTROLLER INVALID_CACHE DESTROY_UNFLUSHED_DATA

CLEAR_ERRORS THIS_CONTROLLER INVALID_CACHE NODESTROY_UNFLUSHED_DATA

Specify NODESTROY_UNFLUSHED_DATA if:

The controller module is replaced.

The controller nonvolatile memory (NVMEM) contents are lost.

Specify DESTROY_UNFLUSHED_DATA parameter if used to retain the

controller information and discard unwritten cache data in the following

situations:

If the cache module is replaced.

Any other reason not listed above.

 

执行命令分别清理两个ControllerCACHE

 

HSZ80_BOT>CLEAR THIS_CONTROLLER INVALID_CACHE DESTROY_UNFLUSHED_DATA

 Previous controller operation terminated by power failure.

HSZ80_BOT>SHOW THIS Controller:

        HSZ80 ZG05111402 Software V83Z-1, Hardware  E06

        NODE_ID          = 5000-1FE1-000C-2020

        ALLOCATION_CLASS = 0

        SCSI_VERSION     = SCSI-2

        Configured for dual-redundancy with ZG10505171

            In dual-redundant configuration

        Device Port SCSI address 6

        Time: NOT SET

Host PORT_1:

        SCSI target(s) (0, 1, 2, 3)

        Preferred target(s) (2, 3)

        TRANSFER_RATE_REQUESTED = 20MHZ

        Host Functionality Mode = A

        Command Console LUN is target 0, lun 3

Host PORT_2:

        No SCSI targets

        No preferred targets

        TRANSFER_RATE_REQUESTED = 20MHZ

Cache:

        128 megabyte write cache, version 0022

        Cache is GOOD

        No unflushed data in cache

        CACHE_FLUSH_TIMER = 60 (seconds)

Mirrored Cache:

        128 megabyte write cache, version 0022

        Cache is GOOD

        No unflushed data in cache

Battery:

        MORE THAN 50% CHARGED

        Expires:             02-APR-2010

        NOCACHE_UPS

 

Cache即处于Good状态,此时就可以查看阵列里RAID的信息了,主机端可以认到盘但是会报错,无法正常使用,日志类似:

CPU 0 booting

 

pga0.0.0.5.1 Link is down.

(boot dka1.0.0.4.1 -flags A)

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

dka1.0.0.4.1 has no media present or is disabled via the RUN/STOP switch

retries to pka0.0.0.4.1 exhausted(cmd-0 sts-2)

failed to open dka1.0.0.4.1

P00>>>

 

继续检查RA8000RAID状态发现:

 

HSZ80_BOT>SHOW STORAGES

 FULL Name          Storageset                     Uses             Used by

------------------------------------------------------------------------------

 

S1            stripeset                      M1               D0

                                             M2               D1

                                                              D2

                                                              D4

        Switches:

          CHUNKSIZE = NOT YET KNOWN

        State:

          INOPERATIVE

          M1, no information available

          M2, no information available

        Size: NOT YET KNOWN

 

M1            mirrorset                      DISK10300        S1

                                             DISK30200       

        Switches:

          POLICY (for replacement) = BEST_PERFORMANCE

          COPY (priority) = NORMAL

          READ_SOURCE = LEAST_BUSY

          MEMBERSHIP = 2, 2 members present

          NODT_SUPPORT

        State:

          NORMAL

          DISK10300 (member  0) is NORMAL

          DISK30200 (member  1) is NORMAL

        Size: NOT YET KNOWN

 

M2            mirrorset                      DISK20300        S1

                                             DISK40200       

        Switches:

          POLICY (for replacement) = BEST_PERFORMANCE

          COPY (priority) = NORMAL

          READ_SOURCE = LEAST_BUSY

          MEMBERSHIP = 2, 2 members present

          NODT_SUPPORT

        State:

          NORMAL

          DISK40200 (member  0) is NORMAL

          DISK20300 (member  1) is NORMAL

        Size: NOT YET KNOWN

 

R1            raidset                        DISK10000        D200

                                             DISK10200       

                                             DISK20200       

                                             DISK30000       

                                             DISK40000       

                                             DISK50000       

                                             DISK50200       

                                             DISK60000       

                                             DISK60100       

                                             DISK60200       

        Switches:

          POLICY (for replacement) = BEST_PERFORMANCE

          RECONSTRUCT (priority) = NORMAL

          CHUNKSIZE = 128 blocks

        State:

          RECONSTRUCT 0% complete

          DISK10000 (member  0) is RECONSTRUCTING   0% complete

          DISK60100 (member  1) is RECONSTRUCTING   0% complete

          DISK30000 (member  2) is RECONSTRUCTING   0% complete

          DISK40000 (member  3) is RECONSTRUCTING   0% complete

          DISK50000 (member  4) is RECONSTRUCTING   0% complete

          DISK60000 (member  5) is RECONSTRUCTING   0% complete

          DISK50200 (member  6) is RECONSTRUCTING   0% complete

          DISK60200 (member  7) is RECONSTRUCTING   0% complete

          DISK10200 (member  8) is RECONSTRUCTING   0% complete

          DISK20200 (member  9) is RECONSTRUCTING   0% complete

        Size: NOT YET KNOWN

 

R2            raidset                        DISK10100        D201

                                             DISK20100       

                                             DISK30100       

                                             DISK30300       

                                             DISK40100       

                                             DISK40300       

                                             DISK50100       

                                             DISK50300       

                                             DISK60300       

        Switches:

          POLICY (for replacement) = BEST_PERFORMANCE

          RECONSTRUCT (priority) = NORMAL

          CHUNKSIZE = 256 blocks

        State:

          RECONSTRUCT 0% complete

          DISK10100 (member  0) is RECONSTRUCTING   0% complete

          DISK20100 (member  1) is RECONSTRUCTING   0% complete

          DISK30100 (member  2) is RECONSTRUCTING   0% complete

          DISK40100 (member  3) is RECONSTRUCTING   0% complete

          DISK30300 (member  4) is RECONSTRUCTING   0% complete

          DISK50100 (member  5) is RECONSTRUCTING   0% complete

          DISK40300 (member  6) is RECONSTRUCTING   0% complete

          DISK50300 (member  7) is RECONSTRUCTING   0% complete

          DISK60300 (member  8) is RECONSTRUCTING   0% complete

        Size: NOT YET KNOWN

 

SPARESET      spareset                       DISK20000       

 

FAILEDSET     failedset                                      

        Switches:

          NOAUTOSPARE

HSZ80_BOT>SHOW UNIT     LUN                                      Uses

--------------------------------------------------------------

 

  D0                                         S1           (partition)

  D1                                         S1           (partition)

  D2                                         S1           (partition)

  D4                                         S1           (partition)

  D200                                       R1

  D201                                       R2

HSZ80_BOT>SHOW UNIT FULL     LUN                                      Uses

--------------------------------------------------------------

 

  D0                                         S1           (partition)

        LUN ID:      6000-1FE1-000C-2020-0009-1050-5171-0087

        Switches:

          RUN                    NOWRITE_PROTECT        READ_CACHE           

          READAHEAD_CACHE        WRITEBACK_CACHE      

          MAXIMUM_CACHED_TRANSFER_SIZE = 32

        Access:

          Access path = ALL

        State:

          INOPERATIVE

          Unit has lost data

          PREFERRED_PATH = OTHER_CONTROLLER

          WRITE_PROTECT - DATA SAFETY

        Size: NOT YET KNOWN

        Geometry (C/H/S): NOT YET KNOWN

  D1                                         S1           (partition)

        LUN ID:      6000-1FE1-000C-2020-0009-1050-5171-0088

        Switches:

          RUN                    NOWRITE_PROTECT        READ_CACHE           

          READAHEAD_CACHE        WRITEBACK_CACHE      

          MAXIMUM_CACHED_TRANSFER_SIZE = 32

        Access:

          Access path = ALL

        State:

          INOPERATIVE

          Unit has lost data

          PREFERRED_PATH = OTHER_CONTROLLER

          WRITE_PROTECT - DATA SAFETY

        Size: NOT YET KNOWN

        Geometry (C/H/S): NOT YET KNOWN

  D2                                         S1           (partition)

        LUN ID:      6000-1FE1-000C-2020-0009-1050-5171-0089

        Switches:

          RUN                    NOWRITE_PROTECT        READ_CACHE           

          READAHEAD_CACHE        WRITEBACK_CACHE      

          MAXIMUM_CACHED_TRANSFER_SIZE = 32

        Access:

          Access path = ALL

        State:

          INOPERATIVE

          Unit has lost data

          PREFERRED_PATH = OTHER_CONTROLLER

          WRITE_PROTECT - DATA SAFETY

        Size: NOT YET KNOWN

        Geometry (C/H/S): NOT YET KNOWN

  D4                                         S1           (partition)

        LUN ID:      6000-1FE1-000C-2020-0009-1050-5171-0086

        Switches:

          RUN                    NOWRITE_PROTECT        READ_CACHE           

          READAHEAD_CACHE        WRITEBACK_CACHE      

          MAXIMUM_CACHED_TRANSFER_SIZE = 32

        Access:

          Access path = ALL

        State:

          INOPERATIVE

          Unit has lost data

          PREFERRED_PATH = OTHER_CONTROLLER

          WRITE_PROTECT - DATA SAFETY

        Size: NOT YET KNOWN

        Geometry (C/H/S): NOT YET KNOWN

  D200                                       R1

        LUN ID:      6000-1FE1-000C-2020-0009-1050-5171-008A

        Switches:

          RUN                    NOWRITE_PROTECT        READ_CACHE           

          READAHEAD_CACHE        WRITEBACK_CACHE      

          MAXIMUM_CACHED_TRANSFER_SIZE = 32

        Access:

          Access path = ALL

        State:

          INOPERATIVE

          Unit has lost data

          PREFERRED_PATH = THIS_CONTROLLER

          WRITE_PROTECT - DATA SAFETY

        Size: NOT YET KNOWN

        Geometry (C/H/S): NOT YET KNOWN

  D201                                       R2

        LUN ID:      6000-1FE1-000C-2020-0009-1050-5171-0095

        Switches:

          RUN                    NOWRITE_PROTECT        READ_CACHE           

          READAHEAD_CACHE        WRITEBACK_CACHE      

          MAXIMUM_CACHED_TRANSFER_SIZE = 32

        Access:

          Access path = ALL

        State:

          INOPERATIVE

          Unit has lost data

          PREFERRED_PATH = THIS_CONTROLLER

          WRITE_PROTECT - DATA SAFETY

        Size: NOT YET KNOWN

        Geometry (C/H/S): NOT YET KNOWN

 

阅读(22152) | 评论(4) | 转发(0) |
给主人留下些什么吧!~~

chinaunix网友2009-01-05 13:37:40

晕倒里评论太乱了 加附件~~~

chinaunix网友2009-01-05 13:37:40

晕倒里评论太乱了 加附件~~~

chinaunix网友2009-01-05 13:34:55

主机能正常识别磁盘但是无法正常使用的原因是由于UNIT处于LOST_DATA状态引起的。此时UNIT的容量也无法显示,UNIT也就是主机看见的磁盘不能被操作,解决方法是用命令:CLEAR_ERRORS unit-number LOST_DATA 清除错误,对于RAID5中的UNIT,清除LOST_DATA后会重新同步;MIRROR清除LOST_DATA后不重新同步。 例: HSZ80_TOP> clear_errors D201 LOST_DATA Previous controller operation terminated by power failure. HSZ80_TOP>SHOW D201 LUN Uses -------------------------------------------------------------- D201 R2

chinaunix网友2009-01-05 13:34:55

主机能正常识别磁盘但是无法正常使用的原因是由于UNIT处于LOST_DATA状态引起的。此时UNIT的容量也无法显示,UNIT也就是主机看见的磁盘不能被操作,解决方法是用命令:CLEAR_ERRORS unit-number LOST_DATA 清除错误,对于RAID5中的UNIT,清除LOST_DATA后会重新同步;MIRROR清除LOST_DATA后不重新同步。 例: HSZ80_TOP> clear_errors D201 LOST_DATA Previous controller operation terminated by power failure. HSZ80_TOP>SHOW D201 LUN Uses -------------------------------------------------------------- D201 R2