在CLARiiON阵列出现两块或以上数量硬盘故障时，不要轻易更换任何一块硬盘（emc93178)-conqueryou-ChinaUnix博客

ChanKing83

首页　| 　博文目录　| 　关于我

conqueryou

博客访问： 58052
博文数量： 6
博客积分： 73
博客等级：民兵
技术积分： 72
用户组：普通用户
注册时间： 2010-07-02 11:39

文章分类

全部博文（6）

文章存档

2015年（6）

我的朋友

manshukw

分类：服务器与存储

2015-08-18 22:44:05

当处理CLARiiON阵列出现的两块或者以上数量的故障硬盘时，在没有咨询LRS或者TS2(二线工程师）的情况下，不要进行硬盘物理更换。如果两个或者更多的故障硬盘同属于一个RAID组，该RAID组中的数据可能对于所连接的服务器不可用，直到RAID组中不再出现两个或者以上数量的硬盘故障。参考下面的流程：

1.)在两个SP上运行SPcollect；
    2.)分析来自两个SP上的日志文件，找出问题的原因；
    3.)手工查找日志定位原因或者运行SPLAT.查看文章 6013（"How to filter a CLARiiON storage processor [SP] event log using the SP log analysis tool [SPLAT]""如何使用SP日志分析工具[SPLAT]过滤CLARiiON 时间日志")

  如果同一个RAID组中还有其它一块或者多块硬盘故障，或者重建中，一定不要更换当前硬盘。可以使用Navisphere Manager或者Navisphere CLI命令行验证RAID组中其它磁盘的当前状态。如果待更换的硬盘所在的RIAD组中还有其它故障或者重建中的硬盘，在更换前一定要咨询LRS或者TS2技术人员。重新插拔故障硬盘作为一种补救措施可能奏效，可以临时清除某些（或者全部）故障状态，这样可能允许双硬盘故障的RAID组恢复可用状态，使用户摆脱当前数据不可用(DU:data unavailable)的窘境。重新插拔硬盘也可能无法清除故障状态（如果确实是硬件故障，那么还是会处于故障状态）并且可能导致重构发生，所以在你无法确认插拔硬盘可能导致的后果的情况下，先咨询一下LRS或者TS2工程师。重新插拔是一种临时的解决方案，可能将硬盘重新上线，恢复RAID组中数据的可用性。进一步的排错定位根本原因并且防止问题再次发生还是必要的。确认你熟悉KB28746文章中的内容（CLARiiON CX-Series Troubleshooting Tree: Multiple Drives are Faulted on the Array：CLARiiON CX系列排错树：阵列中多块硬盘故障），这块内容可以指导你在这种情形下合理地进行故障排错。除此之外，KB24233这篇文章会告诉你,在由于多块硬盘故障导致数据不可用的情形下，如何恰当地处理双硬盘故障的RAID组。如果还有在这些情形下如何进行操作的任何疑问，可以立刻将问题升级给你的LRS或者TS2工程师。

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Impact	Do not physically replace any drives while troubleshooting a CLARiiON array with two or more drives faulted

Issue	Troubleshooting a CLARiiON array with two or more drives faulted

Environment	Product: CLARiiON CX Series

Resolution	When troubleshooting a CLARiiON array with two or more drives faulted, no drives should be physically replaced without consulting with your LRS or Technical Support Level 2 (TS2). If two or more of the faulted drives are members of the same RAID group, data from within that RAID group may be unavailable to connected hosts until the RAID group becomes no longer double faulted. Follow this procedure: Run SPcollect on both Storage Processors (SPs). Analyze log files from both SPs to look for the cause of the problem Manually search the logs for cause or run SPLAT. See solution 6013 ("How to filter a CLARiiON storage processor [SP] event log using the SP log analysis tool [SPLAT]"). You should never replace a disk on a CLARiiON array if one or more other disks in the same RAID group are faulted, or rebuilding. You can use Navisphere Manager or Navisphere CLI to verify the status of the other disks in the RAID group. Consult with an LRS or TS2 before attempting to replace a disk if that disk?s RAID group contains another faulted or rebuilding disk. Reseating the faulted drives may work as a remedial action to temporarily clear some (or all) of the faults and possibly allow the double faulted RAID group to come back available, thus bringing the customer out of the data unavailable (DU) event. Reseating may not clear every fault (real hardware faults will remain faulted) and may incur rebuilds, so consult with an LRS or TS2 before reseating a drive if you are uncertain about the possible ramifications of reseating drives. Reseating is a temporary workaround that may bring a disk back online and that may restore connectivity to data in a RAID group. Further troubleshooting to determine root cause and to prevent recurrence is necessary. Make sure that you are familiar with solution 28746 (CLARiiON CX-Series Troubleshooting Tree: Multiple Drives are Faulted on the Array), which can help guide you through properly troubleshooting an array in this condition. In addition, solution 24233 tells you how to appropriately deal with double-faulted RAID groups where data has become unavailable due to multiple drive failures. If there are any questions at all about how you should proceed in these situations, escalate to your LRS or to TS2 immediately.

Product	CLARiiON CX Series, CLARiiON, CLARiiON FC

Requested Publish Date	5/25/2013 1:34 PM

External Source	Primus

Primus/Webtop solution ID	emc93178

阅读(3173) | 评论(0) | 转发(0) |

上一篇：EMC SolVe Desktop CX3-40_20_10 SPE内存模块更换步骤文档释出

下一篇：没有了

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6