Chinaunix首页 | 论坛 | 博客
  • 博客访问: 661798
  • 博文数量: 291
  • 博客积分: 10025
  • 博客等级: 上将
  • 技术积分: 2400
  • 用 户 组: 普通用户
  • 注册时间: 2004-12-04 12:04
文章分类

全部博文(291)

文章存档

2008年(102)

2007年(112)

2006年(75)

2004年(2)

我的朋友

分类: 服务器与存储

2006-11-02 06:32:32

 这两天针对EMChotspare做了大量的测试,起因是因为用户处的一块系统盘失效导致global write cachedisable, 要知道,write cache被禁止后,会使I/O性能下降很多。这时,一些I/O比较频繁的应用会表现为速度下降,响应慢。用vmstat观察发现idle基本为10%-20%, 显示I/O busy, wait也会变为非0,在用户处为2-5

其实,用户处配置了36, 73, 146Ghotspare盘各一快,而系统盘(5块,disk0-disk5)76G。但即使是hotspare已经完全顶替了损坏的系统盘,这时write cache仍然被禁止。由于备件未到,最终采取了将146Ghotspare盘,直接插入损坏的系统盘的槽位。

以上的过程,我们得出结论(针对用户的环境)

1)     write cache被禁止,系统地性能会急剧下降

2)     如果是系统盘损坏,无论有无hotspare盘,write cache都会被disable

3)     一旦系统盘恢复正常后(数据同步完成后,73G的盘通常需要1小时)write cache才会自动地enable

4)     如果是非系统盘损坏,write cache不会被disable

5)     容量大的hotspare可以顶替小容量的坏盘

6)     如果SPS1个失效,也会将write cache disable.

 

在我们的CX500测试环境中,也作了同样的测试。发现大部分测试结果都一样,但是第3)点有所不同,结论是:

一旦系统盘开始同步(而非等待完成), write cache就可以恢复成enable的状态,这大大的减少了write cachedisable的时间,也正是我们需要的行为。

那么,这也许是CX600的一个bug, CX500中已经修正? 抑或是通过升级微码就可以避免这个问题呢?

以下是从EMC的手册中摘抄的与hotspare相关的一些说明:

Hot spare - A single global spare disk, that serves as a temporary replacement for a failed disk in a RAID 5, 3, 1, or 1/0 LUN. Data from the failed disk is reconstructed automatically on the hot spare. It is reconstructed from the parity data or mirrored data on the working disks in the LUN; therefore, the data on the LUN is always accessible. A hot spare LUN cannot belong to a storage group

 

 

RAID type

Number of disks you can use

RAID 5

3 - 16

RAID 3

5 or 9 (CX-series)

RAID 1/0

2, 4, 6, 8, 10, 12, 14, 16

RAID 1

2

RAID 0

3 - 16

Disk

1

Hot spare

1

 

Note: If you have LUNs consisting of FC drives, allocate an FC drive as a hot

spare.If you have LUNs consisting of ATA drives, allocate an ATA drive as a

hot spare.

 

 

Rebuild priority The rebuild priority is the relative importance of reconstructing data on either a hot spare or a new disk that replaces a failed disk in a LUN. It determines the amount of resource the SP devotes to rebuilding instead of to normal I/O activity. Table 8-3 lists and describes the rebuild time associated with each rebuild value.

 

Value

Target rebuild time in hours

ASAP

0 (as quickly as possible) This is default.

HIGH

6

MEDIUM

12

LOW

18

 

 

The rebuild priorities correspond to the target times listed above. The storage system attempts to rebuild the LUN in the target time or less. The actual time to rebuild the LUN depends on the I/O workload, the LUN size, and the LUN RAID type. For a RAID group with multiple LUNs, the highest priority specified for any LUN in the group is used for all LUNs on the group.

Rebuilding a RAID 5, 3, 1, or 1/0 LUN

You can monitor the rebuilding of a new disk from the General tab of its Disk Properties dialog box (page 14-15).

A new disk module’s state changes as follows:

1. Powering up - The disk is powering up.

2. Rebuilding - The storage system is reconstructing the data on the new disk from the information on the other disks in the LUN. If the disk is the replacement for a hot spare that is being integrated into a redundant LUN, the state is Equalizing instead of  Rebuilding. In this situation, the storage system is simply copying the data from the hot spare onto the new disk.

3. Enabled - The disk is bound and assigned to the SP being used as the communication channel to the enclosure.

 

A hot spare’s state changes as follows:

1. Rebuilding - The SP is rebuilding the data on the hot spare.

2. Enabled - The hot spare is fully integrated into the LUN, or the failed disk has been replaced with a new disk and the SP is copying the data from the hot spare onto the new disk.

3. Ready - The copy is complete. The LUN consists of the disks in the original slots and the hot spare is on standby.

 

Rebuilding occurs at the same time as user I/O. The rebuild priority for the LUN determines the duration of the rebuild process and the amount of SP resources dedicated to rebuilding. A High or ASAP (as soon as possible) rebuild priority consumes many resources and may significantly degrade performance. A Low rebuild priority consumes fewer resources with less effect on performance. You can determine the rebuild priority for a LUN from the General tab of its LUN Properties dialog box (page 14-14).

 

Failed vault disk with storage-system write caching enabled

If you are using write caching, the storage system uses the disks listed in Table 14-3 for its cache vault. If one of these disks fails, the storage system dumps its write cache image to the remaining disks in the vault; then it writes all dirty (modified) pages to disk and disables write caching.

Storage-system write caching remains disabled until a replacement disk is inserted and the storage system rebuilds the LUN with the replacement disk in it. You can determine whether storage-system write caching is enabled or disabled from the Cache tab of its

Properties dialog box (page 14-14).

 

Storage-system type

Cache vault disks

CX3-series, CX-series

0-0 through 0-4

 

阅读(1382) | 评论(2) | 转发(0) |
给主人留下些什么吧!~~

chinaunix网友2008-08-22 16:19:14

"What is the High Availability Cache Vault (HACV) setting and what is the risk of setting it on or off?" ID: emc126011 Usage: 23 Date Created: 01/16/2006 Last Modified: 05/10/2007 STATUS: Approved Audience: Customer Knowledgebase Solution Question: What is the High Availability Cache Vault (HACV) setting and what is the risk of setting it on or off? Question: Purpose of High Availability Cache Vault (HACV) on a CLARiiON CX- and DL-Series array