分类: 服务器与存储
2009-06-24 23:41:59
其实,用户处配置了36, 73,
以上的过程,我们得出结论(针对用户的环境):
1) write cache被禁止,系统地性能会急剧下降
2) 如果是系统盘损坏,无论有无hotspare盘,write cache都会被disable
3) 一旦系统盘恢复正常后(数据同步完成后,
4) 如果是非系统盘损坏,write cache不会被disable
5) 容量大的hotspare可以顶替小容量的坏盘
6) 如果SPS有1个失效,也会将write cache disable.
在我们的CX500测试环境中,也作了同样的测试。发现大部分测试结果都一样,但是第3)点有所不同,结论是:
一旦系统盘开始同步(而非等待完成), write cache就可以恢复成enable的状态,这大大的减少了write cache被disable的时间,也正是我们需要的行为。
那么,这也许是CX600的一个bug, 在CX500中已经修正? 抑或是通过升级微码就可以避免这个问题呢?
以下是从EMC的手册中摘抄的与hotspare相关的一些说明:
Hot spare - A single global spare disk, that serves as a temporary replacement for a failed disk in a RAID 5, 3, 1, or 1/0 LUN. Data from the failed disk is reconstructed automatically on the hot spare. It is reconstructed from the parity data or mirrored data on the working disks in the LUN; therefore, the data on the LUN is always accessible. A hot spare LUN cannot belong to a storage group
RAID type |
Number of disks you can use |
RAID 5 |
3 - 16 |
RAID 3 |
5 or 9 (CX-series) |
RAID 1/0 |
2, 4, 6, 8, 10, 12, 14, 16 |
RAID 1 |
2 |
RAID 0 |
3 - 16 |
Disk |
1 |
Hot spare |
1 |
Note: If you have LUNs consisting of FC drives, allocate an FC drive as a hot
spare.If you have LUNs consisting of ATA drives, allocate an ATA drive as a
hot spare.
Rebuild priority The rebuild priority is the relative importance of reconstructing data on either a hot spare or a new disk that replaces a failed disk in a LUN. It determines the amount of resource the SP devotes to rebuilding instead of to normal I/O activity. Table 8-3 lists and describes the rebuild time associated with each rebuild value.
Value |
Target rebuild time in hours |
ASAP |
0 (as quickly as possible) This is default. |
HIGH |
6 |
MEDIUM |
12 |
LOW |
18 |
The rebuild priorities correspond to the target times listed above. The storage system attempts to rebuild the LUN in the target time or less. The actual time to rebuild the LUN depends on the I/O workload, the LUN size, and the LUN RAID type. For a RAID group with multiple LUNs, the highest priority specified for any LUN in the group is used for all LUNs on the group.
Rebuilding a RAID 5, 3, 1, or 1/0 LUN
You can monitor the rebuilding of a new disk from the General tab of its Disk Properties dialog box (page 14-15).
A new disk module’s state changes as follows:
1. Powering up - The disk is powering up.
2. Rebuilding - The storage system is reconstructing the data on the new disk from the information on the other disks in the LUN. If the disk is the replacement for a hot spare that is being integrated into a redundant LUN, the state is Equalizing instead of Rebuilding. In this situation, the storage system is simply copying the data from the hot spare onto the new disk.
3. Enabled - The disk is bound and assigned to the SP being used as the communication channel to the enclosure.
A hot spare’s state changes as follows:
1. Rebuilding - The SP is rebuilding the data on the hot spare.
2. Enabled - The hot spare is fully integrated into the LUN, or the failed disk has been replaced with a new disk and the SP is copying the data from the hot spare onto the new disk.
3. Ready - The copy is complete. The LUN consists of the disks in the original slots and the hot spare is on standby.
Rebuilding occurs at the same time as user I/O. The rebuild priority for the LUN determines the duration of the rebuild process and the amount of SP resources dedicated to rebuilding. A High or ASAP (as soon as possible) rebuild priority consumes many resources and may significantly degrade performance. A Low rebuild priority consumes fewer resources with less effect on performance. You can determine the rebuild priority for a LUN from the General tab of its LUN Properties dialog box (page 14-14).
Failed vault disk with storage-system write caching enabled
If you are using write caching, the storage system uses the disks listed in Table 14-3 for its cache vault. If one of these disks fails, the storage system dumps its write cache image to the remaining disks in the vault; then it writes all dirty (modified) pages to disk and disables write caching.
Storage-system write caching remains disabled until a replacement disk is inserted and the storage system rebuilds the LUN with the replacement disk in it. You can determine whether storage-system write caching is enabled or disabled from the Cache tab of its
Properties dialog box (page 14-14).
Storage-system type |
Cache vault disks |
CX3-series, CX-series |
0-0 through 0-4 |
ID: emc126011
Usage: 23
Date Created: 01/16/2006
Last Modified: 05/10/2007
STATUS: Approved
Audience: Customer
Knowledgebase Solution
Question: What is the High Availability Cache Vault (HACV) setting and what is the risk of setting it on or off?
Question: Purpose of High Availability Cache Vault (HACV) on a CLARiiON CX- and DL-Series array
Environment: Product: CLARiiON CX200
Environment: Product: CLARiiON CX300
Environment: Product: CLARiiON CX300i
Environment: Product: CLARiiON CX400
Environment: Product: CLARiiON CX500
Environment: Product: CLARiiON CX500i
Environment: Product: CLARiiON CX600
Environment: Product: CLARiiON CX700
Environment: Product: CLARiiON DL300
Environment: Product: CLARiiON DL310
Environment: Product: CLARiiON DL700
Environment: Product: CLARiiON DL710
Environment: Product: CLARiiON CX3-10c
Environment: Product: CLARiiON CX3-20
Environment: Product: CLARiiON CX3-20c
Environment: Product: CLARiiON CX3-20F
Environment: Product: CLARiiON CX3-40
Environment: Product: CLARiiON CX3-40c
Environment: Product: CLARiiON CX3-40F
Environment: Product: CLARiiON CX3-80
Problem: Does the HA Cache Vault prevent write cache from disabling in case another critical component fails?
Problem: What does the HA Cache Vault check box in Navisphere Manager do?
Problem: What events does HA Cache Vault protect the write cache from?
Fix: If you enable the HA cache vault (HACV), a single drive failure will cause the write cache to become disabled, thus reducing the risk of losing data in the event of a second drive failing. If you disable the HACV, a single drive failure does not disable the write cache, leaving data at risk if a second drive fails. When you disable the HCCV, you will receive a warning message stating that this operation will allow write caching to continue even if one of the cache vault drives fails. If there is already a failure on one of the cache vault drives, this operation will not re-enable the write cache. Cache will not re-enable in the event of an SP reboot until the fault condition is corrected.
The following table describes the consequences of having HACV enabled or disabled when a problem occurs.
HACV Matrix
Problem
HACV enabled
HACV disabled
CacheState
After failure
Data Loss
CacheState
After failure
Data Loss
No disk failures
Enabled
No
Enabled
No
Above and the SP panic or reboot
Enabled
No
Enabled
No
Above and double SP panic
Disabled
Yes
Disabled
Yes
Above and Array power cycles
Enabled
No
Enabled
No
Single vault disk fails
Disabled
No
Enabled
No
Above and the SP panic or reboot
Disabled
No
Disabled
No
Above and double SP panic
Disabled
No
Disabled
Yes
Above and Array power cycles
Disabled
No
Disabled
No
Second vault disk fails
Disabled
User LUNs
Disabled
User LUNs
Above and the SP panic or reboot
Disabled
No
Disabled
No
Above and double SP panic
Disabled
No
Disabled
Possible *
Above and Array power cycles
Disabled
No
Disabled
Possible *
* When a second vault disk fails, cache will disable and begin to de-stage because this takes more time than a dump of cache memory to the vault. There is a window of vulnerability for LUNs to end up with dirty cache.
Notes: There are issues that will cause the cache to become disabled and that the HACV setting has no effect on:
AC power loss
SP failure
Fan failure
Power supply failure
SPS failure
User-induced cache disable
Insufficient number of available cache pages
Over- temperature
Notes: HA Cache Vault is defined in Navisphere Manager Help as:
"HA Cache Vault Note:
Supported only on CX-Series storage systems. Determines the availability of storage-system write caching when a single drive in the cache vault fails. When the check box is enabled (default), write caching is disabled if a single vault disk fails. When the check box is cleared, write caching is not disabled if a single disk fails.
Important: Disabling the HA Cache Vault check box puts the data at risk if another cache vault disk should fail."