Chinaunix首页 | 论坛 | 博客
  • 博客访问: 379438
  • 博文数量: 38
  • 博客积分: 256
  • 博客等级: 入伍新兵
  • 技术积分: 846
  • 用 户 组: 普通用户
  • 注册时间: 2012-12-14 23:21
文章分类

全部博文(38)

文章存档

2015年(1)

2014年(1)

2013年(28)

2012年(8)

我的朋友

分类: 服务器与存储

2015-01-20 15:55:02

EMC ENHANCED WRITE CACHE AVAILABILITY FOR
CLARIION CX4 AND FLARE RELEASE 28
 The short story – EMC doesn’t dump cache nor write through when the CX4 loses an SP (either via failure or NDU).

It’s a big deal as everything stops when the CX dumps cache and then writes slow down (and possibly reads as a consequence of no write consolidation) without WCA.

With WCA, there is no cache dump, so write back continues and performance should be about the same with 1 SP as with 2 (unless IO load exceeds capacity of one SP – which is unlikely

EMC expects the bigger CLARiiON shops will see a positive impact with WCA

==========

 

 

What is WCA?

Write Cache Availability refers to a change in how CX4 and FLARE release 28 implements cache management. CLARiiON storage systems prior to R28 have automatically destaged write cache data to triple-mirrored vault drives when a variety of components failed. With WCA, data will now be maintained in cache in the following failure scenarios:

·        Single SP failure

·        Single vault drive failure

·        Power supply failure 

·        SPS failure

 

Cache destage to the vault drives will continue to operate as it always has in the following cases:

·        Loss of power to the array on both the A and B power circuits

·        Array overheating conditions affecting both SPs

·        Multi-fan failures (on CX4-80)

 

Maintenance procedures that triggered automatic cache destage will no longer do so.

·        Non-disruptive Upgrade (NDU) of FLARE and layered applications

·        Single SP restart

·        Single SP physically removed

 

What has changed since Cache Destage to Vault was introduced?

Patented in 1994, the CLARiiON disk array had many features that are now standard in the industry. Features mentioned in early patents included the industry’s first dual active controller design, full system redundancy, and mirrored write cache. To protect the integrity of data in cache, this early design introduced cache destage to the vault drives. CLARiiON also included an innovative RAID architecture to protect in flight writes to RAID groups in the presence of power failures. Subsequent innovations improved on this design, e.g. algorithms to optimize the performance of cache destage process.

 

In the ensuing years, CLARiiON customers benefited from continuous improvements in CLARiiON hardware, and software architecture. In May 2006, EMC introduced the CX3 UltraScale family of storage systems. Detailed analysis of CX3 systems in the field showed that these systems were delivering 99.999% uptime performance. After reviewing and analyzing detailed reports on over 10,000 CX3 systems in the field, CLARiiON engineering determined that data could safely be maintained in write cache in specific failure scenarios. 

 

Why is WCA being implemented?

The analysis of thousands of CX3 UltraScale systems proved that this could be done now with a very high degree of safety. WCA will provide important benefits to CLARiiON customers:

·        Users will benefit from improved application performance immediately following certain component failures. 

·        In high-end CLARiiON arrays, customers will be able to utilize more cache for writes. Larger write cache will provided improved performance, particularly for applications that perform many random write operations such as data base applications and Exchange.

·        Array performance will be improved during hardware maintenance procedures such replacing an SP, power supply, or SPS.

·        Array performance will be improved during NDU maintenance procedures

 

What will happen when a power supply, blower, or SPS fails?

These components are redundant and hot swappable. The storage system will continue to operate normally if any of these components should fail. CLARiiON provides the means for customers to be notified when failure event occurs. Through CLARalert, the EMC Customer Services will also be notified of the event. The best practice has always been to replace the failed component as soon as possible. The probability of two similar components failing within a small time window is extremely low.

 

What will happen when an SP fails or is removed for maintenance?

Data will remain in the write cache of the surviving SP thereby increasing the performance of the array. This is called “single-board write caching”. Consider a situation where SP-A is the surviving storage processor. SP-A will detect that the peer has panicked or been removed and transition to single-board write caching without increasing response times for I/O requests that are currently in flight. When SP-B is rebooted, the write cache image in SP-A will be written to SP-B.

 

What will happen when one SP panics while the other is down?

FLARE is designed to react to a variety of failure scenarios. These will be covered in detail in an engineering white paper. Some examples:

·        When a surviving SP panics the, write cache memory will be persisted through the reboot and as soon as the SP comes up, the cache will be rebuilt and then enabled.

·        If both SPs panic almost simultaneously, then the SP that comes up first will send its copy of the write cache image to the second SP.

·        While an SP is rebooting, if the surviving SP panics, then the cache will be disabled until the second SP (original surviving SP) comes up. The cache image will be check and then transferred to the peer.

 .

What do customers need to do when configuring a CX4 array?

There is no setup involved. WCA is simply part of the built-in functionality for CX4 series storage systems. Cache tuning and cache management functions are not changing.

 

Are there special considerations for performing maintenance or upgrades?

EMC has added an illuminated hand with red slash through it on all SPs to clearly indicate which SP is active and which isn’t to avoid SP replacement gaffes. In addition, the CLARiiON Procedure Generator will also continue to generate tested and approved maintenance procedures.

 


阅读(2052) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~