分类: 服务器与存储
2014-01-03 13:26:47
SDD Cache: To Write Through Or To Write Back? That Is The Question
Whether ‘tis nobler in the mind to suffer the slings and arrows of your IT manager or to take up cache against a sea of data. With apologies to Bill Shakespeare, the IT Dog is discussing two different cache write policies: write through and write back. Let me take a run at telling you what they are, why they are different and what you need to think about when you decide what policy to use.
Cache Write Through
This is the easier to explain and understand of the two policies discussed here. Cache write through is like having your cake and eating it too. Data is written into cache for fast retrieval when needed for a future operation and at the same time, the data is written into the underlying memory location (think central database or primary storage here). Why is cache write through good? Because it insures the integrity of the data in the database so if another application needs to access the same data, you can be sure the data in the database is correct or ‘fresh’. Another benefit to cache write through is there is always a good, reliable copy of the data in the database if something happens to the cache itself (e.g. power failure). “So Dog, why wouldn’t I do cache write through all the time?” Well, because cache write through has a little problem in that the writing application must wait for the completion of the write into the primary storage before it can proceed with the next operation and as you well know, writing to primary storage can be slow which is why you are using cache in the first place. Write through cache can therefore slow down the application. So you get data integrity and assurance at the expense of application speed.
Getting Down And Dirty
We need to get a little dirty here to explain cache write back. No, not that kind of dirty; we are still talking about data and applications here. Cache write back uses cache as the temporary storage for the ‘freshest’ data and that same data is updated in the primary storage at a later time. This means that when a data location is updated, it is written to only to cache, not to primary storage and the data in cache is termed ‘fresh’. The corresponding data location in primary storage now does not match the ‘fresh’ cache data and is now considered ‘stale’. The cache controller keeps track of the state of the primary storage (fresh or stale) by using a ‘dirty’ bit to indicate if the data in the primary storage matches the cache data copy or does not. If a request for the stale data from primary storage comes in from another application, then the controller has to update the primary storage location before the application can use the data. “Why would I use this cache policy?” Well, because data is not written into slower, primary storage each time it is updated, the writing application does not have to wait for the write to primary storage to be completed before moving on to the next operation so the application runs faster. The problem with this policy is there is a small chance that the fresh cache data (or the dirty bit) can be corrupted before the data is written into primary storage. So you get speed at the expense of possible loss of data integrity.
What To Do When?
The question on which of these two cache policies to use depends on the application. Applications that absolutely cannot risk data loss (banks storing real-time trading info, continuous data protection in back store) should use write through cache policy and applications that can tolerate data loss (back end, analytics, etc.) can speed performance by using write back cache policy.
Tell Me What Your Policy Is
Let me know if you have experience with using either write back or write through cache policy and what you have learned from deploying that technique.