Data validation should not to be used during a performance run. The processor overhead can
impact performance results.
Before I start I want to answer a question that has come up a few times: “why use Vdbench to
check for data corruptions? I can just write large files, calculate a checksum and then re-read and compare the checksums. “
Yes,of course you can do that, but is that really good enough? All you’re doing here is check for data corruptions during sequential data transfers. What about random I/O? Isn’t that important enough to check?If you write the same block X times and the contents you then find are correct, doesn’t it mean that you could have lost X-1 consecutive writes without ever noticing it? You spent 24 hours writing and re-reading large sequential files, which block is the one that’s bad?When was that block written and when was that block read again? Yes, it is nice to say: I have a bad checksum over the weekend. It is much more useful to say “I have a specific error in a specific block,and yes, I know when it was written and when it was found to be in error”,and by the way,this bad block actually came from the wrong disk.”
See data_errors=for information about terminating after a data validation error.
Data validation works as follows: Every write operation against an SD or FSD will be recorded
in an in-memory table. Each 512-byte sector in the block that is written contains an 8-byte
logical byte address (LBA),and a one-byte data validation key. The data validation key is
incremented from 1 to 126 for each write to the same block. Once it reaches the value 126, it will roll over to one. Zero is an internal value indicating that the block has never been written.This key methodology is developed to identify lost writes.If the same block is written several times without changing the contents of the block it is impossible to recognize if one or more of the writes have been lost. Using this key methodology we will have to lose exactly 126 consecutive writes to the same block without being able to identify that writes were lost. After a block has been written once, the data in the block will be validated after each read operation. A write will always be prefixed by a read so that the original content can be validated. Use of the '-vr' execution parameter (or validate=read parameter file option) forces each block to be read immediately after it has been written. However, remember that there is no guarantee that the data has correctly reached the physical disk drive; the data could have been simply read from cache.
Since data validation tables are maintained in memory, data validation will normally not be possible after Vdbench terminates,or after a system crash/reboot. To allow continuous data validation, use journaling.
Journaling: to allow data validation after a Vdbench or system outage, each write is recorded in a journal file.This journal file is flushed to disk using synchronous writes after each update (or we would lose updates after a system outage). Each journal update writes 512 bytes to its disk. Each journal entry is 8 bytes long, thereby allowing 63 entries plus an 8-byte header to be recorded in one journal record. When the last journal entry in a journal record is written, an additional 512 bytes of zeros is appended, allowing Vdbench to keep track of end-of-file in the journal. A journal entry is written before and after each Vdbench write.
Note: I witnessed one scenario where the journal file was properly maintained but the file system structure used for the journal files was invalid after a system outage. I therefore allow now the use of raw devices for journal files to get around this problem.
Since each Vdbench workload write will result in two synchronous journal writes, journaling
will have an impact on throughput/performance for the I/O workload. It is highly recommended
that you use a disk storage unit that has write-behind cache activated.This will minimize the performance impact on the I/O workload. To allow file system buffering on journal writes,
specify '-jn'or'-jrn'(or journal=noflush in your parameter file) to prevent forced flushing.This will speed up journal writes, but they may be lost when the system does not shut down cleanly. It is further recommended that the journals be written to what may be called a 'safe' disk. Do not write the journals to the same disk that you are doing error injection or other scary things With an unreliable journal, data validation may not work.
At the start of a run that requests journaling, two files are created: a map backup file,and a journal file. The contents of the in-memory data validation table (map) are written to both the backup and the journal file (all key entries being zero). Journal updates are continually written at the end of the journal file. When Vdbench restarts after a system failure and journal recovery is requested, the original map is read from the beginning of the journal file and all the updates in the journal are applied to the map. Once the journal file reaches end of file, all blocks that are marked 'modified' will be read and the contents validated.
Next, the in-memory map is written back to the beginning of the journal file,and then to the
backup file. Journal records will then be written immediately behind the map on the journal file. If writing of the map to the journal file fails because of a system outage, the backup file still contains the original map from the start of the previous run.If during the next journal recovery it is determined that not all the writes to the map in the journal file completed, the map will be restored from the backup file and the journal updates again are applied from the journal entries that still reside in the journal file after the incomplete map.
After a journal recovery, there is one specific situation that needs extra effort. Since each write operation has a before and after journal entry, it can happen that an after entry has never been written because of a system outage.In that case, it is not clear whether the block in question contains before or after data.In that case, the block will be read and the data that is compared may consist of either of the two values, either the new data or old data.
Note: I understand that any storage device that is interrupted in the middle of a write operation must have enough residual power available to complete the 512-byte sector that is currently being written,or may be ignored. That means that if one single sector contains both old and new data that there has been a data corruption.
Once the journal recovery is complete, all blocks that are identified in the map as being written to at least once are read sequentially and their contents validated.
During normal termination of a run, the data validation map is written to the journal.This serves two purposes: end of file in the journal file will be reset to just after the map, thus preserving disk space,(at this time unused space is not freed, however)and it avoids the need to re-read the whole journal and apply it to the starting map in case you need to do another journal recovery.
Note: since the history of all data that is being written is maintained on a block by block level using different data transfer sizes within a Vdbench execution has the following restrictions:
? Different data transfer sizes are allowed, as long as they are all multiples of each other.If for instance you use a 1k, 4k and 8k data transfer size, data validation will internally use the 1k value as the ‘data validation key block size’, with therefore a 4k block occupying 4 smaller data validation key blocks.
Note: when you do a data validation test against a large amount of disk space it may take quite a whilefor a random block to be accessed for the second time.(Remember, Vdbench can only
compare the data contents when it knows what is there).This means that a relative short run may appear successful whilein fact no blocks have been re-read and validated. Vdbench therefore since Vdbench 5.00 keeps track of how many blocks were actually read and validated.If the amount of blocks validated at the end of a run is zero, Vdbench will abort.
Example:For a one TB lun running 100 iops of 8k blocks it will take 744 hours or 31 days for
each random block to be accessed at least twice!
Note: since any re-write of a block when running data validation implies a pre-read of that block I suggest that when you specify a read percentage (rdpct=) you specify rdpct=0.This prevents you, especially at the beginning of a test, from reading blocks that Vdbench has not written (yet) and therefore is not able to compare, wasting precious IOPS and bandwidth.In these runs (unless you forcibly request an immediate re-read) you’ll see that the run starts with a zero read percentage, but then slowly climbs to 50% read once Vdbench starts writing (and therefore prereading) blocks that Vdbench has written before.