Chinaunix首页 | 论坛 | 博客
  • 博客访问: 802873
  • 博文数量: 274
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 862
  • 用 户 组: 普通用户
  • 注册时间: 2015-10-24 15:31
个人简介

不合格的程序猿

文章分类

全部博文(274)

文章存档

2019年(3)

2018年(1)

2017年(4)

2016年(160)

2015年(106)

我的朋友

分类: 服务器与存储

2018-10-27 00:24:53

vdbench配置文件:

点击(此处)折叠或打开

  1. hd=default,vdbench=/home/icfs_test/vdbench504/,user=root,shell=ssh
  2. hd=hd1,system=20.5.29.101
  3. hd=hd2,system=20.5.29.102
  4. hd=hd3,system=20.5.29.103
  5. hd=hd4,system=20.5.29.104
  6. hd=hd5,system=20.5.29.105
  7. hd=hd6,system=20.5.29.106
  8. hd=hd7,system=20.5.29.107
  9. hd=hd8,system=20.5.29.108
  10. hd=hd9,system=20.5.29.109


  11. fsd=fsd1,anchor=/mnt/icfs/testdir1/20181026,depth=3,width=5,files=10000,size=500k,shared=yes
  12. fsd=fsd2,anchor=/mnt/icfs/testdir2/20181026,depth=3,width=5,files=10000,size=500k,shared=yes
  13. fsd=fsd3,anchor=/mnt/icfs/testdir3/20181026,depth=3,width=5,files=10000,size=500k,shared=yes
  14. fsd=fsd4,anchor=/mnt/icfs/testdir4/20181026,depth=3,width=5,files=10000,size=500k,shared=yes
  15. fsd=fsd5,anchor=/mnt/icfs/testdir5/20181026,depth=3,width=5,files=10000,size=500k,shared=yes
  16. fsd=fsd6,anchor=/mnt/icfs/testdir6/20181026,depth=3,width=5,files=10000,size=500k,shared=yes
  17. fsd=fsd7,anchor=/mnt/icfs/testdir7/20181026,depth=3,width=5,files=10000,size=500k,shared=yes
  18. fsd=fsd8,anchor=/mnt/icfs/testdir8/20181026,depth=3,width=5,files=10000,size=500k,shared=yes

  19. fwd=format,threads=32,xfersize=500k

  20. fwd=default,xfersize=500k,fileio=random,fileselect=random,rdpct=0,threads=32

  21. fwd=fwd1,fsd=fsd1,host=hd1
  22. fwd=fwd2,fsd=fsd2,host=hd2
  23. fwd=fwd3,fsd=fsd3,host=hd3
  24. fwd=fwd4,fsd=fsd4,host=hd4
  25. fwd=fwd5,fsd=fsd5,host=hd5
  26. fwd=fwd6,fsd=fsd6,host=hd6
  27. fwd=fwd7,fsd=fsd7,host=hd7
  28. fwd=fwd8,fsd=fsd8,host=hd8

  29. rd=rd1,fwd=fwd*,fwdrate=max,format=restart,elapsed=28800,interval=1
【注:fsd中shared,vdbench不允许不同的slave之间共享同一个目录结构下的所有文件,因为这样会带来很大的开销,但是它们允许共享同一个目录结构。加入设置了shared=yes,那么不同的slave可以平分一个目录下所有的文件来进行访问,相当于每个slave有各自等分的访问区域,因此不能测试多个客户的对同一个文件的读写。

实施过程:

点击(此处)折叠或打开

  1. 1. 每个客户的都需要安装vdbench,路径保持一致
  2. 2. SSH互联,选一个客户的为父节点,把每个子节点都信任父节点,并验证免密登录
  3. 3. 参照如开头给出的vdbench配置文件,编写配置文件vdbench.conf
  4. 4. 在父节点vdbench目录下,脱离终端运行vdbench:   nohup ./vdbench -f vdbench.conf &


参数说明:

点击(此处)折叠或打开

  1. # 块设备
  2. HD:主机定义
  3. 如果您希望展示当前主机,则设置 hd= localhost。如果希望指定一个远程主机,hd= label。
  4. system= IP 地址或网络名称。
  5. clients= 用于模拟服务器的正在运行的客户端数量。
  6. SD:存储定义
  7. sd= 标识存储的名称。
  8. host= 存储所在的主机的 ID。
  9. lun= 原始磁盘、磁带或文件系统的名称。vdbench 也可为您创建一个磁盘。
  10. threads= 对 SD 的最大并发 I/O 请求数量。默认为 8。
  11. hitarea= 调整读取命中百分比的大小。默认为 1m。
  12. openflags= 用于打开一个 lun 或一个文件的 flag_list。
  13. WD:工作负载定义
  14. wd= 标识工作负载的名称。
  15. sd= 要使用的存储定义的 ID。
  16. host= 要运行此工作负载的主机的 ID。默认设置为 localhost。
  17. rdpct= 读取请求占请求总数的百分比。
  18. rhpct= 读取命中百分比。默认设置为 0。
  19. whpct= 写入命中百分比。默认设置为 0。
  20. xfersize= 要传输的数据大小。默认设置为 4k。
  21. seekpct= 随机寻道的百分比。可为随机值。
  22. openflags= 用于打开一个 lun 或一个文件的 flag_list。
  23. iorate= 此工作负载的固定 I/O 速率。
  24. RD:运行定义
  25. rd= 标识运行的名称。
  26. wd= 用于此运行的工作负载的 ID。
  27. iorate= (#,#,...) 一个或多个 I/O 速率。
  28. curve:性能曲线(待定义)。
  29. max:不受控制的工作负载。
  30. elapsed= time:以秒为单位的运行持续时间。默认设置为 30。
  31. warmup= time:加热期,最终会被忽略。
  32. distribution= I/O 请求的分布:指数、统一或确定性。
  33. pause= 在下一次运行之前休眠的时间,以秒为单位。
  34. openflags= 用于打开一个 lun 或一个文件的 flag_list。

点击(此处)折叠或打开

  1. #文件系统
  2. HD:主机定义。与虚拟块设备相同。
  3. FSD:文件系统定义
  4. fsd= 标识文件系统定义的名称
  5. anchor= 将在其中创建目录结构的目录
  6. width= 要在定位符下创建的目录数
  7. depth= 要在定位符下创建的级别数
  8. files= 要在最低级别创建的文件数
  9. sizes= (size,size,...) 将创建的文件大小
  10. distribution= bottom(如果希望仅在最低级别创建文件)和 all(如果希望在所有目录中创建文件)
  11. openflags= 用于打开一个文件系统 (Solaris) 的 flag_list
  12. FWD:文件系统工作负载定义
  13. fwd= 标识文件系统工作负载定义的名称。
  14. fsd= 要使用的文件系统定义的 ID。
  15. host= 要用于此工作负载的主机的 ID。
  16. fileio= random 或 sequential,表示文件 I/O 将执行的方式。
  17. fileselect= random 或 sequential,标识选择文件或目录的方式。
  18. xfersizes= 数据传输(读取和写入操作)处理的数据大小。
  19. operation= mkdir、rmdir、create、delete、open、close、read、write、getattr 和 setattr。选择要执行的单个文件操作。
  20. rdpct= (仅)读取和写入操作的百分比。
  21. threads= 此工作负载的并发线程数量。每个线程需要至少 1 个文件。
  22. RD:运行定义
  23. fwd= 要使用的文件系统工作负载定义的 ID。
  24. fwdrate= 每秒执行的文件系统操作数量。
  25. format= yes / no / only / restart / clean / directories。在开始运行之前要执行的操作。
  26. operations= 覆盖 fwd 操作。选项相同。
数据一致性校验:

点击(此处)折叠或打开

  1. # 校验原理
  2. 数据校验的工作流程如下:每一个在存储系统中的第一次写操作记录在一个表中,假定写操作的块大小是1m,那么这个块大小中的每512字节中包含的两项–8字节的逻辑字节地址(LBA)和一个字节的数据校验key值(标记是第几次写,范围为0-125,00代表创建写,01代表第一次覆盖写,以此类推,当到达126后折返00,重新来一轮)会被记录,这个过程为生成校验日志;第二次重新运行脚本(使用参数-jr或者-vr)则根据第一次记录的日志进行数据校验
校验参数:

点击(此处)折叠或打开

  1. # 校验参数

  2. 打开vdbench校验数据的参数为-v或-j,这个过程会为每一次写操作记录日志用于后续校验。

  3. 使用-v参数,则生成的校验日志直接保存于内存中,使用-j参数则生成一个校验日志的文件,第二次校验时,-jr即可进行日志恢复进行校验。-v直接记录于内存之中,速度更快,但如果存储系统出现重启或内存清理,那么-v参数记录的校验日志就丢失了;-j直接写到磁盘上,安全有保证但速度会慢一下,此时可选择-jn,异步写到磁盘上,速度和安全都有一定的保证。

点击(此处)折叠或打开

  1. # 原文

  2. Data validation should not to be used during a performance run. The processor overhead can
  3. impact performance results.

  4. Before I start I want to answer a question that has come up a few times: “why use Vdbench to
  5. check for data corruptions? I can just write large files, calculate a checksum and then re-read and compare the checksums.

  6. Yes, of course you can do that, but is that really good enough? All you’re doing here is check for data corruptions during sequential data transfers. What about random I/O? Isn’t that important enough to check? If you write the same block X times and the contents you then find are correctdoesn’t it mean that you could have lost X-1 consecutive writes without ever noticing it? You spent 24 hours writing and re-reading large sequential files, which block is the one that’s bad?When was that block written and when was that block read again? Yes, it is nice to say: I have a bad checksum over the weekend. It is much more useful to say “I have a specific error inspecific block, and yes, I know when it was written and when it was found to be in error”, and by the way, this bad block actually came from the wrong disk.

  7. See data_errors= for information about terminating after a data validation error.

  8. Data validation works as follows: Every write operation against an SD or FSD will be recorded
  9. in an in-memory table. Each 512-byte sector in the block that is written contains an 8-byte
  10. logical byte address (LBA), and a one-byte data validation key. The data validation key is
  11. incremented from 1 to 126 for each write to the same block. Once it reaches the value 126, it will roll over to one. Zero is an internal value indicating that the block has never been written. This key methodology is developed to identify lost writes. If the same block is written several times without changing the contents of the block it is impossible to recognize if one or more of the writes have been lost. Using this key methodology we will have to lose exactly 126 consecutive writes to the same block without being able to identify that writes were lostAfter a block has been written once, the data in the block will be validated after each read operation. A write will always be prefixed by a read so that the original content can be validatedUse of the '-vr' execution parameter (or validate=read parameter file option) forces each block to be read immediately after it has been written. However, remember that there is no guarantee that the data has correctly reached the physical disk drive; the data could have been simply read from cache

  12. Since data validation tables are maintained in memory, data validation will normally not be possible after Vdbench terminates, or after a system crash/reboot. To allow continuous data validation, use journaling

  13. Journaling: to allow data validation after a Vdbench or system outage, each write is recorded injournal file. This journal file is flushed to disk using synchronous writes after each update (or we would lose updates after a system outage). Each journal update writes 512 bytes to its disk. Each journal entry is 8 bytes long, thereby allowing 63 entries plus an 8-byte header to be recorded in one journal record. When the last journal entry in a journal record is written, an additional 512 bytes of zeros is appended, allowing Vdbench to keep track of end-of-file in the journal.journal entry is written before and after each Vdbench write.

  14. Note: I witnessed one scenario where the journal file was properly maintained but the file system structure used for the journal files was invalid after a system outage. I therefore allow now the use of raw devices for journal files to get around this problem.

  15. Since each Vdbench workload write will result in two synchronous journal writes, journaling
  16. will have an impact on throughput/performance for the I/O workload. It is highly recommended
  17. that you use a disk storage unit that has write-behind cache activated. This will minimize the performance impact on the I/O workload. To allow file system buffering on journal writes,
  18. specify '-jn' or '-jrn' (or journal=noflush in your parameter file) to prevent forced flushing. This will speed up journal writes, but they may be lost when the system does not shut down cleanlyIt is further recommended that the journals be written to what may be called a 'safe' disk. Do not write the journals to the same disk that you are doing error injection or other scary things With an unreliable journal, data validation may not work.

  19. At the start of a run that requests journaling, two files are created: a map backup file, and a journal file. The contents of the in-memory data validation table (map) are written to both the backup and the journal file (all key entries being zero). Journal updates are continually written at the end of the journal file. When Vdbench restarts after a system failure and journal recovery is requested, the original map is read from the beginning of the journal file and all the updates in the journal are applied to the map. Once the journal file reaches end of file, all blocks that are marked 'modified' will be read and the contents validated.
  20. Next, the in-memory map is written back to the beginning of the journal file, and then to the
  21. backup file. Journal records will then be written immediately behind the map on the journal fileIf writing of the map to the journal file fails because of a system outage, the backup file still contains the original map from the start of the previous run. If during the next journal recovery it is determined that not all the writes to the map in the journal file completed, the map will be restored from the backup file and the journal updates again are applied from the journal entries that still reside in the journal file after the incomplete map.

  22. After a journal recovery, there is one specific situation that needs extra effort. Since each write operation has a before and after journal entry, it can happen that an after entry has never been written because of a system outage. In that case, it is not clear whether the block in question contains before or after data. In that case, the block will be read and the data that is compared may consist of either of the two values, either the new data or old data.
  23. Note: I understand that any storage device that is interrupted in the middle of a write operation must have enough residual power available to complete the 512-byte sector that is currently being written, or may be ignored. That means that if one single sector contains both old and new data that there has been a data corruption.

  24. Once the journal recovery is complete, all blocks that are identified in the map as being written to at least once are read sequentially and their contents validated.

  25. During normal termination of a run, the data validation map is written to the journal. This serves two purposes: end of file in the journal file will be reset to just after the map, thus preserving disk space, (at this time unused space is not freed, however) and it avoids the need to re-read the whole journal and apply it to the starting map in case you need to do another journal recovery.

  26. Note: since the history of all data that is being written is maintained on a block by block level using different data transfer sizes within a Vdbench execution has the following restrictions:
  27.     ? Different data transfer sizes are allowed, as long as they are all multiples of each other. If for instance you use a 1k, 4k and 8k data transfer size, data validation will internally use the 1k value as the ‘data validation key block size’, with therefore a 4k block occupying 4 smaller data validation key blocks.

  28. Note: when you do a data validation test against a large amount of disk space it may take quite a while for a random block to be accessed for the second time. (Remember, Vdbench can only
  29. compare the data contents when it knows what is there). This means that a relative short run may appear successful while in fact no blocks have been re-read and validated. Vdbench therefore since Vdbench 5.00 keeps track of how many blocks were actually read and validated. If the amount of blocks validated at the end of a run is zero, Vdbench will abort.

  30. Example: For a one TB lun running 100 iops of 8k blocks it will take 744 hours or 31 days for
  31. each random block to be accessed at least twice!

  32. Note: since any re-write of a block when running data validation implies a pre-read of that block I suggest that when you specify a read percentage (rdpct=) you specify rdpct=0. This prevents you, especially at the beginning of a test, from reading blocks that Vdbench has not written (yetand therefore is not able to compare, wasting precious IOPS and bandwidth. In these runs (unless you forcibly request an immediate re-read) you’ll see that the run starts with a zero read percentage, but then slowly climbs to 50% read once Vdbench starts writing (and therefore prereading) blocks that Vdbench has written before.






阅读(13199) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~