高效的两段式循环缓冲区──BipBuffer（1）-chenwayne-ChinaUnix博客

chenwayne

首页　| 　博文目录　| 　关于我

chenwayne

博客访问： 1543776
博文数量： 114
博客积分： 10010
博客等级：上将
技术积分： 1357
用户组：普通用户
注册时间： 2006-11-19 18:13

文章分类

全部博文（114）

文章存档

2010年（8）

2009年（9）

2008年（27）

2007年（62）

2006年（8）

我的朋友

manpaizh

相关博文

高效的两段式循环缓冲区──BipBuffer（1）

分类： C/C++

2008-11-14 00:19:39

Simon Cooke，美国（原作者）
北京理工大学 20981 陈罡(翻译)

写在前面的话：

循环缓冲区是一个非常常用的数据存储结构，已经被广泛地用于连续、流数据的存储和通信应用中。对于循环缓冲区，传统的操作方法是开辟一块连续的存储区，不断地写入数据，当写入到存储区的末尾的时候，再从存储区的首部再开始写入数据，由此不断地重复下去构成了循环缓冲区。偶曾经写过很多循环缓冲区，也看过很多人编写的循环缓冲区，但是拜读Simon Cooke先生的文章────“两段式”循环缓冲区（原文名称是：The Bip Buffer - The Circular Buffer with a Twist）确实觉得与众不同，于是就有了把它介绍给国内开发者的意愿。这里的twist的意思是“缠绕、绞合”，在这里有紧密联系的意味，作者的本意是希望通过twist这个词能够体现出这个循环缓冲区的特点，但是如果直译出来，会让很多人感到费解。所以在此，根据偶个人的理解将这个标题翻译成“两段式”循环缓冲区。接下来偶会把英文原文跟偶的理解写出来，感兴趣的朋友可以对照着看，如果翻译有误的地方还请个位高手不吝斧正！

────译者

1、Introduction 简介

Instead of keeping one head and tail pointer to the data in the buffer, it maintains two revolving regions, allowing for fast data access without having to worry about wrapping at the end of the buffer.

Buffer allocations are always maintained as contiguous blocks, allowing the buffer to be used in a highly efficient manner with API calls, and also reducing the amount of copying which needs to be performed to put data into the buffer. Finally, a two-phase allocation system allows the user to pessimistically reserve an area of buffer space, and then trim back the buffer to commit to only the space which was used..

Let's cover a little history first. If you don't already know why a circular buffer can be implemented really efficiently in hardware, or why that makes them the buffer of choice in most electronics, here's why.

Bip-Buffer使用起来有些类似循环缓冲区，但是在结构上略有不同。Bip-Buffer内部采用了两个循环存储区（而不是靠维持头指针和尾指针）来实现数据的高速存取，而且可以让Bip-Buffer的使用者完全不必担心写入数据到达缓冲区末尾，导致重新从缓冲区的首部开始写入的问题。Bip-Buffer维护的存储区是连续的，因此，Bip-Buffer可以通过API调用非常高效地使用存储区，在整个使用过程中可以最大限度避免使用诸如memcpy(),memmove()之类的内存拷贝操作（通常对于循环缓冲区来说，频繁地调用内存操作函数会成为效率瓶颈）。最后，Bip-Buffer两段式的内存分配系统允许用户申请一块较大的内存，而通过Commit操作来确认真正需要的内存大小，然后把没有用完的内存回收。

在这之前，我们先来回顾一下历史。如果你不知道为什么循环缓冲区可以利用硬件来做得非常非常高效或者不明白为什么在许许多多的电子产品中都能找到循环缓冲区的影子，那么下面的描述将给你解答。

2.Back in Days of Old... “石器”时代

Once upon a time, computers were much simpler. They didn't have 64 bit data buses. Heck, they didn't even have real 16 bit registers - although you could occasionally convince a pair of them to sub in for that purpose. These were simpler times, where Real Men programmed in assembly language, and laughed at anyone who didn't know how to use the carry flag for all kinds of nefarious purposes.

在很久以前，计算机要简单得多，它们并没有64位的总线，甚至没有真正的16位的寄存器，尽管你可能会相信真的有一对寄存器在做减法计算（译者：这句话很绕口,应该是作者在以一种调侃的语气跟读者交流，也不知道偶理解得对不对。作者的意思是暗指计算机是没有减法的，所谓减法就是在进行补码的加法运算，但是对于编程人员来说，由于汇编指令集里面是有SUB指令的，或许有些初级的开发者根据这个指令会想当然地认为寄存器之间在做减法运算。）。在这个“石器”时代，大师们使用汇编语言编写程序（译者，石刀、石斧？），他们会嘲笑那些不知道如何使用进位标志位来进行编程的开发者。

With simpler times came elegant hacks to eke the most power out of every instruction cycle available. Take, for example, a simple terminal communications program. Newer RS232 serial controllers had things like automatic handling of RTS and CTS signal lines to control the flow of data - but this came at a cost. Namely, the connection would be stopping and starting all the time, instead of streaming along. So in between the controller card and the system, would often be found a FIFO. This simple circular buffer was often no more than a couple of bytes long, but it meant that the system could run smoothly along without polling to see if data had arrived, or being hammered by constant interrupts from the serial controller.

在那个时代里，一流的黑客们想方设法地“压榨”计算机在每一条指令周期的运算能力。举个例子来说，一个简单的终端通信程序（译者，这里的终端通信，指的是基于rs232的串行通信，在作者所指的那个时代，应该还没有所谓互联网这样的东西存在），较新的rs232串行控制器可以通过自动处理RTS和CTS信号线来控制数据流向（但是这带来了一定程度的带宽资源浪费）。正如RTS和CTS这样的名字所代表的意思一样，串行通信的数据连接需要不断地处理控制器开始和停止信号，而不是采用类似“流”的方式连续不断地传输数据。于是，在控制卡和系统之间，我们通常可以找到一个叫做FIFO（译者，可以理解为数据结构中的先进先出型的队列）东西。这个或许是最简单的循环缓冲区的雏形，它通常只有几个到几十个字节左右的长度，但是它的出现，意味着整个系统可以流畅地运行，不需要实时地检查（译者，也就是所谓的轮循）是否有新的数据到达，或者应用程序的执行过程不断地被串行控制器的硬件中断（译者，这里的中断应该跟微机原理中学到的中断类似）所打扰。

Most FIFOs started out on-chip, but people also added their own in their code - the idea being that if you had some really gnarly dancing that you had to do on the incoming data, you may as well batch it all up into one lump and do it infrequently... giving spare time to the system to do other things. Like scroll the console, or decode GIFs.

绝大多数的FIFO都是在芯片上完成的，但是开发者们也会把这种理念用于他们的代码中，尤其是当某些通信连续性很糟糕的场合，需要开发者多次接收数据，然后一次性读取出来处理的时候，很多人想到了循环缓冲区。有了它的帮助，开发者可以在等待的这段时间里让计算机做一些其它事情，例如滚动控制台输出或者解码GIF图片之类的。

As I said, a FIFO is a very simple circular buffer. Most are implemented very simply as well; they're typically 2ⁿ bytes in size, which allows the pointers to simply overflow to get back around to the other end of the buffer. The FIFO logic can tell if the FIFO is empty because the head and tail values are the same, and it's full if the head is one greater than the tail.

正如我所提到的，FIFO是非常简单的循环缓冲区，而且绝大部分都是非常简单的实现；它们的长度一般都是2的n次方，这样就可以允许对指针进行简单的溢出判断和处理完成指针重新指向缓冲区的起始位置。FIFO的逻辑可以很容易地通过头指针和尾指针的值来判断缓冲区是“空”还是“满”——头指针和尾指针的数值相等，代表缓冲区为空；头指针的数值如果比尾指针大1，则代表缓冲区满。

Implementing these in software was easy on the old 8 bit systems. Take a 16 bit register pair. Decide on a location in memory (a multiple of 256) to store the FIFO data in. Then, after setting the register to the start of the buffer, don't touch the high register - just increment the low register. This gives you a 256-byte long buffer which you can walk through in one (in the case of the Zilog Z80, 4 cycles - the smallest execution unit available on that system) instruction. You can never go out of the bounds of your buffer, because the low register acts as an index with a value from 0 to 255. When you hit what would have been index 256, the register overflows and clocks back over to zero.

在老式的8位系统里面实现上述FIFO是非常简单的事情。找两个8位的寄存器构成一个16位的寄存器对，分配一段内存（取256的倍数）来保存FIFO数据，然后，让寄存器对指向该段内存的起始地址（译者，8位的系统，一般寻址空间是16位的，作者的意思是要用两个8位的寄存器来保存16位的内存地址，256的倍数代表了一个对齐问题，如果取256的倍数的话，就会让16位的寄存器对，只有高8位是有数值的，低8位是从0开始的），注意不要去碰高8位的寄存器，就让它保留内存地址的值即可；然后可以使用8位模式来操作低8位寄存器对16位的寄存器对指向的内存地址进行FIFO数据写入操作，把低8位的寄存器做为0－255的索引，每写入一个字节，就把低8位的寄存器加1，一旦超过了255，低8位的寄存器就会溢出，让低8位寄存器重新从0开始，这样就由硬件自动完成了循环缓冲区指针的调整。用这种方式就为你提供了只需要一个指令周期就可以完成操作的256个字节的循环缓冲区（在Zilog Z80系统上面，需要4个指令周期，这是在该系统里可以得到的最小的执行单位）。在这个实现中，由于是采用硬件溢出的方式来调整循环缓冲区的指针，因此，根本不必担心会溢出，会把数据写到其它的内存里面。（译者，这可能是可以用软件实现的效率最高、安全性最好的循环缓冲区了。）

3. The Modern Day “帝国”时代

Unfortunately, there is no solution quite as elegant available to Windows programmers today as that simple old 8-bit solution. Sure, you can dive down into assembly language (provided you can work out how the compiler maps registers to values... something I've never seen a good enough explanation of to get my head around), but most people don't have time for assembly language any more. And besides, we're dealing with 32 bit registers now - incrementing just one low-order byte from inside that register isn't really all that kosher any more. It can lead to cache flushing, pipeline stalling, printer fires, rains of frog, etc.
很不幸，对于现代windows程序开发人员来说，已经没有可能找到一种效率可以与早先8位机时代的FIFO相媲美的循环缓冲区的“完美”解决方案了。当然了，你可以深入研究汇编语言（你可以知道编译器是如何把寄存器和程序中的数值映射起来，然后做某种优化。。。总之我从来没有看到过一个能够让我改变我的这个看法的汇编解决方案），但是绝大多数人没有时间去挖掘汇编语言的潜力。而且，我们现代的操作系统都采用的是32位的寄存器，依靠寄存器加1，然后利用硬件溢出来达到循环利用缓冲区的做法，基本上已经不太现实了。现代的操作系统会利用cache（缓存）技术，管道延迟技术，printer fires, rains of frog等等来扩大寻址的空间。（译者，这后面两个技术不知道是什么意思，还望知道的朋友提示一下。）

If you can't just clock the low-order register to walk through the buffer, you have to start worrying about things like checking to see how much buffer you have filled before the end, making sure that you remember to copy the rest of the data from the start of the buffer, and all kinds of other bookkeeping headaches.

如果不能够通过简单的自增低位寄存器来实现重复使用整个缓冲区的话，那么我们就不得不去面对诸如已经往缓冲区中写入了多少数据，如何确保当写到缓冲区末尾的时候，要把余下的数据从缓冲区的首部开始写入等等让人头疼的问题。

My first attempt at implementing something like this relied on the vague hope that the virtual memory system could be tricked into setting things up in such a way that you could set up a mirror of a section of memory right next to the original. The idea being that you could still use the rotating allocation of data; a copy operation could go at full speed without any checking to see if you'd walked off the end of the buffer - because as far as your process's address space is concerned, the end of your buffer is also the beginning of your buffer.
我的第一个在现代的操作系统中实现高效的FIFO循环缓冲区的设想是基于一个模糊的目标，希望能够欺骗虚拟内存系统，在当前缓冲区的后面做一个镜像的缓冲区，这样一旦对这段缓冲区写入数据超过了内存的边界，数据会自动写入到当前当前缓冲区的起始位置去。这样，就可以仍然构成一个循环使用的存储区，而且内存拷贝等的操作不需要检查当前指针是否到达了内存区的边界——这是因为进程的地址空间已经被修改过了，缓冲区末尾的再下一个字节的地址恰恰就是缓冲区的开始的地址。（译者，作者的这个设想确实很有趣，但是估计现代的操作系统还没有开放到这个程度，估计用linux通过修改一些内核代码，应该是可以做到作者这个想法的）

Now, this mirroring technique may actually work. Due to some restrictions, I decided not to implement it myself (yet - I'm sure I'll find a use for it some day). The idea behind it is that first one reserves two areas of virtual memory, side by side. One then maps the same temporary file into both virtual memory sections. Voila! Instant mirroring, and a nice large buffered expanse one can copy data from willy-nilly.
这个镜像技术的设想或许真的会工作，但是由于一些系统限制性的原因，我决定不去自己实现它（虽然现在没有，我肯定将来的什么时候我会为它找到一种应用方式）。这个想法背后，意味着程序需要维护两块并列的虚拟内存，在两个虚拟内存中映射的是同一个临时文件。Instant mirroring技术(译者，这或许是作者一时激动，给这个设想起的名字吧。。。)最终可以允许用户无限制地向缓冲区写入数据、读取数据。

Unfortunately, while it should (again, I've not tried it) indeed work, there is another problem - namely, that files can only be mapped on 64kb boundaries (possibly larger on larger memory systems). This means that your buffer has to be a minimum of 64kb in size, and will take up 128kb of your virtual address space. Depending on your application, this may be a valid technique. However, I don't see writing a server application with 1000's of sockets being a valid prospect here.
So what to do? If mirroring won't work, how close can we get to using a circular buffer in our code? Heck, even if we can get close, why would we want to?
不幸的是，尽管它确实应该工作（再次声明，我没有尝试去做这个实验），但是会引起另外一个问题，也就是说文件只能被映射到64kb的边界（也许在更大的内存系统中会大一些）。这就意味着缓冲区最小也需要64kb的大小，并且会用掉虚拟地址空间的128kb的寻址空间。无论如何，这取决于你的应用程序的规模，这也许是一个可行的技术，但是我从没看到过为1000个socket端口提供服务的程序有采用这种技术的苗头。（译者，作者很无奈，毕竟想法是好的，但是真实的服务器开发，需要的是可靠、稳定、以及高效，没人愿意为了测试那些还在设想中的技术而赌上自己的schedule :P）。

既然如此，该怎么办呢？如果这种mirroring的技术不能够工作，我们如何找到一种在效率上最最接近它的循环缓冲区的实现方案呢？假设我们可以做出这样的接近于上述方案的循环缓冲区，那么我们该如何做呢？

（未完待续。。。）

阅读(4932) | 评论(0) | 转发(0) |

上一篇：关于vs2005的“应用程序配置不正确，无法运行”问题的解决方法

下一篇：高效的两段式循环缓冲区──BipBuffer（2）

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6