使用TCP/IP进行网际互连---第十二章TCP（3）-digdeep126-ChinaUnix博客

digdeep126的ChinaUnix博客digdeep126.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

digdeep126

博客访问： 1803392
博文数量： 413
博客积分： 8399
博客等级：中将
技术积分： 4325
用户组：普通用户
注册时间： 2011-06-09 10:44

文章分类

全部博文（413）

MySQL（12）
Oracle优化相关（6）
ORA-错误（1）
RAC_DG_GG等（6）
Oracle备份恢复（8）
存储（4）
PL/SQL（3）
UNIX（2）
Oracle体系结构（73）
Python（6）
Audio/Video（8）
English（7）
pthread（30）
Java（3）
Linux下的汇编（9）
网络（62）
生命感悟（7）
转载（4）
生活点滴（8）
Linux驱动编程（2）
一笑了之（3）
Linux shell（40）
Linux C/C++（108）
未分配的博文（1）

文章存档

2015年（1）

2014年（18）

2013年（39）

2012年（163）

2011年（192）

我的朋友

相关博文

使用TCP/IP进行网际互连---第十二章TCP（3）

分类： LINUX

2011-12-27 17:29:23

1. 糊涂窗口综合症（silly window syndrome,简称SWS）

如果发送方快速地产生数据，发送方TCP传输的报文段所携带的数据很快就会装满接收方的缓冲区。最终，发送方会收到窗口通告确认信息，得知接收方的窗口已被填满。如果接收方的应用程序每次从饱和的缓冲区读取1个字节之后，那么接收方会向发送方发送一个字节的窗口通告，发送方再发送一个字节 ...... 如此反复。发送方和接收方之间的这种交互可能进入一个稳定的状态，届时TCP为每个字节数据发送一个单独的报文段。传输短报文段严重地浪费了网络带宽，带来了不必要的计算负载。这种现象或者问题称为“糊涂窗口综合症”。

1）在接收端的可以用“推迟确认”技术来避免“糊涂窗口综合症”；

2）在发送端可以用“Nagle算法”来避免“糊涂窗口综合症”；

TCP要求收发双方实现避免糊涂窗口综合征的启发式方法：

接收方要避免小窗口的通告（推迟确认）；

而发送方要使用自适应机制来推迟传输，以便将数据组块形成长的报文段（Nagle算法）。

2.Delayed Acknowledgment（推迟确认 / 延迟确认）

A host that is receiving a stream of TCP data segments can increase efficiency in both the Internet and the hosts by sending fewer than one ACK (acknowledgment) segment per data segment received; this is known as a "delayed ACK" .

推迟确认：即改变原来"对收到每一个数据包时，都会发送一个ACK报文给发送方"。将ACK确认推迟到下一个TCP segment到来，或者“推迟定时器”过期，或者可以将ACK确认捎带回去。

1）改为：根据实际情况来决定或者每收到两个TCP segment，发送一个ACK确认，

或者对一个TCP segment，发送一个ACK确认。

这样可以减少通信量，提高吞吐率。

2）微软的实现如下：

In Windows Transmission Control Protocol/Internet Protocol (TCP/IP), the stack takes a common approach to implementing delayed ACKs. As data is received by TCP on a connection, the stack only sends an acknowledgment back if one of the following conditions is met:

No ACK was sent for the previous segment received.
A segment is received, but no other segment arrives within 200 milliseconds for that connection.

In other words, an ACK is sent for every other TCP segment received on a connection, unless the delayed ACK timer expires after 200 milliseconds pass.

也就是说：在Windows上，“推迟确认”实现为：要么每两个TCP segment发送一个ACK确认，要么在ACK定时器过时是发送一个ACK确认（此时是一个TCP segment一个ACK确认）。TCP标准推荐最多推迟500ms，微软指定的推迟为200ms.

3）如果接受端的应用程序在接受到一个TCP segment之后，立即产生响应，也就是说要向发送方发送数据回应他，那么此时给TCP segment的ACK确认将被捎带在该数据段中发送回去。（此时是一个TCP segment一个ACK确认，但是没有增加网络上得TCP segment的数量.）

4）如果接受端的应用程序在数据到达之后，尽快读取数据，读取数据之后，TCP会移动自己的窗口，那么ACK确认会和更新的窗口通告一起捎带回去。（此时是一个TCP segment一个ACK确认，但是没有增加网络上得TCP segment的数量.）

delayed ack的好处：
a) to avoid the silly window syndrome;（在接受端避免“糊涂窗口综合症”）
b) to allow ACKs to piggyback on a reply frame if one is ready to go when the stack decides to do the ACK;
c) to allow the stack to send one ACK for several frames， if those frames arrive within
the delay period.（这里的several推荐为2.）

综上所述：“推迟确认”的最终目标是通过捎带技术或者多个segment共用一个ACK确认等技术来减少“网络上专门用于ACK确认的TCP segment的数量”（大概可以减少一半左右）。

3. Nagle算法（）

Nagle's algorithm, named after John Nagle, is a means of improving the efficiency of networks by reducing the number of packets that need to be sent over the network.

Nagle's document, Congestion Control in IP/TCP Internetworks (RFC 896) describes what he called the 'small packet problem', where an application repeatedly emits data in small chunks, frequently only 1 in size. Since packets have a 40 byte header (20 bytes for TCP, 20 bytes for ), this results in a 41 byte packet for 1 byte of useful information, a huge overhead. This situation often occurs in sessions, where most keypresses generate a single byte of data that is transmitted immediately. Worse, over slow links, many such packets can be in transit at the same time, potentially leading to .

Nagle's algorithm works by combining a number of small outgoing messages, and sending them all at once. Specifically, as long as there is a sent packet for which the sender has received no acknowledgment, the sender should keep buffering its output until it has a full packet's worth of output, so that output can be sent all at once.

也就是说：Nagle算法在发送端进行操作（而“推迟确认”是在接收端进行操作）。Nagle算法，在发送端为了避免发送很小的TCP segment，规定只有在下面两种情况下才会发送TCP segment:

1）发送端收到一个ACK确认；

2）发送端的数据累计达到了MSS（maximun segment size）；

Nagle算法在发送端避免了“糊涂窗口综合症”。

Nagle算法：

if there is new data to send
if the window size >= MSS and available data is >= MSS
send complete MSS segment now
else
if there is unconfirmed data still in the pipe
enqueue data in the buffer until an acknowledge is received
else
send data immediately
end if
end if
end if

This algorithm interacts badly with . With both algorithms enabled, applications that do two successive writes to a TCP connection, followed by a read that will not be fulfilled until after the data from the second write has reached the destination, experience a constant delay of up to 500 milliseconds, the " delay". For this reason, TCP implementations usually provide applications with an interface to disable the Nagle algorithm. This is typically called the TCP_NODELAY option.

If possible an application should avoid consecutive small writes in the first place, so that Nagle's algorithm will not be triggered. The application should keep from sending small single writes and buffer up application writes then send (or with the help of writev() call).

"The user-level solution is to avoid write-write-read sequences on sockets. write-read-write-read is fine. write-write-write is fine. But write-write-read is a killer. So, if you can, buffer up your little writes to TCP and send them all at once. Using the standard UNIX I/O package and flushing write before each read usually works."

The tinygram problem and are sometimes confused. The tinygram problem occurs when the window is almost empty. Silly window syndrome occurs when the window is almost full.

Negative Effect on Non Small Writes

The algorithm applies to data of any size. If the data in a single write spans 2n packets, the last packet will be withheld, waiting for the ACK for the previous packet. In any request-response application protocols where request data can be larger than a packet, this can artificially impose a few hundred milliseconds latency between the requester and the responder, even if the requester has properly buffered the request data. Nagle's algorithm must be disabled by the requester in this case. If the response data can be larger than a packet, the responder must also disable Nagle's algorithm so the requester can promptly receive the whole response.

In generally, since Nagle's algorithm is only a defense against careless applications, it will not benefit a carefully written application that takes proper care of buffering; the algorithm has either no effect, or negative effect on the application.

大概意思是：

1）Nagle算法在“请求-应答”类的交互程序中，最好不要使用。它可能会导致在“请求”和“应答”之间被Nagle

算法人为地插入几百微妙的延迟。

2）Nagle算法并不适应于那些被认真仔细编写的程序，因为它们一般都恰当地考虑了缓冲问题。

Interactions with real-time systems

Applications that expect real time responses can react poorly with Nagle's algorithm. Applications such as networked multiplayer video games expect that actions in the game are sent immediately, while the algorithm purposefully delays transmission, increasing at the expense of . For this reason applications with low-bandwidth time-sensitive transmissions typically use TCP_NODELAY to bypass the Nagle delay.

大概意思是：期待实时交互的程序最好不要使用Nagle算法。它们一般期望立即发送数据，然而Nagle算法会延迟发送数据，通过延迟来提高网络带宽。因此：占用带宽低和时间敏感的程序应该用TCP_NODELAY 来绕过Nagle算法导致的延迟。

[]References

Boosting Socket Performance on Linux - Slashdot

, Bruce S. Davie (2007). Computer Networks: A Systems Approach (4 ed.). Morgan Kaufmann. p. 402–403. .

[]External links

阅读(1273) | 评论(0) | 转发(0) |

上一篇：Ubuntu 11.10 的 software-center 打不开

下一篇：使用TCP/IP进行网际互连---第十二章TCP（1）

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6