全部博文(298)
分类: LINUX
2011-04-14 10:32:26
《Unix Network Programming volume 1》, I have done a little modification.
2.4 listen Function
The listen function is called only by a TCP server and it performs two actions:
#include |
#int listen (int sockfd, int backlog); |
Returns: 0 if OK, -1 on error |
This function is normally called after both the socket and bind functions and must be called before calling the accept function.
To understand the backlog argument, we must realize that for a given listening socket, the kernel maintains two queues:
Figure 2.7 depicts these two queues for a given listening socket.
Figure 2.7. The two queues maintained by TCP for a listening socket.
When an entry is created on the incomplete queue, the parameters from the listen socket are copied over to the newly created connection. The connection creation mechanism is completely automatic; the server process is not involved. Figure 2.8 depicts the packets exchanged during the connection establishment with these two queues.
Figure 2.8. TCP three-way handshake and the two queues for a listening socket.
When a SYN arrives from a client, TCP creates a new entry on the incomplete queue and then responds with the second segment of the three-way handshake: the server's SYN with an ACK of the client's SYN. This entry will remain on the incomplete queue until the third segment of the three-way handshake arrives (the client's ACK of the server's SYN), or until the entry times out. (Berkeley-derived implementations have a timeout of 75 seconds for these incomplete entries.) If the three-way handshake completes normally, the entry moves from the incomplete queue to the end of the completed queue. When the process calls accept, which we will describe in the next section, the first entry on the completed queue is returned to the process, or if the queue is empty, the process is put to sleep until an entry is placed onto the completed queue.
There are several points to consider regarding the handling of these two queues.
There has never been a formal definition of what the backlog means. The 4.2BSD man page says that it "defines the maximum length the queue of pending connections may grow to." Many man pages and even the POSIX specification copy this definition verbatim, but this definition does not say whether a pending connection is one in the SYN_RCVD state, one in the ESTABLISHED state that has not yet been accepted, or either. The historical definition in this bullet is the Berkeley implementation, dating back to 4.2BSD, and copied by many others.
The reason for adding this fudge factor appears lost to history [Joy 1994]. But if we consider the backlog as specifying the maximum number of completed connections that the kernel will queue for a socket ([Borman 1997b], as discussed shortly), then the reason for the fudge factor is to take into account incomplete connections on the queue.
Many current systems allow the administrator to modify the maximum value for the backlog.
We can provide a simple solution to this problem by modifying our wrapper function for the listen function. Figure 2.9 shows the actual code. We allow the environment variable LISTENQ to override the value specified by the caller.
Figure 2.9 Wrapper function for listen that allows an environment variable to specify backlog.
lib/wrapsock.c
137 void
138 Listen (int fd, int backlog)
139 {
140 char *ptr;
141 /* can override 2nd argument with environment variable */
142 if ( (ptr = getenv("LISTENQ")) != NULL)
143 backlog = atoi (ptr);
144 if (listen (fd, backlog) < 0)
145 err_sys ("listen error");
146 }
Some implementations do send an RST when the queue is full. This behavior is incorrect for the reasons stated above, and unless your client specifically needs to interact with such a server, it's best to ignore this possibility. Coding to handle this case reduces the robustness of the client and puts more load on the network in the normal RST case, where the port really has no server listening on it.
Figure 2.10 shows the actual number of queued connections provided for different values of the backlog argument for the various operating systems. For seven different operating systems there are five distinct columns, showing the variety of interpretations about what backlog means!
Figure 2.10. Actual number of queued connections for values of backlog.
AIX and MacOS have the traditional Berkeley algorithm, and Solaris seems very close to that algorithm as well. FreeBSD just adds one to backlog.
As we said, historically the backlog has specified the maximum value for the sum of both queues. During 1996, a new type of attack was launched on the Internet called SYN flooding [CERT 1996b]. The hacker writes a program to send SYNs at a high rate to the victim, filling the incomplete connection queue for one or more TCP ports. (We use the term hacker to mean the attacker, as described in [Cheswick, Bellovin, and Rubin 2003].) Additionally, the source IP address of each SYN is set to a random number (this is called IP spoofing) so that the server's SYN/ACK goes nowhere. This also prevents the server from knowing the real IP address of the hacker. By filling the incomplete queue with bogus SYNs, legitimate SYNs are not queued, providing a denial of service to legitimate clients. There are two commonly used methods of handling these attacks, summarized in [Borman 1997b]. But what is most interesting in this note is revisiting what the listen backlog really means. It should specify the maximum number of completed connections for a given socket that the kernel will queue. The purpose of having a limit on these completed connections is to stop the kernel from accepting new connection requests for a given socket when the application is not accepting them (for whatever reason). If a system implements this interpretation, as does BSD/OS 3.0, then the application need not specify huge backlog values just because the server handles lots of client requests (e.g., a busy Web server) or to provide protection against SYN flooding. The kernel handles lots of incomplete connections, regardless of whether they are legitimate or from a hacker. But even with this interpretation, scenarios do occur where the traditional value of 5 is inadequate.