我发现我的Clinet端程序连服务器某一个端口服务的并发进程无法超过1024个,请问各位高手有什么办法能突破这个系统的限制??
This is the infamous "TIME_WAIT" state. It is demanded by the TCP specification in order to be "sure" that stray data from an old defunct connection on the same ports does not find its way onto the new connection. The specification RFC 793 "Transmission Control Protocol" states that this should be four minutes.
The problem is that the four minute wait is far too conservative for high performance transaction-oriented server implementations. Suppose we have a server that is in steady state receiving and processing 200 requests per second. The current implementation of enhydra director opens a new connection for each transaction to the back-end server, much like Apache JServ. This means that 200 connections per second on average are going into the "TIME_WAIT" state when they finish. After four minutes, older TIME_WAIT connections begin to disappear as expected. This means that there will, on average, be about four minutes' worth of TIME_WAIT connections at any given time. Doing the math, we get 200 * 4 * 60 = 48000 connections!
The problem here is that the maximum number of connections that any one server can have in any state is 65535. This is because the TCP specification (RFC 793) specifies that the "port" which defines the local endpoint for the connection is only a 16 bit unsigned integer, and the largest such number is 65535. Worse, ports 0-1024 are usually reserved for well-known services and are not available as dynamically assigned (or "ephemeral") ports. In the best of all cases, we have 64511 ports available. With a 4 minute TIME_WAIT, we can compute 64511 ports / ( 4 minutes * 60 seconds per minute) = 268 conn. per sec. This means that no matter how fast your machine, connections will start failing at 268 connections per second average load. Actually the failures will usually starte a little before that threshold is reached. The situation here is similar to having a brand new Ferrari on a crowded freeway at rush hour. You might be able to do 180 MPH, but you're still going no faster than 20.
The current (quick) solution
The current solution to this problem is to reduce the amount of time old connections spend in TIME_WAIT. Some purists will sound off that doing so is a violation of RFC 793. Technically this is true, but I argue that on modern networks, this violation is a very venial sin, compared to the performance improvements it gives. Here's why.
The purpose of TIME_WAIT is to prevent lingering stale duplicates from old connections from finding their way into an existing connection, causing data corruption. Back in the days of 9600 baud X.25 networks, where a packet could very well circulate around for many tens of seconds among remote routers, four minutes was a good conservative number. The Web had not been invented yet, and if it had, no one would have accused a PDP/11 of being capable of sustaining 200 or more transactions per second anyhow.
How to fix TIME_WAIT on Linux
On RedHat 6.1, run the following command:
/sbin/sysctl net.ipv4.ip_local_port_range
If you haven't already changed this value, it is 1024 to 4999. This is the range of ephemeral ports available by default, and as you can see it is too low for a high performance server. Use the command
/sbin/sysctl -w net.ipv4.ip_local_port_range="1024 65535"
to change the range to 1024-65535. By default the TIME_WAIT interval on Redhat Linux 6.1 is one minute. This will allow up to 1072 connections per second and 500 per second without exceeding 30000 TIME_WAITS. If you need more, you have to change a kernel header file and recompile your Linux kernel. Assuming /usr/src/linux is a symlink to your current kernel, the file to change is:
/usr/src/linux/include/net/tcp.h
Change the line:
#define TCP_TIMEWAIT_LEN (60*HZ)
to:
#define TCP_TIMEWAIT_LEN ( * HZ)
where:
TIMEWAITSECS is the number of seconds for connections to remain
int the TIME_WAIT state.
Example to change the TIME_WAIT period to 15 seconds:
#define TCP_TIMEWAIT_LEN (15 * HZ)
Once you have saved 'tcp.h', do a COMPLETE rebuild and install of the Linux kernel following the usual Linux kernel rebuild procedure.
How to fix TIME_WAIT on Solaris
On Solaris, the TIME_WAIT interval is a tuneable parameter and can be changed using 'ndd'. The value is specified in milliseconds so to specify a value of 30 seconds, use 30000 milliseconds. The command is:
Solaris 2.6 and before:
ndd -set /dev/tcp tcp_close_wait_interval 30000
Solaris 2.7 and later:
ndd -set /dev/tcp tcp_time_wait_interval 30000
The change takes effect immediately on all new connections. Old connections will still wait for the old interval until they expire. You should put the above command into a system startup file so that it is run each time the system is rebooted.
阅读(2899) | 评论(0) | 转发(0) |