现象:频繁点击页面后,web服务器就没有反应了,过一会,web服务器又恢复正常。
分析:
频繁点击页面后,web服务器就没有反应了,后来发现,在shell下输入netStackSysPoolShow,显示的664大小的网络堆栈为0。想看一下网络状态,输入inetstatShow,没有显示网络状态,不知道什么原因。
开始时:
-> netStackSysPoolShow
type number
--------- ------
FREE : 1138
DATA : 2
HEADER : 0
SOCKET : 0
PCB : 0
RTABLE : 0
HTABLE : 0
ATABLE : 0
SONAME : 0
ZOMBIE : 0
SOOPTS : 0
FTABLE : 0
RIGHTS : 0
IFADDR : 0
CONTROL : 0
OOBDATA : 0
IPMOPTS : 0
IPMADDR : 0
IFMADDR : 0
MRTABLE : 0
TAG : 0
TOTAL : 1140
number of mbufs: 1140
number of times failed to find space: 0
number of times waited for space: 0
number of times drained protocols for space: 0
__________________
CLUSTER POOL TABLE
_______________________________________________________________________________
size clusters free usage minsize maxsize avgsize
-------------------------------------------------------------------------------
20 250 235 32 16 20 16
44 200 182 34 28 36 29
96 100 67 49 48 76 55
172 150 139 21 124 164 149
292 100 80 123 176 256 216
664 50 46 4 384 584 534
1144 30 26 4 1144 1144 1144
-------------------------------------------------------------------------------
value = 80 = 0x50 = 'P'
频繁点击后:
-> netStackSysPoolShow
type number
--------- ------
FREE : 1138
DATA : 2
HEADER : 0
SOCKET : 0
PCB : 0
RTABLE : 0
HTABLE : 0
ATABLE : 0
SONAME : 0
ZOMBIE : 0
SOOPTS : 0
FTABLE : 0
RIGHTS : 0
IFADDR : 0
CONTROL : 0
OOBDATA : 0
IPMOPTS : 0
IPMADDR : 0
IFMADDR : 0
MRTABLE : 0
TAG : 0
TOTAL : 1140
number of mbufs: 1140
number of times failed to find space: 0
number of times waited for space: 0
number of times drained protocols for space: 0
__________________
CLUSTER POOL TABLE
_______________________________________________________________________________
size clusters free usage minsize maxsize avgsize
-------------------------------------------------------------------------------
20 250 234 197 16 20 16
44 200 181 35 28 36 29
96 100 66 50 48 76 55
172 150 138 118 124 164 138
292 100 34 210 176 256 232
664 50 0 86 384 584 615
1144 30 26 4 1144 1144 1144
-------------------------------------------------------------------------------
value = 80 = 0x50 = 'P'
过一段时间后:有恢复正常。
-> netStackSysPoolShow
type number
--------- ------
FREE : 1138
DATA : 2
HEADER : 0
SOCKET : 0
PCB : 0
RTABLE : 0
HTABLE : 0
ATABLE : 0
SONAME : 0
ZOMBIE : 0
SOOPTS : 0
FTABLE : 0
RIGHTS : 0
IFADDR : 0
CONTROL : 0
OOBDATA : 0
IPMOPTS : 0
IPMADDR : 0
IFMADDR : 0
MRTABLE : 0
TAG : 0
TOTAL : 1140
number of mbufs: 1140
number of times failed to find space: 0
number of times waited for space: 0
number of times drained protocols for space: 0
__________________
CLUSTER POOL TABLE
_______________________________________________________________________________
size clusters free usage minsize maxsize avgsize
-------------------------------------------------------------------------------
20 250 234 197 16 20 16
44 200 181 35 28 36 29
96 100 66 50 48 76 55
172 150 138 118 124 164 138
292 100 80 210 176 256 232
664 50 46 86 384 584 615
1144 30 26 4 1144 1144 1144
-------------------------------------------------------------------------------
value = 80 = 0x50 = 'P'
使用命令inetstatShow,什么都看不到:
-> inetstatShow
Active Internet connections (including servers)
PCB Proto Recv-Q Send-Q Local Address Foreign Address (state)
-------- ----- ------ ------ ------------------ ------------------ -------
tcpcb not found
value = -1 = 0xffffffff
->
由于在vxworks下,没法看到tcp链路状态,后来在linux下测试,发现,频繁点击页面,网络状态中会出现很多time_wait状态,这些time_wait过段时间就消息。time_wait消失的时间是1-4分钟,vxworks下大概是1分钟。
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:32770 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:32771 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN
tcp 0 0 192.168.12.254:80 192.168.12.2:2947 TIME_WAIT
tcp 0 0 192.168.12.254:80 192.168.12.2:2946 TIME_WAIT
tcp 0 0 192.168.12.254:80 192.168.12.2:2945 TIME_WAIT
tcp 0 0 192.168.12.254:80 192.168.12.2:2912 TIME_WAIT
tcp 0 0 192.168.12.254:80 192.168.12.2:2949 TIME_WAIT
tcp 0 0 192.168.12.254:80 192.168.12.2:2948 TIME_WAIT
tcp 0 0 192.168.12.254:80 192.168.12.2:2955 TIME_WAIT
tcp 0 0 192.168.12.254:80 192.168.12.2:2953 TIME_WAIT
tcp 0 0 192.168.12.254:80 192.168.12.2:2958 TIME_WAIT
tcp 0 0 192.168.12.254:80 192.168.12.2:2962 TIME_WAIT
tcp 0 0 192.168.12.254:80 192.168.12.2:2903 TIME_WAIT
tcp 0 0 192.168.12.254:80 192.168.12.2:2965 TIME_WAIT
tcp 0 0 192.168.12.254:80 192.168.12.2:2906 TIME_WAIT
tcp 0 0 192.168.12.254:80 192.168.12.2:2943 TIME_WAIT
tcp 0 0 192.168.12.254:80 192.168.12.2:2911 TIME_WAIT
tcp 0 0 192.168.12.254:80 192.168.12.2:2942 TIME_WAIT
tcp 0 256 192.168.12.254:22 192.168.12.2:1052 ESTABLISHED
[root@localhost root]#
后来看代码,发现,主要是goahead每收到一个请求后,accept会建立一个socket,向浏览器发送完数据后会close掉改socket,这样就会出现time_wait状态,需要2msl长时间才能释放,会占用资源,vxworks下的netstack资源为0,就是time_wait占用的结果,试图寻找解决的方法。使用setsockopt(),SO_LINGER, tLinger.l_onoff = 1; tLinger.l_linger = 0;发现在vxworks下不起作用,netStackSysPoolShow,发现点击页面过快还是为0, 但是,在linux明显看到time_wait减少。
说明SO_LINGER, tLinger.l_onoff = 1; tLinger.l_linger = 0;在linux下起了作用,但在vxworks下不起作用。
解决问题的方法:
1.扩大网络堆栈,延缓问题的发生。
2.修改系统的配置,缩短time_wait的时间。
3.检查是否是底层驱动的问题,之所以这样说,是因为,在网上有人提过这样的问题,但没有在我们设备发现这些打印信息。下面的文档就是网上的评论。
阅读(1750) | 评论(0) | 转发(2) |