We are currently suffering a performance botleneck to our squid cluster.
We have 4M Internet line, used for HTTP streams to support 230+ users. As we can't control user's actions effectly, some guy may watch TV show and other bandwidth occupied services. So we put a squid cluster in the front of the Internet, to worked as a proxy server which will restrict the accesses of online video & audio services to save our bandwidth...
The proxy cluster works at the begining, but it's more and more slowly these days.
I have to try to find the root cause and improve the bad situation.
Our squid proxy cluster works around as following:
IE (auto detect proxy settings by WPAD protocol)
||
WPAD SERVER (http server which hosts a java script:wpad.js, this script is used to identify which connections will connect to the intranet directly--don't need go out to internet, or which will forward to the proxy server)
||
If the connections should go out to internet, then the requesting will direct to proxy cluster. Exactly, the IPVS powered load balancer(virtual ip: 10.0.0.50).
This balacer will forward the request to one of the realservers.
||
We have two real servers(just PC, not PC server), both has 1G memory, P4 1G CPU, and some disks. The squid software will handle the requests. Actually, it also need to DNS the request.
||
DNS server is installed in the real server too, to do a name cache. It's supposed to speed things up.
||
The Router or gateway to Internet.
||
Dest Server( eg)
||
Real server of squid
||
IE (Pls note, our cluster is implemented by DR, no need to forward back to loadbalancer from squid real server)
That's all about our roadmap to get a HTTP connection.
The problem is we are mostly waiting more than 20 seconds before we get the http page. That's not acceptable, and our users complain many times for this...
I feel shy for this situation too.
So I started the troubleshooting for the performance problem.
1.CPU?
It's not true, as the real server's cpu are all 95% idle. It can't be CPU's problem.
2.Mem?
Maybe, squid is a software will store and index many data. Most of the memory will used a buffer of filesystem.
And that's true. We have 1G memory, and the process of squid doesn't use much memory, but it's only 18M memory free.
Sign, if that's the root cause, then I can do nothing for it. I don't have the privilege to add more memory to the servers.
3.Network?
That maybe the problem, I guess. So I just think about to get more than 4M internet bandwidth, but it couldn't be implemented soon.
And later, I found that I am wrong. We deployed snmp to the squid servers, and finally found the network is not full occupied, the load is just 40% - 60%.
4. Disk IO?
Our hard disks are just IDE hard disk with 7200 RPM. And each host only have one disk. So IOPS may be the bottleneck, and surely it is; but I don't think it's the root cause too.
I monitored the IOPS, the capacity is about 70 per second, and currently the avearage is not greater than 50. And also IOWAIT of CPU is not bigger than 2%.
5. SQUID itself?
Then I tried to tune squid itself.
Tune the parameters of cache_dir & cache_mem; Tune the mount options () of cache dir;
6.........?What could it be???
I really don't have any ideas, any clues...
I searched lots articles about squid performance tuning...
I got squidclient to finalize the statistics, and then I got it: DNS lookup time.
It spent more than 12 seconds to get a domainname resolved.
GOOOOOOOOOOOOOOOOOOOOOOOOOD.
I started the /etc/init.d/nscd in all the servers, and shutdown all the name server in real server to use official dns (my company scope) instead.
Balaba, balabala...
Things improved finally, but I think performance is need improved all the time.
Will monitor and diagnose the bottleneck further to make it faster and faster.
Share with you all later
That's all.
Things is improved.
阅读(676) | 评论(0) | 转发(0) |