在众多的负载均衡集群解决方案中,有基于硬件的负载均衡设备如F5,Big-IP等,也有基于软件的负载均衡产品如HAproxy,LVS,Nginx等。在软件的负载均衡产品中,又分为两种实现方式:基于操作系统的软负载实现(LVS)和基于第三方应用的软负载实现(HAproxy)。
1.四层和七层负载均衡的区别
1.1 四层负载均衡器
所谓的四层就是ISO参考模型中的第四层,四层负载均衡器也称为四层交换机。它主要是通过分析TCP层以及TCP/IP层的流量实现基于 IP+端口 的负载均衡;
以常见的TCP应用为例,负载均衡器在接收到第一个来自客户端的SYN请求时,会通过设定的负载均衡算法选择一台最佳的后台服务器,同时将报文中目标IP地址修改为后端服务器IP地址,然后直接转发给该后端服务器,这样一个负载均衡请求就完成了。
从过程来看,一个TCP连接是客户端和服务器直接建立的,而负载均衡器只不过完成了一个类似路由器的转发动作。在某些负载均衡策略中,为保证后端服务器返回的报文可以正确传递给负载均衡器,在转发报文的同时可能还会对报文原来的源地址进行修改。
过程如图:
1.2 七层负载均衡器
同理,七层负载均衡器也称为七层交换机,位于ISO的最高层(应用层),此时负载均衡器支持多种应用协议,常见的有HTTP,FTP,SMTP等。七层负载均衡器可以根据报文内容,再配合负载均衡算法来选择后端服务器,因此也称为“内容交换器”。
比如,对于web服务器的负载均衡,七层负载均衡器不仅可以根据IP+端口的方式进行负载分流,还乐意根据网站的URL,访问域名,浏览器类别,语言等决定负载均衡的策略。比如,有两台web服务器分别对应中英文两个网站,域名分别为A B,要实现访问A域名时进入中文网站,访问B域名时进入英文网站,这在四层负载均衡器中几乎是无法实现的,而七层负载均衡器可以根据客户端访问域名的不同选择对应的网页进行负载均衡处理;
还以TCP应用为例,由于负载均衡器要获取报文内容,因此只能先代替后端服务器和客户端建立连接,接着,才能收到客户端发过来的报文内容,然后再根据该报文中特定字段加上负载均衡器设置的算法来决定最终选择的内部服务器,整个过程,七层负载均衡器类似于一个代理服务器。
过程如图:
对比整个过程。可以看出,七层负载均衡模式下,负载均衡器与客户端及后端服务器分别建立一次TCP连接,而四层负载均衡模式下,仅建立一次TCP连接。由此可知,七层负载均衡对负载均衡设备的要求更高,而七层负载均衡的处理能力也低于四层模式的负载均衡;
2. HAproxy与LVS的异同
(1)LVS是基于Linux操作系统实现的一种软负载均衡,而HAproxy是基于第三方应用实现的一种软负载均衡;
(2)LVS是基于四层的IP负载均衡技术,而HAproxy是基于四层和七层技术,可提供TCP和HTTP应用的负载均衡综合解决方案;
(3)LVS工作在ISO模型的第四层,因此其状态检测功能单一,而HAproxy有强大的状态检测功能,可以支持端口,URL,脚本等多种状态检测方式;
(4)虽然HAproxy功能强大,但是它的整体处理性能低于LVS,而LVS拥有接近硬件设备的网络吞吐和连接负载能力;
以上内容摘自:高性能Linux服务器构建实战(高俊峰著)
详细的比较可以看一下这篇文章(仅供参考):
http://blog.csdn.net/gzh0222/article/details/8540604
3.haproxy简介
HAproxy是一个开源的,高性能的,基于TCP(第四层)和HTTP(第七层)应用的负载均衡软件,支持虚拟主机,它是免费、快速并且可靠的一种解决方案。HAProxy特别适用于那些负载特大的通常又需要会话保持或七层处理的web站点(如门户网站或电商网站等)。HAProxy运行在时下的硬件上,完全可以支持数以万计的并发连接。并且它的运行模式使得它可以很简单安全的整合进你当前的架构中,同时可以保护web服务器不被暴露到网络上。
HAProxy实现了一种事件驱动,单一进程模型,此模型支持非常大的并发连接数。多进程或多线程模型受内存限制,系统调度器限制以及无处不在的锁限制,很少能处理数千并发连接。事件驱动模型因为在有更好的资源和时间管理的用户端(User-Space)实现所有这些任务,所以没有这些问题。此模型的弊端是,在多核系统上,这些程序通常扩展性较差。这就是为什么他们必须进行优化以使每个CPU时间片(Cycle)做更多的工作。
4.HAproxy目前支持的版本
We always support at least two active versions in parallel and an extra old one in critical fixes mode only. The currently supported versions are :
1.5版本: the most featureful version, supports SSL, IPv6, keep-alive, DDoS protection, etc...
1.4版本: the most stable version for people who don't need SSL. Still provides client-side keep-alive
1.3版本: the old stable version for companies who cannot upgrade for internal policy reasons.
5.各版本主要特征
The most differenciating features of each version are listed below:
1.5版本:
released in 2014 This version further expands 1.4 with 4 years of hard work : native SSL support on both sides with SNI/NPN/ALPN and OCSP stapling, IPv6 and UNIX sockets are supported everywhere, full HTTP keep-alive for better support of NTLM and improved efficiency in static farms, HTTP/1.1 compression (deflate, gzip) to save bandwidth, PROXY protocol versions 1 and 2 on both sides, data sampling on everything in request or response, including payload, ACLs can use any matching method with any input sample maps and dynamic ACLs updatable from the CLI stick-tables support counters to track activity on any input sample custom format for logs, unique-id, header rewriting, and redirects, improved health checks (SSL, scripted TCP, check agent, ...), much more scalable configuration supports hundreds of thousands of backends and certificates without sweating;
1.4版本:
released in 2010 This version has brought its share of new features over 1.3, most of which were long awaited : client-side keep-alive to reduce the time to load heavy pages for clients over the net, TCP speedups to help the TCP stack save a few packets per connection, response buffering for an even lower number of concurrent connections on the servers, RDP protocol support with server stickiness and user filtering, source-based stickiness to attach a source address to a server, a much better stats interface reporting tons of useful information, more verbose health checks reporting precise statuses and responses in stats and logs, traffic-based health to fast-fail a server above a certain error threshold, support for HTTP authentication for any request including stats, with support for password encryption, server management from the CLI to enable/disable and change a server's weight without restarting haproxy, ACL-based persistence to maintain or disable persistence based on ACLs, regardless of the server's state, log analyzer to generate fast reports from logs parsed at 1 Gbyte/s;
1.3版本:
released in 2006 This version has brought a lot of new features and improvements over 1.2, among which content switching to select a server pool based on any request criteria, ACL to write content switching rules, wider choice of load-balancing algorithms for better integration, content inspection allowing to block unexpected protocols, transparent proxy under Linux, which allows to directly connect to the server using the client's IP address, kernel TCP splicing to forward data between the two sides without copy in order to reach multi-gigabit data rates, layered design separating sockets, TCP and HTTP processing for more robust and faster processing and easier evolutions, fast and fair scheduler allowing better QoS by assigning priorities to some tasks, session rate limiting for colocated environments, etc...
6.支持的平台/OS
Linux 2.4 on x86, x86_64, Alpha, Sparc, MIPS, PARISC
Linux 2.6 / 3.x on x86, x86_64, ARM, Sparc, PPC64
Solaris 8/9 on UltraSPARC 2 and 3
Solaris 10 on Opteron and UltraSPARC
FreeBSD 4.10 - 10 on x86
OpenBSD 3.1 to -current on i386, amd64, macppc, alpha, sparc64 and VAX (check the ports)
AIX 5.1 - 5.3 on Power? architecture
7.性能
HAproxy借助于OS上几种常见的技术来实现性能的最大化。
单进程、事件驱动模型显著降低了上下文切换的开销及内存占用。
O(1)事件检查器(event checker)允许其在高并发连接中对任何连接的任何事件实现即时探测。
在任何可用的情况下,单缓冲(singlebuffering)机制能以不复制任何数据的方式完成读写操作,
这会节约大量的CPU时钟周期及内存带宽;
借助于Linux 2.6(>= 2.6.27.19)上的splice()系统调用,HAProxy可以实现零复制转发(Zero-copy
forwarding),在Linux 3.5及以上的OS中还可以实现零复制启动(zero-starting);
MRU内存分配器在固定大小的内存池中可实现即时内存分配,这能够显著减少创建一个会话的时长;
树型存储:侧重于使用作者多年前开发的弹性二叉树,实现了以O(log(N))的低开销来保持计时器命令
保持运行队列命令及管理轮询及最少连接队列;
优化的HTTP首部分析:优化的首部分析功能避免了在HTTP首部分析过程中重读任何内存区域;
精心地降低了昂贵的系统调用,大部分工作都在用户空间完成,如时间读取、缓冲聚合及文件描述符
的启用和禁用等;
所有的这些细微之处的优化实现了在中等规模负载之上依然有着相当低的CPU负载,甚至于在非常高的负载场景中,5%的用户空间占用率和95%的系统空间占用率也是非常普遍的现象,这意味着HAProxy进程消耗比系统空间消耗低20倍以上。
因此,对OS进行性能调优是非常重要的。即使用户空间的占用率提高一倍,其CPU占用率也仅为10%,这也解释了为何7层处理对性能影响有限这一现象。由此,在高端系统上HAProxy的7层性能可轻易超过硬件负载均衡设备。
在生产环境中,在7层处理上使用HAProxy作为昂贵的高端硬件负载均衡设备故障故障时的紧急解决方案也时长可见。硬件负载均衡设备在“报文”级别处理请求,这在支持跨报文请求(request across multiple packets)有着较高的难度,并且它们不缓冲任何数据,因此有着较长的响应时间。对应地,软件负载均衡设备使用TCP缓冲,可建立极长的请求,且有着较大的响应时间。
8.评估负载均衡器的性能的三个因素:
1)会话率the session rate
This factor is very important, because it directly determines when the load balancer will not be able to distribute all the requests it receives. It is mostly dependant on the CPU. Sometimes, you will hear about requests/s or hits/s, and they are the same as sessions/s in HTTP/1.0 or HTTP/1.1 with keep-alive disabled. Requests/s with keep-alive enabled is generally much higher (since it significantly reduces system-side work) but is often meaningless for internet-facing deployments since clients often open a large amount of connections and do not send many requests per connection on avertage. This factor is measured with varying object sizes, the fastest results generally coming from empty objects (eg: HTTP 302, 304 or 404 response codes). Session rates around 100,000 sessions/s can be achieved on Xeon E5 systems in 2014.
2)会话并发能力the session concurrency
This factor is tied to the previous one. Generally, the session rate will drop when the number of concurrent sessions increases (except with the epoll or kqueue polling mechanisms). The slower the servers, the higher the number of concurrent sessions for a same session rate. If a load balancer receives 10000 sessions per second and the servers respond in 100 ms, then the load balancer will have 1000 concurrent sessions. This number is limited by the amount of memory and the amount of file-descriptors the system can handle. With 16 kB buffers, HAProxy will need about 34 kB per session, which results in around 30000 sessions per GB of RAM. In practise, socket buffers in the system also need some memory and 20000 sessions per GB of RAM is more reasonable. Layer 4 load balancers generally announce millions of simultaneous sessions because they need to deal with the TIME_WAIT sockets that the system handles for free in a proxy. Also they don't process any data so they don't need any buffer. Moreover, they are sometimes designed to be used in Direct Server Return mode, in which the load balancer only sees forward traffic, and which forces it to keep the sessions for a long time after their end to avoid cutting sessions before they are closed.
3)数据率the data forwarding rate
This factor generally is at the opposite of the session rate. It is measured in Megabytes/s (MB/s), or sometimes in Gigabits/s (Gbps). Highest data rates are achieved with large objects to minimise the overhead caused by session setup and teardown. Large objects generally increase session concurrency, and high session concurrency with high data rate requires large amounts of memory to support large windows. High data rates burn a lot of CPU and bus cycles on software load balancers because the data has to be copied from the input interface to memory and then back to the output device. Hardware load balancers tend to directly switch packets from input port to output port for higher data rate, but cannot process them and sometimes fail to touch a header or a cookie. Haproxy on a typical Xeon E5 of 2014 can forward data up to about 40 Gbps. A fanless 1.6 GHz Atom CPU is slightly above 1 Gbps.
详细信息请到官方站点查看!!!
官方主页: