1 配置:普通lvs配置复杂,日志信息少,涉及的软件多,出错可能大和出现故障排查麻烦,跟新停滞,官方文档有错误。
到了现今第3方的配置方法出现解决了以上大部分问题.
于2011年4月将采用keepalived的方法,一次性成功配置lvs分发web服务的情况总结如下:
a. 拓扑:
b. ip规划:
LVS-DR-Master 222.211.74.44
LVS-DR-BACKUP 222.211.74.43
LVS-DR-VIP 222.211.74.42
WEB1-Realserver 222.211.74.36
WEB2-Realserver 222.211.74.37
WEB3-Realserver 222.211.74.38
WEB4-Realserver 222.211.74.39
WEB4-Realserver 222.211.74.40
GateWay 222.211.74.1
c. 安装和配置网上多就不写了,只是注意现在只有1.1.21之前的版本编译能顺利通过
dr方式的主配置文件是:
[root@LVS-LB1 ~]# cat /etc/keepalived/keepalived.conf
#-----------------------------------------------------------------
#Configuration File for keepalived
global_defs {
router_id mllvs1 #注意:这个不是唯一是标识,主辅可以一样
}
#-----------------------------------------------------------------
vrrp_instance VI_1 {
state MASTER #注意:这是唯一,辅助服务器改成: BACKUP
priority 200 #注意:这是唯一,辅助服务器改成不同的数值:100
#state BACKUP
#priority 100
#interface for inside_network, bound by vrrp
interface bond0 #注意:指定vrrp的网卡
#interface eth2
#Ignore VRRP interface faults (default unset)
#dont_track_primary
#Binding interface for lvs syncd
lvs_sync_daemon_interface bond0 #注意:指定keepalived的同步网卡
#lvs_sync_daemon_interface eth2
#lvs_sync_daemon_interface eth2
virtual_router_id 20 #注意:这是唯一,主辅都一样
advert_int 1
authentication {
auth_type PASS
auth_pass 7890 #注意:这2行是唯一,主辅都一样
}
virtual_ipaddress {
#Web
222.211.74.44/27 brd 222.211.74.255 dev bond0 label bond0:1
#注意:不是必需,设置后服务起来后有虚网卡出现便于检查。
}
} # End vrrp_instance VI_1
#-----------------------------------------------------------------
# Web APP Server
virtual_server 222.211.74.42 80 {
delay_loop 6 #注意:每隔10秒查询realserver状态
lb_algo wrr #注意:指定lvs的负载算法
lb_kind DR #注意:指定lvs的负载方式
persistence_timeout 120 #注意:同一IP的连接60秒内被分配到同一台realserver
protocol TCP #注意:用TCP协议检查realserver状态
real_server 222.211.74.36 80 {
weight 2 #注意:权重为2
TCP_CHECK {
connect_timeout 10 #注意:10秒无响应超时
nb_get_retry 3
delay_before_retry 3
connect_port 80
}
} #end real_server 1
real_server 222.211.74.37 80 {
weight 3
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
delay_before_retry 3
connect_port 80
}
} #end real_server 2
real_server 222.211.74.38 80 {
weight 3
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
delay_before_retry 3
connect_port 80
}
} #end real_server 3
real_server 222.211.74.39 80 {
weight 2
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
delay_before_retry 3
connect_port 80
}
} #end real_server 4
real_server 222.211.74.40 80 {
weight 2
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
delay_before_retry 3
connect_port 80
}
} #end real_server 5
} # End virtual_server 222.211.74.44 80
#-----------------------------------------------------------------
d. 确认服务:
启动服务/etc/init.d/keepalived start后,在log里面有:
cat /var/log/messages|grep -iE "Keepalived"
Jun 7 10:57:05 bnet Keepalived: Starting VRRP child process, pid=10524
Jun 7 10:57:05 bnet Keepalived_healthcheckers: Netlink reflector reports IP 222.211.74.44 added
Jun 7 10:57:05 bnet Keepalived_healthcheckers: Netlink reflector reports IP 10.0.4.44 added
Jun 7 10:57:05 bnet Keepalived_healthcheckers: Registering Kernel netlink reflector
Jun 7 10:57:05 bnet Keepalived_healthcheckers: Registering Kernel netlink command channel
Jun 7 10:57:05 bnet Keepalived_vrrp: Opening file '/etc/keepalived/keepalived.conf'.
Jun 7 10:57:05 bnet Keepalived_healthcheckers: Opening file '/etc/keepalived/keepalived.conf'.
Jun 7 10:57:05 bnet Keepalived_vrrp: Configuration is using : 35660 Bytes
Jun 7 10:57:05 bnet Keepalived_healthcheckers: Configuration is using : 17829 Bytes
Jun 7 10:57:05 bnet Keepalived_vrrp: Using LinkWatch kernel netlink reflector...
Jun 7 10:57:05 bnet Keepalived_healthcheckers: Using LinkWatch kernel netlink reflector...
Jun 7 10:57:06 bnet Keepalived_healthcheckers: Activating healtchecker for service [222.211.74.36:80]
Jun 7 10:57:06 bnet Keepalived_vrrp: VRRP sockpool: [ifindex(2), proto(112), fd(10,11)]
Jun 7 10:57:06 bnet Keepalived_healthcheckers: Activating healtchecker for service [222.211.74.37:80]
Jun 7 10:57:06 bnet Keepalived_healthcheckers: Activating healtchecker for service [222.211.74.38:80]
Jun 7 10:57:06 bnet Keepalived_healthcheckers: Activating healtchecker for service [222.211.74.39:80]
Jun 7 10:57:06 bnet Keepalived_healthcheckers: Activating healtchecker for service [222.211.74.40:80]
Jun 7 10:57:07 bnet Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Jun 7 10:57:07 bnet Keepalived_healthcheckers: TCP connection to [222.211.74.40:80] failed !!!
Jun 7 10:57:07 bnet Keepalived_healthcheckers: Removing service [222.211.74.40:80] from VS [222.211.74.42:80]
Jun 7 10:57:08 bnet Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE
Jun 7 10:57:08 bnet Keepalived_vrrp: VRRP_Instance(VI_1) setting protocol VIPs.
Jun 7 10:57:08 bnet Keepalived_healthcheckers: Netlink reflector reports IP 222.211.74.42 added
Jun 7 10:57:08 bnet Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 222.211.74.42
Jun 7 10:57:08 bnet Keepalived_vrrp: Netlink reflector reports IP 222.211.74.42 added
Jun 7 10:57:13 bnet Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 222.211.74.42
d1在主服务器上应该有以下服务进程存在:
root 10522 1 0 10:57 ? 00:00:00 keepalived -D
root 10523 10522 0 10:57 ? 00:00:00 keepalived -D
root 10524 10522 0 10:57 ? 00:00:00 keepalived -D
root 10528 1 0 10:57 ? 00:00:00 [ipvs_syncmaster]
d2在辅服务器上应该有以下服务进程存在:
root 4649 1 0 11:11 ? 00:00:00 keepalived -D
root 4650 4649 0 11:11 ? 00:00:00 keepalived -D
root 4651 4649 0 11:11 ? 00:00:00 keepalived -D
root 4662 1 0 11:11 ? 00:00:00 [ipvs_syncbackup]
d3在主服务器上观察能看见lvs的分发比重是2:3:3:2,分发的连接数分别是200 600 580 180大致符合分发比重:
# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 222.211.74.42:80 wrr persistent 120
-> 222.211.74.39:80 Route 2 200 190
-> 222.211.74.38:80 Route 3 600 610
-> 222.211.74.37:80 Route 3 580 570
-> 222.211.74.36:80 Route 2 180 160
d4在副服务器上能看见lvs的分发比重是2:3:3:2,分发的连接数由于没有提供服务所有的分发数为0:
d5 在主服务器上用ifconfig观察能看见服务对应的虚网卡 eth0:1,而在副服务器上由于没有提供服务而没有对应的虚网卡出现
eth0:1 Link encap:Ethernet HWaddr 00:14:5E:32:7F:98
inet addr:222.211.74.42 Bcast:222.211.74.255 Mask:255.255.255.224
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:169 Memory:dcff0000-dd000000
d6 lvs日志的观察:tail -f /var/log/messages
e 注意:由于防火墙ipables和lvs在linux的内核中所在的位置相近,其官方也在文档中提出这2者之间可能会互相影响建议1台机器只用其一,
所以我的环境中直接不启用iptables,其他基于内核的防火墙建议也这样,不启用,lvs的主机的安全由以下决定:
1 系统关闭除了ssh和lvs以外的所有的有服务端口的服务
2 ssh启用管理ip限制
f realserver上要启用arp本地包不回应所以要修改内核参数:
echo "1" >/proc/sys/net/ipv4/conf/lo/arp_ignore
echo "2" >/proc/sys/net/ipv4/conf/lo/arp_announce
echo "1" >/proc/sys/net/ipv4/conf/all/arp_ignore
echo "2" >/proc/sys/net/ipv4/conf/all/arp_announce
lvs上不需要修改net.ipv4.ip_forward
2 测试
a 主副切换测试:
vip是222.211.74.42,在222.211.74.x网外一直ping 222.211.74.42,这时在主上停止服务:/etc/init.d/keepalived stop
这时能观察到ping会中断约6s后恢复,而且主服务器上的虚网卡消失,而对应的副服务器上出现虚网卡。
b ab压力测试:
前提:在dns服务器(或者测试发起服务器上)上做好解析到222.211.74.42
ab -n 1000 -c 200 http:/// #用apache的ab发1000个总链接并发数是200的并发到
这时在主服务器上能看见lvs的分发情况:ipvsadm -ln;业务在线情况:ipvsadm -lnc
3 常见问题及其处理:
不能同步 原因:1是配置文件不一致, 2是lvs的内核模块软故障,处理:停止服务后rmmod后再启用服务即可
不能切换 原因同上2个,还有就是主副网络不通,上层网络做了arp限制(如arp绑定等)
不能转发 配置文件错误
realserver宕掉后还分发: 配置文件错误 网络故障 lvs的内核模块软故障
转发比重问题 监控中发现:1测试并发只能打在一台机器上,处理:取消行 #persistence_timeout 120即可
2正常分发情况下突然一个realserver的链接数突破权重的设置一直上升到平常多很多的状况
这时一般是针对web的并发测试发生或者攻击,这个分析web的log可以找到原因。
疑问:3.1 keepalive或者lvs用那种手段探测realserver服务器的状态以支撑其调度算法?本质上lvs只是个调度器,没有在真实服务器上做数据采集与分析,
猜测lvs只是以其算法来分发负载而不管负载的真实情况,即使keepalived有对真实服务器的健康检查而不是负载能力的检查。
3.2 和http: //在算法的描述上不一致,互相矛盾的:例如关于wlc部分。
4 调优: net.ipv4.tcp_tw_recyle=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_max_syn_backlog=8192
net.ipv4.tcp_keepalive_time=1800
net.ipv4.tcp_fin_timeout=30
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216
net.core.netdev_max_backlog=3000
5 下步再次利用建立lvs群集论坛
6 参考文档:
基础:
node/95
http://
调优:http://machael.blog.51cto.com/829462/211587
阅读(3149) | 评论(0) | 转发(0) |