全部博文(51)
分类: 系统运维
2017-11-09 17:40:19
lvs 漂移失败。
产生原因: 和服务器连接的交换机有一台出现问题导致 pv lvs 服务做了一次服务器切换。
故障现象:14:30:37 ~ 14:30:39 backup服务器做了一次从到主,再到从的切换。 master服务器这端没有任何的反应,说明自己认为自己还是master服务器。这个过程到lvs vip 地址20分钟没有接到流量。
故障日志:
slave 机器日志:
Feb 18 14:30:37 zw_126_102 Keepalived_vrrp: VRRP_Instance(PV_LVS_003) Transition to MASTER STATE
Feb 18 14:30:37 zw_126_102 Keepalived_vrrp: VRRP_Group(VG003) Syncing instances to MASTER state
Feb 18 14:30:37 zw_126_102 Keepalived_vrrp: Remote SMTP server [192.168.95.47:25] connected.
Feb 18 14:30:38 zw_126_102 Keepalived_vrrp: SMTP alert successfully sent.
Feb 18 14:30:38 zw_126_102 Keepalived_vrrp: VRRP_Instance(PV_LVS_003) Entering MASTER STATE
Feb 18 14:30:38 zw_126_102 Keepalived_vrrp: VRRP_Instance(PV_LVS_003) setting protocol VIPs.
Feb 18 14:30:38 zw_126_102 Keepalived_vrrp: VRRP_Instance(PV_LVS_003) setting protocol Virtual Routes
Feb 18 14:30:38 zw_126_102 Keepalived_vrrp: VRRP_Instance(PV_LVS_003) Sending gratuitous ARPs on eth1 for 220.181.11.98
Feb 18 14:30:38 zw_126_102 Keepalived_vrrp: VRRP_Instance(PV_LVS_003) Sending gratuitous ARPs on eth0 for 10.10.127.152
Feb 18 14:30:38 zw_126_102 Keepalived_vrrp: Netlink reflector reports IP 220.181.11.98 added
Feb 18 14:30:38 zw_126_102 Keepalived_vrrp: Netlink reflector reports IP 10.10.127.152 added
Feb 18 14:30:38 zw_126_102 Keepalived_healthcheckers: Netlink reflector reports IP 220.181.11.98 added
Feb 18 14:30:38 zw_126_102 Keepalived_healthcheckers: Netlink reflector reports IP 10.10.127.152 added
Feb 18 14:30:39 zw_126_102 Keepalived_vrrp: VRRP_Instance(PV_LVS_003) Received higher prio advert
Feb 18 14:30:39 zw_126_102 Keepalived_vrrp: VRRP_Instance(PV_LVS_003) Entering BACKUP STATE
Feb 18 14:30:39 zw_126_102 Keepalived_vrrp: VRRP_Instance(PV_LVS_003) removing protocol Virtual Routes
Feb 18 14:30:39 zw_126_102 Keepalived_vrrp: VRRP_Instance(PV_LVS_003) removing protocol VIPs.
Feb 18 14:30:39 zw_126_102 Keepalived_healthcheckers: Netlink reflector reports IP 220.181.11.98 removed
Feb 18 14:30:39 zw_126_102 Keepalived_vrrp: VRRP_Group(VG003) Syncing instances to BACKUP state
master 机器没有日志
故障说明:
我们是查看日志发现,backup切换过程中广播的vip 的arp在backup上。但master服务器没有广播vip在自己上。很快backup发现自己不是master ,就把vip地址remove 了。但此时交换机上vip的
mac还是backup。 所以发送的20分钟发送的数据包还是到backup上。 交换机的arp更新时间是在20分钟,20分钟以后就ok 了,交换机学习到正确的vip的mac 。
故障分析:
正常lvs切换过程描述:lvs master 会每一秒钟发送一次vrrp多播包。通知backup服务器。
backup服务器切换到主过程中会主动广播arp 说明vip的mac 在slave上,然后在发送vrrp多播包到说明自己已经是master。如果原来的master还能接到vrrp多播包的话,它会判断自己优先级和接到vrrp多播包里的优先级哪个高。如果级别比自己低,就会重新不断发送vip 的arp包,说明vip的mac自己着。如果接收的级别高于自己,就可以remove vip地址,不在发送vrrp多播包。
做实验来看看,让master 和 slave 同时断开和同时连接上,master日志出现
Feb 24 11:24:05 bx_0_186 Keepalived_vrrp: VRRP_Instance(PV_LVS_003) Received lower prio advert, forcing new election
Feb 24 11:24:05 bx_0_186 Keepalived_vrrp: VRRP_Instance(PV_LVS_003) Sending gratuitous ARPs on eth0 for 10.16.0.250
会出现重新声明vip的arp 包的情况。
只要在先让master ->slave 联通,而 slave-> master 不通的情况,才会出现master 没有重新广播vip arp的情况。