高可用性、负载均衡的mysql集群解决方案 -- 2-282885139@qq.com-ChinaUnix博客

282885139@qq.com的ChinaUnix博客

首页　| 　博文目录　| 　关于我

282885139@qq.com

博客访问： 3146
博文数量： 8
博客积分： 0
博客等级：民兵
技术积分： 10
用户组：普通用户
注册时间： 2015-11-06 16:58

文章分类

全部博文（8）

未分配的博文（8）

文章存档

2015年（8）

我的朋友

相关博文

高可用性、负载均衡的mysql集群解决方案 -- 2

分类： Mysql/postgreSQL

2015-11-06 17:03:07

原文地址：高可用性、负载均衡的mysql集群解决方案 -- 2 作者：fjzhuozl

（接上一篇）

#chmod 755 modprobe.sh

# sh modprobe.sh

# vi /etc/modules

ip_vs_dh

ip_vs_ftp

ip_vs

ip_vs_lblc

ip_vs_lblcr

ip_vs_lc

ip_vs_nq

ip_vs_rr

ip_vs_sed

ip_vs_sh

ip_vs_wlc

ip_vs_wrr

:wq

#Vi

net.ipv4.ip_forward = 0

改为：

net.ipv4.ip_forward = 1

使修改生效：

/sbin/sysctl -p

在MD和BD上安装heartbeat软件包

#Rpm -Uvh perl-xx-xx-xx.rpm

#Yum install heartbeat

#Rpm -Uvh arptables-noarp-addr-0.99.2-1.rh.el.um.1.noarch.rpm

#rpm -Uvh perl-Mail-POP3Client-2.17-1.el5.centos.noarch.rpm

缺少perl包，就使用yum install perl-xx-xx

#Perl -CPAN -e shell

这样安装的perl包不知道为何不好使？奇怪

这里VIP实际上是绑定在2台director上。所以director之间需要做心跳处理。心跳线使用eth1口，用交叉线连接起来。

这样可以避免影响其他服务器。

配置heartbeat

Heartbeat有3个配置文件：

Ha.cf

Authkeys

Haresources

ldirectord进程的配置文件

Ldirectord.cf

一共需要配置4个配置文件。

#vi ha.cf

logfacility local0

bcast eth1

mcast eth1 225.0.0.1 694 1 0

auto_failback off

node ndb1

node ndb2

respawn hacluster /usr/lib/heartbeat/ipfail

apiauth ipfail gid=haclient uid=hacluster

:wq

# vi authkeys

auth 3

3 md5 514a49f83820e34c877ff48770e48ea7

:wq

# vi haresources

ndb1 \

ldirectord::ldirectord.cf \

LVSSyncDaemonSwap::master \

IPaddr2::192.168.131.105/24/eth0/192.168.131.255

Ndb2上需要将主机名更改一下。

:wq

设置属性并使heartbeat开机启动

# chmod 600 /etc/ha.d/authkeys

#/sbin/chkconfig --level 2345 heartbeat on

#/sbin/chkconfig --del ldirectord

启动heartbeat：

/etc/init.d/ldirectord stop

/etc/init.d/heartbeat start

在MD和BD上检查VIP是否生效：

ip addr sh eth0

[root@ndb1 ha.d]# ip addr sh eth0

2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000

link/ether 00:30:48:28:c6:85 brd ff:ff:ff:ff:ff:ff

inet 192.168.131.164/24 brd 192.168.131.255 scope global eth0

inet 192.168.131.105/24 brd 192.168.131.255 scope global secondary eth0

inet6 fe80::230:48ff:fe28:c685/64 scope link

valid_lft forever preferred_lft forever

[root@ndb1 ha.d]#

[root@ndb2 ~]# ip addr sh eth0

2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000

link/ether 00:30:48:28:c4:af brd ff:ff:ff:ff:ff:ff

inet 192.168.131.26/24 brd 192.168.131.255 scope global eth0

inet6 fe80::230:48ff:fe28:c4af/64 scope link

valid_lft forever preferred_lft forever

[root@ndb2 ~]#

现在在MD（164）上已经生效了。

检查ldirectored进程

[root@ndb1 ha.d]# /usr/sbin/ldirectord ldirectord.cf status

ldirectord for /etc/ha.d/ldirectord.cf is running with pid: 5596

[root@ndb1 ha.d]#

[root@ndb2 ~]# /usr/sbin/ldirectord ldirectord.cf status

ldirectord is stopped for /etc/ha.d/ldirectord.cf

[root@ndb2 ~]#

VIP生效的director应该是running状态，standby应该是stop状态。

利用ipvs检查包转发是否生效

[root@ndb1 ha.d]# /sbin/ipvsadm -L -n

IP Virtual Server version 1.2.1 (size=4096)

Prot LocalAddress:Port Scheduler Flags

-> RemoteAddress:Port Forward Weight ActiveConn InActConn

TCP 192.168.131.105:3306 wrr

-> 192.168.131.77:3306 Route 1 3 3034

-> 192.168.131.101:3306 Route 1 3 3038

[root@ndb1 ha.d]#

[root@ndb2 ~]# /sbin/ipvsadm -L -n

IP Virtual Server version 1.2.1 (size=4096)

Prot LocalAddress:Port Scheduler Flags

-> RemoteAddress:Port Forward Weight ActiveConn InActConn

[root@ndb2 ~]#

在MB上已经生效了。

在MD和BD上检查LVSSyncDaemonSwap的状态：

[root@ndb1 ha.d]# /etc/ha.d/resource.d/LVSSyncDaemonSwap master status

master running

(ipvs_syncmaster pid: 5689)

[root@ndb1 ha.d]#

[root@ndb2 ~]# /etc/ha.d/resource.d/LVSSyncDaemonSwap master status

master stopped

(ipvs_syncbackup pid: 5493)

[root@ndb2 ~]#

同样，standby的处于stopped状态。

以下在RS服务器上执行：

ARP转发限制

MD或者BD采用ARP欺骗将ARP包转发给下面的realserver。为了转发成功，需要做ARP限制。

#/etc/init.d/arptables_jf stop

#/usr/sbin/arptables-noarp-addr 192.168.6.240 start

#/etc/init.d/arptables_jf save

#/sbin/chkconfig --level 2345 arptables_jf on

#/etc/init.d/arptables_jf start

查看限制链表

[root@sql2 mysql-cluster]# /sbin/arptables -L -v -n

Chain IN (policy ACCEPT 29243 packets, 819K bytes)

pkts bytes target in out source-ip destination-ip source-hw destination-hw hlen op hrd pro

54 1512 DROP * * 0.0.0.0/0 192.168.131.105 00/00 00/00 any 0000/0000 0000/0000 0000/0000

Chain OUT (policy ACCEPT 3931 packets, 110K bytes)

pkts bytes target in out source-ip destination-ip source-hw destination-hw hlen op hrd pro

0 0 mangle * eth0 192.168.131.105 0.0.0.0/0 00/00 00/00 any 0000/0000 0000/0000 0000/0000 --mangle-ip-s 192.168.131.101

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)

pkts bytes target in out source-ip destination-ip source-hw destination-hw hlen op hrd pro

[root@sql2 mysql-cluster]#

[root@sql1 ~]# /sbin/arptables -L -v -n

Chain IN (policy ACCEPT 29375 packets, 823K bytes)

pkts bytes target in out source-ip destination-ip source-hw destination-hw hlen op hrd pro

54 1512 DROP * * 0.0.0.0/0 192.168.131.105 00/00 00/00 any 0000/0000 0000/0000 0000/0000

Chain OUT (policy ACCEPT 3903 packets, 109K bytes)

pkts bytes target in out source-ip destination-ip source-hw destination-hw hlen op hrd pro

0 0 mangle * eth0 192.168.131.105 0.0.0.0/0 00/00 00/00 any 0000/0000 0000/0000 0000/0000 --mangle-ip-s 192.168.131.77

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)

pkts bytes target in out source-ip destination-ip source-hw destination-hw hlen op hrd pro

[root@sql1 ~]#

这样，由MD或者BD转发过来的ARP包就被链表控制了。

设置如何接收ARP包

以下在所有RS上执行

# cp /etc/sysconfig/network-scripts/ifcfg-lo /etc/sysconfig/network-scripts/ifcfg-lo:0

#Vi /etc/sysconfig/network-scripts/ifcfg-lo\:0

DEVICE=lo:0

IPADDR=192.168.131.105

NETMASK=255.255.255.255

NETWORK=192.168.131.0

BROADCAST=192.168.131.255

ONBOOT=yes

NAME=loopback

:wq

#/sbin/ifup lo

查看lo:0

[root@sql1 ~]# ip addr sh lo

1: lo: mtu 16436 qdisc noqueue

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

inet 127.0.0.1/8 scope host lo

inet 192.168.131.105/32 brd 192.168.131.255 scope global lo:0

inet6 ::1/128 scope host

valid_lft forever preferred_lft forever

[root@sql1 ~]#

[root@sql2 mysql-cluster]# ip addr sh lo

1: lo: mtu 16436 qdisc noqueue

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

inet 127.0.0.1/8 scope host lo

inet 192.168.131.105/32 brd 192.168.131.255 scope global lo:0

inet6 ::1/128 scope host

valid_lft forever preferred_lft forever

[root@sql2 mysql-cluster]#

重新启动服务器

以下在所有服务器上执行（请确认ip，服务器上没有running任何正在使用的服务）

reboot

启动mysql cluster：

顺序：

ndb_mgmd -- 164/26

Ndbd -- 101/77

Mysqld -- 所有

检查服务是否正常

以下在ndb上执行

#ndb_mgm

[root@ndb1 ha.d]# ndb_mgm

-- NDB Cluster -- Management Client --

ndb_mgm> show

Connected to Management Server at: 192.168.131.164:1186

Cluster Configuration

---------------------

[ndbd(NDB)] 2 node(s)

id=3 @192.168.131.77 (Version: 5.0.67, Nodegroup: 0, Master)

id=4 @192.168.131.101 (Version: 5.0.67, Nodegroup: 0)

[ndb_mgmd(MGM)] 2 node(s)

id=1 @192.168.131.164 (Version: 5.0.67)

id=2 @192.168.131.26 (Version: 5.0.67)

[mysqld(API)] 7 node(s)

id=5 @192.168.131.101 (Version: 5.0.67)

id=6 @192.168.131.26 (Version: 5.0.67)

id=7 @192.168.131.164 (Version: 5.0.67)

id=8 @192.168.131.77 (Version: 5.0.67)

id=9 (not connected, accepting connect from any host)

id=10 (not connected, accepting connect from any host)

id=11 (not connected, accepting connect from any host)

ndb_mgm>

一切正常。

检查heartbeat是否正常：

关闭BD，在MD上查看日志：

[root@ndb1 ha.d]# tail -f /var/log/messages

Dec 17 19:42:21 ndb1 heartbeat: [5462]: info: Received shutdown notice from 'ndb2'.

Dec 17 19:42:21 ndb1 heartbeat: [5462]: info: Resources being acquired from ndb2.

Dec 17 19:42:21 ndb1 harc[7085]: info: Running /etc/ha.d/rc.d/status status

Dec 17 19:42:21 ndb1 mach_down[7118]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired

Dec 17 19:42:21 ndb1 mach_down[7118]: info: mach_down takeover complete for node ndb2.

Dec 17 19:42:21 ndb1 heartbeat: [5462]: info: mach_down takeover complete.

Dec 17 19:42:21 ndb1 ldirectord[7153]: Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status

Dec 17 19:42:21 ndb1 ldirectord[7153]: ldirectord for /etc/ha.d/ldirectord.cf is running with pid: 5596

Dec 17 19:42:21 ndb1 ldirectord[7153]: Exiting from ldirectord status

Dec 17 19:42:21 ndb1 heartbeat: [7086]: info: Local Resource acquisition completed.

Dec 17 19:42:21 ndb1 harc[7175]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp

Dec 17 19:42:21 ndb1 ip-request-resp[7175]: received ip-request-resp ldirectord::ldirectord.cf OK yes

Dec 17 19:42:21 ndb1 ResourceManager[7196]: info: Acquiring resource group: ndb1 ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr2::192.168.131.105/24/eth0/192.168.131.255

Dec 17 19:42:22 ndb1 ldirectord[7223]: Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status

Dec 17 19:42:22 ndb1 ldirectord[7223]: ldirectord for /etc/ha.d/ldirectord.cf is running with pid: 5596

Dec 17 19:42:22 ndb1 ldirectord[7223]: Exiting from ldirectord status

Dec 17 19:42:22 ndb1 ResourceManager[7196]: info: Running /etc/ha.d/resource.d/ldirectord ldirectord.cf start

Dec 17 19:42:23 ndb1 ldirectord[7245]: Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf start

Dec 17 19:42:23 ndb1 IPaddr2[7291]: INFO: Running OK

如果没有出现异常，表明一切正常。

破坏性试验

1) 检查ndbd

关闭任意一台ndbd的进程，在ndb_mgm上查看是否失去连接。

如果失去连接，表示已经识别出来。

此时在数据库表中增加内容之后启动刚刚关闭的ndbd，检查新写入的数据是否已经被同步过来。如果同步过来，一切正常。

2) 检查heartbeat

关闭MD，检查BD的反应：

[root@ndb2 ~]# tail -f /var/log/messages

Dec 17 19:47:22 ndb2 harc[6862]: info: Running /etc/ha.d/rc.d/status status

Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: Comm_now_up(): updating status to active

Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: Local status now set to: 'active'

Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: Starting child client "/usr/lib/heartbeat/ipfail" (498,496)

Dec 17 19:47:23 ndb2 heartbeat: [6879]: info: Starting "/usr/lib/heartbeat/ipfail" as uid 498 gid 496 (pid 6879)

Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: remote resource transition completed.

Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: Local Resource acquisition completed. (none)

Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: Initial resource acquisition complete (T_RESOURCES(them))

Dec 17 19:47:29 ndb2 ipfail: [6879]: info: Ping node count is balanced.

Dec 17 19:47:43 ndb2 heartbeat: [6852]: info: Received shutdown notice from 'ndb1'.

Dec 17 19:47:43 ndb2 heartbeat: [6852]: info: Resources being acquired from ndb1.

Dec 17 19:47:43 ndb2 heartbeat: [6884]: info: acquire all HA resources (standby).

Dec 17 19:47:43 ndb2 ResourceManager[6911]: info: Acquiring resource group: ndb2 ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr2::192.168.131.105/24/eth0/192.168.131.255

Dec 17 19:47:43 ndb2 ldirectord[6957]: ldirectord is stopped for /etc/ha.d/ldirectord.cf

Dec 17 19:47:43 ndb2 ldirectord[6957]: Exiting with exit_status 3: Exiting from ldirectord status

Dec 17 19:47:43 ndb2 heartbeat: [6885]: info: Local Resource acquisition completed.

Dec 17 19:47:43 ndb2 ldirectord[6961]: ldirectord is stopped for /etc/ha.d/ldirectord.cf

Dec 17 19:47:43 ndb2 ldirectord[6961]: Exiting with exit_status 3: Exiting from ldirectord status

Dec 17 19:47:43 ndb2 ResourceManager[6911]: info: Running /etc/ha.d/resource.d/ldirectord ldirectord.cf start

Dec 17 19:47:44 ndb2 ldirectord[6986]: Starting Linux Director v1.77.2.32 as daemon

Dec 17 19:47:44 ndb2 ldirectord[6988]: Added virtual server: 192.168.131.105:3306

Dec 17 19:47:44 ndb2 ldirectord[6988]: Quiescent real server: 192.168.131.101:3306 mapped from 192.168.131.101:3306 ( x 192.168.131.105:3306) (Weight set to 0)

Dec 17 19:47:44 ndb2 ldirectord[6988]: Quiescent real server: 192.168.131.77:3306 mapped from 192.168.131.77:3306 ( x 192.168.131.105:3306) (Weight set to 0)

Dec 17 19:47:44 ndb2 ResourceManager[6911]: info: Running /etc/ha.d/resource.d/LVSSyncDaemonSwap master start

Dec 17 19:47:44 ndb2 kernel: IPVS: stopping sync thread 5493 ...

Dec 17 19:47:45 ndb2 kernel: IPVS: sync thread stopped!

Dec 17 19:47:45 ndb2 LVSSyncDaemonSwap[7050]: info: ipvs_syncbackup down

Dec 17 19:47:45 ndb2 kernel: IPVS: sync thread started: state = MASTER, mcast_ifn = eth0, syncid = 0

Dec 17 19:47:45 ndb2 LVSSyncDaemonSwap[7050]: info: ipvs_syncmaster up

Dec 17 19:47:45 ndb2 LVSSyncDaemonSwap[7050]: info: ipvs_syncmaster obtained

Dec 17 19:47:45 ndb2 IPaddr2[7102]: INFO: Resource is stopped

Dec 17 19:47:45 ndb2 ResourceManager[6911]: info: Running /etc/ha.d/resource.d/IPaddr2 192.168.131.105/24/eth0/192.168.131.255 start

Dec 17 19:47:45 ndb2 IPaddr2[7214]: INFO: ip -f inet addr add 192.168.131.105/24 brd 192.168.131.255 dev eth0

Dec 17 19:47:45 ndb2 avahi-daemon[2776]: Registering new address record for 192.168.131.105 on eth0.

Dec 17 19:47:45 ndb2 IPaddr2[7214]: INFO: ip link set eth0 up

Dec 17 19:47:45 ndb2 IPaddr2[7214]: INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.131.105 eth0 192.168.131.105 auto not_used not_used

Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers

Dec 17 19:47:45 ndb2 IPaddr2[7185]: INFO: Success

Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers

Dec 17 19:47:45 ndb2 heartbeat: [6884]: info: all HA resource acquisition completed (standby).

Dec 17 19:47:45 ndb2 heartbeat: [6852]: info: Standby resource acquisition done [all].

Dec 17 19:47:45 ndb2 harc[7277]: info: Running /etc/ha.d/rc.d/status status

Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers

Dec 17 19:47:45 ndb2 last message repeated 14 times

Dec 17 19:47:45 ndb2 mach_down[7293]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired

Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers

Dec 17 19:47:45 ndb2 mach_down[7293]: info: mach_down takeover complete for node ndb1.

Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers

Dec 17 19:47:45 ndb2 heartbeat: [6852]: info: mach_down takeover complete.

Dec 17 19:47:45 ndb2 harc[7327]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp

Dec 17 19:47:45 ndb2 ip-request-resp[7327]: received ip-request-resp ldirectord::ldirectord.cf OK yes

Dec 17 19:47:45 ndb2 ResourceManager[7348]: info: Acquiring resource group: ndb2 ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr2::192.168.131.105/24/eth0/192.168.131.255

Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers

Dec 17 19:47:46 ndb2 last message repeated 3 times

Dec 17 19:47:46 ndb2 ldirectord[7375]: ldirectord for /etc/ha.d/ldirectord.cf is running with pid: 6988

Dec 17 19:47:46 ndb2 ldirectord[7375]: Exiting from ldirectord status

Dec 17 19:47:46 ndb2 ResourceManager[7348]: info: Running /etc/ha.d/resource.d/ldirectord ldirectord.cf start

Dec 17 19:47:46 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers

Dec 17 19:47:46 ndb2 last message repeated 6 times

Dec 17 19:47:46 ndb2 IPaddr2[7443]: INFO: Running OK

Dec 17 19:47:46 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers

Dec 17 19:48:16 ndb2 last message repeated 289 times

Dec 17 19:48:16 ndb2 heartbeat: [6852]: WARN: node ndb1: is dead

Dec 17 19:48:16 ndb2 heartbeat: [6852]: info: Dead node ndb1 gave up resources.

Dec 17 19:48:16 ndb2 heartbeat: [6852]: info: Link ndb1:eth1 dead.

Dec 17 19:48:16 ndb2 ipfail: [6879]: info: Status update: Node ndb1 now has status dead

Dec 17 19:48:16 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers

Dec 17 19:48:17 ndb2 last message repeated 8 times

Dec 17 19:48:17 ndb2 ipfail: [6879]: info: NS: We are dead. :<

Dec 17 19:48:17 ndb2 ipfail: [6879]: info: Link Status update: Link ndb1/eth1 now has status dead

Dec 17 19:48:17 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers

Dec 17 19:48:17 ndb2 ipfail: [6879]: info: We are dead. :<

Dec 17 19:48:17 ndb2 ipfail: [6879]: info: Asking other side for ping node count.

Dec 17 19:48:18 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers[root@ndb2 ~]# tail -f /var/log/messages