高可用性、负载均衡的mysql集群解决方案 -- 2-liu1084-ChinaUnix博客

刘军的博客liujun.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

liu1084

博客访问： 724138
博文数量： 160
博客积分： 8847
博客等级：中将
技术积分： 1656
用户组：普通用户
注册时间： 2010-11-25 16:46

个人简介

。。。。。。。。。。。。。。。。。。。。。。

文章分类

全部博文（160）

SEO（1）
ruby（6）
Java开发（36）
RIA（10）
PHP开发（24）
英日学习（0）
幽默（9）
杂文（16）
Apache（4）
Qmail（6）
Oracle（10）
Linux（25）
FreeBSD（6）
MySQL（5）
未分配的博文（2）

文章存档

2015年（1）

2013年（1）

2012年（4）

2011年（26）

2010年（14）

2009年（36）

2008年（38）

2007年（39）

2006年（1）

我的朋友

相关博文

高可用性、负载均衡的mysql集群解决方案 -- 2

分类： Mysql/postgreSQL

2008-12-17 20:31:53

（接上一篇）

#chmod 755 modprobe.sh

# sh modprobe.sh

# vi /etc/modules

ip_vs_dh

ip_vs_ftp

ip_vs

ip_vs_lblc

ip_vs_lblcr

ip_vs_lc

ip_vs_nq

ip_vs_rr

ip_vs_sed

ip_vs_sh

ip_vs_wlc

ip_vs_wrr

:wq

#Vi

net.ipv4.ip_forward = 0

改为：

net.ipv4.ip_forward = 1

使修改生效：

/sbin/sysctl -p

在MD和BD上安装heartbeat软件包

#Rpm -Uvh perl-xx-xx-xx.rpm

#Yum install heartbeat

#Rpm -Uvh arptables-noarp-addr-0.99.2-1.rh.el.um.1.noarch.rpm

#rpm -Uvh perl-Mail-POP3Client-2.17-1.el5.centos.noarch.rpm

缺少perl包，就使用yum install perl-xx-xx

#Perl -CPAN -e shell

这样安装的perl包不知道为何不好使？奇怪

这里VIP实际上是绑定在2台director上。所以director之间需要做心跳处理。心跳线使用eth1口，用交叉线连接起来。

这样可以避免影响其他服务器。

配置heartbeat

Heartbeat有3个配置文件：

Ha.cf

Authkeys

Haresources

ldirectord进程的配置文件

Ldirectord.cf

一共需要配置4个配置文件。

#vi ha.cf

logfacility local0

bcast eth1

mcast eth1 225.0.0.1 694 1 0

auto_failback off

node ndb1

node ndb2

respawn hacluster /usr/lib/heartbeat/ipfail

apiauth ipfail gid=haclient uid=hacluster

:wq

# vi authkeys

auth 3

3 md5 514a49f83820e34c877ff48770e48ea7

:wq

# vi haresources

ndb1 \

ldirectord::ldirectord.cf \

LVSSyncDaemonSwap::master \

IPaddr2::192.168.131.105/24/eth0/192.168.131.255

Ndb2上需要将主机名更改一下。

:wq

设置属性并使heartbeat开机启动

# chmod 600 /etc/ha.d/authkeys

#/sbin/chkconfig --level 2345 heartbeat on

#/sbin/chkconfig --del ldirectord

启动heartbeat：

/etc/init.d/ldirectord stop

/etc/init.d/heartbeat start

在MD和BD上检查VIP是否生效：

ip addr sh eth0

[root@ndb1 ha.d]# ip addr sh eth0

2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000

link/ether 00:30:48:28:c6:85 brd ff:ff:ff:ff:ff:ff

inet 192.168.131.164/24 brd 192.168.131.255 scope global eth0

inet 192.168.131.105/24 brd 192.168.131.255 scope global secondary eth0

inet6 fe80::230:48ff:fe28:c685/64 scope link

valid_lft forever preferred_lft forever

[root@ndb1 ha.d]#

[root@ndb2 ~]# ip addr sh eth0

2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000

link/ether 00:30:48:28:c4:af brd ff:ff:ff:ff:ff:ff

inet 192.168.131.26/24 brd 192.168.131.255 scope global eth0

inet6 fe80::230:48ff:fe28:c4af/64 scope link

valid_lft forever preferred_lft forever

[root@ndb2 ~]#

现在在MD（164）上已经生效了。

检查ldirectored进程

[root@ndb1 ha.d]# /usr/sbin/ldirectord ldirectord.cf status

ldirectord for /etc/ha.d/ldirectord.cf is running with pid: 5596

[root@ndb1 ha.d]#

[root@ndb2 ~]# /usr/sbin/ldirectord ldirectord.cf status

ldirectord is stopped for /etc/ha.d/ldirectord.cf

[root@ndb2 ~]#

VIP生效的director应该是running状态，standby应该是stop状态。

利用ipvs检查包转发是否生效

[root@ndb1 ha.d]# /sbin/ipvsadm -L -n

IP Virtual Server version 1.2.1 (size=4096)

Prot LocalAddress:Port Scheduler Flags

-> RemoteAddress:Port Forward Weight ActiveConn InActConn

TCP 192.168.131.105:3306 wrr

-> 192.168.131.77:3306 Route 1 3 3034

-> 192.168.131.101:3306 Route 1 3 3038

[root@ndb1 ha.d]#

[root@ndb2 ~]# /sbin/ipvsadm -L -n

IP Virtual Server version 1.2.1 (size=4096)

Prot LocalAddress:Port Scheduler Flags

-> RemoteAddress:Port Forward Weight ActiveConn InActConn

[root@ndb2 ~]#

在MB上已经生效了。

在MD和BD上检查LVSSyncDaemonSwap的状态：

[root@ndb1 ha.d]# /etc/ha.d/resource.d/LVSSyncDaemonSwap master status

master running

(ipvs_syncmaster pid: 5689)

[root@ndb1 ha.d]#

[root@ndb2 ~]# /etc/ha.d/resource.d/LVSSyncDaemonSwap master status

master stopped

(ipvs_syncbackup pid: 5493)

[root@ndb2 ~]#

同样，standby的处于stopped状态。

以下在RS服务器上执行：

ARP转发限制

MD或者BD采用ARP欺骗将ARP包转发给下面的realserver。为了转发成功，需要做ARP限制。

#/etc/init.d/arptables_jf stop

#/usr/sbin/arptables-noarp-addr 192.168.6.240 start

#/etc/init.d/arptables_jf save

#/sbin/chkconfig --level 2345 arptables_jf on

#/etc/init.d/arptables_jf start

查看限制链表

[root@sql2 mysql-cluster]# /sbin/arptables -L -v -n

Chain IN (policy ACCEPT 29243 packets, 819K bytes)

pkts bytes target in out source-ip destination-ip source-hw destination-hw hlen op hrd pro

54 1512 DROP * * 0.0.0.0/0 192.168.131.105 00/00 00/00 any 0000/0000 0000/0000 0000/0000

Chain OUT (policy ACCEPT 3931 packets, 110K bytes)

pkts bytes target in out source-ip destination-ip source-hw destination-hw hlen op hrd pro

0 0 mangle * eth0 192.168.131.105 0.0.0.0/0 00/00 00/00 any 0000/0000 0000/0000 0000/0000 --mangle-ip-s 192.168.131.101

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)

pkts bytes target in out source-ip destination-ip source-hw destination-hw hlen op hrd pro

[root@sql2 mysql-cluster]#

[root@sql1 ~]# /sbin/arptables -L -v -n

Chain IN (policy ACCEPT 29375 packets, 823K bytes)

pkts bytes target in out source-ip destination-ip source-hw destination-hw hlen op hrd pro

54 1512 DROP * * 0.0.0.0/0 192.168.131.105 00/00 00/00 any 0000/0000 0000/0000 0000/0000

Chain OUT (policy ACCEPT 3903 packets, 109K bytes)

pkts bytes target in out source-ip destination-ip source-hw destination-hw hlen op hrd pro

0 0 mangle * eth0 192.168.131.105 0.0.0.0/0 00/00 00/00 any 0000/0000 0000/0000 0000/0000 --mangle-ip-s 192.168.131.77

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)

pkts bytes target in out source-ip destination-ip source-hw destination-hw hlen op hrd pro

[root@sql1 ~]#

这样，由MD或者BD转发过来的ARP包就被链表控制了。

设置如何接收ARP包

以下在所有RS上执行

# cp /etc/sysconfig/network-scripts/ifcfg-lo /etc/sysconfig/network-scripts/ifcfg-lo:0

#Vi /etc/sysconfig/network-scripts/ifcfg-lo\:0

DEVICE=lo:0

IPADDR=192.168.131.105

NETMASK=255.255.255.255

NETWORK=192.168.131.0

BROADCAST=192.168.131.255

ONBOOT=yes

NAME=loopback

:wq

#/sbin/ifup lo

查看lo:0

[root@sql1 ~]# ip addr sh lo

1: lo: mtu 16436 qdisc noqueue

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

inet 127.0.0.1/8 scope host lo

inet 192.168.131.105/32 brd 192.168.131.255 scope global lo:0

inet6 ::1/128 scope host

valid_lft forever preferred_lft forever

[root@sql1 ~]#

[root@sql2 mysql-cluster]# ip addr sh lo

1: lo: mtu 16436 qdisc noqueue

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

inet 127.0.0.1/8 scope host lo

inet 192.168.131.105/32 brd 192.168.131.255 scope global lo:0

inet6 ::1/128 scope host

valid_lft forever preferred_lft forever

[root@sql2 mysql-cluster]#

重新启动服务器

以下在所有服务器上执行（请确认ip，服务器上没有running任何正在使用的服务）

reboot

启动mysql cluster：

顺序：

ndb_mgmd -- 164/26

Ndbd -- 101/77

Mysqld -- 所有

检查服务是否正常

以下在ndb上执行

#ndb_mgm

[root@ndb1 ha.d]# ndb_mgm

-- NDB Cluster -- Management Client --

ndb_mgm> show

Connected to Management Server at: 192.168.131.164:1186

Cluster Configuration

---------------------

[ndbd(NDB)] 2 node(s)

id=3 @192.168.131.77 (Version: 5.0.67, Nodegroup: 0, Master)

id=4 @192.168.131.101 (Version: 5.0.67, Nodegroup: 0)

[ndb_mgmd(MGM)] 2 node(s)

id=1 @192.168.131.164 (Version: 5.0.67)

id=2 @192.168.131.26 (Version: 5.0.67)

[mysqld(API)] 7 node(s)

id=5 @192.168.131.101 (Version: 5.0.67)

id=6 @192.168.131.26 (Version: 5.0.67)

id=7 @192.168.131.164 (Version: 5.0.67)

id=8 @192.168.131.77 (Version: 5.0.67)

id=9 (not connected, accepting connect from any host)

id=10 (not connected, accepting connect from any host)

id=11 (not connected, accepting connect from any host)

ndb_mgm>

一切正常。

检查heartbeat是否正常：

关闭BD，在MD上查看日志：

[root@ndb1 ha.d]# tail -f /var/log/messages

Dec 17 19:42:21 ndb1 heartbeat: [5462]: info: Received shutdown notice from 'ndb2'.

Dec 17 19:42:21 ndb1 heartbeat: [5462]: info: Resources being acquired from ndb2.

Dec 17 19:42:21 ndb1 harc[7085]: info: Running /etc/ha.d/rc.d/status status

Dec 17 19:42:21 ndb1 mach_down[7118]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired

Dec 17 19:42:21 ndb1 mach_down[7118]: info: mach_down takeover complete for node ndb2.

Dec 17 19:42:21 ndb1 heartbeat: [5462]: info: mach_down takeover complete.

Dec 17 19:42:21 ndb1 ldirectord[7153]: Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status

Dec 17 19:42:21 ndb1 ldirectord[7153]: ldirectord for /etc/ha.d/ldirectord.cf is running with pid: 5596

Dec 17 19:42:21 ndb1 ldirectord[7153]: Exiting from ldirectord status

Dec 17 19:42:21 ndb1 heartbeat: [7086]: info: Local Resource acquisition completed.

Dec 17 19:42:21 ndb1 harc[7175]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp

Dec 17 19:42:21 ndb1 ip-request-resp[7175]: received ip-request-resp ldirectord::ldirectord.cf OK yes

Dec 17 19:42:21 ndb1 ResourceManager[7196]: info: Acquiring resource group: ndb1 ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr2::192.168.131.105/24/eth0/192.168.131.255

Dec 17 19:42:22 ndb1 ldirectord[7223]: Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status

Dec 17 19:42:22 ndb1 ldirectord[7223]: ldirectord for /etc/ha.d/ldirectord.cf is running with pid: 5596

Dec 17 19:42:22 ndb1 ldirectord[7223]: Exiting from ldirectord status

Dec 17 19:42:22 ndb1 ResourceManager[7196]: info: Running /etc/ha.d/resource.d/ldirectord ldirectord.cf start

Dec 17 19:42:23 ndb1 ldirectord[7245]: Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf start

Dec 17 19:42:23 ndb1 IPaddr2[7291]: INFO: Running OK

如果没有出现异常，表明一切正常。

破坏性试验

1) 检查ndbd

关闭任意一台ndbd的进程，在ndb_mgm上查看是否失去连接。

如果失去连接，表示已经识别出来。

此时在数据库表中增加内容之后启动刚刚关闭的ndbd，检查新写入的数据是否已经被同步过来。如果同步过来，一切正常。

2) 检查heartbeat

关闭MD，检查BD的反应：

[root@ndb2 ~]# tail -f /var/log/messages

Dec 17 19:47:22 ndb2 harc[6862]: info: Running /etc/ha.d/rc.d/status status

Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: Comm_now_up(): updating status to active

Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: Local status now set to: 'active'

Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: Starting child client "/usr/lib/heartbeat/ipfail" (498,496)

Dec 17 19:47:23 ndb2 heartbeat: [6879]: info: Starting "/usr/lib/heartbeat/ipfail" as uid 498 gid 496 (pid 6879)

Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: remote resource transition completed.

Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: Local Resource acquisition completed. (none)

Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: Initial resource acquisition complete (T_RESOURCES(them))

Dec 17 19:47:29 ndb2 ipfail: [6879]: info: Ping node count is balanced.

Dec 17 19:47:43 ndb2 heartbeat: [6852]: info: Received shutdown notice from 'ndb1'.

Dec 17 19:47:43 ndb2 heartbeat: [6852]: info: Resources being acquired from ndb1.

Dec 17 19:47:43 ndb2 heartbeat: [6884]: info: acquire all HA resources (standby).

Dec 17 19:47:43 ndb2 ResourceManager[6911]: info: Acquiring resource group: ndb2 ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr2::192.168.131.105/24/eth0/192.168.131.255

Dec 17 19:47:43 ndb2 ldirectord[6957]: ldirectord is stopped for /etc/ha.d/ldirectord.cf

Dec 17 19:47:43 ndb2 ldirectord[6957]: Exiting with exit_status 3: Exiting from ldirectord status

Dec 17 19:47:43 ndb2 heartbeat: [6885]: info: Local Resource acquisition completed.

Dec 17 19:47:43 ndb2 ldirectord[6961]: ldirectord is stopped for /etc/ha.d/ldirectord.cf

Dec 17 19:47:43 ndb2 ldirectord[6961]: Exiting with exit_status 3: Exiting from ldirectord status

Dec 17 19:47:43 ndb2 ResourceManager[6911]: info: Running /etc/ha.d/resource.d/ldirectord ldirectord.cf start

Dec 17 19:47:44 ndb2 ldirectord[6986]: Starting Linux Director v1.77.2.32 as daemon

Dec 17 19:47:44 ndb2 ldirectord[6988]: Added virtual server: 192.168.131.105:3306

Dec 17 19:47:44 ndb2 ldirectord[6988]: Quiescent real server: 192.168.131.101:3306 mapped from 192.168.131.101:3306 ( x 192.168.131.105:3306) (Weight set to 0)

Dec 17 19:47:44 ndb2 ldirectord[6988]: Quiescent real server: 192.168.131.77:3306 mapped from 192.168.131.77:3306 ( x 192.168.131.105:3306) (Weight set to 0)

Dec 17 19:47:44 ndb2 ResourceManager[6911]: info: Running /etc/ha.d/resource.d/LVSSyncDaemonSwap master start

Dec 17 19:47:44 ndb2 kernel: IPVS: stopping sync thread 5493 ...

Dec 17 19:47:45 ndb2 kernel: IPVS: sync thread stopped!

Dec 17 19:47:45 ndb2 LVSSyncDaemonSwap[7050]: info: ipvs_syncbackup down

Dec 17 19:47:45 ndb2 kernel: IPVS: sync thread started: state = MASTER, mcast_ifn = eth0, syncid = 0

Dec 17 19:47:45 ndb2 LVSSyncDaemonSwap[7050]: info: ipvs_syncmaster up

Dec 17 19:47:45 ndb2 LVSSyncDaemonSwap[7050]: info: ipvs_syncmaster obtained

Dec 17 19:47:45 ndb2 IPaddr2[7102]: INFO: Resource is stopped

Dec 17 19:47:45 ndb2 ResourceManager[6911]: info: Running /etc/ha.d/resource.d/IPaddr2 192.168.131.105/24/eth0/192.168.131.255 start

Dec 17 19:47:45 ndb2 IPaddr2[7214]: INFO: ip -f inet addr add 192.168.131.105/24 brd 192.168.131.255 dev eth0

Dec 17 19:47:45 ndb2 avahi-daemon[2776]: Registering new address record for 192.168.131.105 on eth0.

Dec 17 19:47:45 ndb2 IPaddr2[7214]: INFO: ip link set eth0 up

Dec 17 19:47:45 ndb2 IPaddr2[7214]: INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.131.105 eth0 192.168.131.105 auto not_used not_used

Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers

Dec 17 19:47:45 ndb2 IPaddr2[7185]: INFO: Success

Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers

Dec 17 19:47:45 ndb2 heartbeat: [6884]: info: all HA resource acquisition completed (standby).

Dec 17 19:47:45 ndb2 heartbeat: [6852]: info: Standby resource acquisition done [all].

Dec 17 19:47:45 ndb2 harc[7277]: info: Running /etc/ha.d/rc.d/status status

Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers

Dec 17 19:47:45 ndb2 last message repeated 14 times

Dec 17 19:47:45 ndb2 mach_down[7293]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired

Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers

Dec 17 19:47:45 ndb2 mach_down[7293]: info: mach_down takeover complete for node ndb1.

Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers

Dec 17 19:47:45 ndb2 heartbeat: [6852]: info: mach_down takeover complete.

Dec 17 19:47:45 ndb2 harc[7327]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp

Dec 17 19:47:45 ndb2 ip-request-resp[7327]: received ip-request-resp ldirectord::ldirectord.cf OK yes

Dec 17 19:47:45 ndb2 ResourceManager[7348]: info: Acquiring resource group: ndb2 ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr2::192.168.131.105/24/eth0/192.168.131.255

Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers

Dec 17 19:47:46 ndb2 last message repeated 3 times

Dec 17 19:47:46 ndb2 ldirectord[7375]: ldirectord for /etc/ha.d/ldirectord.cf is running with pid: 6988

Dec 17 19:47:46 ndb2 ldirectord[7375]: Exiting from ldirectord status

Dec 17 19:47:46 ndb2 ResourceManager[7348]: info: Running /etc/ha.d/resource.d/ldirectord ldirectord.cf start

Dec 17 19:47:46 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers

Dec 17 19:47:46 ndb2 last message repeated 6 times

Dec 17 19:47:46 ndb2 IPaddr2[7443]: INFO: Running OK

Dec 17 19:47:46 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers

Dec 17 19:48:16 ndb2 last message repeated 289 times

Dec 17 19:48:16 ndb2 heartbeat: [6852]: WARN: node ndb1: is dead

Dec 17 19:48:16 ndb2 heartbeat: [6852]: info: Dead node ndb1 gave up resources.

Dec 17 19:48:16 ndb2 heartbeat: [6852]: info: Link ndb1:eth1 dead.

Dec 17 19:48:16 ndb2 ipfail: [6879]: info: Status update: Node ndb1 now has status dead

Dec 17 19:48:16 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers

Dec 17 19:48:17 ndb2 last message repeated 8 times

Dec 17 19:48:17 ndb2 ipfail: [6879]: info: NS: We are dead. :<

Dec 17 19:48:17 ndb2 ipfail: [6879]: info: Link Status update: Link ndb1/eth1 now has status dead

Dec 17 19:48:17 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers

Dec 17 19:48:17 ndb2 ipfail: [6879]: info: We are dead. :<

Dec 17 19:48:17 ndb2 ipfail: [6879]: info: Asking other side for ping node count.

Dec 17 19:48:18 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers[root@ndb2 ~]# tail -f /var/log/messages