分类: Mysql/postgreSQL
2011-05-05 14:26:22
#chmod 755 modprobe.sh
# sh modprobe.sh
# vi /etc/modules
ip_vs_dh
ip_vs_ftp
ip_vs
ip_vs_lblc
ip_vs_lblcr
ip_vs_lc
ip_vs_nq
ip_vs_rr
ip_vs_sed
ip_vs_sh
ip_vs_wlc
ip_vs_wrr
:wq
#Vi
net.ipv4.ip_forward = 0
改为:
net.ipv4.ip_forward = 1
使修改生效:
/sbin/sysctl -p
在MD和BD上安装heartbeat软件包
#Rpm -Uvh perl-xx-xx-xx.rpm
#Yum install heartbeat
#Rpm -Uvh arptables-noarp-addr-0.99.2-1.rh.el.um.1.noarch.rpm
#rpm -Uvh perl-Mail-POP3Client-2.17-1.el5.centos.noarch.rpm
缺少perl包,就使用yum install perl-xx-xx
#Perl -CPAN -e shell
这样安装的perl包不知道为何不好使?奇怪
这里VIP实际上是绑定在2台director上。所以director之间需要做心跳处理。心跳线使用eth1口,用交叉线连接起来。
这样可以避免影响其他服务器。
配置heartbeat
Heartbeat有3个配置文件:
Ha.cf
Authkeys
Haresources
ldirectord进程的配置文件
Ldirectord.cf
一共需要配置4个配置文件。
#vi ha.cf
logfacility local0
bcast eth1
mcast eth1 225.0.0.1 694 1 0
auto_failback off
node ndb1
node ndb2
respawn hacluster /usr/lib/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster
:wq
# vi authkeys
auth 3
3 md5 514a49f83820e34c877ff48770e48ea7
:wq
# vi haresources
ndb1 \
ldirectord::ldirectord.cf \
LVSSyncDaemonSwap::master \
IPaddr2::192.168.131.105/24/eth0/192.168.131.255
Ndb2上需要将主机名更改一下。
:wq
设置属性并使heartbeat开机启动
# chmod 600 /etc/ha.d/authkeys
#/sbin/chkconfig --level 2345 heartbeat on
#/sbin/chkconfig --del ldirectord
启动heartbeat:
/etc/init.d/ldirectord stop
/etc/init.d/heartbeat start
在MD和BD上检查VIP是否生效:
ip addr sh eth0
[root@ndb1 ha.d]# ip addr sh eth0
2: eth0:
link/ether 00:30:48:28:c6:85 brd ff:ff:ff:ff:ff:ff
inet 192.168.131.164/24 brd 192.168.131.255 scope global eth0
inet 192.168.131.105/24 brd 192.168.131.255 scope global secondary eth0
inet6 fe80::230:48ff:fe28:c685/64 scope link
valid_lft forever preferred_lft forever
[root@ndb1 ha.d]#
[root@ndb2 ~]# ip addr sh eth0
2: eth0:
link/ether 00:30:48:28:c4:af brd ff:ff:ff:ff:ff:ff
inet 192.168.131.26/24 brd 192.168.131.255 scope global eth0
inet6 fe80::230:48ff:fe28:c4af/64 scope link
valid_lft forever preferred_lft forever
[root@ndb2 ~]#
现在在MD(164)上已经生效了。
检查ldirectored进程
[root@ndb1 ha.d]# /usr/sbin/ldirectord ldirectord.cf status
ldirectord for /etc/ha.d/ldirectord.cf is running with pid: 5596
[root@ndb1 ha.d]#
[root@ndb2 ~]# /usr/sbin/ldirectord ldirectord.cf status
ldirectord is stopped for /etc/ha.d/ldirectord.cf
[root@ndb2 ~]#
VIP生效的director应该是running状态,standby应该是stop状态。
利用ipvs检查包转发是否生效
[root@ndb1 ha.d]# /sbin/ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 192.168.131.105:3306 wrr
-> 192.168.131.77:3306 Route 1 3 3034
-> 192.168.131.101:3306 Route 1 3 3038
[root@ndb1 ha.d]#
[root@ndb2 ~]# /sbin/ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
[root@ndb2 ~]#
在MB上已经生效了。
在MD和BD上检查LVSSyncDaemonSwap的状态:
[root@ndb1 ha.d]# /etc/ha.d/resource.d/LVSSyncDaemonSwap master status
master running
(ipvs_syncmaster pid: 5689)
[root@ndb1 ha.d]#
[root@ndb2 ~]# /etc/ha.d/resource.d/LVSSyncDaemonSwap master status
master stopped
(ipvs_syncbackup pid: 5493)
[root@ndb2 ~]#
同样,standby的处于stopped状态。
以下在RS服务器上执行:
ARP转发限制
MD或者BD采用ARP欺骗将ARP包转发给下面的realserver。为了转发成功,需要做ARP限制。
#/etc/init.d/arptables_jf stop
#/usr/sbin/arptables-noarp-addr 192.168.6.240 start
#/etc/init.d/arptables_jf save
#/sbin/chkconfig --level 2345 arptables_jf on
#/etc/init.d/arptables_jf start
查看限制链表
[root@sql2 mysql-cluster]# /sbin/arptables -L -v -n
Chain IN (policy ACCEPT 29243 packets, 819K bytes)
pkts bytes target in out source-ip destination-ip source-hw destination-hw hlen op hrd pro
54 1512 DROP * * 0.0.0.0/0 192.168.131.105 00/00 00/00 any 0000/0000 0000/0000 0000/0000
Chain OUT (policy ACCEPT 3931 packets, 110K bytes)
pkts bytes target in out source-ip destination-ip source-hw destination-hw hlen op hrd pro
0 0 mangle * eth0 192.168.131.105 0.0.0.0/0 00/00 00/00 any 0000/0000 0000/0000 0000/0000 --mangle-ip-s 192.168.131.101
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target in out source-ip destination-ip source-hw destination-hw hlen op hrd pro
[root@sql2 mysql-cluster]#
[root@sql1 ~]# /sbin/arptables -L -v -n
Chain IN (policy ACCEPT 29375 packets, 823K bytes)
pkts bytes target in out source-ip destination-ip source-hw destination-hw hlen op hrd pro
54 1512 DROP * * 0.0.0.0/0 192.168.131.105 00/00 00/00 any 0000/0000 0000/0000 0000/0000
Chain OUT (policy ACCEPT 3903 packets, 109K bytes)
pkts bytes target in out source-ip destination-ip source-hw destination-hw hlen op hrd pro
0 0 mangle * eth0 192.168.131.105 0.0.0.0/0 00/00 00/00 any 0000/0000 0000/0000 0000/0000 --mangle-ip-s 192.168.131.77
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target in out source-ip destination-ip source-hw destination-hw hlen op hrd pro
[root@sql1 ~]#
这样,由MD或者BD转发过来的ARP包就被链表控制了。
设置如何接收ARP包
以下在所有RS上执行
# cp /etc/sysconfig/network-scripts/ifcfg-lo /etc/sysconfig/network-scripts/ifcfg-lo:0
#Vi /etc/sysconfig/network-scripts/ifcfg-lo\:0
DEVICE=lo:0
IPADDR=192.168.131.105
NETMASK=255.255.255.255
NETWORK=192.168.131.0
BROADCAST=192.168.131.255
ONBOOT=yes
NAME=loopback
:wq
#/sbin/ifup lo
查看lo:0
[root@sql1 ~]# ip addr sh lo
1: lo:
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet 192.168.131.105/32 brd 192.168.131.255 scope global lo:0
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
[root@sql1 ~]#
[root@sql2 mysql-cluster]# ip addr sh lo
1: lo:
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet 192.168.131.105/32 brd 192.168.131.255 scope global lo:0
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
[root@sql2 mysql-cluster]#
重新启动服务器
以下在所有服务器上执行(请确认ip,服务器上没有running任何正在使用的服务)
reboot
启动mysql cluster:
顺序:
ndb_mgmd -- 164/26
Ndbd -- 101/77
Mysqld -- 所有
检查服务是否正常
以下在ndb上执行
#ndb_mgm
[root@ndb1 ha.d]# ndb_mgm
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: 192.168.131.164:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 @192.168.131.77 (Version: 5.0.67, Nodegroup: 0, Master)
id=4 @192.168.131.101 (Version: 5.0.67, Nodegroup: 0)
[ndb_mgmd(MGM)] 2 node(s)
id=1 @192.168.131.164 (Version: 5.0.67)
id=2 @192.168.131.26 (Version: 5.0.67)
[mysqld(API)] 7 node(s)
id=5 @192.168.131.101 (Version: 5.0.67)
id=6 @192.168.131.26 (Version: 5.0.67)
id=7 @192.168.131.164 (Version: 5.0.67)
id=8 @192.168.131.77 (Version: 5.0.67)
id=9 (not connected, accepting connect from any host)
id=10 (not connected, accepting connect from any host)
id=11 (not connected, accepting connect from any host)
ndb_mgm>
一切正常。
检查heartbeat是否正常:
关闭BD,在MD上查看日志:
[root@ndb1 ha.d]# tail -f /var/log/messages
Dec 17 19:42:21 ndb1 heartbeat: [5462]: info: Received shutdown notice from 'ndb2'.
Dec 17 19:42:21 ndb1 heartbeat: [5462]: info: Resources being acquired from ndb2.
Dec 17 19:42:21 ndb1 harc[7085]: info: Running /etc/ha.d/rc.d/status status
Dec 17 19:42:21 ndb1 mach_down[7118]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
Dec 17 19:42:21 ndb1 mach_down[7118]: info: mach_down takeover complete for node ndb2.
Dec 17 19:42:21 ndb1 heartbeat: [5462]: info: mach_down takeover complete.
Dec 17 19:42:21 ndb1 ldirectord[7153]: Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status
Dec 17 19:42:21 ndb1 ldirectord[7153]: ldirectord for /etc/ha.d/ldirectord.cf is running with pid: 5596
Dec 17 19:42:21 ndb1 ldirectord[7153]: Exiting from ldirectord status
Dec 17 19:42:21 ndb1 heartbeat: [7086]: info: Local Resource acquisition completed.
Dec 17 19:42:21 ndb1 harc[7175]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
Dec 17 19:42:21 ndb1 ip-request-resp[7175]: received ip-request-resp ldirectord::ldirectord.cf OK yes
Dec 17 19:42:21 ndb1 ResourceManager[7196]: info: Acquiring resource group: ndb1 ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr2::192.168.131.105/24/eth0/192.168.131.255
Dec 17 19:42:22 ndb1 ldirectord[7223]: Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status
Dec 17 19:42:22 ndb1 ldirectord[7223]: ldirectord for /etc/ha.d/ldirectord.cf is running with pid: 5596
Dec 17 19:42:22 ndb1 ldirectord[7223]: Exiting from ldirectord status
Dec 17 19:42:22 ndb1 ResourceManager[7196]: info: Running /etc/ha.d/resource.d/ldirectord ldirectord.cf start
Dec 17 19:42:23 ndb1 ldirectord[7245]: Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf start
Dec 17 19:42:23 ndb1 IPaddr2[7291]: INFO: Running OK
如果没有出现异常,表明一切正常。
破坏性试验
1) 检查ndbd
关闭任意一台ndbd的进程,在ndb_mgm上查看是否失去连接。
如果失去连接,表示已经识别出来。
此时在数据库表中增加内容之后启动刚刚关闭的ndbd,检查新写入的数据是否已经被同步过来。如果同步过来,一切正常。
2) 检查heartbeat
关闭MD,检查BD的反应:
[root@ndb2 ~]# tail -f /var/log/messages
Dec 17 19:47:22 ndb2 harc[6862]: info: Running /etc/ha.d/rc.d/status status
Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: Comm_now_up(): updating status to active
Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: Local status now set to: 'active'
Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: Starting child client "/usr/lib/heartbeat/ipfail" (498,496)
Dec 17 19:47:23 ndb2 heartbeat: [6879]: info: Starting "/usr/lib/heartbeat/ipfail" as uid 498 gid 496 (pid 6879)
Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: remote resource transition completed.
Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: remote resource transition completed.
Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: Local Resource acquisition completed. (none)
Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: Initial resource acquisition complete (T_RESOURCES(them))
Dec 17 19:47:29 ndb2 ipfail: [6879]: info: Ping node count is balanced.
Dec 17 19:47:43 ndb2 heartbeat: [6852]: info: Received shutdown notice from 'ndb1'.
Dec 17 19:47:43 ndb2 heartbeat: [6852]: info: Resources being acquired from ndb1.
Dec 17 19:47:43 ndb2 heartbeat: [6884]: info: acquire all HA resources (standby).
Dec 17 19:47:43 ndb2 ResourceManager[6911]: info: Acquiring resource group: ndb2 ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr2::192.168.131.105/24/eth0/192.168.131.255
Dec 17 19:47:43 ndb2 ldirectord[6957]: ldirectord is stopped for /etc/ha.d/ldirectord.cf
Dec 17 19:47:43 ndb2 ldirectord[6957]: Exiting with exit_status 3: Exiting from ldirectord status
Dec 17 19:47:43 ndb2 heartbeat: [6885]: info: Local Resource acquisition completed.
Dec 17 19:47:43 ndb2 ldirectord[6961]: ldirectord is stopped for /etc/ha.d/ldirectord.cf
Dec 17 19:47:43 ndb2 ldirectord[6961]: Exiting with exit_status 3: Exiting from ldirectord status
Dec 17 19:47:43 ndb2 ResourceManager[6911]: info: Running /etc/ha.d/resource.d/ldirectord ldirectord.cf start
Dec 17 19:47:44 ndb2 ldirectord[6986]: Starting Linux Director v1.77.2.32 as daemon
Dec 17 19:47:44 ndb2 ldirectord[6988]: Added virtual server: 192.168.131.105:3306
Dec 17 19:47:44 ndb2 ldirectord[6988]: Quiescent real server: 192.168.131.101:3306 mapped from 192.168.131.101:3306 ( x 192.168.131.105:3306) (Weight set to 0)
Dec 17 19:47:44 ndb2 ldirectord[6988]: Quiescent real server: 192.168.131.77:3306 mapped from 192.168.131.77:3306 ( x 192.168.131.105:3306) (Weight set to 0)
Dec 17 19:47:44 ndb2 ResourceManager[6911]: info: Running /etc/ha.d/resource.d/LVSSyncDaemonSwap master start
Dec 17 19:47:44 ndb2 kernel: IPVS: stopping sync thread 5493 ...
Dec 17 19:47:45 ndb2 kernel: IPVS: sync thread stopped!
Dec 17 19:47:45 ndb2 LVSSyncDaemonSwap[7050]: info: ipvs_syncbackup down
Dec 17 19:47:45 ndb2 kernel: IPVS: sync thread started: state = MASTER, mcast_ifn = eth0, syncid = 0
Dec 17 19:47:45 ndb2 LVSSyncDaemonSwap[7050]: info: ipvs_syncmaster up
Dec 17 19:47:45 ndb2 LVSSyncDaemonSwap[7050]: info: ipvs_syncmaster obtained
Dec 17 19:47:45 ndb2 IPaddr2[7102]: INFO: Resource is stopped
Dec 17 19:47:45 ndb2 ResourceManager[6911]: info: Running /etc/ha.d/resource.d/IPaddr2 192.168.131.105/24/eth0/192.168.131.255 start
Dec 17 19:47:45 ndb2 IPaddr2[7214]: INFO: ip -f inet addr add 192.168.131.105/24 brd 192.168.131.255 dev eth0
Dec 17 19:47:45 ndb2 avahi-daemon[2776]: Registering new address record for 192.168.131.105 on eth0.
Dec 17 19:47:45 ndb2 IPaddr2[7214]: INFO: ip link set eth0 up
Dec 17 19:47:45 ndb2 IPaddr2[7214]: INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.131.105 eth0 192.168.131.105 auto not_used not_used
Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:47:45 ndb2 IPaddr2[7185]: INFO: Success
Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:47:45 ndb2 heartbeat: [6884]: info: all HA resource acquisition completed (standby).
Dec 17 19:47:45 ndb2 heartbeat: [6852]: info: Standby resource acquisition done [all].
Dec 17 19:47:45 ndb2 harc[7277]: info: Running /etc/ha.d/rc.d/status status
Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:47:45 ndb2 last message repeated 14 times
Dec 17 19:47:45 ndb2 mach_down[7293]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:47:45 ndb2 mach_down[7293]: info: mach_down takeover complete for node ndb1.
Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:47:45 ndb2 heartbeat: [6852]: info: mach_down takeover complete.
Dec 17 19:47:45 ndb2 harc[7327]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
Dec 17 19:47:45 ndb2 ip-request-resp[7327]: received ip-request-resp ldirectord::ldirectord.cf OK yes
Dec 17 19:47:45 ndb2 ResourceManager[7348]: info: Acquiring resource group: ndb2 ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr2::192.168.131.105/24/eth0/192.168.131.255
Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:47:46 ndb2 last message repeated 3 times
Dec 17 19:47:46 ndb2 ldirectord[7375]: ldirectord for /etc/ha.d/ldirectord.cf is running with pid: 6988
Dec 17 19:47:46 ndb2 ldirectord[7375]: Exiting from ldirectord status
Dec 17 19:47:46 ndb2 ResourceManager[7348]: info: Running /etc/ha.d/resource.d/ldirectord ldirectord.cf start
Dec 17 19:47:46 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:47:46 ndb2 last message repeated 6 times
Dec 17 19:47:46 ndb2 IPaddr2[7443]: INFO: Running OK
Dec 17 19:47:46 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:48:16 ndb2 last message repeated 289 times
Dec 17 19:48:16 ndb2 heartbeat: [6852]: WARN: node ndb1: is dead
Dec 17 19:48:16 ndb2 heartbeat: [6852]: info: Dead node ndb1 gave up resources.
Dec 17 19:48:16 ndb2 heartbeat: [6852]: info: Link ndb1:eth1 dead.
Dec 17 19:48:16 ndb2 ipfail: [6879]: info: Status update: Node ndb1 now has status dead
Dec 17 19:48:16 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:48:17 ndb2 last message repeated 8 times
Dec 17 19:48:17 ndb2 ipfail: [6879]: info: NS: We are dead. :<
Dec 17 19:48:17 ndb2 ipfail: [6879]: info: Link Status update: Link ndb1/eth1 now has status dead
Dec 17 19:48:17 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:48:17 ndb2 ipfail: [6879]: info: We are dead. :<
Dec 17 19:48:17 ndb2 ipfail: [6879]: info: Asking other side for ping node count.
Dec 17 19:48:18 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers[root@ndb2 ~]# tail -f /var/log/messages
Dec 17 19:47:22 ndb2 harc[6862]: info: Running /etc/ha.d/rc.d/status status
Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: Comm_now_up(): updating status to active
Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: Local status now set to: 'active'
Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: Starting child client "/usr/lib/heartbeat/ipfail" (498,496)
Dec 17 19:47:23 ndb2 heartbeat: [6879]: info: Starting "/usr/lib/heartbeat/ipfail" as uid 498 gid 496 (pid 6879)
Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: remote resource transition completed.
Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: remote resource transition completed.
Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: Local Resource acquisition completed. (none)
Dec 17 19:47:23 ndb2 heartbeat: [6852]: info: Initial resource acquisition complete (T_RESOURCES(them))
Dec 17 19:47:29 ndb2 ipfail: [6879]: info: Ping node count is balanced.
Dec 17 19:47:43 ndb2 heartbeat: [6852]: info: Received shutdown notice from 'ndb1'.
Dec 17 19:47:43 ndb2 heartbeat: [6852]: info: Resources being acquired from ndb1.
Dec 17 19:47:43 ndb2 heartbeat: [6884]: info: acquire all HA resources (standby).
Dec 17 19:47:43 ndb2 ResourceManager[6911]: info: Acquiring resource group: ndb2 ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr2::192.168.131.105/24/eth0/192.168.131.255
Dec 17 19:47:43 ndb2 ldirectord[6957]: ldirectord is stopped for /etc/ha.d/ldirectord.cf
Dec 17 19:47:43 ndb2 ldirectord[6957]: Exiting with exit_status 3: Exiting from ldirectord status
Dec 17 19:47:43 ndb2 heartbeat: [6885]: info: Local Resource acquisition completed.
Dec 17 19:47:43 ndb2 ldirectord[6961]: ldirectord is stopped for /etc/ha.d/ldirectord.cf
Dec 17 19:47:43 ndb2 ldirectord[6961]: Exiting with exit_status 3: Exiting from ldirectord status
Dec 17 19:47:43 ndb2 ResourceManager[6911]: info: Running /etc/ha.d/resource.d/ldirectord ldirectord.cf start
Dec 17 19:47:44 ndb2 ldirectord[6986]: Starting Linux Director v1.77.2.32 as daemon
Dec 17 19:47:44 ndb2 ldirectord[6988]: Added virtual server: 192.168.131.105:3306
Dec 17 19:47:44 ndb2 ldirectord[6988]: Quiescent real server: 192.168.131.101:3306 mapped from 192.168.131.101:3306 ( x 192.168.131.105:3306) (Weight set to 0)
Dec 17 19:47:44 ndb2 ldirectord[6988]: Quiescent real server: 192.168.131.77:3306 mapped from 192.168.131.77:3306 ( x 192.168.131.105:3306) (Weight set to 0)
Dec 17 19:47:44 ndb2 ResourceManager[6911]: info: Running /etc/ha.d/resource.d/LVSSyncDaemonSwap master start
Dec 17 19:47:44 ndb2 kernel: IPVS: stopping sync thread 5493 ...
Dec 17 19:47:45 ndb2 kernel: IPVS: sync thread stopped!
Dec 17 19:47:45 ndb2 LVSSyncDaemonSwap[7050]: info: ipvs_syncbackup down
Dec 17 19:47:45 ndb2 kernel: IPVS: sync thread started: state = MASTER, mcast_ifn = eth0, syncid = 0
Dec 17 19:47:45 ndb2 LVSSyncDaemonSwap[7050]: info: ipvs_syncmaster up
Dec 17 19:47:45 ndb2 LVSSyncDaemonSwap[7050]: info: ipvs_syncmaster obtained
Dec 17 19:47:45 ndb2 IPaddr2[7102]: INFO: Resource is stopped
Dec 17 19:47:45 ndb2 ResourceManager[6911]: info: Running /etc/ha.d/resource.d/IPaddr2 192.168.131.105/24/eth0/192.168.131.255 start
Dec 17 19:47:45 ndb2 IPaddr2[7214]: INFO: ip -f inet addr add 192.168.131.105/24 brd 192.168.131.255 dev eth0
Dec 17 19:47:45 ndb2 avahi-daemon[2776]: Registering new address record for 192.168.131.105 on eth0.
Dec 17 19:47:45 ndb2 IPaddr2[7214]: INFO: ip link set eth0 up
Dec 17 19:47:45 ndb2 IPaddr2[7214]: INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.131.105 eth0 192.168.131.105 auto not_used not_used
Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:47:45 ndb2 IPaddr2[7185]: INFO: Success
Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:47:45 ndb2 heartbeat: [6884]: info: all HA resource acquisition completed (standby).
Dec 17 19:47:45 ndb2 heartbeat: [6852]: info: Standby resource acquisition done [all].
Dec 17 19:47:45 ndb2 harc[7277]: info: Running /etc/ha.d/rc.d/status status
Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:47:45 ndb2 last message repeated 14 times
Dec 17 19:47:45 ndb2 mach_down[7293]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:47:45 ndb2 mach_down[7293]: info: mach_down takeover complete for node ndb1.
Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:47:45 ndb2 heartbeat: [6852]: info: mach_down takeover complete.
Dec 17 19:47:45 ndb2 harc[7327]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
Dec 17 19:47:45 ndb2 ip-request-resp[7327]: received ip-request-resp ldirectord::ldirectord.cf OK yes
Dec 17 19:47:45 ndb2 ResourceManager[7348]: info: Acquiring resource group: ndb2 ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr2::192.168.131.105/24/eth0/192.168.131.255
Dec 17 19:47:45 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:47:46 ndb2 last message repeated 3 times
Dec 17 19:47:46 ndb2 ldirectord[7375]: ldirectord for /etc/ha.d/ldirectord.cf is running with pid: 6988
Dec 17 19:47:46 ndb2 ldirectord[7375]: Exiting from ldirectord status
Dec 17 19:47:46 ndb2 ResourceManager[7348]: info: Running /etc/ha.d/resource.d/ldirectord ldirectord.cf start
Dec 17 19:47:46 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:47:46 ndb2 last message repeated 6 times
Dec 17 19:47:46 ndb2 IPaddr2[7443]: INFO: Running OK
Dec 17 19:47:46 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:48:16 ndb2 last message repeated 289 times
Dec 17 19:48:16 ndb2 heartbeat: [6852]: WARN: node ndb1: is dead
Dec 17 19:48:16 ndb2 heartbeat: [6852]: info: Dead node ndb1 gave up resources.
Dec 17 19:48:16 ndb2 heartbeat: [6852]: info: Link ndb1:eth1 dead.
Dec 17 19:48:16 ndb2 ipfail: [6879]: info: Status update: Node ndb1 now has status dead
Dec 17 19:48:16 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:48:17 ndb2 last message repeated 8 times
Dec 17 19:48:17 ndb2 ipfail: [6879]: info: NS: We are dead. :<
Dec 17 19:48:17 ndb2 ipfail: [6879]: info: Link Status update: Link ndb1/eth1 now has status dead
Dec 17 19:48:17 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
Dec 17 19:48:17 ndb2 ipfail: [6879]: info: We are dead. :<
Dec 17 19:48:17 ndb2 ipfail: [6879]: info: Asking other side for ping node count.
Dec 17 19:48:18 ndb2 kernel: IPVS: ip_vs_wrr_schedule(): no available servers
如果没有错误,表明heartbeat已经切换。
此时再次插入数据验证,如果还可以继续写入,表明配置完全成功。
Mysql cluster的测试报告:
在192.168.8.48上部署测试脚本,让这台服务器表示一个客户端请求读写数据库。
测试脚本1:
[root@localhost mysql-cluster]# cat /data/pay.kingsoft.com/wwwroot/test.php
$link = mysql_connect('192.168.131.105', 'ldirector', 'xxxxxxxxx');
mysql_select_db('kingsoft',$link);
$sql = "insert into `preference`(`id`,`preferenceSerialNumber`,`username`,`preferenceTypeId`,`i***pired`,`isUsed`,`preferenceUsername`,`equalMoney`,`genDatetime`,`useDatetime`,`grantDatetime`,`expriedDatetime`) values ( NULL,'514a49f83820e34c877ff48770e48ea7','liujun','2','1','1','kingsoft','512.23','2008-12-03','2008-12-03','2008-12-03','2008-12-03')";
for($i = 0;$i < 100 ;$i++){
mysql_query($sql);
}
mysql_close($link);
?>
测试脚本2:
[root@localhost mysql-cluster]# cat test.sh
#!/bin/sh
i=0;
j=0;
while [ $i -lt 1000 ]
do
wget -q
i=`expr $i + 1`;
done
sleep 2;
find . -name "test.php.*" | xargs rm -rf ;
while [ $j -lt 1000 ]
do
mysql -uldirector -pxxxxxxxxxxx -h192.168.131.105 -e "use kingsoft; insert into preference(preferenceSerialNumber,username,preferenceTypeId,preferenceUsername,equalMoney,genDatetime,useDatetime,grantDatetime,expriedDatetime) values('514a49f83820e34c877ff48770e48ea7','liujun2','3','liujun33333','33.8','2008-12-23 7:05:00','2008-12-23 7:15:00','2008-12-23 7:25:00','2008-12-23 7:35:00')";
j=`expr $j + 1`;
done
sleep 3;
server=`mysql -uldirector -pxxxxxxxxxx -h192.168.131.105 -e "use kingsoft;select count(*) from preference"`;
datetime=`date +%T`;
echo $datetime"----------"$server >> /tmp/mysql-cluster/mysql.log;
[root@localhost mysql-cluster]#
测试时间:
在192.168.8.48的cron中添加:
[root@localhost mysql-cluster]# crontab -e
*/3 * * * * sh /tmp/mysql-cluster/test.sh > /dev/null 2>&1
[root@localhost mysql-cluster]#
连续运行24小时。
测试结果:
#Cat mysql.log
14:31:54----------count(*) 21022
14:35:00----------count(*) 42634
14:37:57----------count(*) 63608
14:40:55----------count(*) 84708
14:43:55----------count(*) 105887
14:46:54----------count(*) 127045
14:49:58----------count(*) 148512
14:53:01----------count(*) 169795
14:56:27----------count(*) 190714
14:59:29----------count(*) 209921
15:02:03----------count(*) 231380
15:03:51----------count(*) 252231
15:05:12----------count(*) 269825
15:05:33----------count(*) 271824
15:08:05----------count(*) 291141
15:10:59----------count(*) 311836
15:14:00----------count(*) 332951
15:16:57----------count(*) 353841
15:19:59----------count(*) 374977
15:23:03----------count(*) 396181
15:26:01----------count(*) 417064
15:29:01----------count(*) 438098
15:32:03----------count(*) 459191
15:35:05----------count(*) 480229
15:38:05----------count(*) 501222
15:41:02----------count(*) 521868
15:43:59----------count(*) 542721
15:47:02----------count(*) 563841
16:00:32----------count(*) 698215
18:50:49----------count(*) 2105513
19:09:01----------count(*) 2105513
19:26:13----------count(*) 2105513
19:27:28----------count(*) 2105513
[root@localhost mysql-cluster]#
测试结果分析:
1)当逐渐增加负载,数据库的负载并不大,CPU占用率为30%,而内存则由600MB逐渐升至2GB,最终达到极限。
2)数据并发量大,并未引起数据库的异常,表明负载均衡已经解决了单台服务器负载太大的引起的瓶颈。
3)由于内存有限(2GB),当表中数据达到一定量以后,会出现表满现象。这种情况可以通过增加内存来解决。
4)mysql cluster可以实现高可用性、负载均衡,并且通过优化参数使其进一步稳定服务。
5)可以采用6.3版本的mysql cluster,来减小NDBD内存用量。
6)Mysql cluster的性能一般,比mysql replication慢。
需要注意的问题:
1) 当ndbd第一次启动的时候或者config.ini更改的时候,需要加--initial参数进行初始化。
2) 尽可能的不要人工干预系统,出现问题需要谨慎对待。
以下是一个外国人写的注意事项和优化:
1. Broken up into three parts. The MySQL servers sit separate from the NDB Storage Engine, which are storage nodes (NDB nodes). The third part is called a management server. The management server, oddly enough, isn’t required once the cluster is up and running unless you want to add another storage node.
2. Memory based storage engine. If you’re not using 5.1+ then you must have enough RAM in each storage node to store the data set. This means that if you have four machines with 4GB of RAM each you can store 8GB of data (16GB of total storage divided by two for two copies of the data set).
3. Storage nodes are static and pre-allocate resources on startup.
4. Supports transactions and row level locking.
5. Should be noted this is a storage engine so you can’t create MyISAM or InnoDB tables inside of a cluster.
6. Uses fixed sized records. This means if you have a varchar(255) and put a single byte into it that field is still using 255 bytes.
7. No foreign key constraint support.
8. Replication across nodes is syncronous across nodes. I assume this means that an INSERT happens once all of the nodes have completed the INSERT. This is different than regular replication which is asyncronous and introduces race conditions.
9. Tables are divided into fragments (one fragment for each storage node). Each storage node is responsible for each fragment. Each fragment also has a secondary fragment, which is a copy of another node’s primary fragment. This data distribution happens automatically.
10. NDB takes your primary key, creates a hash and then converts that to a fragment. So you’ll have various rows on each different storage node.
11. A node group is a set of nodes that share the same fragment information. If you lose an entire node group you’ve lost half of the table and the cluster will not continue to operate. However, if one node in a node group fails there will still be enough data to keep the cluster up and running.
12. If a node fails and it’s secondary counterpart takes over it will, essentially, have to perform the job of two nodes. Until a node has fully recovered it will not rejoin the cluster.
13. Backups are hot and non-locking. Each node writes its own set of backup files. No support for incremental backups.
14. Because it’s memory based you could lose data on a system crash (as you might have transactions sitting in RAM when a crash occurs). The COMMIT command does not write changes to disk meaning that you could have data sitting in memory that’s not on disk when a node crashes. This means the odd truth is that MySQL Clusters support syncronous replication, but are not atomic.
15. NDB nodes will checkpoint data to disk (data + logs), which are used for system recovery. They write two logs, the UNDO and REDO logs.
16. They recommend using TRUNCATE to delete all rows from a table.
17. Modification operations are distributed to both the primary and secondary fragments (obviously).
18. NDB will run on 64-bit machines. They recommend Dual CPU 64-bit machines. NDB is threaded. Application nodes (MySQL servers) can be whatever.
19. SCI offers 30-100% better performance over gigabit.
20. They actually recommend avoiding joins and to denormalize your schemas. Are you kidding me? He actually said “Performance for joins sucks.”
Overall, I’m underwhelmed by MySQL Clustering. You’re limited in storage with the RAM and you can’t optimize your schemas due to fixed field sizes. And any RDBMS “solution” that recommends you denormalize puts me off.
That being said the actual technology is pretty interesting and I suspect that in a few years we’ll see the clustering features in MySQL come into their own. As of now I suspect few people would be able to justify the sacrifices for the gains clustering allows.
(全文完)