SUSE 9 heartbeat双机集群配置
环境:两台SUSE 9,同时已经搭建好apache
主机名:node1(172.16.11.88/255.255.0.0) --主用
node2(172.16.12.89/255.255.0.0) --备用
心跳网卡:node1-10.10.10.200
node2-10.10.10.201
配置IP地址:略
操作步骤:
1、在node1上配置/etc/hosts和/etc/HOSTNAME两个配置文件,内容如下:
node1:/etc # more /etc/HOSTNAME
node1
node1:/etc # more /etc/hosts
。。。。。。。。
。。。。。。。。
10.10.10.200 node1.com node1
172.16.11.88 node1.com node1
2、在node2上配置/etc/hosts和/etc/HOSTNAME两个配置文件,内容如下:
node2:~ # more /etc/HOSTNAME
node2
node2:~ # more /etc/hosts
。。。。。。。。
。。。。。。。。
10.10.10.201 node2.com node2
172.16.12.89 node2.com node2
3、分别对两台机器进行重新启动 init 6
4、由于安装SUSE时是完全安装,已经自带有heartbeat软件,只需进行配置文件的拷贝和修改即可
这里只需在node1上操作,之后采用scp将配置好的文件传到备用node2即可
node1:~ # rpm -q heartbeat -d
/usr/share/doc/packages/heartbeat/AUTHORS
/usr/share/doc/packages/heartbeat/COPYING
/usr/share/doc/packages/heartbeat/ChangeLog
/usr/share/doc/packages/heartbeat/DirectoryMap.txt
/usr/share/doc/packages/heartbeat/GettingStarted.html
/usr/share/doc/packages/heartbeat/GettingStarted.txt
/usr/share/doc/packages/heartbeat/HardwareGuide.html
。。。。。
node1:~ # cd /usr/share/doc/packages/heartbeat
node1:/usr/share/doc/packages/heartbeat # ls
. COPYING GettingStarted.html HardwareGuide.txt Requirements.txt faqntips.html haresources rsync.html
.. ChangeLog GettingStarted.txt README apphbd.cf faqntips.txt heartbeat_api.html rsync.txt
AUTHORS DirectoryMap.txt HardwareGuide.html Requirements.html authkeys ha.cf heartbeat_api.txt startstop
node1:/usr/share/doc/packages/heartbeat # cp ha.cf /etc/ha.d/
node1:/usr/share/doc/packages/heartbeat # cp haresources /etc/ha.d/
node1:/usr/share/doc/packages/heartbeat # cp authkeys /etc/ha.d/
node1:/etc/ha.d # ls
. .. README.config authkeys conf ha.cf harc haresources rc.d resource.d shellfuncs
node1:/etc/ha.d # cp ha.cf ha.cf-bak
编辑ha.cf,去掉如下字段前的#即可
node1:/etc/ha.d # more ha.cf
debugfile /var/log/ha-debug 该文件保存heartbeat的调试信息
logfile /var/log/ha-log heartbeat的日志文件
logfacility local0
keepalive 2 心跳的时间间隔,默认时间单位为秒
deadtime 30 超出该时间间隔未收到对方节点的心跳,则认为对方已经死亡。
warntime 10 超出该时间间隔未收到对方节点的心跳,则发出警告并记录到日志中。
initdead 120 在某些系统上,系统启动或重启之后需要经过一段时间网络才能正常工作,该选项用于解决这种情况产生的时间间隔。取值至少为deadtime 的两倍。
udpport 694 设置广播通信使用的端口,694为默认使用的端口号。
baud 19200 设置串行通信的波特率。
serial /dev/ttyS0 # Linux 选择串行通信设备,用于双机使用串口线连接的情况。如果双机使用以太网连接,则应该关闭该选项。
bcast eth1 # Linux 设置广播通信所使用的网络接口卡。
auto_failback on heartbeat的两台主机分别为主节点和从节点。主节点在正常情况下占用资源并运行所有的服务,
在该选项设为on的情况下,一旦主节点恢复运行,则自动获取
node node1 --经过了修改
node node2 --经过了修改
ping 172.16.13.90 用于心跳测试
respawn hacluster /usr/lib/heartbeat/ipfail 指定与heartbeat一同启动和关闭的进程,该进程被自动监视,
编辑haresource,添加如下内容
node1:/etc/ha.d # more haresources
node1 IPaddr::172.16.13.90 apache
编辑authkeys,注释如下字段前#
node1:/etc/ha.d # more authkeys
auth 1
1 crc
5、建立http服务切换启动脚本的链接文件
node1:/etc/ha.d/resource.d # mv apache apache-bak (node2上也要做这步)
node1:/etc/ha.d/resource.d # ln -s /var/local/apache/bin/apachectl apache (node2上也要做这步)
6、为保证机器重启后heartbeat能够自动启动,建立如下链接文件 (node2上也要做这步)
node1:/etc/rc.d/rc0.d # ln -s /etc/init.d/heartbeat K05heartbeat
node1:/etc/rc.d/rc0.d # cd ..
node1:/etc/rc.d # cd rc3.d
node1:/etc/rc.d/rc3.d # ln -s /etc/init.d/heartbeat S75heartbeat
node1:/etc/rc.d/rc3.d # cd ..
node1:/etc/rc.d # cd rc5.d
node1:/etc/rc.d/rc5.d # ln -s /etc/init.d/heartbeat S75heartbeat
node1:/etc/rc.d/rc5.d # cd ..
node1:/etc/rc.d # cd rc6.d
node1:/etc/rc.d/rc6.d # ln -s /etc/init.d/heartbeat K05heartbeat
7、使用scp将配置好的文件传送到node2上
node1:/etc/ha.d # scp ha.cf haresources authkeys
8、测试一
在node1上启动heartbeat
node1:/etc/init.d # ./heartbeat start
在真实机PING提供服务的集群的地址,不会马上通,
C:\Documents and Settings\Administrator>ping 172.16.13.90 -t
Pinging 172.16.13.90 with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Reply from 172.16.13.90: bytes=32 time=3090ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
查看node1的IP地址,发现已经绑定了集群地址
node1:/etc/ha.d # ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:0C:29:2F:8D:5C
inet addr:172.16.11.88 Bcast:172.16.255.255 Mask:255.255.0.0
inet6 addr: fe80::20c:29ff:fe2f:8d5c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:22532 errors:0 dropped:0 overruns:0 frame:0
TX packets:6594 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2399073 (2.2 Mb) TX bytes:867317 (846.9 Kb)
Interrupt:5 Base address:0x2000
eth0:1 Link encap:Ethernet HWaddr 00:0C:29:2F:8D:5C
inet addr:172.16.13.90 Bcast:172.16.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:5 Base address:0x2000
eth1 Link encap:Ethernet HWaddr 00:0C:29:2F:8D:66
inet addr:10.10.10.200 Bcast:10.10.10.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe2f:8d66/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:26506 errors:0 dropped:0 overruns:0 frame:0
TX packets:412 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:8694184 (8.2 Mb) TX bytes:59800 (58.3 Kb)
Interrupt:9 Base address:0x2080
并且heartbeat服务也已经启动
node1:/etc/ha.d # ps -ef|grep heartbeat
root 9396 1 0 09:29 ? 00:00:00 heartbeat: heartbeat: master control process
nobody 9398 9396 0 09:29 ? 00:00:00 heartbeat: heartbeat: FIFO reader
nobody 9399 9396 0 09:29 ? 00:00:00 heartbeat: heartbeat: write: serial /dev/ttyS0
nobody 9400 9396 0 09:29 ? 00:00:00 heartbeat: heartbeat: read: serial /dev/ttyS0
nobody 9401 9396 0 09:29 ? 00:00:00 heartbeat: heartbeat: write: bcast eth1
nobody 9402 9396 0 09:29 ? 00:00:00 heartbeat: heartbeat: read: bcast eth1
nobody 9403 9396 0 09:29 ? 00:00:00 heartbeat: heartbeat: write: ping 172.16.13.90
nobody 9404 9396 0 09:29 ? 00:00:00 heartbeat: heartbeat: read: ping 172.16.13.90
haclust 9441 9396 0 09:31 ? 00:00:00 /usr/lib/heartbeat/ipfail
启动node2的heartbeat,并没有发现IP有变化
node2:/etc/init.d # ./heartbeat start
Starting High-Availability servicesheartbeat: 2009/08/21_09:37:42 info: **************************
heartbeat: 2009/08/21_09:37:43 info: Configuration validated. Starting heartbeat 1.2.2
done
node2:/etc/init.d # ps -ef|grep heartbeat
root 9282 1 0 09:37 ? 00:00:00 heartbeat: heartbeat: master control process
nobody 9284 9282 0 09:37 ? 00:00:00 heartbeat: heartbeat: FIFO reader
nobody 9285 9282 0 09:37 ? 00:00:00 heartbeat: heartbeat: write: serial /dev/ttyS0
nobody 9286 9282 0 09:37 ? 00:00:00 heartbeat: heartbeat: read: serial /dev/ttyS0
nobody 9287 9282 0 09:37 ? 00:00:00 heartbeat: heartbeat: write: bcast eth1
nobody 9288 9282 0 09:37 ? 00:00:00 heartbeat: heartbeat: read: bcast eth1
nobody 9289 9282 0 09:37 ? 00:00:00 heartbeat: heartbeat: write: ping 172.16.13.90
nobody 9290 9282 0 09:37 ? 00:00:00 heartbeat: heartbeat: read: ping 172.16.13.90
haclust 9292 9282 0 09:37 ? 00:00:00 /usr/lib/heartbeat/ipfail
root 9319 8594 0 09:41 pts/2 00:00:00 grep heartbeat
node2:/etc/init.d # ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:0C:29:B6:91:D5
inet addr:172.16.12.89 Bcast:172.16.255.255 Mask:255.255.0.0
inet6 addr: fe80::20c:29ff:feb6:91d5/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:21262 errors:0 dropped:0 overruns:0 frame:0
TX packets:1644 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2281613 (2.1 Mb) TX bytes:222878 (217.6 Kb)
Interrupt:5 Base address:0x2000
eth1 Link encap:Ethernet HWaddr 00:0C:29:B6:91:DF
inet addr:10.10.10.201 Bcast:10.10.10.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:feb6:91df/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:23005 errors:0 dropped:0 overruns:0 frame:0
TX packets:44 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2449417 (2.3 Mb) TX bytes:2656 (2.5 Kb)
Interrupt:9 Base address:0x2080
切换测试
停掉node1的heartbeat,查看IP地址,发现集群地址已经被释放
node1:/etc/init.d # ./heartbeat stop
Stopping High-Availability services done
node1:/etc/init.d # ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:0C:29:2F:8D:5C
inet addr:172.16.11.88 Bcast:172.16.255.255 Mask:255.255.0.0
inet6 addr: fe80::20c:29ff:fe2f:8d5c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:27236 errors:0 dropped:0 overruns:0 frame:0
TX packets:7708 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2890393 (2.7 Mb) TX bytes:996856 (973.4 Kb)
Interrupt:5 Base address:0x2000
eth1 Link encap:Ethernet HWaddr 00:0C:29:2F:8D:66
inet addr:10.10.10.200 Bcast:10.10.10.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe2f:8d66/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:30427 errors:0 dropped:0 overruns:0 frame:0
TX packets:741 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:9087419 (8.6 Mb) TX bytes:110217 (107.6 Kb)
Interrupt:9 Base address:0x2080
回到node2上看到已经由它绑定,并且PING过程没有任何的间断
node2:/etc/init.d # ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:0C:29:B6:91:D5
inet addr:172.16.12.89 Bcast:172.16.255.255 Mask:255.255.0.0
inet6 addr: fe80::20c:29ff:feb6:91d5/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:24140 errors:0 dropped:0 overruns:0 frame:0
TX packets:2131 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2590821 (2.4 Mb) TX bytes:295284 (288.3 Kb)
Interrupt:5 Base address:0x2000
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:B6:91:D5
inet addr:172.16.13.90 Bcast:172.16.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:5 Base address:0x2000
eth1 Link encap:Ethernet HWaddr 00:0C:29:B6:91:DF
inet addr:10.10.10.201 Bcast:10.10.10.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:feb6:91df/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:25335 errors:0 dropped:0 overruns:0 frame:0
TX packets:257 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2690249 (2.5 Mb) TX bytes:35614 (34.7 Kb)
Interrupt:9 Base address:0x2080
C:\Documents and Settings\Administrator>ping 172.16.13.90 -t
Pinging 172.16.13.90 with 32 bytes of data:
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
再次启动node1的heartbeat,发现集群地址由被抢回来,并且PING也没有任何的间断
node1:/etc/init.d # ./heartbeat start
Starting High-Availability servicesheartbeat: 2009/08/21_09:48:38 info: **************************
heartbeat: 2009/08/21_09:48:38 info: Configuration validated. Starting heartbeat 1.2.2
node1:/etc/init.d # ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:0C:29:2F:8D:5C
inet addr:172.16.11.88 Bcast:172.16.255.255 Mask:255.255.0.0
inet6 addr: fe80::20c:29ff:fe2f:8d5c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:29224 errors:0 dropped:0 overruns:0 frame:0
TX packets:7782 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3087130 (2.9 Mb) TX bytes:1006644 (983.0 Kb)
Interrupt:5 Base address:0x2000
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:2F:8D:5C
inet addr:172.16.13.90 Bcast:172.16.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:5 Base address:0x2000
eth1 Link encap:Ethernet HWaddr 00:0C:29:2F:8D:66
inet addr:10.10.10.200 Bcast:10.10.10.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe2f:8d66/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:32428 errors:0 dropped:0 overruns:0 frame:0
TX packets:767 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:9283107 (8.8 Mb) TX bytes:113956 (111.2 Kb)
Interrupt:9 Base address:0x2080
C:\Documents and Settings\Administrator>ping 172.16.13.90 -t
Pinging 172.16.13.90 with 32 bytes of data:
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
这时发现node2已经释放了集群的IP
node2:/etc/init.d # ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:0C:29:B6:91:D5
inet addr:172.16.12.89 Bcast:172.16.255.255 Mask:255.255.0.0
inet6 addr: fe80::20c:29ff:feb6:91d5/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:25996 errors:0 dropped:0 overruns:0 frame:0
TX packets:2277 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2774784 (2.6 Mb) TX bytes:313663 (306.3 Kb)
Interrupt:5 Base address:0x2000
eth1 Link encap:Ethernet HWaddr 00:0C:29:B6:91:DF
inet addr:10.10.10.201 Bcast:10.10.10.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:feb6:91df/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:27078 errors:0 dropped:0 overruns:0 frame:0
TX packets:430 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2859383 (2.7 Mb) TX bytes:62097 (60.6 Kb)
Interrupt:9 Base address:0x2080
对node1进行重启,发现PING过程中会有偶尔的间断
node1:/etc/rc.d/rc6.d # init 6
node1:/etc/rc.d/rc6.d #
C:\Documents and Settings\Administrator>ping 172.16.13.90 -t
Pinging 172.16.13.90 with 32 bytes of data:
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
再node1重启过程中,集群IP会被短暂地由node2来接管
node2:~ # ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:0C:29:B6:91:D5
inet addr:172.16.12.89 Bcast:172.16.255.255 Mask:255.255.0.0
inet6 addr: fe80::20c:29ff:feb6:91d5/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:30489 errors:0 dropped:0 overruns:0 frame:0
TX packets:2899 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3254986 (3.1 Mb) TX bytes:397698 (388.3 Kb)
Interrupt:5 Base address:0x2000
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:B6:91:D5
inet addr:172.16.13.90 Bcast:172.16.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:5 Base address:0x2000
eth1 Link encap:Ethernet HWaddr 00:0C:29:B6:91:DF
inet addr:10.10.10.201 Bcast:10.10.10.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:feb6:91df/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:31056 errors:0 dropped:0 overruns:0 frame:0
TX packets:798 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3255139 (3.1 Mb) TX bytes:119282 (116.4 Kb)
Interrupt:9 Base address:0x2080
当node1启动完成后,集群IP又会回到node1
node1:~ # ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:0C:29:2F:8D:5C
inet addr:172.16.11.88 Bcast:172.16.255.255 Mask:255.255.0.0
inet6 addr: fe80::20c:29ff:fe2f:8d5c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:689 errors:0 dropped:0 overruns:0 frame:0
TX packets:175 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:77787 (75.9 Kb) TX bytes:15691 (15.3 Kb)
Interrupt:5 Base address:0x2000
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:2F:8D:5C
inet addr:172.16.13.90 Bcast:172.16.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:5 Base address:0x2000
eth1 Link encap:Ethernet HWaddr 00:0C:29:2F:8D:66
inet addr:10.10.10.200 Bcast:10.10.10.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe2f:8d66/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:519 errors:0 dropped:0 overruns:0 frame:0
TX packets:55 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:52692 (51.4 Kb) TX bytes:7923 (7.7 Kb)
Interrupt:9 Base address:0x2080
node2已经释放了集群IP
node2:~ # ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:0C:29:B6:91:D5
inet addr:172.16.12.89 Bcast:172.16.255.255 Mask:255.255.0.0
inet6 addr: fe80::20c:29ff:feb6:91d5/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:30795 errors:0 dropped:0 overruns:0 frame:0
TX packets:2937 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3287332 (3.1 Mb) TX bytes:403118 (393.6 Kb)
Interrupt:5 Base address:0x2000
eth1 Link encap:Ethernet HWaddr 00:0C:29:B6:91:DF
inet addr:10.10.10.201 Bcast:10.10.10.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:feb6:91df/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:31293 errors:0 dropped:0 overruns:0 frame:0
TX packets:833 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3278826 (3.1 Mb) TX bytes:124863 (121.9 Kb)
Interrupt:9 Base address:0x2080
这个时候PING没有任何的间断
C:\Documents and Settings\Administrator>ping 172.16.13.90 -t
Pinging 172.16.13.90 with 32 bytes of data:
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
Reply from 172.16.13.90: bytes=32 time<1ms TTL=64
注意:先停掉node2,再停掉node1
然后再启动node2,需要等1分钟左右才能由node2获取到集群地址
再来启动node1,这个时候node2会释放集群地址,由node1获得
9、测试二,http服务切换
在node1上查看http已经启动
node1:/etc/init.d # ps -ef|grep httpd
root 17653 1 0 14:48 ? 00:00:00 /var/local/apache/bin/httpd -k start
daemon 17654 17653 0 14:48 ? 00:00:00 /var/local/apache/bin/httpd -k start
daemon 17655 17653 0 14:48 ? 00:00:00 /var/local/apache/bin/httpd -k start
daemon 17656 17653 0 14:48 ? 00:00:00 /var/local/apache/bin/httpd -k start
daemon 17657 17653 0 14:48 ? 00:00:00 /var/local/apache/bin/httpd -k start
daemon 17661 17653 0 14:48 ? 00:00:00 /var/local/apache/bin/httpd -k start
root 17663 6349 0 14:48 pts/0 00:00:00 grep httpd
停掉node1的heartbeat,会发现http也已经被关闭
node1:/etc/init.d # ./heartbeat stop
Stopping High-Availability services done
node1:/etc/init.d # ps -ef|grep httpd
root 18705 6349 0 15:18 pts/0 00:00:00 grep httpd
回到node2上查看,发现http被带动起来,在这之前http是没有启动的
node2:/etc/ha.d # ps -ef|grep httpd
root 22995 1 0 15:22 ? 00:00:00 /var/local/apache/bin/httpd -k start
daemon 22996 22995 0 15:22 ? 00:00:00 /var/local/apache/bin/httpd -k start
daemon 22997 22995 0 15:22 ? 00:00:00 /var/local/apache/bin/httpd -k start
daemon 22998 22995 0 15:22 ? 00:00:00 /var/local/apache/bin/httpd -k start
daemon 23008 22995 0 15:22 ? 00:00:00 /var/local/apache/bin/httpd -k start
daemon 23009 22995 0 15:22 ? 00:00:00 /var/local/apache/bin/httpd -k start
root 23106 8594 0 15:23 pts/2 00:00:00 grep httpd
再启动node1的heartbeat,http也同时被启动
node1:/etc/init.d # ./heartbeat start
Starting High-Availability servicesheartbeat: 2009/08/21_15:21:12 info: **************************
heartbeat: 2009/08/21_15:21:12 info: Configuration validated. Starting heartbeat 1.2.2
done
node1:/etc/init.d #
node1:/etc/init.d # ps -ef|grep httpd
root 18905 1 0 15:21 ? 00:00:00 /var/local/apache/bin/httpd -k start
daemon 18906 18905 0 15:21 ? 00:00:00 /var/local/apache/bin/httpd -k start
daemon 18907 18905 0 15:21 ? 00:00:00 /var/local/apache/bin/httpd -k start
daemon 18908 18905 0 15:21 ? 00:00:00 /var/local/apache/bin/httpd -k start
daemon 18909 18905 0 15:21 ? 00:00:00 /var/local/apache/bin/httpd -k start
daemon 18913 18905 0 15:21 ? 00:00:00 /var/local/apache/bin/httpd -k start
root 18915 6349 0 15:21 pts/0 00:00:00 grep httpd
node2的http已经被关闭
node2:/etc/ha.d # ps -ef|grep httpd
root 24098 8594 0 15:25 pts/2 00:00:00 grep httpd
在切换过程中总会发现如下的字样
node1:/var/log # tail -f ha-log
heartbeat: 2009/08/21_15:21:18 info: Acquiring resource group: node1 IPaddr::172.16.13.90 apache
heartbeat: 2009/08/21_15:21:18 info: Running /etc/ha.d/resource.d/IPaddr 172.16.13.90 start
heartbeat: 2009/08/21_15:18:12 info: Releasing resource group: node1 IPaddr::172.16.13.90 apache
heartbeat: 2009/08/21_15:18:12 info: Running /etc/ha.d/resource.d/apache stop
heartbeat: 2009/08/21_15:18:12 info: Running /etc/ha.d/resource.d/IPaddr 172.16.13.90 stop
调试日志
node1:/var/log # tail -f ha-debug
ls: /var/lib/heartbeat/rsctmp/IPaddr/eth0:*: No such file or directory
heartbeat: 2009/08/21_15:21:19 debug: /etc/ha.d/resource.d/IPaddr 172.16.13.90 start done. RC=0
/etc/ha.d/resource.d/apache: line 94: lynx: command not found
heartbeat: 2009/08/21_15:21:19 debug: Starting /etc/ha.d/resource.d/apache start
heartbeat: 2009/08/21_15:21:19 debug: /etc/ha.d/resource.d/apache start done. RC=0
heartbeat: 2009/08/21_15:32:45 debug: Starting /etc/ha.d/resource.d/apache stop
heartbeat: 2009/08/21_15:32:45 debug: /etc/ha.d/resource.d/apache stop done. RC=0
heartbeat: 2009/08/21_15:32:45 debug: Starting /etc/ha.d/resource.d/IPaddr 172.16.13.90 stop
SIOCDELRT: No such process