Linux HA Cluster:
高可用: 一个服务器down掉的可能性多种多样,任何一个可能坏了都有可能带来风险,而服务器离线通常带来的代价是很大的,尤其是 web站点,所以当某一台提供服务的的服务器down掉不至于服务终止的就叫高可用。
心跳: 就是将多台服务器用网络连接起来,而后每一台服务器都不停的将自己依然在线的信息很简短很小的通告给同一个网络中的备用服务器的主机,告诉其实主机自己依然在线,其它服务器收到这个心跳信息就认为本机是在线的,尤其是主服务器。
心跳信息怎么发送, 由谁来收,其实就是进程中的通信两台主机是没法通信的,只能利用网络功能,通过进程监听在某一套接字上,实现数据发送,数据请求,所以多台服务器就得运行同等的进程,这两个进程不停的进行通信,主节点(主服务器)不停的向对方同等的节点发送自己的心跳信息,那这个软件就叫高可用的集群的基准层次,也叫心跳信息传递层以及事物信息的传递层,这是运行在集群中的各节点上的进程,这个进程是个服务软件,关机后需要将其启动起来,主机间才可以传递信息的,一般是主节点传给备节点。
资源: 以web为例,vip是资源,web服务也是资源,还有网页面也是资源,一个服务包括多个资源,而像web的共享存储也是资源等等,不同的服务所需要的资源也是不同的,而共享存储是高可用集群中最难解决的问题。
如是主节点挂了,多个备节点怎么样来选择一个备节点来做为提供服务的一个节点呢,而这种应该选择哪个备用节点来做为提供服务的机制就叫做
集群事物决策的过程 。
ha_aware:
ha_aware: 如果一个应用程序自己能够利用底层心跳信息传递层的功能完成集群事物决策的过程的软件就叫ha_aware。
非ha_aware:
DC: Designated Coordinator
DC:Designated Coordinator选定的协调员 ,当DC所在的主机挂了就会先选出一个DC,再由DC做出事物的决策。
注意: 在高可用集群中最核心的、最底层的管理的单位叫
资源 ,把资源组合在一起来组合成一个服务。
高可用集群中,任何资源都不应该自行启动,而是由CRM管理启动与否:
CRM: Cluster Resources Manager
CRM: Cluster Resources Manager集群资源管理,真正做出决策的是CRM。
LRM: Local Resources Manager
heartbeat v1版 时就有了资源管理的概念,而v1版的资源就是heartbeat自带的,叫haresources,这个文件是个配置文件;而这个配置文件接口就叫haresources;
heartbeat v2第二版 的时候,heartbeat被做了很大的改进,自己可以做为一个独立进程来运行,并而可以通过它接收用户请求,它就叫crm,在运行时它需要在各节点上运行一个叫crmd的进程,这个进程通常要监听在一个套接字上,端口就是5560,所以服务器端叫crmd,而客户端叫crm(可以称为crm shell),是个命令行接口,通过这个命令行接口就可以跟服务器端的crm通信了,heartbeat也有它的图形化界面工具,就叫heartbeat-GUI工具,通过这个界面就可以配置进行。
第三版heartbeat v3, 被独立成三个项目heartbeat、pacemaker(心脏起博器)、cluster-glue(集群的贴合器),架构分离开来了,可以结合其它的组件工作了。
RA: resource agent
{ start|stop|restart|status}
status:
runing: 运行
stopped: 停止
failover: 失效转移,故障转移
failback: 失效转回,故障转回
Message Layer:
heartbeat v1,v2,v3
(OpenAIS) ,corosync
cman
CRM:
heartbeat v1: haresources (配置接口:配置文件,文件名也叫haresources)
heartbeat v2: crm (各节点均运行进程crmd: 配置接口:客户端crmsh(shell)、heartbeat-GUI )
heartbeat v3 = heartbeat + pacemaker + cluster-glue:
pacemaker: 配置接口:
CLI: crm(SuSE),pcs
GUI:hawk,LCMC,pacemaker-mgmt
cman + rgmanager:
resource group manager: Failover Domain
配置接口:
RHCS: RedHat Cluster Suite
配置接口: Conga(完全生命周期的配置接口)
RA类型:
heartbeat legacy: heartbeat的传统类型
LSB: /etc/rc.d/init.d/*
OCF: Open Cluster Framework
provider: pacemaker
linbit
STONITH:
keepalived : vrrp
keepalived+ipvs
keepalived+haproxy
RHEL OR CentOS高可用集群解决方案:
5:
自带: RHCS: cman+rgmanager
选用第三方: corosync+pacemaker,heartbeat(v1或v2),keepalived
6:
自带: RHCS(cman+rgmanager)
corosync+rgmanager
cman+pacemaker
heartbeat v3 + pacemaker
keepalived
应用方式:
做前端负载均衡器的高可用: keepalived
做大规模的高可用集群:corosync(cman)+pacemaker
资源隔离:
节点级别:
STONITH
资源级别:
HA节点数: 大于2,且总数为奇数:
HA集群的工作模型:
A/P: two nodes,工作于主备模型;
N-M:N>m,N各节点,M个服务;活动节点N,备用N-M个
N-N:N个节点,N个服务;
A-A:双主模型;
资源转移的方式:
rgmanager:failover domain, priority
pacemaker:
资源粘性:
资源约束(3种类型):
位置约束:资源更倾向于哪个节点上;
inf: 无穷大
n:
- n:
- inf:负无穷
排列约束:资源运行在同一节点的倾向性;
inf:
-inf:
顺序约束:资源启动次序及关闭次序:
例子:如何让web service中的三个资源:vip、httpd及filesystem运行于同一节点上?
1、排列约束:
2、资源组(resource group):
如果节点不再是集群节点成员时,如何处理运行于当前节点的资源?
stopped
ignore
freeze
suicide
一个资源刚配置完成时,是否启动?
target-role?
RA类型:
heartbeat legacy
LSB
OCF
STONITH
资源类型:
primitive,native:主资源,只能运行于一个节点
group:组资源:
clone:克隆资源;
总克隆数,每个节点最多可运行的克隆数;
stonith,cluster filesystem
master/slave:主从资源
heartbeat v2使用crm作为集群资源管理器:需要在ha.cf中添加
crm respawn
crm通过mgmtd进程监听在5560/tcp
需要启动hb_gui的主机为hacluster用户添加密码,并使用其登录hb_gui
with quorum:满足法定票数
without quorun:不满足法定票数
安装配置高可用集群:
1、节点名称:集群每个节点的名称都得能互相解析
/etc/hosts
hosts中主机名的正反解析结果必须跟"uname -n"的结果一致
2、时间必须得同步
使用网络时间服务器同步时间
3、各节点间能基于ssh密钥认证通信:
4、先关闭selinux
5、注意防火墙开端口
一、heartbeatV1(haresources)
示例:
node1:192.168.0.111
node2:192.168.0.112
1、修改节点名称
node1:
[root@node1 ~t]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.0.111 node1.shamereedwine.com
192.168.0.112 node2.shamereedwine.com
[root@node1 ~]# hostname
node1.shamereedwine.com
node2:
[root@node2 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.0.111 node1.shamereedwine.com
192.168.0.112 node2.shamereedwine.com
[root@node2 ~]# hostname
node2.shamereedwine.com
2、测试各节点是否能相互解析
node1:
[root@node1 ~]# ping node2.shamereedwine.com
PING node2.shamereedwine.com (192.168.0.112) 56(84) bytes of data.
64 bytes from node2.shamereedwine.com (192.168.0.112): icmp_seq=1 ttl=64 time=1.99 ms
64 bytes from node2.shamereedwine.com (192.168.0.112): icmp_seq=2 ttl=64 time=0.546 ms
64 bytes from node2.shamereedwine.com (192.168.0.112): icmp_seq=3 ttl=64 time=0.425 ms
node2:
[root@node2 ~]# ping node1.shamereedwine.com
PING node1.shamereedwine.com (192.168.0.111) 56(84) bytes of data.
64 bytes from node1.shamereedwine.com (192.168.0.111): icmp_seq=1 ttl=64 time=9.99 ms
64 bytes from node1.shamereedwine.com (192.168.0.111): icmp_seq=2 ttl=64 time=0.544 ms
64 bytes from node1.shamereedwine.com (192.168.0.111): icmp_seq=3 ttl=64 time=0.544 ms
3、各节点时间同步
node1:
ntpdate 133.100.11.8
node2:
ntpdate 133.100.11.8
4、关闭selinux
node1:
[root@node1 ~]# vim /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
# targeted - Targeted processes are protected,
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
setenforce 0
node2:
[root@node2 ~]# vim /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
# targeted - Targeted processes are protected,
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
setenforce 0
5、各节点间能基于ssh密钥认证通信:
生成公钥和私钥并传到对方的服务器上。
node1:
[root@node1 ~]# ssh-keygen -t rsa -P ''
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
/root/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
83:42:29:02:d4:af:df:39:f1:2b:26:13:d0:2c:aa:f4 root@node1.shamereedwine.com
The key's randomart image is:
+--[ RSA 2048]----+
|o.. |
|. . . |
|. . * |
| . = + . |
| . = . S |
| o . o . . |
|o . . o + |
|. E + * . |
| + o.. |
+-----------------+
[root@node1 ~]# ssh-copy-id -i .ssh/id_rsa.pub root@node2.shamereedwine.com
The authenticity of host 'node2.shamereedwine.com (192.168.0.112)' can't be established.
RSA key fingerprint is 96:59:fe:d6:49:44:89:d4:9c:78:c4:da:0a:81:aa:ef.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node2.shamereedwine.com,192.168.0.112' (RSA) to the list of known hosts.
root@node2.shamereedwine.com's password:
Now try logging into the machine, with "ssh 'root@node2.shamereedwine.com'", and check in:
.ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
node2:
[root@node2 ~]# ssh-keygen -t rsa -P ''
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
/root/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
fd:a1:91:70:9b:ad:1b:e6:a3:82:5d:79:f7:e8:1d:05 root@node2.shamereedwine.com
The key's randomart image is:
+--[ RSA 2048]----+
| |
| |
| . . E |
| + = . |
| S.* o .|
| o .=.. . |
| o . .=..o. |
| . o o.o.... |
| ...oo. . |
+-----------------+
[root@node2 ~]# ssh-copy-id -i .ssh/id_rsa.pub root@node1.shamereedwine.com
The authenticity of host 'node1.shamereedwine.com (192.168.0.111)' can't be established.
RSA key fingerprint is ef:16:2d:b3:58:58:71:a5:ba:bd:e7:0a:ff:c1:b5:7e.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node1.shamereedwine.com,192.168.0.111' (RSA) to the list of known hosts.
root@node1.shamereedwine.com's password:
Now try logging into the machine, with "ssh 'root@node1.shamereedwine.com'", and check in:
.ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
6、heartbeat
node1:
先安装epel的yum源
wget
rpm -ivh epel-release-6-8.noarch.rpm
再安装下面依赖包
[root@node1 ~]# yum install perl-TimeDate net-snmp-libs libnet PyXML
然后安装heartbeat
[root@node1 ~]# rpm -ivh heartbeat-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm
Preparing... ########################################### [100%]
1:heartbeat-pils ########################################### [ 33%]
2:heartbeat-stonith ########################################### [ 67%]
3:heartbeat ########################################### [100%]
node2:
先安装epel的yum源
wget
rpm -ivh epel-release-6-8.noarch.rpm
再安装下面的依赖包
[root@node2 ~]# yum install perl-TimeDate net-snmp-libs libnet PyXML
然后安装heartbeat
[root@node2 ~]# rpm -ivh heartbeat-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm
Preparing... ########################################### [100%]
1:heartbeat-pils ########################################### [ 33%]
2:heartbeat-stonith ########################################### [ 67%]
3:heartbeat ########################################### [100%]
7、配置heartbeat
(1)、复制配置文件
node1:
[root@node1 ~]# cp /usr/share/doc/heartbeat-2.1.4/{authkeys,ha.cf,haresources} /etc/ha.d/
node2:
[root@node2 ~]# cp /usr/share/doc/heartbeat-2.1.4/{authkeys,ha.cf,haresources} /etc/ha.d/
(2)、配置认证密钥
node1:
[root@node1 ha.d]# openssl rand -hex 8
2a4816939b8c73b5
vim authkey
auth 2
2 sha1 2a4816939b8c73b5
node2:
等会把node1的authkey配置文件复制过来即可
(3)、配置authkey权限
node1:
[root@node1 ha.d]# chmod 600 authkeys
node2:
等会把node1的authkey配置文件复制过来即可
(4)、配置ha.cf
node1:
vim ha.cf
logfile /var/log/ha-log
keepalive 1000ms
deadtime 8
warntime 4
udpport 694
mcast eth0 225.0.0.1 694 1 0
auto_failback on
node node1.shamereedwine.com
node node2.shamereedwine.com
ping 192.168.0.1
compression bz2
compression_threshold 2
node2:
等会把node1的ha.cf配置文件复制过来即可
(5)、两节点装httpd
node1:
yum install httpd
node2:
yum install httpd
(6)、配置首页面
node1:
[root@node1 ~]# cd /var/www/html/
[root@node1 html]# vim index.html
[root@node1 html]# cat index.html
node1.shamereedwine.com
node2:
[root@node2 ~]# cd /var/www/html/
[root@node2 html]# vim index.html
[root@node2 html]# cat index.html
node2.shamereedwine.com
(7)、禁止httpd开机自启动
node1:
[root@node1 ~]# chkconfig httpd off
node2:
[root@node2 ~]# chkconfig httpd off
(8)、编辑haresources,在最后一行加入下面配置的语句
node1:
node1.shamereedwine.com 192.168.0.200/24/eth0 httpd
node2:
把node1的ha.d下面的三个文件,原封不动的复制到node2下面的/etc/ha.d的目录下。
[root@node1 ha.d]# scp -p authkeys ha.cf haresources node2.shamereedwine.com:/etc/ha.d/
authkeys 100% 657 0.6KB/s 00:00
ha.cf 100% 10KB 10.4KB/s 00:00
haresources 100% 5962 5.8KB/s 00:00
(9)、启动heartbeat
noe1:
[root@node1 ha.d]# service heartbeat start
Starting High-Availability services:
2017/03/03_06:33:19 INFO: Resource is stopped
Done.
node2:在node1上启动node2
[root@node1 ha.d]# ssh node2.shamereedwine.com 'service heartbeat start'
Starting High-Availability services:
2017/03/03_06:33:50 INFO: Resource is stopped
Done.
(10)、查看日志信息
node1:
[root@node1 ha.d]# tail /var/log/ha-log
heartbeat[4000]: 2017/03/03_06:33:52 info: remote resource transition completed.
heartbeat[4000]: 2017/03/03_06:33:52 info: node1.shamereedwine.com wants to go standby [foreign]
heartbeat[4000]: 2017/03/03_06:33:52 info: standby: node2.shamereedwine.com can take our foreign resources
heartbeat[4468]: 2017/03/03_06:33:52 info: give up foreign HA resources (standby).
heartbeat[4468]: 2017/03/03_06:33:52 info: foreign HA resource release completed (standby).
heartbeat[4000]: 2017/03/03_06:33:52 info: Local standby process completed [foreign].
heartbeat[4000]: 2017/03/03_06:33:53 WARN: 1 lost packet(s) for [node2.shamereedwine.com] [11:13]
heartbeat[4000]: 2017/03/03_06:33:53 info: remote resource transition completed.
heartbeat[4000]: 2017/03/03_06:33:53 info: No pkts missing from node2.shamereedwine.com!
heartbeat[4000]: 2017/03/03_06:33:53 info: Other node completed standby takeover of foreign resources.
node2:
[root@node1 ha.d]# ssh node2.shamereedwine.com 'tail /var/log/ha-log'
heartbeat[4041]: 2017/03/03_06:33:51 info: remote resource transition completed.
heartbeat[4041]: 2017/03/03_06:33:51 info: remote resource transition completed.
heartbeat[4041]: 2017/03/03_06:33:51 info: Local Resource acquisition completed. (none)
heartbeat[4041]: 2017/03/03_06:33:52 info: node1.shamereedwine.com wants to go standby [foreign]
heartbeat[4041]: 2017/03/03_06:33:53 info: standby: acquire [foreign] resources from node1.shamereedwine.com
heartbeat[4069]: 2017/03/03_06:33:53 info: acquire local HA resources (standby).
heartbeat[4069]: 2017/03/03_06:33:53 info: local HA resource acquisition completed (standby).
heartbeat[4041]: 2017/03/03_06:33:53 info: Standby resource acquisition done [foreign].
heartbeat[4041]: 2017/03/03_06:33:53 info: Initial resource acquisition complete (auto_failback)
heartbeat[4041]: 2017/03/03_06:33:53 info: remote resource transition completed.
(11)、查看httpd启动信息
node1:
[root@node1 ha.d]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:0C:29:62:4D:A1
inet addr:192.168.0.111 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe62:4da1/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:14308 errors:0 dropped:0 overruns:0 frame:0
TX packets:5296 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2673563 (2.5 MiB) TX bytes:821893 (802.6 KiB)
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:62:4D:A1
inet addr:192.168.0.200 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
[root@node1 ha.d]# ss -tnlp|grep httpd
LISTEN 0 128 :::80 :::* users:(("httpd",4408,14),("httpd",4411,14),("httpd",4413,14),("httpd",4414,14),("httpd",4415,14),("httpd",4416,14),("httpd",4417,14),("httpd",4418,14),("httpd",4419,14))
node2:做备用节点
[root@node2 ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:0C:29:BC:50:F2
inet addr:192.168.0.112 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:febc:50f2/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:15609 errors:0 dropped:0 overruns:0 frame:0
TX packets:6232 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2850870 (2.7 MiB) TX bytes:969973 (947.2 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
[root@node2 ~]# ss -tnlp|grep httpd
(12)、访问定义的集群信息:
(13)、在node2上停止heartbeat服务,看节点是否自动转移
node2:
[root@node2 ~]# ssh node1.shamereedwine.com 'service heartbeat stop'
Stopping High-Availability services:
Done.
[root@node2 ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:0C:29:BC:50:F2
inet addr:192.168.0.112 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:febc:50f2/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:16749 errors:0 dropped:0 overruns:0 frame:0
TX packets:7304 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3066780 (2.9 MiB) TX bytes:1183093 (1.1 MiB)
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:BC:50:F2
inet addr:192.168.0.200 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
[root@node2 ~]# ss -tnlp|grep httpd
LISTEN 0 128 :::80 :::* users:(("httpd",4452,14),("httpd",4454,14),("httpd",4457,14),("httpd",4459,14),("httpd",4461,14),("httpd",4462,14),("httpd",4465,14),("httpd",4469,14),("httpd",4471,14))
节点自动转移到node2上,如下图所示:
(14)、再让node1自动上线
node2:
[root@node2 ~]# ssh node1.shamereedwine.com 'service heartbeat start'
Starting High-Availability services:
2017/03/03_07:03:44 INFO: Resource is stopped
Done.
再访问定义的集群信息,又转移到node1上了,如下图所示:

(15)、添加一个节点,做为nfs共享
node3:192.168.0.113
创建nfs网络目录
[root@node3 ~]# mkdir /www/htdocs -pv
[root@node3 ~]# cd /www/htdocs/
授予apache用户对网页目录读、写、执行权限
[root@node3 htdocs]# vim /etc/exports
[root@node3 htdocs]# setfacl -m u:apache:rwx /www/htdocs/
[root@node3 htdocs]# cd /www/htdocs/
编辑共享网页
[root@node3 htdocs]# vim index.html
Page in NFS Server
(16)、定义nfs的HA的资源
先停用两节点的heartbeat
node1:
[root@node1 ~]# service heartbeat stop
Stopping High-Availability services:
Done.
node2:
[root@node1 ~]# ssh node2.shamereedwine.com 'service heartbeat stop'
Stopping High-Availability services:
Done.
(17)、重新定义资源
node1:
cd /etc/ha.d
vim haresources
把配置文件改为以下所示:
node1.shamereedwine.com 192.168.0.200/24/eth0 Filesystem::192.168.0.113:/www/htdocs::/var/www/html::nfs httpd
node2:
cd /etc/ha.d
vim haresources
把配置文件改为以下所示:
node1.shamereedwine.com 192.168.0.200/24/eth0 Filesystem::192.168.0.113:/www/htdocs::/var/www/html::nfs httpd
(18)、node3启动nfs服务
[root@node3 htdocs]# service nfs start
启动 NFS 服务: [确定]
关掉 NFS 配额: [确定]
启动 NFS mountd: [确定]
启动 NFS 守护进程: [确定]
正在启动 RPC idmapd: [确定]
(19)、启动heartbeat服务
node1:
[root@node1 ha.d]# service heartbeat start
Starting High-Availability services:
2017/03/03_23:14:52 INFO: Resource is stopped
Done.
node2:
[root@node1 ha.d]# ssh node2.shamereedwine.com "service heartbeat start"
Starting High-Availability services:
2017/03/03_23:17:33 INFO: Resource is stopped
Done.
(20)、查询集群节点的挂载情况:
[root@node1 ha.d]# mount
/dev/mapper/VolGroup-lv_root on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/sda1 on /boot type ext4 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
192.168.0.113:/www/htdocs on /var/www/html type nfs (rw,vers=4,addr=192.168.0.113,clientaddr=192.168.0.111)
(21)、测试网页

保存成功!
(22)、主从节点使用脚本切换
node1:
cd /usr/share/heartbeat
[root@node1 heartbeat]# ./hb_standby
2017/03/04_00:06:17 Going standby [all].
集群自动切换到node2上了
node2:
[root@node2 ha.d]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:0C:29:BC:50:F2
inet addr:192.168.0.112 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:febc:50f2/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:14222 errors:0 dropped:0 overruns:0 frame:0
TX packets:11451 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2627490 (2.5 MiB) TX bytes:2389431 (2.2 MiB)
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:BC:50:F2
inet addr:192.168.0.200 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
(23)、验证集群效果:

访问正常!
二、 heartbeatV2(crm)
1、停止heartbeat
node1:
[root@node1 heartbeat]# service heartbeat stop
Stopping High-Availability services:
Done.
node2:
[root@node1 heartbeat]# ssh node2.shamereedwine.com "service heartbeat stop"
Stopping High-Availability services:
Done.
2、编辑配置文件
node1:
[root@node1 heartbeat]# cd /etc/ha.d/
[root@node1 ha.d]# vim ha.cf
在配置文件中加一句
crm respawn
直接用下面的命令,把配置好的配置文件传到node2上
[root@node1 ha.d]# /usr/lib64/heartbeat/ha_propagate
Propagating HA configuration files to node node2.shamereedwine.com.
ha.cf 100% 10KB 10.4KB/s 00:00
authkeys 100% 657 0.6KB/s 00:00
3、安装heartbeat-gui
node1:
报错,heartbeat安装要依赖于pygtk2-libglade,使用yum安装
[root@node1 ~]# rpm -ivh heartbeat-gui-2.1.4-12.el6.x86_64.rpm
error: Failed dependencies:
pygtk2-libglade is needed by heartbeat-gui-2.1.4-12.el6.x86_64
[root@node1 ~]# yum install pygtk2-libglade
再安装成功
[root@node1 ~]# rpm -ivh heartbeat-gui-2.1.4-12.el6.x86_64.rpm
Preparing... ########################################### [100%]
1:heartbeat-gui ########################################### [100%]
node2:
报错,heartbeat安装要依赖于pygtk2-libglade,使用yum安装
[root@node2 ~]# rpm -ivh heartbeat-gui-2.1.4-12.el6.x86_64.rpm
error: Failed dependencies:
pygtk2-libglade is needed by heartbeat-gui-2.1.4-12.el6.x86_64
[root@node2 ~]# yum install pygtk2-libglade
再安装,成功
[root@node2 ~]# rpm -ivh heartbeat-gui-2.1.4-12.el6.x86_64.rpm
Preparing... ########################################### [100%]
1:heartbeat-gui ########################################### [100%]
4、重新启动heartbeat服务
node1:
service heartbeat start
node2:
service heartbeat start
5、移除haresources到别的地方
node1:
[root@node1 ~]# cd /etc/ha.d/
[root@node1 ha.d]# mv haresources /root
node2:
[root@node2 ~]# cd /etc/ha.d/
[root@node2 ha.d]# mv haresources /root
6、重新启动heartbeat
node1:
[root@node1 ~]# service heartbeat restart
Stopping High-Availability services:
Done.
Waiting to allow resource takeover to complete:
Done.
Starting High-Availability services:
cat: /etc/ha.d/haresources: No such file or directory
Done.
node2:
[root@node2 ~]# service heartbeat restart
Stopping High-Availability services:
Done.
Waiting to allow resource takeover to complete:
Done.
Starting High-Availability services:
cat: /etc/ha.d/haresources: No such file or directory
Done.
7、查看crm监听的进程和端口
8、使用crm_mon查看集群状态
[root@node2 ~]# crm_mon
8、下面是一些常用的crm命令
[root@node2 ~]# crm
crmadmin crm_attribute crm_diff crm_failcount crm_master crm_mon crm_resource crm_sh crm_standby crm_uuid crm_verify
[root@node2 ~]# crm_sh
/usr/sbin/crm_sh:31: DeprecationWarning: The popen2 module is deprecated. Use the subprocess module.
from popen2 import Popen3
crm #
crm #
crm # help
Usage: crm (nodes|config|resources)
crm # help nodes #节点帮助信息
Usage: nodes (status|list)
crm # nodes status #节点状态
crm # nodes list #节点列表
crm # help resources #显示资源帮助信息
Usage: resources (status|list)
9、查看HA图形化界面启动的用户
[root@node2 ~]# cat /etc/passwd|grep hacluster
hacluster:x:498:498:heartbeat user:/var/lib/heartbeat/cores/hacluster:/sbin/nologin
10、在node2节点上给图形化界面启动的用户
hacluster添加密码
[root@node2 ~]# passwd hacluster
11、启动程序
[root@node2 ~]# hb_gui &
[1] 24904
效果如下图所示:
12、登录后效果
13、定义一个web资源组
node1:192.168.0.111(web服务器)
node2:192.168.0.112(web服务器)
node3:192.168.0.113(nfs服务器)
web service:
vip: 192.168.0.100
httpd
nfs: /www/htdocs,/var/www/html
(1)、创建资源组
(2)、添加webip
(3)、添加webstore
(4)、添加webserver
(5)、启动组资源,如下图:
(6)、查看web的主页面,如下图所示是nfs提供的主页面
(7)、把node2 "standby" 应用自佛那个切换到node1上了,如下图所示:
14、安装ipvs集群
node1:192.168.0.111(DR)
node2:192.168.0.112
node3:192.168.0.113(RS1)
node4:192.168.0.114(RS2)
VIP:192.168.0.150
(1)、先把node1和node2上的资源停用
[root@node1 ~]# service heartbeat stop
Stopping High-Availability services:
Done.
[root@node1 ~]# ssh node2.shamereedwine.com "service heartbeat stop"
Stopping High-Availability services:
Done.
(2)、在两节点上,安装ipvsadm
node1:
[root@node1 ~]# yum install ipvsadm -y
node2:
[root@node2 ~]# yum install ipvsadm -y
(3)、配置内核参数
node3:
[root@node3 ~]# echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore
[root@node3 ~]# echo 1 > /proc/sys/net/ipv4/conf/eth0/arp_ignore
[root@node3 ~]# echo 2 > /proc/sys/net/ipv4/conf/eth0/arp_announce
[root@node3 ~]# echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce
node4:
[root@node4 ~]# echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore
[root@node4 ~]# echo 1 > /proc/sys/net/ipv4/conf/eth0/arp_ignore
[root@node4 ~]# echo 2 > /proc/sys/net/ipv4/conf/eth0/arp_announce
[root@node4 ~]# echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce
(4)、配置VIP
node3:
ifconfig lo:0 192.168.0.150 netmask 255.255.255.255 broadcast 192.168.0.150 up
node4:
[root@node4 ~]# ifconfig lo:0 192.168.0.150 netmask 255.255.255.255 broadcast 192.168.0.150 up
(5)、添加到VIP的路由
node3:
[root@node3 ~]# route add -host 192.168.0.150 dev lo:0
node4:
[root@node4 ~]# route add -host 192.168.0.150 dev lo:0
(6)、创建测试页面
node3:
vim /var/www/html/index.html
Welcome to mysite1
node4:
vim /var/www/html/index.html
Welcome to mysite2
(7)、启动httpd服务
node3:
[root@node3 ~]# service httpd start
正在启动 httpd: [确定]
node4:
[root@node4 ~]# service httpd start
正在启动 httpd: [确定]
(8)、确保ipvsadm不能开机自启动
node1:
[root@node1 ~]# chkconfig ipvsadm off
node2:
[root@node2 ~]# chkconfig ipvsadm off
(9)、DR上配置VIP
node1:
[root@node1 ~]# ifconfig eth0:0 192.168.0.150/24 up
[root@node1 ~]# route add -host 192.168.0.150 dev eth0:0
node2:
[root@node2 ~]# ifconfig eth0:0 192.168.0.150/24 up
[root@node2 ~]# route add -host 192.168.0.150 dev eth0:0
(10)、添加ipvs集群
node1:
[root@node1 ~]# ipvsadm -A -t 192.168.0.150:80 -s rr
[root@node1 ~]# ipvsadm -a -t 192.168.0.150:80 -r 192.168.0.113 -g
[root@node1 ~]# ipvsadm -a -t 192.168.0.150:80 -r 192.168.0.114 -g
[root@node1 ~]# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 192.168.0.150:80 rr
-> 192.168.0.113:80 Route 1 0 0
-> 192.168.0.114:80 Route 1 0 0
(11)、测试页面,如下图:

(12)、保存ipvsadm的配置:
node1:
[root@node1 ~]# service ipvsadm save
ipvsadm: Saving IPVS table to /etc/sysconfig/ipvsadm: [确定]
(13)、停止ipvsadm
node1:
[root@node1 ~]# service ipvsadm stop
ipvsadm: Clearing the current IPVS table: [确定]
ipvsadm: Unloading modules: [确定]
(14)、停用VIP
node1:
[root@node1 ~]# ifconfig eth0:0 down
(15)、把ipvsadm 复制到node2上
node2:
[root@node1 ~]# scp /etc/sysconfig/ipvsadm node2:/etc/sysconfig/
(16)、node2节点上配置集群服务,通过/etc/sysconfig/ipvsadm载入
node2:
[root@node2 ~]# ipvsadm -R < /etc/sysconfig/ipvsadm
[root@node2 ~]# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 192.168.0.150:80 rr
-> 192.168.0.113:80 Route 1 0 0
-> 192.168.0.114:80 Route 1 0 0
(17)、停用ipvsadm服务
node2:
[root@node2 ~]# service ipvsadm stop
ipvsadm: Clearing the current IPVS table: [确定]
ipvsadm: Unloading modules: [确定]
(18)、停用网卡别名和删除路由
node2:
[root@node2 ~]# ifconfig eth0:0 down
[root@node2 ~]# route del -host 192.168.0.150
(19)node1和node2启动heartbeat高可用服务
node1:
[root@node1 ~]# service heartbeat start
Starting High-Availability services:
cat: /etc/ha.d/haresources: No such file or directory
Done.
[root@node1 ~]# ssh node2.shamereedwine.com "service heartbeat start"
Starting High-Availability services:
cat: /etc/ha.d/haresources: No such file or directory
Done.
(20)、使用hb_gui配置高可用服务
[root@node1 ~]# hb_gui &
如下图所示:
(21)、添加组资源
(22)、添加第一个VIP资源
(23)、启动资源,显示正常
(24)、验证node2上的配置
(25)、添加第二个资源:
(26)、启动资源:
(27)、查看ipvsadm启动情况,如下图启动正常
(28)、node2转成备用,自动转移到node1上
15、做健康状况检查
(1)、先停用node1和node2上的heartbeat服务
[root@node1 ~]# service heartbeat stop
Stopping High-Availability services:
Done.
[1]+ Done hb_gui
[root@node1 ~]# ssh node2.shamereedwine.com "service heartbeat stop"
Stopping High-Availability services:
Done.
(2)、使用yum安装ldirectord
node1:
[root@node1 ~]# yum install heartbeat-ldirectord-2.1.4-12.el6.x86_64.rpm
node2:
[root@node2 ~]# yum install heartbeat-ldirectord-2.1.4-12.el6.x86_64.rpm
(3)、禁止ldirectord开机自动启动
node1:
[root@node1 ~]# chkconfig ldirectord off
node2:
[root@node2 ~]# chkconfig ldirectord off
(4)、把ldirectord的配置文件放到指定的目录
node1:
[root@node1 ~]# cp /usr/share/doc/heartbeat-ldirectord-2.1.4/ldirectord.cf /etc/ha.d/
node2:
[root@node2 ~]# cp /usr/share/doc/heartbeat-ldirectord-2.1.4/ldirectord.cf /etc/ha.d/
(5)、编辑配置文件
node1:
[root@node1 ~]# cd /etc/ha.d/
vim ldirectord.cf
配置如下:
[root@node1 ha.d]# grep -v "^#" ldirectord.cf
checktimeout=3 #检查延时,每隔多长时间检测失败
checkinterval=1 #检查间隔,每隔多长时间检测一次
autoreload=yes #改了配置文件会不会自动装载
logfile="/var/log/ldirectord.log" #日志路径
quiescent=yes #是否使用静默模式
virtual=192.168.0.150:80 #定义VIP服务
real=192.168.0.113:80 gate #定义realserver类型
real=192.168.0.114:80 gate #定义realserver类型
fallback=127.0.0.1:80 gate #是所有的服务器都挂了,它使用哪个服务器来暂时返回一个错误的页面
service=http #基于哪种协议做健康状态检测,基于应用层的检测
request=".health.html" #请求的页面
receive="OK" #返回的状态
scheduler=rr #算法
由于集群可以在配置文件里配置,所以可以先把ipvsadm删除
node1:
[root@node1 ha.d]# rm -i /etc/sysconfig/ipvsadm
rm:是否删除普通文件 "/etc/sysconfig/ipvsadm"?y
node2:
[root@node2 ~]# rm -i /etc/sysconfig/ipvsadm
rm:是否删除普通文件 "/etc/sysconfig/ipvsadm"?y
(6)、前端的两台服务器,定义后端服务宕机的错误页面!
node1:
[root@node1 ~]# service httpd start
正在启动 httpd: [确定]
vim /var/www/html/index.html
sorry the webside is in rest
[root@node2 ~]# service httpd start
正在启动 httpd: [确定]
vim /var/www/html/index.html
sorry the webside is in ill
(7)、后台的两台服务器
node3、node4都提供一个健康检查页面
node3:
echo OK > /var/www/html/.health.html
node4:
echo OK > /var/www/html/.health.html
(8)、把node1的配置文件复制到node2的/etc/ha.d/的目录下
[root@node1 ha.d]# scp ldirectord.cf node2.shamereedwine.com:/etc/ha.d/
ldirectord.cf 100% 7501 7.3KB/s 00:00
(9)、启动node1和node2上的heartbeat服务
node1:
[root@node1 ha.d]# service heartbeat start
Starting High-Availability services:
cat: /etc/ha.d/haresources: No such file or directory
Done.
node2:
[root@node2 ~]# service heartbeat start
Starting High-Availability services:
cat: /etc/ha.d/haresources: No such file or directory
Done.
(10)、配置高可用集群,定义个资源组如下图所示:
(11)、添加第一个VIP,如下图所示:
(12)、添加个ldirectord,如下图所示:
(13)、启动资源组director,如下图所示:
(14)、验证下定义的服务在node2上的启动情况,启动正常,如下图所示:
(15)、停用node3、node4两个realserver上的服务
[root@node3 ~]# service httpd stop
停止 httpd: [确定]
[root@node4 html]# service httpd stop
停止 httpd: [确定]
(16)、node2上查看集群服务状态
(17)、访问页面提示故障,如下图所示:
(18)、把node2 "stand-by",服务自动切换到node1上:
(19)、故障网页显示如下:显示了node1定义的故障页

(20)、再让node3的服务上线
[root@node3 ~]# service httpd start
正在启动 httpd: [确定]
(21)、node1上集群信息显示如下:
[root@node1 html]# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 192.168.0.150:80 rr
-> 192.168.0.113:80 Route 1 0 0
-> 192.168.0.114:80 Route 0 0 0
(22)、刷新VIP页面,又恢复正常,显示如下:
(23)、再让node4重新上线:
[root@node4 html]# service httpd start
正在启动 httpd: [确定]
[root@node1 html]# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 192.168.0.150:80 rr
-> 192.168.0.113:80 Route 1 0 0
-> 192.168.0.114:80 Route 1 0 0
(24)、VIP页面又在node1和node2的两节点主页面间来回震荡: