Pacemaker+corosync搭建双节点HA集群的可靠性验证-niao5929-ChinaUnix博客

birdofpreybirdofprey.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

niao5929

博客访问： 7365729
博文数量： 3857
博客积分： 6409
博客等级：准将
技术积分： 15948
用户组：普通用户
注册时间： 2008-09-02 16:48

个人简介

迷彩潜伏隐蔽伪装

文章分类

全部博文（3857）

大数据计算（149）
随想（82）
编程语言（372）

python（3）

lisp（0）

JAVA C++（2）

GOLANG（0）
数据库（115）
高可用集群（412）

分布式系统（26）

SDN（0）

细胞节点（78）

分布式网络（5）
Linux（1172）

SHELL（10）

网络（209）
未分配的博文（1555）

文章存档

2017年（5）

2016年（63）

2015年（927）

2014年（677）

2013年（807）

2012年（1241）

2011年（67）

2010年（7）

2009年（36）

2008年（28）

我的朋友

1. 环境构成

沿用前一篇<>(http://blog.chinaunix.net/uid-20726500-id-4453488.html)的环境，但改成双节点集群。

共享存储服务器
OS：CentOS release 6.5 (Final)
主机名:disknode
网卡1：
保留
网卡类型：NAT
IP：192.168.152.120
网卡2：
用于共享盘的iscsi通信
网卡类型：Host-Only
IP：192.168.146.120
网卡3：
用于external/ssh fence设备通信
网卡类型：桥接
IP：10.167.217.107

HA节点1
OS：CentOS release 6.5 (Final)
主机名:hanode1
网卡1：
用于集群公开IP（192.168.152.200）和集群内部消息的通信
网卡类型：NAT
IP：192.168.152.130
网卡2：
用于共享盘的iscsi通信
网卡类型：Host-Only
IP：192.168.146.130
网卡3：
用于external/ssh fence设备通信
网卡类型：桥接
IP：10.167.217.169

HA节点2
OS:CentOS release 6.5 (Final)
主机名:hanode2
网卡1：
用于集群公开IP（192.168.152.200）和集群内部消息的通信
网卡类型：NAT
IP：192.168.152.140
网卡2：
用于共享盘的iscsi通信
网卡类型：Host-Only
IP：192.168.146.140
网卡3：
用于external/ssh fence设备通信
网卡类型：桥接
IP：10.167.217.171

集群公开IP
192.168.152.200

2. 环境配置

在原来已配好的3节点的环境上，把disknode从集群管理里移除。

修改配置使达不到法定票数时的动作为忽略
no-quorum-policy=ignore

并修改法定票数
expected-quorum-votes=2

将disknode相关的配置删掉
location no_iscsid rs_iscsid -inf: disknode
location votenode ClusterIP -inf: disknode

[root@hanode1 ~]# crm configure edit
node disknode
node hanode1
node hanode2
primitive ClusterIP IPaddr2 \
params ip=192.168.152.200 cidr_netmask=32 \
op monitor interval=30s
primitive DataFS Filesystem \
params device="/dev/sdc" directory="/mnt/pg" fstype=ext4 \
op monitor interval=15s
primitive pg93 pgsql \
meta target-role=Started is-managed=true migration-threshold=INFINITY failure-timeout=60s \
op monitor interval=15s
primitive rs_iscsid lsb:iscsid \
op monitor interval=30s \
meta target-role=Started
primitive st-ssh stonith:external/ssh \
params hostlist="hanode1 hanode2"
group PgGroup ClusterIP rs_iscsid DataFS pg93
clone st-sshclone st-ssh
property cib-bootstrap-options: \
dc-version=1.1.9-2a917dd \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes=2 \
stonith-enabled=true \
no-quorum-policy=ignore \
last-lrm-refresh=1409756808
#vim:set syntax=pcmk

在disknode上停掉corosync服务
[root@disknode ~]# /etc/init.d/corosync stop
Signaling Corosync Cluster Engine (corosync) to terminate: [ OK ]
Waiting for corosync services to unload:. [ OK ]
[root@disknode ~]# chkconfig corosync off

进入剩下的其中一个节点hanode1的终端，从CIB中移除disknode
[root@hanode1 ~]# crm_node -R disknode --force

结果很奇怪，它居然把hanode2删掉了，多试几次还遇到把hanode2重启的情况。
[root@hanode1 ~]# crm status
Last updated: Fri Sep 5 23:48:50 2014
Last change: Fri Sep 5 23:47:50 2014 via crm_node on hanode1
Stack: classic openais (with plugin)
Current DC: hanode1 - partition with quorum
Version: 1.1.9-2a917dd
2 Nodes configured, 2 expected votes
6 Resources configured.

Node disknode: UNCLEAN (offline)
Online: [ hanode1 ]

Resource Group: PgGroup
ClusterIP (ocf::heartbeat:IPaddr2): Started hanode1
rs_iscsid (lsb:iscsid): Started hanode1
DataFS (ocf::heartbeat:Filesystem): Started hanode1
pg93 (ocf::heartbeat:pgsql): Started hanode1
Clone Set: st-sshclone [st-ssh]
Started: [ hanode1 ]
Stopped: [ st-ssh:1 ]

/var/log/messages日志里有类似这样的错误消息
Sep 5 21:27:53 hanode1 corosync[3827]: [pcmk ] info: pcmk_remove_member: Sent: remove-peer:disknode
Sep 5 21:27:53 hanode1 corosync[3827]: [pcmk ] ERROR: ais_get_int: Characters left over after parsing 'disknode': 'disknode'

再执行一次，这回把disknode删掉了
[root@hanode1 ~]# crm_node -R disknode --force
[root@hanode1 ~]# crm status
Last updated: Fri Sep 5 23:50:16 2014
Last change: Fri Sep 5 23:50:14 2014 via crm_node on hanode1
Stack: classic openais (with plugin)
Current DC: hanode1 - partition with quorum
Version: 1.1.9-2a917dd
1 Nodes configured, 2 expected votes
5 Resources configured.

Online: [ hanode1 ]

Resource Group: PgGroup
ClusterIP (ocf::heartbeat:IPaddr2): Started hanode1
rs_iscsid (lsb:iscsid): Started hanode1
DataFS (ocf::heartbeat:Filesystem): Started hanode1
pg93 (ocf::heartbeat:pgsql): Started hanode1
Clone Set: st-sshclone [st-ssh]
Started: [ hanode1 ]

再重启hanode2服务器，状态终于正常了。
[root@hanode1 ~]# crm status
Last updated: Sat Sep 6 00:01:16 2014
Last change: Fri Sep 5 23:57:54 2014 via crmd on hanode1
Stack: classic openais (with plugin)
Current DC: hanode1 - partition with quorum
Version: 1.1.9-2a917dd
2 Nodes configured, 2 expected votes
6 Resources configured.

Online: [ hanode1 hanode2 ]

Resource Group: PgGroup
ClusterIP (ocf::heartbeat:IPaddr2): Started hanode1
rs_iscsid (lsb:iscsid): Started hanode1
DataFS (ocf::heartbeat:Filesystem): Started hanode1
pg93 (ocf::heartbeat:pgsql): Started hanode1
Clone Set: st-sshclone [st-ssh]
Started: [ hanode1 hanode2 ]

*）本来只是想重启corosync服务，但是停不掉corosync服务，于是杀掉corosync进程，正准备启动corosync服务，发现hanode2被fencing掉了。

3. 切换测试

3.1 场景1：主服务器的corosync进程down

杀掉主服务器的corosync进程
[root@hanode1 ~]# ps -ef|grep corosync
root 1355 1 0 00:00 ? 00:00:02 corosync
root 4606 2103 0 00:18 pts/0 00:00:00 grep corosync
[root@hanode1 ~]# kill -9 1355

马上发现hanode1被重启，而后hanode2接管服务。
[root@hanode2 ~]# crm status
Last updated: Sat Sep 6 00:22:45 2014
Last change: Fri Sep 5 23:57:54 2014 via crmd on hanode1
Stack: classic openais (with plugin)
Current DC: hanode2 - partition with quorum
Version: 1.1.9-2a917dd
2 Nodes configured, 2 expected votes
6 Resources configured.

Online: [ hanode1 hanode2 ]

Resource Group: PgGroup
ClusterIP (ocf::heartbeat:IPaddr2): Started hanode2
rs_iscsid (lsb:iscsid): Started hanode2
DataFS (ocf::heartbeat:Filesystem): Started hanode2
pg93 (ocf::heartbeat:pgsql): Started hanode2
Clone Set: st-sshclone [st-ssh]
Started: [ hanode1 hanode2 ]

查看hanode2上的日志，也发现hanode2在fencing成功后再接管资源，保证了资源不会被2个节点同时拥有。
[root@hanode2 ~]# vi /var/log/messages
Sep 6 00:15:08 hanode2 stonith-ng[1362]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for hanode1: 3daae4d8-f3f5-47bb-ac32-1a7106099eca (0)
Sep 6 00:15:13 hanode2 stonith-ng[1362]: notice: log_operation: Operation 'reboot' [2149] (call 0 from crmd.1366) for host 'hanode1' with device 'st-ssh' returned: 0 (OK)
Sep 6 00:15:13 hanode2 stonith-ng[1362]: notice: remote_op_done: Operation reboot of hanode1 by hanode2 for crmd.1366@hanode2.3daae4d8: OK
...
Sep 6 00:15:14 hanode2 pengine[1365]: notice: LogActions: Start rs_iscsid#011(hanode2)
Sep 6 00:15:14 hanode2 pengine[1365]: notice: LogActions: Start DataFS#011(hanode2)
Sep 6 00:15:14 hanode2 pengine[1365]: notice: LogActions: Start pg93#011(hanode2)

3.1 场景2：主备服务间的心跳网络断开

在VMWare上禁用主服务器hanode2的心跳网卡。好戏来了，hanode1和hanode2同时干掉了对方。
hanode1和hanode2启动后，hanode1手快了一点，hanode2被杀掉。hanode2起来后又把hanode1干掉...
看来这种场景下，就是2个节点互砍，谁也成不了赢家。

4. 脑裂的改进

前面的场景2实际是心跳的网络成了单点故障，可以在心跳网络上引入网络设备的冗余，提高心跳网络的稳定性。除此以外还没有别的方法呢。设想有2种方法：

4.1 方法1

把hanode1和hanode2的心跳线接到同一个路由器上，并用这个路由器的ip地址作为pingnode。用这个pingnode做仲裁。
下面看看这个方法有没效。路由器的ip为192.168.152.2

[root@hanode1 ~]# crm configure edit

node hanode1
node hanode2
primitive ClusterIP IPaddr2 \
params ip=192.168.152.200 cidr_netmask=32 \
op monitor interval=30s
primitive DataFS Filesystem \
params device="/dev/sdc" directory="/mnt/pg" fstype=ext4 \
op monitor interval=15s
primitive pg93 pgsql \
meta target-role=Started is-managed=true migration-threshold=INFINITY failure-timeout=60s \
op monitor interval=15s
primitive pingCheck ocf:pacemaker:ping \
params name=default_ping_set host_list=192.168.152.2 multiplier=100 \
op start timeout=60s interval=0s on-fail=restart \
op monitor timeout=60s interval=10s on-fail=restart \
op stop timeout=60s interval=0s on-fail=ignore
primitive rs_iscsid lsb:iscsid \
op monitor interval=30s \
meta target-role=Started
primitive st-ssh stonith:external/ssh \
params hostlist="hanode1 hanode2"
group PgGroup ClusterIP rs_iscsid DataFS pg93
clone clnPingCheck pingCheck
clone st-sshclone st-ssh
location rsc_location PgGroup \
rule $id="rsc_location-rule" -inf: not_defined default_ping_set or default_ping_set lt 100
order rsc_orderi 0: clnPingCheck PgGroup
property cib-bootstrap-options: \
dc-version=1.1.9-2a917dd \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes=2 \
stonith-enabled=true \
no-quorum-policy=ignore \
last-lrm-refresh=1409756808
#vim:set syntax=pcmk

修改后把2个节点的corosync服务重启一下，过一会状态更新。
[root@hanode1 ~]# crm_mon -Afr1
Last updated: Sat Sep 6 01:36:57 2014
Last change: Sat Sep 6 01:36:08 2014 via cibadmin on hanode1
Stack: classic openais (with plugin)
Current DC: hanode1 - partition with quorum
Version: 1.1.9-2a917dd
2 Nodes configured, 2 expected votes
8 Resources configured.

Online: [ hanode1 hanode2 ]

Full list of resources:

Resource Group: PgGroup
ClusterIP (ocf::heartbeat:IPaddr2): Started hanode1
rs_iscsid (lsb:iscsid): Started hanode1
DataFS (ocf::heartbeat:Filesystem): Started hanode1
pg93 (ocf::heartbeat:pgsql): Started hanode1
Clone Set: st-sshclone [st-ssh]
Started: [ hanode1 hanode2 ]
Clone Set: clnPingCheck [pingCheck]
Started: [ hanode1 hanode2 ]

Node Attributes:
* Node hanode1:
+ default_ping_set : 100
* Node hanode2:
+ default_ping_set : 100

Migration summary:
* Node hanode2:
* Node hanode1:

现在把主服务的心跳网卡禁掉。结果和以前一样，2个机器还是互杀。原因在于fencing机制在pingCheck之前动作，pingCheck仅仅可以影响资源的位置。看来这个方法不行。

4.1 方法2

把stonith-action 的动作从reboot改为off，这样手快的那一方有可能成为赢家。
[root@hanode1 ~]# crm_attribute --attr-name stonith-action --attr-value off

这里用的测试用的stonith设备external/ssh是不支持poweroff的，为了测试把external/ssh脚本改了一下，最终修改过的external/ssh脚本见附录。
再把前面加的pingCheck去掉，再试一次，把主服务的心跳网卡禁掉。很不幸2个机器都关机了。
看了下external/ssh的脚本，关机前sleep了2秒，把这个sleep去掉再试。这次终于有1个幸存了。

POWEROFF_COMMAND="echo 'sleep 2; /sbin/poweroff -nf' | SHELL=/bin/sh at now >/dev/null 2>&1"
==》
POWEROFF_COMMAND="echo '/sbin/poweroff -nf' | SHELL=/bin/sh at now >/dev/null 2>&1"

5. 结论

在双节点的情况下，只要有fencing设备就可以确保共享资源不被破坏了。如果没有fencing设备，就必须要配置抢占资源。当心跳网络出现故障时没法保障双机集群依然可用，但通过将stonith-action 的设置为off，可在很大概率上使得脑裂时有1台机器还活着。

6. 附录

修改过的external/ssh脚本

[root@hanode1 ~]# cat /usr/lib64/stonith/plugins/external/ssh

点击(此处)折叠或打开

#!/bin/sh
#
# External STONITH module for ssh.
#
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of version 2 of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# Further, this software is distributed without any warranty that it is
# free of the rightful claim of any third person regarding infringement
# or the like. Any license provided herein, whether implied or
# otherwise, applies only to this software file. Patent licenses, if
# any, provided herein do not apply to combinations of this program with
# other software, or any other product whatsoever.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
#
SSH_COMMAND="/usr/bin/ssh -q -x -o PasswordAuthentication=no -o StrictHostKeyChecking=no -n -l root"
#SSH_COMMAND="/usr/bin/ssh -q -x -n -l root"
REBOOT_COMMAND="echo 'sleep 2; /sbin/reboot -nf' | SHELL=/bin/sh at now >/dev/null 2>&1"
# Warning: If you select this poweroff command, it'll physically
# power-off the machine, and quite a number of systems won't be remotely
# revivable.
# TODO: Probably should touch a file on the server instead to just
# prevent heartbeat et al from being started after the reboot.
POWEROFF_COMMAND="echo '/sbin/poweroff -nf' | SHELL=/bin/sh at now >/dev/null 2>&1"
#POWEROFF_COMMAND="echo 'sleep 2; /sbin/reboot -nf' | SHELL=/bin/sh at now >/dev/null 2>&1"
# Rewrite the hostlist to accept "," as a delimeter for hostnames too.
hostlist=`echo $hostlist | tr ',' ' '`
is_host_up() {
for j in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
do
if
ping -w1 -c1 "$1" >/dev/null 2>&1
then
sleep 1
else
return 1
fi
done
return 0
}
echo hostlist="$hostlist" para="$*" >>/var/stonith_ssh.log
case $1 in
gethosts)
for h in $hostlist ; do
echo $h
done
exit 0
;;
on)
# Can't really be implemented because ssh cannot power on a system
# when it is powered off.
exit 1
;;
off)
# Shouldn't really be implemented because if ssh cannot power on a
# system, it shouldn't be allowed to power it off.
# exit 1
# ;;
h_target=`echo $2 | tr A-Z a-z`
for h in $hostlist
do
h=`echo $h | tr A-Z a-z`
[ "$h" != "$h_target" ] &&
continue
if
case ${livedangerously} in
[Yy]*) is_host_up $h;;
*) true;;
esac
then
$SSH_COMMAND "$2" "$POWEROFF_COMMAND"
# Good thing this is only for testing...
if
is_host_up $h
then
exit 1
else
exit 0
fi
else
# well... Let's call it successful, after all this is only for testing...
exit 0
fi
done
exit 1
;;
reset)
h_target=`echo $2 | tr A-Z a-z`
for h in $hostlist
do
h=`echo $h | tr A-Z a-z`
[ "$h" != "$h_target" ] &&
continue
if
case ${livedangerously} in
[Yy]*) is_host_up $h;;
*) true;;
esac
then
$SSH_COMMAND "$2" "$REBOOT_COMMAND"
# Good thing this is only for testing...
if
is_host_up $h
then
exit 1
else
exit 0
fi
else
# well... Let's call it successful, after all this is only for testing...
exit 0
fi
done
exit 1
;;
status)
if
[ -z "$hostlist" ]
then
exit 1
fi
for h in $hostlist
do
if
ping -w1 -c1 "$h" 2>&1 | grep "unknown host"
then
exit 1
fi
done
exit 0
;;
getconfignames)
echo "hostlist"
exit 0
;;
getinfo-devid)
echo "ssh STONITH device"
exit 0
;;
getinfo-devname)
echo "ssh STONITH external device"
exit 0
;;
getinfo-devdescr)
echo "ssh-based host reset"
echo "Fine for testing, but not suitable for production!"
echo "Only reboot action supported, no poweroff, and, surprisingly enough, no poweron."
exit 0
;;
getinfo-devurl)
echo ""
exit 0
;;
getinfo-xml)
cat << SSHXML
Hostlist
The list of hosts that the STONITH device controls
Live Dangerously!!
Set to "yes" if you want to risk your system's integrity.
Of course, since this plugin isn't for production, using it
in production at all is a bad idea. On the other hand,
setting this parameter to yes makes it an even worse idea.
Viva la Vida Loca!
SSHXML
exit 0
;;
*)
exit 1
;;
esac

阅读(1441) | 评论(0) | 转发(0) |

上一篇：GCC 编译过程解析

下一篇：Hadoop初探之Stream

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6