分类: LINUX
2010-09-10 13:33:37
Heartbeat 3.0+ Pacemaker
OCF: Open Cluster Framework
Cluster Glue GUI console Pacemaker Resource Agents Heartbeat 组件关系模型
1. heartbeat: 负责集群节点的通信和节点状态的后台进程,需和CRM(Cluster Resource Manager)组件(Pacemaker)一起提供用户service。
1.1. 通信模块,基于ipv4的UDP 单播、多播、广播
· intra-cluster communication - sending and receiving packets to cluster nodes
· configuration queries
· connectivity information (who can the current node hear packets from) - both for queries and state change notifications
· basic group membership services
1.2. CCM(Cluster Consensus Membership),确保集群中每个节点都可以正常进行通信,实现了OCF节点关系的API
1.3. Cluster Plumbing Library
1.4. IPC Library
1.5. logging daemon
2. Pacemaker:负责集群资源管理,start or stop各种各样的用户service,提供High Availablity,主要功能特性
2.1. Detection and recovery of node and service-level failures
2.2. Automatically replicated configuration that can be updated from any node
2.3. Ability to specify cluster-wide service ordering, colocation and anti-colocation
2.4. Support for advanced services type
· Clones: for services which need to be active on multiple nodes
· Multi-state: for services with multiple modes (eg. master/slave, primary/secondary)
2.5. Unified, scriptable, cluster shell
3. Resource Agents:标准化的cluster resource 接口,Most resource agents are coded as shell scripts, 实现的接口包括
· start: enable or start the given resource
· stop: disable or stop the given resource
· status: return the status of the given resource (running or not running)
· monitor: like status, but also check specifically for unexpected not running states
· validate: validate the resource's configuration
· meta-data: return information about the resource agent itself (used by GUIs and other management utilities, and documentation tools)
4. Cluster Glue:heartbeat/Pacemaker 栈之外的内容,包括LRM(Local Resource Manager),STONITH(Shoot The Other Node In The Head),集群通信的底层库等。
4.1. LRM, 介于CRM和Resource Agents之间的接口,It is itself not cluster aware, nor does it apply any policies。
· start a resource;
· stop a resource;
· monitor a resource;
· report a resource's status;
· list all resource instances it currently controls, and their status.
4.2. STONITH
4.3. hb_report
4.4. Cluster Plumbing Library,提供上层组件需要的底层库
组件版本兼容列表
· Heartbeat 3.0.3:由于Heartbeat 3的第一版就是3.0.2,所以3.0.3就是第一个build
· Cluster Glue 1.0.6
· Pacemaker 1.0.5:新的1.1分支的版本是1.1.2.1,但是没有看到下载路径
· GUI console(Pacemaker GUI)2.0:兼容Pacemaker 1.1,兼容1.0时需要修改一下代码
· Resource Agent 1.0.3:遵循OCF标准的shell scripts,暂无严格的版本兼容限制
由于都是采用Build Source的方式,在X86和X86_64的平台上会有些差别,主要体现在一些编译错误方面。
各组件的build依赖关系如下
BUILD ORDER Cluster Glue Heartbeat 3 Pacemaker Resource Agents Corosync Pacemaker GUI
OS: CentOS 5.4 X86_64
export
PREFIX=/usr/local
groupadd
-g 600 haclient
useradd
-g 600 -u 600 hacluster
yum
-y install libtool-ltdl-devel
yum
-y install intltool
yum
-y install gnutls-devel
yum
-y install gettext-devel
cd
/usr/local
wget
-O cluster-glue.tar.bz2
tar
jxvf cluster-glue.tar.bz2
mv
Reusable-Cluster-Components-glue-1.0.6 cluster-glue-1.0.6
cd
cluster-glue-1.0.6
./autogen.sh &&
./configure --prefix=$PREFIX
make
......
cc1: warnings being treated as errors
main.c:64: warning: function declaration
isn’t a prototype
main.c:78: warning: function declaration
isn’t a prototype
gmake[2]: *** [main.o] Error 1
gmake[2]: Leaving directory
`/usr/local/cluster-glue-1.0.6/lib/stonith'
gmake[1]: *** [all-recursive] Error 1
gmake[1]: Leaving directory
`/usr/local/cluster-glue-1.0.6/lib'
make: *** [all-recursive] Error 1
此时是一个编译文件配置的警告,可以修改编译文件使编译成功。
vi
lib/stonith/Makefile
delete -Werror
make &&
make install
cd
/usr/local
wget
-O cluster-resource-agents.tar.bz2
tar jxvf resource-agents.tar.bz2
mv
Cluster-Resource-Agents-7200186935f1 cluster-agents-1.0.3
cd cluster-agents-1.0.3/
./autogen.sh && ./configure --prefix=$PREFIX
(if i386, you also
should: ln -s /usr/local/lib/libplumb* /usr/lib/)
make
&& make install
cd
/usr/local
wget
-O heartbeat.tar.bz2 http://hg.linux-ha.org/dev/archive/tip.tar.bz2
tar jxvf heartbeat.tar.bz2
mv
Heartbeat-3-0-3fa50ef7c2bb heartbeat-3-0-3
cd
heartbeat-3-0-3/
./bootstrap
&& ./configure --prefix=$PREFIX
make
&& make install
cd
/usr/local
wget
-O pacemaker.tar.bz2
tar jxvf pacemaker.tar.bz2
mv
Pacemaker-1-0-49263d12452b/ pagemaker-1-0-9
cd
pagemaker-1-0-9/
./autogen.sh && ./configure --prefix=$PREFIX
make && make install
( reload libarary : ldconfig
–v)
cd /usr/local
wget
-O pacemaker-gui.tar.bz2
mv
Pacemaker-Python-GUI-94dfb7cb070d Pacemaker-Python-GUI-1992
cd
Pacemaker-Python-GUI-1992/
./bootstrap
--with-heartbeat-support --prefix=/usr/local
make
......
mgmt_crm.c: In function 'on_cleanup_rsc':
mgmt_crm.c:1307: warning: passing argument
9 of 'delete_attr' makes integer from pointer without a cast
mgmt_crm.c:1307: error: too many arguments
to function 'delete_attr'
mgmt_crm.c:1316: warning: passing argument
9 of 'update_attr' makes integer from pointer without a cast
mgmt_crm.c:1316: error: too many arguments
to function 'update_attr'
这是一个2.0的mgmt和1.0.* 的pacemaker的兼容问题
vi
mgmt/daemon/mgmt_crm.c
1.7 -
delete_attr(cib_conn, cib_sync_call, XML_CIB_TAG_STATUS, dest_node,
NULL,
1.8 +
delete_attr(cib_conn, cib_sync_call, XML_CIB_TAG_STATUS, dest_node,
NULL, NULL,
1.16 - XML_CIB_TAG_CRMCONFIG, NULL, NULL,
NULL, "last-lrm-refresh", now_s, FALSE);
1.17 + XML_CIB_TAG_CRMCONFIG, NULL, NULL,
NULL, NULL, "last-lrm-refresh", now_s, FALSE);
make
&& make install
ln -s
/usr/local/lib /usr/local/lib64
选择任一节点进行配置
vi
/usr/local/etc/ha.d/ha.cf
autojoin none
use_logd on
mcast eth0 239.0.0.43 694 1 0
bcast eth0
warntime 5
deadtime 15
initdead 60
keepalive 2
node mawebtest2
node mawebtest
node madbtest
crm respawn
apiauth mgmtd
uid=root
respawn root
/usr/local/lib64/heartbeat/mgmtd
vi
/usr/local/etc/ha.d/authkeys
auth 1
1 crc
chmod
600 /usr/local/etc/ha.d/authkeys
将heartbeat的配置分发到其他节点
/usr/local/lib64/heartbeat/ha_propagate
依次启动各个节点
service
heartbeat start
cp
/usr/local/etc/pam.d/hbmgmtd /etc/pam.d/
passwd
hacluster
...
资源也可以理解为Resource Agents,pacemake 支持4中资源类型:OCF,LSB,heartbeat,stonith
pacemaker对各种资源类型的支持有所不同,根据业务需求要进行严格测试。推荐使用OCF的Resource Agents。
OCF的Resource Agents分为heartbeat和pacemaker 2类,其中pacemaker是在OCF的基础上优化的一些agents。
Location,Order,Colocation,这个要重点理解
· hb_gui
· crm cli
/var/log/message
当然也可以启动heartbeat的日志系统来跟踪日志:/var/log/ha-log
service
logd start
· heartbeat 从2.1.4开始进行了项目重构,结构上比以前版本(2.1.3)清晰,在设计上面也更加复杂一些。
· 整个应用集成涉及的组件较多,各个组件之间的版本兼容问题随之而来,需要多加测试。
· 整个架构的核心,Cluster Resource Management,对业务需要的Resouce Agent要进行严格测试,尽管是跨平台的脚本,由于linux平台的灵活配置,还是需要对各个脚本进行测试和检查
· 在测试过程中发现一些问题,可能与现在整个项目还在开发有关。 请定期关注各个组件的版本变化,并进行版本升级。