Chinaunix首页 | 论坛 | 博客
  • 博客访问: 232044
  • 博文数量: 61
  • 博客积分: 2482
  • 博客等级: 少校
  • 技术积分: 675
  • 用 户 组: 普通用户
  • 注册时间: 2009-03-02 11:03
文章分类

全部博文(61)

文章存档

2012年(1)

2011年(1)

2010年(52)

2009年(7)

分类: LINUX

2010-09-10 13:33:37

Heartbeat 3.0+ Pacemaker

OCF: Open Cluster Framework

常用组件说明



Cluster Glue

GUI console

 

Pacemaker

Resource Agents

Heartbeat

组件关系模型

1.        heartbeat: 负责集群节点的通信和节点状态的后台进程,需和CRMCluster Resource Manager)组件(Pacemaker)一起提供用户service

1.1.    通信模块,基于ipv4UDP 单播、多播、广播

·       intra-cluster communication - sending and receiving packets to cluster nodes

·       configuration queries

·       connectivity information (who can the current node hear packets from) - both for queries and state change notifications

·       basic group membership services

1.2.    CCM(Cluster Consensus Membership),确保集群中每个节点都可以正常进行通信,实现了OCF节点关系的API

1.3.    Cluster Plumbing Library

1.4.    IPC Library

1.5.    logging daemon

2.        Pacemaker:负责集群资源管理,start or stop各种各样的用户service,提供High Availablity,主要功能特性

2.1.    Detection and recovery of node and service-level failures

2.2.    Automatically replicated configuration that can be updated from any node

2.3.    Ability to specify cluster-wide service ordering, colocation and anti-colocation

2.4.    Support for advanced services type

·       Clones: for services which need to be active on multiple nodes

·       Multi-state: for services with multiple modes (eg. master/slave, primary/secondary)

2.5.    Unified, scriptable, cluster shell

3.        Resource Agents:标准化的cluster resource 接口,Most resource agents are coded as shell scripts, 实现的接口包括

·       start: enable or start the given resource

·       stop: disable or stop the given resource

·       status: return the status of the given resource (running or not running)

·       monitor: like status, but also check specifically for unexpected not running states

·       validate: validate the resource's configuration

·       meta-data: return information about the resource agent itself (used by GUIs and other management utilities, and documentation tools)

4.        Cluster Glueheartbeat/Pacemaker 栈之外的内容,包括LRMLocal Resource Manager),STONITHShoot The Other Node In The Head),集群通信的底层库等。

4.1.    LRM, 介于CRMResource Agents之间的接口,It is itself not cluster aware, nor does it apply any policies

·       start a resource;

·       stop a resource;

·       monitor a resource;

·       report a resource's status;

·       list all resource instances it currently controls, and their status.

4.2.    STONITH

4.3.    hb_report

4.4.    Cluster Plumbing Library,提供上层组件需要的底层库

组件版本兼容列表

·       Heartbeat 3.0.3:由于Heartbeat 3的第一版就是3.0.2,所以3.0.3就是第一个build

·       Cluster Glue 1.0.6

·       Pacemaker 1.0.5:新的1.1分支的版本是1.1.2.1,但是没有看到下载路径

·       GUI consolePacemaker GUI2.0:兼容Pacemaker 1.1,兼容1.0时需要修改一下代码

·       Resource Agent 1.0.3:遵循OCF标准的shell scripts,暂无严格的版本兼容限制

安装各组件

由于都是采用Build Source的方式,在X86X86_64的平台上会有些差别,主要体现在一些编译错误方面。

各组件的build依赖关系如下

BUILD  ORDER

Cluster Glue

Heartbeat 3

Pacemaker

Resource Agents

Corosync

Pacemaker GUI

安装准备

OS: CentOS 5.4 X86_64

export PREFIX=/usr/local

groupadd -g 600 haclient

useradd -g 600 -u 600 hacluster

yum -y install libtool-ltdl-devel

yum -y install intltool

yum -y install gnutls-devel

yum -y install gettext-devel

安装Cluster Glue

cd /usr/local

wget -O cluster-glue.tar.bz2

tar jxvf cluster-glue.tar.bz2

mv Reusable-Cluster-Components-glue-1.0.6 cluster-glue-1.0.6

cd cluster-glue-1.0.6

./autogen.sh  &&  ./configure --prefix=$PREFIX 

make

......

cc1: warnings being treated as errors

main.c:64: warning: function declaration isn’t a prototype

main.c:78: warning: function declaration isn’t a prototype

gmake[2]: *** [main.o] Error 1

gmake[2]: Leaving directory `/usr/local/cluster-glue-1.0.6/lib/stonith'

gmake[1]: *** [all-recursive] Error 1

gmake[1]: Leaving directory `/usr/local/cluster-glue-1.0.6/lib'

make: *** [all-recursive] Error 1

此时是一个编译文件配置的警告,可以修改编译文件使编译成功。

vi lib/stonith/Makefile

delete -Werror

make && make install

安装Resource Agents

cd /usr/local

wget -O cluster-resource-agents.tar.bz2

 tar jxvf resource-agents.tar.bz2

mv Cluster-Resource-Agents-7200186935f1 cluster-agents-1.0.3

cd cluster-agents-1.0.3/

./autogen.sh   && ./configure --prefix=$PREFIX

(if i386, you also should: ln -s /usr/local/lib/libplumb* /usr/lib/)

make && make install

安装Heartbeat 3

cd /usr/local

wget -O heartbeat.tar.bz2 http://hg.linux-ha.org/dev/archive/tip.tar.bz2

 tar jxvf heartbeat.tar.bz2

mv Heartbeat-3-0-3fa50ef7c2bb heartbeat-3-0-3

cd heartbeat-3-0-3/

./bootstrap && ./configure --prefix=$PREFIX

make && make install

安装Pacemaker 1.0

cd /usr/local

wget -O pacemaker.tar.bz2

 tar jxvf pacemaker.tar.bz2

mv Pacemaker-1-0-49263d12452b/ pagemaker-1-0-9

cd pagemaker-1-0-9/

./autogen.sh  &&  ./configure --prefix=$PREFIX

 make &&  make install

( reload libarary : ldconfig –v)

安装Pacemaker GUI

cd /usr/local

wget -O pacemaker-gui.tar.bz2

mv Pacemaker-Python-GUI-94dfb7cb070d Pacemaker-Python-GUI-1992

cd Pacemaker-Python-GUI-1992/

./bootstrap --with-heartbeat-support --prefix=/usr/local

make

......

mgmt_crm.c: In function 'on_cleanup_rsc':

mgmt_crm.c:1307: warning: passing argument 9 of 'delete_attr' makes integer from pointer without a cast

mgmt_crm.c:1307: error: too many arguments to function 'delete_attr'

mgmt_crm.c:1316: warning: passing argument 9 of 'update_attr' makes integer from pointer without a cast

mgmt_crm.c:1316: error: too many arguments to function 'update_attr'

这是一个2.0mgmt1.0.* pacemaker的兼容问题

vi mgmt/daemon/mgmt_crm.c

1.7 -    delete_attr(cib_conn, cib_sync_call, XML_CIB_TAG_STATUS, dest_node, NULL,

1.8 +    delete_attr(cib_conn, cib_sync_call, XML_CIB_TAG_STATUS, dest_node, NULL, NULL,

1.16 -        XML_CIB_TAG_CRMCONFIG, NULL, NULL, NULL, "last-lrm-refresh", now_s, FALSE);

1.17 +        XML_CIB_TAG_CRMCONFIG, NULL, NULL, NULL, NULL, "last-lrm-refresh", now_s, FALSE);

make && make install

ln -s /usr/local/lib /usr/local/lib64

配置Heartbeat并启动

选择任一节点进行配置

vi /usr/local/etc/ha.d/ha.cf

autojoin none

use_logd on

mcast eth0 239.0.0.43 694 1 0

bcast eth0

warntime 5

deadtime 15

initdead 60

keepalive 2

node mawebtest2

node mawebtest

node madbtest

crm respawn

apiauth         mgmtd   uid=root

respawn         root    /usr/local/lib64/heartbeat/mgmtd

vi /usr/local/etc/ha.d/authkeys

auth 1

1 crc

chmod 600 /usr/local/etc/ha.d/authkeys

heartbeat的配置分发到其他节点

/usr/local/lib64/heartbeat/ha_propagate

依次启动各个节点

service heartbeat start

配置各个节点的GUI

cp /usr/local/etc/pam.d/hbmgmtd  /etc/pam.d/

passwd hacluster

...

配置集群资源

资源也可以理解为Resource Agentspacemake 支持4中资源类型:OCFLSBheartbeatstonith

pacemaker对各种资源类型的支持有所不同,根据业务需求要进行严格测试。推荐使用OCFResource Agents

OCFResource Agents分为heartbeatpacemaker 2类,其中pacemaker是在OCF的基础上优化的一些agents

配置各个资源的关系

LocationOrderColocation,这个要重点理解

配置工具说明

·       hb_gui

·       crm cli

日志检查

/var/log/message

当然也可以启动heartbeat的日志系统来跟踪日志:/var/log/ha-log

service logd start

资源配置示例

文档参考列表

说明

·       heartbeat 2.1.4开始进行了项目重构,结构上比以前版本(2.1.3)清晰,在设计上面也更加复杂一些。

·       整个应用集成涉及的组件较多,各个组件之间的版本兼容问题随之而来,需要多加测试。

·       整个架构的核心,Cluster Resource Management,对业务需要的Resouce Agent要进行严格测试,尽管是跨平台的脚本,由于linux平台的灵活配置,还是需要对各个脚本进行测试和检查

·       在测试过程中发现一些问题,可能与现在整个项目还在开发有关。 请定期关注各个组件的版本变化,并进行版本升级。

阅读(6818) | 评论(1) | 转发(0) |
0

上一篇:nginx+samba installation

下一篇:8荣8耻

给主人留下些什么吧!~~

kevinbin2010-09-26 15:59:37

heartbeat 2.1.4 是不是将pacemaker整合了 我直接安装heartbeat 没有装pacemaker 也可以用crm。不太明白cluster glue 和heartbeat的关系