Corosync的配置:
配置准备工作:准备两台机器,分布分别是node1.a.org ,node2.a.org ,相应的IP地址:192.168.0.3 ,192.168.0.134 ,安装集群服务apache的httpd服务:
一:编辑/etc/host文件加入以下内容:
192.168.0.134 node1.a.org node1
192.168.0.3 node2.a.org node2
1在node1, node2上用hostname命名或者直接编辑/etc/sysconfig/network文件更改主机名
2、设置两个节点基于密钥进行ssh通信
- node1:
- #ssh-keygen –t rsa
- #ssh-copy-id –I /root/.ssh/id_rsa.pub node2
- node2:
- #ssh-keygen –t rsa
- #ssh-copy-id –I /root/.ssh/id_rsa.pub node2
在node1, node2上安装apache服务,为了测试在node1上创建含’node1.a.org’的index.html文件,在node2上创建’node2.a.org’的index.html确保服务能启动,这里采用yum安装:
- #yum install httpd –y
- #chkconfig httpd stop
- #chkconfig httpd off
二:安装软件包:
libibverbs, librdmacm, lm_sensors, libtool-ltdl, openhpi-libs, openhpi, perl-TimeDate 1 将这些软件放在
/root/cluster- #cd /root/cluster
-
#yum –y localinstall *.rpm –nogpgcheck
2编辑 配置corosync文件:- # cp corosync.conf.example corosync.conf
-
在该文件中加入以下内容:
-
service {
-
ver: 0
-
name: pacemaker
-
}
-
-
ai***ec {
-
user: root
-
group: root
-
}
-
将bindnet addr该成:bindnet addr: 192.168.0.0
3 节点通信时生成认证密钥文件:- #corosync-keygen
-
#scp –p authkey node:/etc/corosync
-
#mkdir /var/log/cluster
4:启动: /etc/init.d/corosync start
说明:以上操作是在node1节点中进行的,在节点node2上做相同的操作然后在node1节点上启动node2的服务:ssh node2 ‘/etc/init.d/corosync
start’启动
验证启动corosync是否正常:
查看corosync引擎是否正常启动:- # grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/messages
-
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
-
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
-
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1397.
-
Jun 14 19:03:49 node1 corosync[5120]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
-
Jun 14 19:03:49 node1 corosync[5120]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
查看初始化成员节点通知是否正常发出:- # grep TOTEM /var/log/messages
-
Jun 14 19:03:49 node1 corosync[5120]: [TOTEM ] Initializing transport (UDP/IP).
-
Jun 14 19:03:49 node1 corosync[5120]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
-
Jun 14 19:03:50 node1 corosync[5120]: [TOTEM ] The network interface [192.168.0.5] is now up.
-
Jun 14 19:03:50 node1 corosync[5120]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
检查启动过程中是否有错误产生:- # grep ERROR: /var/log/messages | grep -v unpack_resources
查看pacemaker是否正常启动:- # grep pcmk_startup /var/log/messages
-
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: CRM: Initialized
-
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] Logging: Initialized pcmk_startup
-
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295
-
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Service: 9
-
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Local hostname: node1.a.org
配置集群服务:
为web集群创建一个ip地址资源:- # crm configure primitive WebIP ocf:heartbeat:IPaddr params ip=192.168.0.99
修改忽略quorum不能满足的集群状态检查:- # crm configure property no-quorum-policy=ignore
为资源设置默认黏性值:- # crm configure rsc_defaults resource-stickiness=100
-
# crm configure property stonith-enabled=false
WebIP和WebSite可能会运行于不同节点的问题,通过以下解决- # crm configure colocation website-with-ip INFINITY: WebSite WebIP
确保website在魔鬼节点启动前先启动webip- # crm configure order httpd-after-ip mandatory: WebIP WebSite
设置约束- # crm configure location prefer-node1 WebSite rule 200: node1
在node1,node2上启动corosync服务:
通过游览器访问192.168.0.99看是否有效果,然后任意停止一个服务在此访问验证:
到现在配置openais完成:
DRBD的配置
配置前,需要在node1,node2上添加一块硬盘并创建分区:
#fdisk /dev/sdb
安装软件包:
drbd共有两部分组成:内核模块和用户空间的管理工具。其中drbd内核模块代码已经整合进Linux内核2.6.33以后的版本中,因此,如果您的内核版本高于此版本的话,你只需要安装管理工具即可;否则,您需要同时安装内核模块和管理工具两个软件包,并且此两者的版本号一定要保持对应。下载这些软件并安装在node1,node2做相同的操作配置- # yum -y --nogpgcheck localinstall drbd83-8.3.8-1.el5.centos.i386.rpm kmod-drbd83-8.3.8-1.el5.centos.i686.rpm
配置drbd:
主要配置/etc/drbd.conf文件:
- # cp /usr/share/doc/drbd83-8.3.8/drbd.conf /etc
-
配置/etc/drbd.d/global-common.conf
-
global {
-
usage-count no;
-
# minor-count dialog-refresh disable-ip-verification
-
}
-
-
common {
-
protocol C;
-
-
handlers {
-
pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
-
pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
-
local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
-
# fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
-
# split-brain "/usr/lib/drbd/notify-split-brain.sh root";
-
# out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
-
# before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
-
# after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
-
}
-
-
startup {
-
wfc-timeout 120;
-
degr-wfc-timeout 120;
-
}
-
-
disk {
-
on-io-error detach;
-
fencing resource-only;
-
}
-
-
net {
-
cram-hmac-alg "sha1";
-
shared-secret "mydrbdlab";
-
}
-
-
syncer {
-
rate 100M;
-
}
-
}
3、定义一个资源
/etc/drbd.d/web.res,内容如下:
- resource web {
-
on node1.a.org {
-
device /dev/drbd0;
-
disk /dev/sdb1;
-
address 192.168.0.134:7789;
-
meta-disk internal;
-
}
-
on node2.a.org {
-
device /dev/drbd0;
-
disk /dev/sdb1;
-
address 192.168.0.3:7789;
-
meta-disk internal;
-
}
-
}
初始化资源并启动服务:
#drbdad create-md web
在node1, node2上启动服务: # /etc/init.d/drbd start
设置主节点:
- # drbdsetup /dev/drbd0 primary –o
-
或
-
# drbdadm -- --overwrite-data-of-peer primary web
创建文件系统,文件系统的挂载的primary节点进行:- # mke2fs -j -L DRBD /dev/drbd0
-
# mkdir /web
-
# mount /dev/drbd0 /web
验证drbd:
在主节点(node1)上/web的文件中复制一些内容并设置为从服务然后:
- #umount /web
-
#drbdadm secondary web
-
在node2上:drbdm primary web 设置为主节点
-
#mount /dev/drbd0 /web
有关cororync,drdb的命令有关介绍
- corosync常用命令
-
corosync-keygen 生成密钥
-
crm status 查看集群状态
-
crm_verify –L 检查集群是否出现故障
-
在ra模式下:classes显示资源的子类
-
crm_attribute 修改某个或全局属性
-
crm_node 修改跟节点有关命令
-
crm_node –q 显示票数
cibadmin 集群配置修改工具
常用
–Q显示CIB文档 , -E 清空CIB内容, -R 修改替换CIB, -D 删除某个选项, -d清空所有资源
例:cibadmin –Q
>/tem/qq.xml 修改qq.xml文件后在替换cibadmin –Q /tem/qq.xml
删除某个资源:crm(live)configure#edit 直接编辑,或在该模式下用delete删除,或cibadmin
crm_shadow
crm(live)configure
ra# list ocf heartbeat 查看文件系统
资源约束:
- 位置:资源更乐意留在哪个节点上
-
help location 查看帮助
-
例:location Web_on_node1 Web 500: node1.a.org
-
次序:定义资源的先后顺序
-
help order 查看帮助
-
例:order WebServer_after_WebIP mandatory: WebServer:start WebIP
-
排序:是否能同时运行在两节点上
-
help colocation 查看帮助
DRBD常用命令介绍:
- # drbd-overview 查看主从
-
# cat /proc/drbd 查看启动状态
crm交互式模式介绍:
在shell中直接输入crm进入交互式:
- [root@node1 ~]# crm
-
crm(live)# help 查看帮助
-
-
This is the CRM command line interface program.
-
-
Available commands:
-
-
cib manage shadow CIBs
-
resource resources management
-
node nodes management
-
options user preferences
-
configure CRM cluster configuration
-
ra resource agents information center
-
status show cluster status
-
quit,bye,exit exit the program
-
help show help
-
end,cd,up go back one level
-
-
crm(live)#
在输入configure进入配置模式:
- crm(live)configure#
-
crm(live)configure# cd 用来切换
-
crm(live)# status 查看状态
-
============
-
Last updated: Wed Sep 14 22:09:13 2011
-
Stack: openais
-
Current DC: node1.a.org - partition WITHOUT quorum
-
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
-
2 Nodes configured, 2 expected votes
-
2 Resources configured.
-
============
-
-
Online: [ node1.a.org ]
-
OFFLINE: [ node2.a.org ]
-
-
Master/Slave Set: MS_Webdrbd
-
Slaves: [ node1.a.org ]
-
Stopped: [ webdrbd:1 ]
ra 可以查看资源代理类型:
- crm(live)configure ra# classes
-
heartbeat
-
lsb
-
ocf / heartbeat linbit pacemaker
-
stonith
在configure模式中配置完需要用commit提交才保存并能生效:
- crm node standby 在某个节点上执行将该节点将模拟故障
-
crm node online让该节点重新上线
drbd+pacemaker配置:
drbd配置如上下面配置pacemaker:
- [root@node1 ~]# crm configure show
-
node node1.a.org
-
node node2.a.org
-
property $id="cib-bootstrap-options" \
-
dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
-
cluster-infrastructure="openais" \
-
expected-quorum-votes="2" \
-
no-quorum-policy="ignore" \ 确保含有此项
-
stonith-enabled="false" \ 确保含有此项
-
[root@node1 ~]#
-
[root@node1 ~]#
-
[root@node1 ~]# /etc/init.d/drbd stop 将node1,node2的drbd关掉
-
Stopping all DRBD resources: .
-
[root@node1 ~]# chkconfig drbd off
配置drbd资源:
- ]# crm
-
crm(live)# configure
-
crm(live)configure# primitive webdrbd ocf:heartbeat:drbd params drbd_resource=web op monitor role=Master interval=50s timeout=30s op monitor role=Slave interval=60s timeout=30s
-
WARNING: webdrbd: default timeout 20s for start is smaller than the advised 240
-
WARNING: webdrbd: default timeout 20s for stop is smaller than the advised 100
-
crm(live)configure# master MS_Webdrbd webdrbd meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
-
crm(live)configure# show webdrbd
-
primitive webdrbd ocf:heartbeat:drbd \
-
params drbd_resource="web" \
-
op monitor interval="50s" role="Master" timeout="30s" \
-
op monitor interval="60s" role="Slave" timeout="30s"
-
crm(live)configure# show MS_Webdrbd
-
ms MS_Webdrbd webdrbd \
-
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
-
crm(live)configure#
在node2上查看主机是否成为primary节点:
- # drbdadm role web
-
Primary/Secondary
为Primary节点上的web资源创建自动挂载的集群服务
才
阅读(3335) | 评论(0) | 转发(0) |