Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1175634
  • 博文数量: 150
  • 博客积分: 2739
  • 博客等级: 少校
  • 技术积分: 2392
  • 用 户 组: 普通用户
  • 注册时间: 2010-12-07 12:28
文章分类

全部博文(150)

文章存档

2015年(2)

2014年(16)

2013年(10)

2012年(58)

2011年(64)

分类: LINUX

2011-09-15 22:15:44

Corosync的配置:

 配置准备工作:准备两台机器,分布分别是node1.a.org ,node2.a.org ,相应的IP地址:192.168.0.3 192.168.0.134 ,安装集群服务apachehttpd服务:

一:编辑/etc/host文件加入以下内容:

192.168.0.134                               node1.a.org node1

192.168.0.3                                      node2.a.org node2

1node1, node2上用hostname命名或者直接编辑/etc/sysconfig/network文件更改主机名

2、设置两个节点基于密钥进行ssh通信

  1. node1:
  2. #ssh-keygen –t rsa
  3. #ssh-copy-id –I /root/.ssh/id_rsa.pub node2
  4. node2:
  5.   #ssh-keygen –t rsa
  6. #ssh-copy-id –I /root/.ssh/id_rsa.pub node2

node1, node2上安装apache服务,为了测试在node1上创建含’node1.a.org’index.html文件,在node2上创建’node2.a.org’index.html确保服务能启动,这里采用yum安装:

  1. #yum install httpd –y
  2.         #chkconfig httpd stop
  3.         #chkconfig httpd off

二:安装软件包:

 libibverbs, librdmacm, lm_sensors, libtool-ltdl, openhpi-libs, openhpi, perl-TimeDate  1 将这些软件放在/root/cluster
  1. #cd /root/cluster
  2.           #yum –y localinstall *.rpm –nogpgcheck
 2编辑 配置corosync文件:
  1. # cp corosync.conf.example corosync.conf
  2. 在该文件中加入以下内容:
  3. service {
  4. ver: 0
  5. name: pacemaker
  6. }

  7. ai***ec {
  8. user:    root
  9. group: root
  10. }
  11.  将bindnet addr该成:bindnet addr: 192.168.0.0
 3 节点通信时生成认证密钥文件:
  1. #corosync-keygen
  2.             #scp –p authkey node:/etc/corosync
  3.             #mkdir /var/log/cluster

4:启动:  /etc/init.d/corosync start

  说明:以上操作是在node1节点中进行的,在节点node2上做相同的操作然后在node1节点上启动node2的服务:ssh node2 ‘/etc/init.d/corosync start’启动

验证启动corosync是否正常:

 查看corosync引擎是否正常启动:

  1. # grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/messages
  2. Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
  3. Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
  4. Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1397.
  5. Jun 14 19:03:49 node1 corosync[5120]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
  6. Jun 14 19:03:49 node1 corosync[5120]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.

查看初始化成员节点通知是否正常发出:
  1. # grep TOTEM /var/log/messages
  2. Jun 14 19:03:49 node1 corosync[5120]: [TOTEM ] Initializing transport (UDP/IP).
  3. Jun 14 19:03:49 node1 corosync[5120]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
  4. Jun 14 19:03:50 node1 corosync[5120]: [TOTEM ] The network interface [192.168.0.5] is now up.
  5. Jun 14 19:03:50 node1 corosync[5120]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
检查启动过程中是否有错误产生:
  1. # grep ERROR: /var/log/messages | grep -v unpack_resources
查看pacemaker是否正常启动:
  1. # grep pcmk_startup /var/log/messages
  2. Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: CRM: Initialized
  3. Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] Logging: Initialized pcmk_startup
  4. Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295
  5. Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Service: 9
  6. Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Local hostname: node1.a.org

配置集群服务:

web集群创建一个ip地址资源:
  1. # crm configure primitive WebIP ocf:heartbeat:IPaddr params ip=192.168.0.99
修改忽略quorum不能满足的集群状态检查:
  1. # crm configure property no-quorum-policy=ignore
为资源设置默认黏性值:
  1. # crm configure rsc_defaults resource-stickiness=100
  2. # crm configure property stonith-enabled=false
WebIPWebSite可能会运行于不同节点的问题,通过以下解决
  1. # crm configure colocation website-with-ip INFINITY: WebSite WebIP
确保website在魔鬼节点启动前先启动webip
  1. # crm configure order httpd-after-ip mandatory: WebIP WebSite
设置约束
  1. # crm configure location prefer-node1 WebSite rule 200: node1

node1,node2上启动corosync服务:

  通过游览器访问192.168.0.99看是否有效果,然后任意停止一个服务在此访问验证:

到现在配置openais完成:

 DRBD的配置

配置前,需要在node1,node2上添加一块硬盘并创建分区:          

#fdisk /dev/sdb         

安装软件包:

  drbd共有两部分组成:内核模块和用户空间的管理工具。其中drbd内核模块代码已经整合进Linux内核2.6.33以后的版本中,因此,如果您的内核版本高于此版本的话,你只需要安装管理工具即可;否则,您需要同时安装内核模块和管理工具两个软件包,并且此两者的版本号一定要保持对应。下载这些软件并安装在node1,node2做相同的操作配置
  1. # yum -y --nogpgcheck localinstall drbd83-8.3.8-1.el5.centos.i386.rpm kmod-drbd83-8.3.8-1.el5.centos.i686.rpm

配置drbd:    

主要配置/etc/drbd.conf文件:

  1. # cp /usr/share/doc/drbd83-8.3.8/drbd.conf /etc
  2. 配置/etc/drbd.d/global-common.conf
  3. global {
  4.         usage-count no;
  5.         # minor-count dialog-refresh disable-ip-verification
  6. }

  7. common {
  8.         protocol C;

  9.         handlers {
  10.                 pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
  11.                 pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
  12.                 local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
  13.                 # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
  14.                 # split-brain "/usr/lib/drbd/notify-split-brain.sh root";
  15.                 # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
  16.                 # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
  17.                 # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
  18.         }

  19.         startup {
  20.                 wfc-timeout 120;
  21.                 degr-wfc-timeout 120;
  22.         }

  23.         disk {
  24.                 on-io-error detach;
  25.                                 fencing resource-only;
  26.         }

  27.         net {
  28.                                 cram-hmac-alg "sha1";
  29.                                  shared-secret "mydrbdlab";
  30.         }

  31.         syncer {
  32.                 rate 100M;
  33.         }
  34. }

3、定义一个资源/etc/drbd.d/web.res,内容如下:

  1. resource web {
  2.   on node1.a.org {
  3.     device /dev/drbd0;
  4.     disk /dev/sdb1;
  5.     address 192.168.0.134:7789;
  6.     meta-disk internal;
  7.   }
  8.   on node2.a.org {
  9.     device /dev/drbd0;
  10.     disk /dev/sdb1;
  11.     address 192.168.0.3:7789;
  12.     meta-disk internal;
  13.   }
  14. }

初始化资源并启动服务:         

#drbdad create-md web

node1, node2上启动服务: # /etc/init.d/drbd start

设置主节点:

  1. # drbdsetup /dev/drbd0 primary –o

  2.  # drbdadm -- --overwrite-data-of-peer primary web

创建文件系统,文件系统的挂载的primary节点进行:
  1. # mke2fs -j -L DRBD /dev/drbd0
  2. # mkdir /web
  3. # mount /dev/drbd0 /web

验证drbd      

在主节点(node1)上/web的文件中复制一些内容并设置为从服务然后:

  1. #umount /web
  2.      #drbdadm secondary web
  3.   在node2上:drbdm primary web 设置为主节点
  4.         #mount /dev/drbd0 /web


有关cororync,drdb的命令有关介绍

  1. corosync常用命令
  2. corosync-keygen 生成密钥
  3. crm status 查看集群状态
  4. crm_verify –L 检查集群是否出现故障
  5. 在ra模式下:classes显示资源的子类
  6. crm_attribute 修改某个或全局属性
  7. crm_node 修改跟节点有关命令
  8.             crm_node –q 显示票数

cibadmin 集群配置修改工具

     常用 –Q显示CIB文档 , -E 清空CIB内容, -R 修改替换CIB, -D 删除某个选项, -d清空所有资源

           例:cibadmin –Q >/tem/qq.xml 修改qq.xml文件后在替换cibadmin –Q /tem/qq.xml

  删除某个资源:crm(live)configure#edit 直接编辑,或在该模式下用delete删除,或cibadmin

crm_shadow

   crm(live)configure ra# list ocf heartbeat 查看文件系统

资源约束:

  1. 位置:资源更乐意留在哪个节点上
  2.       help location 查看帮助
  3.       例:location Web_on_node1 Web 500: node1.a.org
  4. 次序:定义资源的先后顺序
  5.       help order 查看帮助
  6.       例:order WebServer_after_WebIP mandatory: WebServer:start WebIP
  7. 排序:是否能同时运行在两节点上
  8.      help colocation 查看帮助


DRBD常用命令介绍:

  1. # drbd-overview 查看主从
  2. # cat /proc/drbd 查看启动状态


crm交互式模式介绍:

     在shell中直接输入crm进入交互式:

  1. [root@node1 ~]# crm
  2. crm(live)# help        查看帮助

  3. This is the CRM command line interface program.

  4. Available commands:

  5.     cib manage shadow CIBs
  6.     resource resources management
  7.     node nodes management
  8.     options user preferences
  9.     configure CRM cluster configuration
  10.     ra resource agents information center
  11.     status show cluster status
  12.     quit,bye,exit exit the program
  13.     help show help
  14.     end,cd,up go back one level

  15. crm(live)#

在输入configure进入配置模式:

  1. crm(live)configure#
  2. crm(live)configure# cd  用来切换
  3. crm(live)# status  查看状态
  4. ============
  5. Last updated: Wed Sep 14 22:09:13 2011
  6. Stack: openais
  7. Current DC: node1.a.org - partition WITHOUT quorum
  8. Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
  9. 2 Nodes configured, 2 expected votes
  10. 2 Resources configured.
  11. ============

  12. Online: [ node1.a.org ]
  13. OFFLINE: [ node2.a.org ]

  14.  Master/Slave Set: MS_Webdrbd
  15.      Slaves: [ node1.a.org ]
  16.      Stopped: [ webdrbd:1 ]

ra 可以查看资源代理类型:

  1. crm(live)configure ra# classes
  2. heartbeat
  3. lsb
  4. ocf / heartbeat linbit pacemaker
  5. stonith

  在configure模式中配置完需要用commit提交才保存并能生效:

  1. crm node standby 在某个节点上执行将该节点将模拟故障
  2. crm node online让该节点重新上线

drbd+pacemaker配置:

    drbd配置如上下面配置pacemaker:

  1. [root@node1 ~]# crm configure show
  2. node node1.a.org
  3. node node2.a.org
  4. property $id="cib-bootstrap-options" \
  5.     dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
  6.     cluster-infrastructure="openais" \
  7.     expected-quorum-votes="2" \
  8.     no-quorum-policy="ignore" \ 确保含有此项
  9.     stonith-enabled="false" \ 确保含有此项
  10. [root@node1 ~]#
  11. [root@node1 ~]#
  12. [root@node1 ~]# /etc/init.d/drbd stop 将node1,node2的drbd关掉
  13. Stopping all DRBD resources: .
  14. [root@node1 ~]# chkconfig drbd off

配置drbd资源:

  1. ]# crm
  2. crm(live)# configure
  3. crm(live)configure# primitive webdrbd ocf:heartbeat:drbd params drbd_resource=web op monitor role=Master interval=50s timeout=30s op monitor role=Slave interval=60s timeout=30s
  4. WARNING: webdrbd: default timeout 20s for start is smaller than the advised 240
  5. WARNING: webdrbd: default timeout 20s for stop is smaller than the advised 100
  6. crm(live)configure# master MS_Webdrbd webdrbd meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
  7. crm(live)configure# show webdrbd
  8. primitive webdrbd ocf:heartbeat:drbd \
  9.     params drbd_resource="web" \
  10.     op monitor interval="50s" role="Master" timeout="30s" \
  11.     op monitor interval="60s" role="Slave" timeout="30s"
  12. crm(live)configure# show MS_Webdrbd
  13. ms MS_Webdrbd webdrbd \
  14.     meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
  15. crm(live)configure#

在node2上查看主机是否成为primary节点:

  1. # drbdadm role web
  2. Primary/Secondary

为Primary节点上的web资源创建自动挂载的集群服务


阅读(3335) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~