Chinaunix首页 | 论坛 | 博客
  • 博客访问: 2825079
  • 博文数量: 587
  • 博客积分: 6356
  • 博客等级: 准将
  • 技术积分: 6410
  • 用 户 组: 普通用户
  • 注册时间: 2008-10-23 10:54
个人简介

器量大者,福泽必厚

文章分类

全部博文(587)

文章存档

2019年(3)

2018年(1)

2017年(29)

2016年(39)

2015年(66)

2014年(117)

2013年(136)

2012年(58)

2011年(34)

2010年(50)

2009年(38)

2008年(16)

分类: LINUX

2010-03-11 11:50:15

Setting Up A Highly Available NFS Server with (drbd+heartbeat)

1:drbd介绍:DRBD 是由内核模块和相关脚本而构成,用以构建高可用性的集群。其实现方式是通过网络来镜像整个设备。您可以把它看作是一种网络RAID

DRBD是一种块设备,可以被用于高可用(HA)之中.它类似于一个网络RAID-1功能.当你将数据写入本地
文件系统时,数据还将会被发送到网络中另一台主机上.以相同的形式记录在一个文件系统中.
本地(主节点)与远程主机(备节点)的数据可以保证实时同步.当本地系统出现故障时,远程主机上还会
保留有一份相同的数据,可以继续使用.

在高可用(HA)中使用DRBD功能,可以代替使用一个共享盘阵.因为数据同时存在于本地主机和远程主机上,
切换时,远程主机只要使用它上面的那份备份数据,就可以继续进行服务了.
DRBD的工作原理如下图:

        +--------+
        |  文件系统 |
        +--------+
             |
             V
        +----------+
        |   块设备层  |
        | (/dev/drbd1) |
        +----------+
         |            |
         |            |
         V           V
   +----------+  +-----------+
   |  本地硬盘   |   | 远程主机硬盘 |
   | (/dev/hdb1)  |   | (/dev/hdb1)  |
   +----------+  +-----------+

2:heartbeat介绍:

heartbeat 是可以从 Linux-HA 项目 Web 站点公开获得的软件包之一。它提供了所有 HA 系统所需要的基本功能,比如启动和停止资源、监测群集中系统的可用性、在群集中的节点间转移共享 IP 地址的所有者等。它通过串行线、以太网接口或者同时使用二者来监测特定服务(或多个服务)的健康状况。

 

 

上面的文档非常之好:

说明:make install时drbd被安装在:

[root@node2 block]# pwd
/lib/modules/2.6.9-42.ELsmp/kernel/drivers/block
[root@node2 block]# ls drbd.ko
drbd.ko
ddrbd相关工具(drbdadm,drbdsetup)被安装到/sbin下.
并会在/etc/init.d/下建立drbd启动脚本.

在启动DRBD之前,你需要分别在两台主机的hdb1分区上,创建供DRBD记录信息的数据块.分别在
两台主机上执行:

[root@g105-1 /]# drbdadm create-md r0
[root@g105-2 /]# drbdadm create-md r0

“r0”是我们在drbd.conf里定义的资源名称.
现在我们可以启动DRBD了,分别在两台主机上执行:

[root@g105-1 /]# /etc/init.d/drbd start
[root@g105-2 /]# /etc/init.d/drbd start
如果是drbd单独使用需要注意,但我们和heartbeat配合使用,就不需要上面的步骤!
DRBD的主备机切换

有时,你需要将DRBD的主备机互换一下.可以执行下面的操作:
在主机上,先要卸载掉DRBD设备.

[root@g105-1 /]# umount /mnt/drbd1

将主机降级为”备机”.

[root@g105-1 /]# drbdadm secondary r0
[root@g105-1 /]# cat /proc/drbd
version: 8.0.4 (api:86/proto:86)
SVN Revision: 2947 build by root@g105-1, 2007-07-28 07:13:14

 1: cs:Connected st:Secondary/Secondary ds:UpToDate/UpToDate C r---
    ns:0 nr:5 dw:5 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0

现在,两台主机都是”备机”.
在备机g105-2上,将它升级为”主机”.

[root@g105-2 /]# drbdadm primary r0
[root@g105-2 /]# cat /proc/drbd
version: 8.0.4 (api:86/proto:86)
SVN Revision: 2947 build by root@g105-2, 2007-07-28 07:13:14

 1: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
    ns:0 nr:5 dw:5 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0

过程中遇到的问题:

 

错误1

[root@node1 ~]# mkdir /data

[root@node1 ~]# mount -t ext3 /dev/drbd0  /data/

mount: block device /dev/drbd0 is write-protected, mounting read-only

mount: /dev/drbd0 already mounted or /data/ busy

原因:

Probably your DRBD resources are still Secondary (cat /proc/drbd to find out).
Please see the users guide on how to make resources primary..

解决方法:

[root@node1 ~]# drbdadm -- --do-what-I-say primary all

[root@node1 ~]# drbdadm -- connect all

[root@node1 ~]# cat /proc/drbd

version: 0.7.20 (api:79/proto:74)

SVN Revision: 2260 build by root@node1, 2010-03-11 09:38:07

 0: cs:Connected st:Primary/Secondary ld:Consistent

    ns:0 nr:0 dw:0 dr:6144831 al:0 bm:376 lo:0 pe:0 ua:0 ap:0

 1: cs:Unconfigured

[root@node1 ~]# mount -t ext3 /dev/drbd0  /data/

[root@node1 ~]#

错误2

root@node1 ~]# mv /var/lib/nfs/ /data/

mv: cannot remove directory `/var/lib/nfs//rpc_pipefs/statd': Operation not permitted

mv: cannot remove directory `/var/lib/nfs//rpc_pipefs/portmap': Operation not permitted

mv: cannot remove directory `/var/lib/nfs//rpc_pipefs/nfs': Operation not permitted

mv: cannot remove directory `/var/lib/nfs//rpc_pipefs/mount': Operation not permitted

mv: cannot remove directory `/var/lib/nfs//rpc_pipefs/lockd': Operation not permitted

解决方法:

[root@node1 ~]# service nfs status

Shutting down NFS mountd: rpc.mountd is stopped

nfsd is stopped

rpc.rquotad is stopped

[root@node1 ~]# service nfslock status

rpc.statd (pid 2528) is running...

[root@node1 ~]# service nfslock stop

Stopping NFS statd:                                        [  OK  ]

[root@node1 ~]# service rpcidmapd status

rpc.idmapd (pid 2557) is running...

[root@node1 ~]# service rpcidmapd stop

Shutting down RPC idmapd:                                  [  OK  ]

[root@node1 ~]# /bin/umount -a -t rpc_pipefs    #异常重要

[root@node1 nfs]# cd /var/lib/nfs/

[root@node1 nfs]# ls

rpc_pipefs

[root@node1 nfs]# ll

total 8

drwxr-xr-x  2 root root 4096 May 23  2006 rpc_pipefs

[root@node1 nfs]# cd rpc_pipefs/

[root@node1 rpc_pipefs]# ls

[root@node1 rpc_pipefs]# ll

total 0

[root@node1 lib]# rm -fr nfs/

[root@node1 lib]# ln -s /data/nfs  nfs

[root@node1 lib]# ll

total 180

drwxr-xr-x   2 root      root    4096 Mar 10 11:50 alternatives

*****

lrwxrwxrwx   1 root      root       9 Mar 11 11:00 nfs -> /data/nfs

***

 

drwxr-xr-x   2 root      root    4096 Mar 11 10:38 xkb

重新将该服务启动:

[root@node1 nfs]# /bin/mount -t rpc_pipefs sunrpc  /var/lib/nfs/rpc_pipefs

[root@node1 nfs]# service rpcidmapd start

Starting RPC idmapd:                                       [  OK  ]

[root@node1 nfs]#

上面的内容稍加改动就可以做成httpd(apache)的集群!(当然httpd,要设置成开机不随系统启动,它的启动和关闭是由heartbeat来决定的)

需要修改的地方:

1:

node2上:

[root@node2 www]# cat /etc/ha.d/haresources
node2  IPaddr::172.17.61.126/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 httpd   #红色字体为修改的部分

当然我的/etc/init.d下有httpd文件

node1上:
[root@node1 ha.d]# cat /etc/ha.d/haresources
node1  IPaddr::172.17.61.126/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 httpd
[root@node1 ha.d]#
2:在node1上

 

mount -t ext3 /dev/drbd0    /data

mv /var/www/html /data

cd /var/www

ln -s /data/html   html

cd html

echo "124">index.html ##为测试所用!

node2上:##当然data目录已经存在!

rm -fr /var/www/html

cd /var/www

ln -s /data/html  html

3:可以通过上面的方法来测试,也可以通过网页的显示内容和链接来测试:

当/etc/init.d/heartbeat stop和/etc/init.d/heartbeat start时后台日志中的输出,我们可以用tail -f /var/log/messages动态监控日志!

124上的动态监控时日志内容:

Mar 11 15:40:47 node1 Filesystem[5282]: INFO: Filesystem Success
Mar 11 15:40:47 node1 ResourceManager[5221]: info: Running /etc/ha.d/resource.d/drbddisk r0 stop
Mar 11 15:40:47 node1 kernel: drbd0: Primary/Secondary --> Secondary/Secondary
Mar 11 15:40:48 node1 ResourceManager[5221]: info: Running /etc/ha.d/resource.d/IPaddr 172.17.61.126/24/eth0 stop
Mar 11 15:40:48 node1 IPaddr[5504]: INFO: /sbin/route -n del -host 172.17.61.126
Mar 11 15:40:48 node1 IPaddr[5504]: INFO: /sbin/ifconfig eth0:0 172.17.61.126 down
Mar 11 15:40:48 node1 IPaddr[5504]: INFO: IP Address 172.17.61.126 released
Mar 11 15:40:48 node1 IPaddr[5422]: INFO: IPaddr Success
Mar 11 15:40:49 node1 kernel: drbd0: Secondary/Secondary --> Secondary/Primary
Mar 11 16:03:16 node1 kernel: drbd0: Secondary/Primary --> Secondary/Secondary
Mar 11 16:03:17 node1 heartbeat: [2483]: info: Received shutdown notice from 'node2'.
Mar 11 16:03:17 node1 heartbeat: [2483]: info: Resources being acquired from node2.
Mar 11 16:03:17 node1 harc[5556]: info: Running /etc/ha.d/rc.d/status status
Mar 11 16:03:17 node1 mach_down[5587]: info: mach_down takeover complete for node node2.
Mar 11 16:03:17 node1 IPaddr[5613]: INFO: IPaddr Resource is stopped
Mar 11 16:03:18 node1 heartbeat: [5557]: info: Local Resource acquisition completed.
Mar 11 16:03:18 node1 harc[5723]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
Mar 11 16:03:18 node1 ip-request-resp[5723]: received ip-request-resp IPaddr::172.17.61.126/24/eth0 OK no
Mar 11 16:03:18 node1 ResourceManager[5738]: info: Acquiring resource group: node1 IPaddr::172.17.61.126/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 httpd
Mar 11 16:03:19 node1 IPaddr[5762]: INFO: IPaddr Resource is stopped
Mar 11 16:03:19 node1 ResourceManager[5738]: info: Running /etc/ha.d/resource.d/IPaddr 172.17.61.126/24/eth0 start
Mar 11 16:03:19 node1 IPaddr[5973]: INFO: /sbin/ifconfig eth0:0 172.17.61.126 netmask 255.255.255.0
Mar 11 16:03:19 node1 IPaddr[5973]: INFO: Sending Gratuitous Arp for 172.17.61.126 on eth0:0 [eth0]
Mar 11 16:03:19 node1 IPaddr[5973]: INFO: /usr/lib64/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-172.17.61.126 eth0 172.17.61.126 auto 172.17.61.126 ffffffffffff
Mar 11 16:03:19 node1 IPaddr[5891]: INFO: IPaddr Success
Mar 11 16:03:19 node1 ResourceManager[5738]: info: Running /etc/ha.d/resource.d/drbddisk r0 start
Mar 11 16:03:19 node1 kernel: drbd0: Secondary/Secondary --> Primary/Secondary
Mar 11 16:03:20 node1 Filesystem[6172]: INFO: /data is unmounted (stopped)
Mar 11 16:03:20 node1 Filesystem[6108]: INFO: Filesystem Resource is stopped
Mar 11 16:03:20 node1 ResourceManager[5738]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 start
Mar 11 16:03:20 node1 kernel: kjournald starting. Commit interval 5 seconds
Mar 11 16:03:20 node1 kernel: EXT3 FS on drbd0, internal journal
Mar 11 16:03:20 node1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Mar 11 16:03:20 node1 Filesystem[6210]: INFO: Filesystem Success
Mar 11 16:03:20 node1 ResourceManager[5738]: info: Running /etc/init.d/httpd start
Mar 11 16:03:20 node1 httpd: httpd: Could not determine the server

 

125上的动态监控时日志内容:

Mar 11 15:59:20 node2 kernel: drbd0: Secondary/Secondary --> Secondary/Primary
Mar 11 15:59:21 node2 logd: [10117]: info: Pid 8026 exited
Mar 11 16:01:11 node2 sshd(pam_unix)[10120]: session opened for user root by root(uid=0)
Mar 11 16:01:29 node2 logd: [10161]: info: logd started with default configuration.
Mar 11 16:01:29 node2 logd: [10162]: info: G_main_add_SignalHandler: Added signal handler for signal 15
Mar 11 16:01:29 node2 logd: [10161]: info: G_main_add_SignalHandler: Added signal handler for signal 15
Mar 11 16:01:30 node2 heartbeat: [10311]: WARN: Traditional compression selected. Realtime behavior will likely be impacted(!)
Mar 11 16:01:30 node2 heartbeat: [10311]: info: See http://linux-ha.org/ha.cf/TraditionalCompressionDirective for more information.
Mar 11 16:01:30 node2 heartbeat: [10311]: WARN: Deprecated 'legacy' auto_failback option selected.
Mar 11 16:01:30 node2 heartbeat: [10311]: WARN: Please convert to 'auto_failback on'.
Mar 11 16:01:30 node2 heartbeat: [10311]: WARN: See documentation for conversion details.
Mar 11 16:01:30 node2 heartbeat: [10311]: WARN: Logging daemon is disabled --enabling logging daemon is recommended
Mar 11 16:01:30 node2 heartbeat: [10311]: info: **************************
Mar 11 16:01:30 node2 heartbeat: [10311]: info: Configuration validated. Starting heartbeat 2.0.4
Mar 11 16:01:30 node2 heartbeat: [10312]: info: heartbeat: version 2.0.4
Mar 11 16:01:30 node2 heartbeat: [10312]: info: Heartbeat generation: 6
Mar 11 16:01:30 node2 heartbeat: [10312]: info: G_main_add_TriggerHandler: Added signal manual handler
Mar 11 16:01:30 node2 heartbeat: [10312]: info: G_main_add_TriggerHandler: Added signal manual handler
Mar 11 16:01:30 node2 heartbeat: [10312]: info: Removing /var/run/heartbeat/rsctmp failed, recreating.
Mar 11 16:01:30 node2 heartbeat: [10312]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
Mar 11 16:01:30 node2 heartbeat: [10312]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
Mar 11 16:01:30 node2 heartbeat: [10312]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Mar 11 16:01:30 node2 heartbeat: [10312]: info: Local status now set to: 'up'
Mar 11 16:01:30 node2 heartbeat: [10312]: info: Exiting write_hostcachedata process 10318 returned rc 0.
Mar 11 16:01:31 node2 heartbeat: [10312]: info: Link node1:eth0 up.
Mar 11 16:01:31 node2 heartbeat: [10312]: info: Status update for node node1: status active
Mar 11 16:01:31 node2 heartbeat: [10312]: info: Link node2:eth0 up.
Mar 11 16:01:31 node2 heartbeat: [10312]: info: Exiting write_hostcachedata process 10320 returned rc 0.
Mar 11 16:01:31 node2 harc[10319]: info: Running /etc/ha.d/rc.d/status status
Mar 11 16:01:31 node2 heartbeat: [10312]: info: Comm_now_up(): updating status to active
Mar 11 16:01:31 node2 heartbeat: [10312]: info: Local status now set to: 'active'
Mar 11 16:01:32 node2 IPaddr[10357]: INFO: IPaddr Resource is stopped
Mar 11 16:01:32 node2 heartbeat: [10330]: info: Local Resource acquisition completed.
Mar 11 16:01:35 node2 kernel: drbd0: Secondary/Primary --> Secondary/Secondary
Mar 11 16:01:36 node2 harc[10468]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
Mar 11 16:01:36 node2 ip-request-resp[10468]: received ip-request-resp IPaddr::172.17.61.126/24/eth0 OK yes
Mar 11 16:01:36 node2 ResourceManager[10483]: info: Acquiring resource group: node2 IPaddr::172.17.61.126/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 httpd
Mar 11 16:01:36 node2 IPaddr[10507]: INFO: IPaddr Resource is stopped
Mar 11 16:01:36 node2 ResourceManager[10483]: info: Running /etc/ha.d/resource.d/IPaddr 172.17.61.126/24/eth0 start
Mar 11 16:01:37 node2 IPaddr[10718]: INFO: /sbin/ifconfig eth0:0 172.17.61.126 netmask 255.255.255.0
Mar 11 16:01:37 node2 IPaddr[10718]: INFO: Sending Gratuitous Arp for 172.17.61.126 on eth0:0 [eth0]
Mar 11 16:01:37 node2 IPaddr[10718]: INFO: /usr/lib64/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-172.17.61.126 eth0 172.17.61.126 auto 172.17.61.126 ffffffffffff
Mar 11 16:01:37 node2 IPaddr[10636]: INFO: IPaddr Success
Mar 11 16:01:37 node2 ResourceManager[10483]: info: Running /etc/ha.d/resource.d/drbddisk r0 start
Mar 11 16:01:37 node2 kernel: drbd0: Secondary/Secondary --> Primary/Secondary
Mar 11 16:01:37 node2 Filesystem[10917]: INFO: /data is unmounted (stopped)
Mar 11 16:01:37 node2 Filesystem[10853]: INFO: Filesystem Resource is stopped
Mar 11 16:01:37 node2 ResourceManager[10483]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 start
Mar 11 16:01:38 node2 kernel: kjournald starting. Commit interval 5 seconds
Mar 11 16:01:38 node2 kernel: EXT3 FS on drbd0, internal journal
Mar 11 16:01:38 node2 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Mar 11 16:01:38 node2 Filesystem[10955]: INFO: Filesystem Success
Mar 11 16:01:38 node2 ResourceManager[10483]: info: Running /etc/init.d/httpd start
Mar 11 16:01:38 node2 httpd: httpd: Could not determine the server

 

显然可以通过日志看清primary——>secondary 和secondary->primary间的转换!

显然用:

    ##172.17.61.126为virtual ip address,就是对外的ip

显然如下:

对124机器,

如图:

但124和125机器不可能同时运行的,肯定是一个为active,另一个为passive状态,我们在日志中可以看到它们的详细信息!而且从网页的显示效果也能看出来!

3:上面的内容稍作修改,即可做成两个mysql机器的HA.

阅读(2218) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~