首页　| 　博文目录　| 　关于我

博客访问： 2865063
博文数量： 587
博客积分： 6356
博客等级：准将
技术积分： 6410
用户组：普通用户
注册时间： 2008-10-23 10:54

个人简介

器量大者，福泽必厚

文章分类

全部博文（587）

centos7（7）
kafka（1）
ansible（1）
jstorm（5）
docker（2）
keepalived（4）
glusterfs（4）
zabbix（54）
python（5）
varnish（11）
mongodb（3）
redis（17）
certificatelogin（4）
转载（5）
email（4）
kvm（12）
xen（6）
ftp（8）
squid（5）
php（28）
basic（46）
hadoop（11）
nginx（24）
highavalaibility（3）
troubleshooting（67）
virtualization（7）
apache（23）
mysql（98）
freebsd（4）
shell（33）
lvs（7）
flex（3）
tomcat（17）
svn（17）
linux（34）

install（6）

drbd+heartbeat（1）
生活（7）
未分配的博文（0）

文章存档

2019年（3）

2018年（1）

2017年（29）

2016年（39）

2015年（66）

2014年（117）

2013年（136）

2012年（58）

2011年（34）

2010年（50）

2009年（38）

2008年（16）

我的朋友

Setting Up A Highly Available NFS Server with (drbd+heartbeat)

1:drbd介绍：DRBD 是由内核模块和相关脚本而构成，用以构建高可用性的集群。其实现方式是通过网络来镜像整个设备。您可以把它看作是一种网络RAID。

DRBD是一种块设备,可以被用于高可用(HA)之中.它类似于一个网络RAID-1功能.当你将数据写入本地
文件系统时,数据还将会被发送到网络中另一台主机上.以相同的形式记录在一个文件系统中.
本地(主节点)与远程主机(备节点)的数据可以保证实时同步.当本地系统出现故障时,远程主机上还会
保留有一份相同的数据,可以继续使用.

在高可用(HA)中使用DRBD功能,可以代替使用一个共享盘阵.因为数据同时存在于本地主机和远程主机上,
切换时,远程主机只要使用它上面的那份备份数据,就可以继续进行服务了.
DRBD的工作原理如下图:

        +--------+
        |  文件系统 |
        +--------+
             |
             V
        +----------+
        |   块设备层  |
        | (/dev/drbd1) |
        +----------+
         |            |
         |            |
         V           V
   +----------+  +-----------+
   |  本地硬盘   |   | 远程主机硬盘 |
   | (/dev/hdb1)  |   | (/dev/hdb1)  |
   +----------+  +-----------+

2：heartbeat介绍：

heartbeat 是可以从 Linux-HA 项目 Web 站点公开获得的软件包之一。它提供了所有 HA 系统所需要的基本功能，比如启动和停止资源、监测群集中系统的可用性、在群集中的节点间转移共享 IP 地址的所有者等。它通过串行线、以太网接口或者同时使用二者来监测特定服务（或多个服务）的健康状况。

上面的文档非常之好：

说明：make install时drbd被安装在：

[root@node2 block]# pwd
/lib/modules/2.6.9-42.ELsmp/kernel/drivers/block
[root@node2 block]# ls drbd.ko
drbd.ko
ddrbd相关工具(drbdadm,drbdsetup)被安装到/sbin下.
并会在/etc/init.d/下建立drbd启动脚本.

在启动DRBD之前,你需要分别在两台主机的hdb1分区上,创建供DRBD记录信息的数据块.分别在
两台主机上执行:

[root@g105-1 /]# drbdadm create-md r0
[root@g105-2 /]# drbdadm create-md r0

“r0”是我们在drbd.conf里定义的资源名称.
现在我们可以启动DRBD了,分别在两台主机上执行:

[root@g105-1 /]# /etc/init.d/drbd start
[root@g105-2 /]# /etc/init.d/drbd start

如果是drbd单独使用需要注意，但我们和heartbeat配合使用，就不需要上面的步骤！

DRBD的主备机切换有时,你需要将DRBD的主备机互换一下.可以执行下面的操作: 
在主机上,先要卸载掉DRBD设备. 
[root@g105-1 /]# umount /mnt/drbd1
将主机降级为”备机”. 
[root@g105-1 /]# drbdadm secondary r0
[root@g105-1 /]# cat /proc/drbd
version: 8.0.4 (api:86/proto:86)
SVN Revision: 2947 build by root@g105-1, 2007-07-28 07:13:14

 1: cs:Connected st:Secondary/Secondary ds:UpToDate/UpToDate C r---
    ns:0 nr:5 dw:5 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0
现在,两台主机都是”备机”. 
在备机g105-2上,将它升级为”主机”. 
[root@g105-2 /]# drbdadm primary r0
[root@g105-2 /]# cat /proc/drbd
version: 8.0.4 (api:86/proto:86)
SVN Revision: 2947 build by root@g105-2, 2007-07-28 07:13:14

 1: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
    ns:0 nr:5 dw:5 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0

过程中遇到的问题：

错误1：

[root@node1 ~]# mkdir /data

[root@node1 ~]# mount -t ext3 /dev/drbd0 /data/

mount: block device /dev/drbd0 is write-protected, mounting read-only

mount: /dev/drbd0 already mounted or /data/ busy

原因：

Probably your DRBD resources are still Secondary (cat /proc/drbd to find out).
Please see the users guide on how to make resources primary..

解决方法：

[root@node1 ~]# drbdadm -- --do-what-I-say primary all

[root@node1 ~]# drbdadm -- connect all

[root@node1 ~]# cat /proc/drbd

version: 0.7.20 (api:79/proto:74)

SVN Revision: 2260 build by root@node1, 2010-03-11 09:38:07

0: cs:Connected st:Primary/Secondary ld:Consistent

ns:0 nr:0 dw:0 dr:6144831 al:0 bm:376 lo:0 pe:0 ua:0 ap:0

1: cs:Unconfigured

[root@node1 ~]# mount -t ext3 /dev/drbd0 /data/

[root@node1 ~]#

错误2：

root@node1 ~]# mv /var/lib/nfs/ /data/

mv: cannot remove directory `/var/lib/nfs//rpc_pipefs/statd': Operation not permitted

mv: cannot remove directory `/var/lib/nfs//rpc_pipefs/portmap': Operation not permitted

mv: cannot remove directory `/var/lib/nfs//rpc_pipefs/nfs': Operation not permitted

mv: cannot remove directory `/var/lib/nfs//rpc_pipefs/mount': Operation not permitted

mv: cannot remove directory `/var/lib/nfs//rpc_pipefs/lockd': Operation not permitted

解决方法：

[root@node1 ~]# service nfs status

Shutting down NFS mountd: rpc.mountd is stopped

nfsd is stopped

rpc.rquotad is stopped

[root@node1 ~]# service nfslock status

rpc.statd (pid 2528) is running...

[root@node1 ~]# service nfslock stop

Stopping NFS statd: [ OK ]

[root@node1 ~]# service rpcidmapd status

rpc.idmapd (pid 2557) is running...

[root@node1 ~]# service rpcidmapd stop

Shutting down RPC idmapd: [ OK ]

[root@node1 ~]# /bin/umount -a -t rpc_pipefs #异常重要

[root@node1 nfs]# cd /var/lib/nfs/

[root@node1 nfs]# ls

rpc_pipefs

[root@node1 nfs]# ll

total 8

drwxr-xr-x 2 root root 4096 May 23 2006 rpc_pipefs

[root@node1 nfs]# cd rpc_pipefs/

[root@node1 rpc_pipefs]# ls

[root@node1 rpc_pipefs]# ll

total 0

[root@node1 lib]# rm -fr nfs/

[root@node1 lib]# ln -s /data/nfs nfs

[root@node1 lib]# ll

total 180

drwxr-xr-x 2 root root 4096 Mar 10 11:50 alternatives

*****

lrwxrwxrwx 1 root root 9 Mar 11 11:00 nfs -> /data/nfs

***

drwxr-xr-x 2 root root 4096 Mar 11 10:38 xkb

重新将该服务启动：

[root@node1 nfs]# /bin/mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs

[root@node1 nfs]# service rpcidmapd start

Starting RPC idmapd: [ OK ]

[root@node1 nfs]#

上面的内容稍加改动就可以做成httpd（apache）的集群！（当然httpd，要设置成开机不随系统启动，它的启动和关闭是由heartbeat来决定的）

需要修改的地方：

1：

node2上：

[root@node2 www]# cat /etc/ha.d/haresources
node2 IPaddr::172.17.61.126/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 httpd #红色字体为修改的部分

当然我的/etc/init.d下有httpd文件

node1上：
[root@node1 ha.d]# cat /etc/ha.d/haresources
node1 IPaddr::172.17.61.126/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 httpd
[root@node1 ha.d]#
2:在node1上

mount -t ext3 /dev/drbd0 /data

mv /var/www/html /data

cd /var/www

ln -s /data/html html

cd html

echo "124">index.html ##为测试所用！

node2上：##当然data目录已经存在！

rm -fr /var/www/html

cd /var/www

ln -s /data/html html

3：可以通过上面的方法来测试，也可以通过网页的显示内容和链接来测试：

当/etc/init.d/heartbeat stop和/etc/init.d/heartbeat start时后台日志中的输出，我们可以用tail -f /var/log/messages动态监控日志！

124上的动态监控时日志内容：

Mar 11 15:40:47 node1 Filesystem[5282]: INFO: Filesystem Success Mar 11 15:40:47 node1 ResourceManager[5221]: info: Running /etc/ha.d/resource.d/drbddisk r0 stop Mar 11 15:40:47 node1 kernel: drbd0: Primary/Secondary --> Secondary/Secondary Mar 11 15:40:48 node1 ResourceManager[5221]: info: Running /etc/ha.d/resource.d/IPaddr 172.17.61.126/24/eth0 stop Mar 11 15:40:48 node1 IPaddr[5504]: INFO: /sbin/route -n del -host 172.17.61.126 Mar 11 15:40:48 node1 IPaddr[5504]: INFO: /sbin/ifconfig eth0:0 172.17.61.126 down Mar 11 15:40:48 node1 IPaddr[5504]: INFO: IP Address 172.17.61.126 released Mar 11 15:40:48 node1 IPaddr[5422]: INFO: IPaddr Success Mar 11 15:40:49 node1 kernel: drbd0: Secondary/Secondary --> Secondary/Primary Mar 11 16:03:16 node1 kernel: drbd0: Secondary/Primary --> Secondary/Secondary Mar 11 16:03:17 node1 heartbeat: [2483]: info: Received shutdown notice from 'node2'. Mar 11 16:03:17 node1 heartbeat: [2483]: info: Resources being acquired from node2. Mar 11 16:03:17 node1 harc[5556]: info: Running /etc/ha.d/rc.d/status status Mar 11 16:03:17 node1 mach_down[5587]: info: mach_down takeover complete for node node2. Mar 11 16:03:17 node1 IPaddr[5613]: INFO: IPaddr Resource is stopped Mar 11 16:03:18 node1 heartbeat: [5557]: info: Local Resource acquisition completed. Mar 11 16:03:18 node1 harc[5723]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp Mar 11 16:03:18 node1 ip-request-resp[5723]: received ip-request-resp IPaddr::172.17.61.126/24/eth0 OK no Mar 11 16:03:18 node1 ResourceManager[5738]: info: Acquiring resource group: node1 IPaddr::172.17.61.126/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 httpd Mar 11 16:03:19 node1 IPaddr[5762]: INFO: IPaddr Resource is stopped Mar 11 16:03:19 node1 ResourceManager[5738]: info: Running /etc/ha.d/resource.d/IPaddr 172.17.61.126/24/eth0 start Mar 11 16:03:19 node1 IPaddr[5973]: INFO: /sbin/ifconfig eth0:0 172.17.61.126 netmask 255.255.255.0 Mar 11 16:03:19 node1 IPaddr[5973]: INFO: Sending Gratuitous Arp for 172.17.61.126 on eth0:0 [eth0] Mar 11 16:03:19 node1 IPaddr[5973]: INFO: /usr/lib64/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-172.17.61.126 eth0 172.17.61.126 auto 172.17.61.126 ffffffffffff Mar 11 16:03:19 node1 IPaddr[5891]: INFO: IPaddr Success Mar 11 16:03:19 node1 ResourceManager[5738]: info: Running /etc/ha.d/resource.d/drbddisk r0 start Mar 11 16:03:19 node1 kernel: drbd0: Secondary/Secondary --> Primary/Secondary Mar 11 16:03:20 node1 Filesystem[6172]: INFO: /data is unmounted (stopped) Mar 11 16:03:20 node1 Filesystem[6108]: INFO: Filesystem Resource is stopped Mar 11 16:03:20 node1 ResourceManager[5738]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 start Mar 11 16:03:20 node1 kernel: kjournald starting. Commit interval 5 seconds Mar 11 16:03:20 node1 kernel: EXT3 FS on drbd0, internal journal Mar 11 16:03:20 node1 kernel: EXT3-fs: mounted filesystem with ordered data mode. Mar 11 16:03:20 node1 Filesystem[6210]: INFO: Filesystem Success Mar 11 16:03:20 node1 ResourceManager[5738]: info: Running /etc/init.d/httpd start Mar 11 16:03:20 node1 httpd: httpd: Could not determine the server

125上的动态监控时日志内容：

Mar 11 15:59:20 node2 kernel: drbd0: Secondary/Secondary --> Secondary/Primary Mar 11 15:59:21 node2 logd: [10117]: info: Pid 8026 exited Mar 11 16:01:11 node2 sshd(pam_unix)[10120]: session opened for user root by root(uid=0) Mar 11 16:01:29 node2 logd: [10161]: info: logd started with default configuration. Mar 11 16:01:29 node2 logd: [10162]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Mar 11 16:01:29 node2 logd: [10161]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Mar 11 16:01:30 node2 heartbeat: [10311]: WARN: Traditional compression selected. Realtime behavior will likely be impacted(!) Mar 11 16:01:30 node2 heartbeat: [10311]: info: See http://linux-ha.org/ha.cf/TraditionalCompressionDirective for more information. Mar 11 16:01:30 node2 heartbeat: [10311]: WARN: Deprecated 'legacy' auto_failback option selected. Mar 11 16:01:30 node2 heartbeat: [10311]: WARN: Please convert to 'auto_failback on'. Mar 11 16:01:30 node2 heartbeat: [10311]: WARN: See documentation for conversion details. Mar 11 16:01:30 node2 heartbeat: [10311]: WARN: Logging daemon is disabled --enabling logging daemon is recommended Mar 11 16:01:30 node2 heartbeat: [10311]: info: ************************** Mar 11 16:01:30 node2 heartbeat: [10311]: info: Configuration validated. Starting heartbeat 2.0.4 Mar 11 16:01:30 node2 heartbeat: [10312]: info: heartbeat: version 2.0.4 Mar 11 16:01:30 node2 heartbeat: [10312]: info: Heartbeat generation: 6 Mar 11 16:01:30 node2 heartbeat: [10312]: info: G_main_add_TriggerHandler: Added signal manual handler Mar 11 16:01:30 node2 heartbeat: [10312]: info: G_main_add_TriggerHandler: Added signal manual handler Mar 11 16:01:30 node2 heartbeat: [10312]: info: Removing /var/run/heartbeat/rsctmp failed, recreating. Mar 11 16:01:30 node2 heartbeat: [10312]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0 Mar 11 16:01:30 node2 heartbeat: [10312]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1 Mar 11 16:01:30 node2 heartbeat: [10312]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Mar 11 16:01:30 node2 heartbeat: [10312]: info: Local status now set to: 'up' Mar 11 16:01:30 node2 heartbeat: [10312]: info: Exiting write_hostcachedata process 10318 returned rc 0. Mar 11 16:01:31 node2 heartbeat: [10312]: info: Link node1:eth0 up. Mar 11 16:01:31 node2 heartbeat: [10312]: info: Status update for node node1: status active Mar 11 16:01:31 node2 heartbeat: [10312]: info: Link node2:eth0 up. Mar 11 16:01:31 node2 heartbeat: [10312]: info: Exiting write_hostcachedata process 10320 returned rc 0. Mar 11 16:01:31 node2 harc[10319]: info: Running /etc/ha.d/rc.d/status status Mar 11 16:01:31 node2 heartbeat: [10312]: info: Comm_now_up(): updating status to active Mar 11 16:01:31 node2 heartbeat: [10312]: info: Local status now set to: 'active' Mar 11 16:01:32 node2 IPaddr[10357]: INFO: IPaddr Resource is stopped Mar 11 16:01:32 node2 heartbeat: [10330]: info: Local Resource acquisition completed. Mar 11 16:01:35 node2 kernel: drbd0: Secondary/Primary --> Secondary/Secondary Mar 11 16:01:36 node2 harc[10468]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp Mar 11 16:01:36 node2 ip-request-resp[10468]: received ip-request-resp IPaddr::172.17.61.126/24/eth0 OK yes Mar 11 16:01:36 node2 ResourceManager[10483]: info: Acquiring resource group: node2 IPaddr::172.17.61.126/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 httpd Mar 11 16:01:36 node2 IPaddr[10507]: INFO: IPaddr Resource is stopped Mar 11 16:01:36 node2 ResourceManager[10483]: info: Running /etc/ha.d/resource.d/IPaddr 172.17.61.126/24/eth0 start Mar 11 16:01:37 node2 IPaddr[10718]: INFO: /sbin/ifconfig eth0:0 172.17.61.126 netmask 255.255.255.0 Mar 11 16:01:37 node2 IPaddr[10718]: INFO: Sending Gratuitous Arp for 172.17.61.126 on eth0:0 [eth0] Mar 11 16:01:37 node2 IPaddr[10718]: INFO: /usr/lib64/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-172.17.61.126 eth0 172.17.61.126 auto 172.17.61.126 ffffffffffff Mar 11 16:01:37 node2 IPaddr[10636]: INFO: IPaddr Success Mar 11 16:01:37 node2 ResourceManager[10483]: info: Running /etc/ha.d/resource.d/drbddisk r0 start Mar 11 16:01:37 node2 kernel: drbd0: Secondary/Secondary --> Primary/Secondary Mar 11 16:01:37 node2 Filesystem[10917]: INFO: /data is unmounted (stopped) Mar 11 16:01:37 node2 Filesystem[10853]: INFO: Filesystem Resource is stopped Mar 11 16:01:37 node2 ResourceManager[10483]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 start Mar 11 16:01:38 node2 kernel: kjournald starting. Commit interval 5 seconds Mar 11 16:01:38 node2 kernel: EXT3 FS on drbd0, internal journal Mar 11 16:01:38 node2 kernel: EXT3-fs: mounted filesystem with ordered data mode. Mar 11 16:01:38 node2 Filesystem[10955]: INFO: Filesystem Success Mar 11 16:01:38 node2 ResourceManager[10483]: info: Running /etc/init.d/httpd start Mar 11 16:01:38 node2 httpd: httpd: Could not determine the server

显然可以通过日志看清primary——>secondary 和secondary->primary间的转换！

显然用：

##172.17.61.126为virtual ip address，就是对外的ip

显然如下：

对124机器，

如图：

但124和125机器不可能同时运行的，肯定是一个为active，另一个为passive状态，我们在日志中可以看到它们的详细信息！而且从网页的显示效果也能看出来！

3:上面的内容稍作修改，即可做成两个mysql机器的HA.

阅读(2255) | 评论(0) | 转发(0) |

上一篇：Block bitmap for group 0 is not in group

下一篇：unable resovle the 'label=/*'

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6