DRBD的实验历程-benxiong-ChinaUnix博客

benxiongbenxiong.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

benxiong

博客访问： 1744156
博文数量： 163
博客积分： 10591
博客等级：上将
技术积分： 1980
用户组：普通用户
注册时间： 2006-08-08 18:17

文章分类

全部博文（163）

LOG（1）
English（0）
Economy（1）
Monitor（12）

Nagios（9）
Unix（9）

AIX（4）

HP-UX（5）
Middleware（3）

Weblogic（3）
linux kernel（1）
Cloud（9）

Nimbus（8）

IaaS（0）
Virtualization（5）

Xen（5）
Program（33）

VBS（1）

批处理（3）

Perl（2）

Shell（12）

Linux C/C++（15）

HTML（0）
Others（8）

Android（1）

Bank（1）

phone（4）
Database（33）

Oracle PL/SQL（0）

Oracle常见问题（10）

Oracle安装卸载（3）

Oracle性能优化（3）

Oracle（10）

MySQL（7）
Equipment（7）

Tivoli Storage M（1）

Storage（6）

Printer（0）
Windows（1）
linux operation（18）
Network（10）

Voice（1）

Router（5）

Wireless（0）

Switch（2）
linux server（12）

DNS（1）

LVS（4）

Torque（1）

svn-cvs（1）

Cluster（2）

HTTP（2）

SAMBA（0）

MediaWiki（0）

DHCP（0）

LDAP（0）

NIS（0）
未分配的博文（0）

文章存档

2018年（1）

2012年（1）

2011年（47）

2010年（58）

2009年（21）

2008年（35）

我的朋友

相关博文

DRBD的实验历程

分类： LINUX

2009-12-14 14:16:12

一、场景：

两台PC机各有一个空闲分区，把他们做成DRBD。每台PC机有两个网卡，以便于直接互联。

drbd 主机列表和预备分区	IP 地址	主机名
主机1(primary)｜/dev/sda4	eth0:172.16.0.178 eth1:192.168.0.1	drbd_test_1
主机2(secondary)｜/dev/hdb2	eth0:172.16.0.179 eth1:192.168.0.2	drbd_test_2

二、安装，还是很顺利的:（两台机器都要做，呵呵，废话）
1.tar zxf drbd-8.3.1.tar.gz
2.cd drbd-8.3.1
3.make
4.make all install
5.检查一下：modprobe drbd 然后：lsmod |grep drbd看看有没有了

三、初始的配置，也比较简单了（/etc/drbd.conf，两台机器一样的）：
global {
}

common {
syncer { rate 100M; }
}

resource db {
protocol C;
startup { wfc-timeout 0; degr-wfc-timeout 120; }
disk { on-io-error detach; }

on drbd_test_1 {
device /dev/drbd0;
disk /dev/sda4;
address 192.168.0.1:7791;
meta-disk internal;
}

on drbd_test_2 {
device /dev/drbd0;
disk /dev/hdb2;
address 192.168.0.2:7791;
meta-disk internal;
}
}

四、准备启动：
创建相应的元数据保存的数据块，两个机器都要执行一下：drbdadm create-md db。按照提示信息直接按回车即可

五、启动drbd服务：
两台机器都:service drbd start
查看启动状态：service drbd status 或者 cat /proc/drbd

六、定义主节点：
在server1上设置它为主节点：drbdadm primary db。
第一次执行会出现如下错误：
State change failed: (-2) Refusing to be Primary without at least one UpToDate disk Command 'drbdsetup /dev/drbd0 primary' terminated with exit code 11
此时要先用drbdsetup 来做，之后就可以用drbaadmin了：drbdsetup /dev/drbd0 primary –o
查看状态可以发现此时两块磁盘已经开始同步，时间会比较长的

七、创建设备文件：（两台都要创建，本例中创建一个drbd0就好）
mknod /dev/drbd0 b 147 0
mknod /dev/drbd1 b 147 1
........
创建多个：for i in $(seq 0 15) ; do mknod /dev/drbd$i b 147 $i ; done

注：从sles11上作试验时，这些设备已经存在，无须创建。

八、格式化drbd分区：(主server上执行，从设备会同步执行)
mkfs.ext3 /dev/drbd0

九、挂载：(主server上执行，从设备是不让挂载的)
mount /dev/drbd0 /data
注意：secondary节点上不允许对drbd设备进行任何操作，包括只读。所有的读写操作只能在primary节点上进行。知道primary节点挂掉之后，secondary节点才能提升成为primary节点，继续进行读写操作

十、实验历程
1. 把primary的eth1给ifdown了，然后直接在secondary上进行主的提升，并也给mount了，发现在primary上测试拷入的文件确实同步过来了。之后把primary的eth1恢复后，发现没有自动恢复主从关系，经过支持查询，发现出现了drbd检测出现了Split-Brain 的状况，两个节点各自都standalone了。
手动恢复Split-Brain状况：
在secondary上：
drbdadmin secondary db
drbdadmin -- --discard-my-data connect db
在primary上：
drbdadmin connect db

注： If DRBD detects that both nodes are (or were at some point, while disconnected) in the primary role, it immediately tears down the replication connection. The tell-tale sign of this is a message like the following appearing in the system log:

Split-Brain detected, dropping connection!

After split brain has been detected, one node will always have the resource in a StandAlone connection state. The other might either also be in the StandAlone state (if both nodes detected the split brain simultaneously), or in WFConnection (if the peer tore down the connection before the other node had a chance to detect split brain).

At this point, unless you configured DRBD to automatically recover from split brain, you must manually intervene by selecting one node whose modifications will be discarded (this node is referred to as the split brain victim). This intervention is made with the following commands:

drbdadm secondary resource
drbdadm -- --discard-my-data connect resource

On the other node (the split brain survivor), if its connection state is also StandAlone, you would enter:

drbdadm connect resource

You may omit this step if the node is already in the WFConnection state; it will then reconnect automatically.

If the resource affected by the split brain is a , use drbdadm --stacked instead of just drbdadm.

2.突然想到一点：实验的两个分区可以不一样大，应该将小的设置为primary，因为如果primary比secondary的空间大，那超出的部分。。。。不敢想。

3.drbd状态的意义：

输出文件上面最开始是drbd的版本信息，然后就是数据同步的一些状态信息：

cs — connection state

     o Unconfigured：设备在等待配置。
   o Unconnected：连接模块时的过渡状态。
   o WFConnection：设备等待另一测的配置。
   o WFReportParams：过渡状态，等待新TCP 连接的第一个数据包时。.
   o SyncingAll：正将主节点的所有模块复制到次级节点上。.
   o SyncingQuick：通过复制已被更新的模块（因为现在次级节点已经离开了集群）来更新次级节点。
   o Connected：一切正常。
   o Timeout：过渡状态。
st — node state (local/remote)
ld — local data consistency
ds — data consistency
ns — network send
nr — network receive
dw — disk write
dr — disk read
pe — pending (waiting for ack)
ua — unack’d (still need to send ack)
al — access log write count

4.两个节点切换的过程：
a)在primary上umount，然后执行drbdadm secondary all，改成secondary模式
b)在secondary上执行drbdadm primary all 改成primary模式，再mount文件系统

阅读(6497) | 评论(1) | 转发(0) |

上一篇：用脚本实时显示Linux网络流量

下一篇：Weblogic 9.23在SUSE10 SP2上的安装

给主人留下些什么吧！~~

chinaunix网友2009-12-28 21:16:03

一大一小的磁盘也没问题，两机在create的时候，大的那个超过部分就直接忽略了.

回复 | 举报

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6