分类: Mysql/postgreSQL
2017-04-19 21:27:12
基于 DRBD+Pacemaker+Corosync 实现高可用的 MySQL
注:在 Active/Standby 的架构体系中,永远只有 Active 主机在提供服务,Standby 主机不对外提供任何服务(包括 MySQL 的”读”). 一、环境搭建说明
【IP 配置】
node1:
- IP: 192.168.1.30 - HostName: node1
node2:
- IP: 192.168.1.31 - HostName: node2
【虚拟 IP(VIP):】
- IP: 192.168.1.99
【网络和服务器设置】 时间同步
# ntpdate cn.pool.ntp.org 【设置 Selinux】
可将 SELINUX 设置为 permissive 或 disabled
[root@node2 ~]# cat /etc/sysconfig/selinux
# This file controls the state of SELinux on the system. # SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded. SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
# targeted - Targeted processes are protected,
# mls - Multi Level Security protection. SELINUXTYPE=targeted
【iptables 防火墙设置】 这里为了方便,关闭防火墙 iptables:
# service iptables stop
iptables: Flushing firewall rules:
iptables: Setting chains to policy ACCEPT: filter iptables: Unloading modules:
# chkconfig iptables off
[ OK ]
[ OK ]
[ OK ] 注:实际环境中不必关闭防火墙,只需要开启相关端口即可(DRBD:7788-7789,CoroSync:3999-4000)
【设置机器 hostname】
[root@node2 ~]# cat /etc/sysconfig/network NETWORKING=yes
HOST NAME=node2
[root@node2 ~]# source /etc/sysconfig/network [root@node2 ~]# hostname $HOSTNAME
【HOSTS设置】
添加 hostname 到每台机器的/etc/hosts
[root@node2 ~]# cat /etc/hosts
...
192.168.1.30 node1.heyuxuan.com node1 192.168.1.31 node2.heyuxuan.com node2
建议:不使用外部的 DNS 服务(那样会成为额外的故障点),而是将这些 mappings 配置到每台机器的/etc/hosts 文件. 【配置 SSH 互信】
[root@node2 ~]# ssh-keygen -t rsa -b 1024 [root@node2 ~]# ssh-copy-id root@192.168.1.30 [root@node1 ~]# ssh-keygen -t rsa -b 1024 [root@node1 ~]# ssh-copy-id root@192.168.1.31
【升级 Linux 2.6.32 内核】
1.进入 yum 源配置目录 cd /etc/yum.repos.d
2.备份系统自带的 yum 源 mv CentOS-Base.repo CentOS-Base.repo.bak
3.下载 163 网易的 yum 源:wget
4.更改文件名 mv CentOS6-Base-163.repo CentOS-Base.repo
5.首先升级内核
[root@node1 yum.repos.d]# yum -y update kernel T otal download size: 48 M
Downloading Packages:
Setting up and reading Presto delta metadata
updates/prestodelta
Processing delta metadata
Package(s) data still to download: 48 M
(1/4): dracut-004-388.el6.noarch.rpm
(2/4): dracut-kernel-004-388.el6.noarch.rpm
(3/4): kernel-2.6.32-573.22.1.el6.x86_64.rpm
(4/4): kernel-firmware-2.6.32-573.22.1.el6.noarch.rpm 00:28
| 545 kB 00:00
---- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
Total
.....略
Installed:
kernel.x86_64 0:2.6.32-573.22.1.el6
Dependency Updated: dracut.noarch 0:004-388.el6
873kB/s| 48MB
00:56
dracut-kernel.noarch 0:004-388.el6
kernel-firmware.noarch 0:2.6.32-573.22.1.el6
| 125 kB | 26 kB
| 30 MB
| 18 MB
00:00 00:00
00:26
Complete!
[root@node1 yum.repos.d]# yum install kernel-devel
.....略
Running Transaction
Installing : kernel-devel-2.6.32-573.22.1.el6.x86_64 1/1 Verifying : kernel-devel-2.6.32-573.22.1.el6.x86_64 1/1
Installed:
kernel-devel.x86_64 0:2.6.32-573.22.1.el6
Complete!
6.下载安装
[root@node1 yum.repos.d]# rpm -Uvh Retrieving
warning: /var/tmp/rpm-tmp.AP86LX: Header V4 DSA/SHA1 Signature, key ID baadae52: NOKEY Preparing... ########################################### [100%]
1:elrepo-release 7.重启系统
二、DRBD 的安装与配置 #{
########################################### [100%]
============================================================== 【DRBD 下载与安装】--暂不启用,适用于 rhel-5 或者 centos 5 系列
==============================================================
drbd 共有两部分组成:内核模块和用户空间的管理工具。其中 drbd 内核模块代码已经整合进 Linux 内核 2.6.33 以后的版本中,因此,如果您的 内核版本高于此版本的话,你只需要安装管理工具即可;否则,您需要同时安装内核模块和管理工具两个软件包,并且此两者的版本号一定要保 持对应。
目前在用的 drbd 版本主要有 8.0、8.2 和 8.3 三个版本,其对应的 rpm 包的名字分别为 drbd, drbd82 和 drbd83,对应的内核模块的名字分别为 kmod-drbd, kmod-drbd82 和 kmod-drbd83。各版本的功能和配置等略有差异;我们实验所用的平台为 x86 且系统为 CentOS6.5,因此只需要同 时安装管理工具。
这里选用最新的 8.3 的版本(drbd83-8.3.8-1.el5.centos.i386.rpm 和 kmod-drbd83-8.3.8-1.el5.centos.i686.rpm); 下载地址为:。
实际使用中,需要根据自己的系统平台等下载符合您需要的软件包版本,这里不提供各版本的下载地址。
[root@node2 home]# yum --nogpgcheck localinstall drbd83-8.3.15-2.el5.centos.x86_64.rpm Dependencies Resolved
======================================================================================= ======================================================================================= ===
Package Arch Version Repository Size
======================================================================================= ======================================================================================= ===
Installing:
drbd83 x86_64 8.3.15-2.el5.centos /drbd83-8.3.15-2.el5.centos.x86_64 487 k
Transaction Summary
======================================================================================= ======================================================================================= ===
Install 1 Package(s)
T otal size: 487 k
Installed size: 487 k
Is this ok [y/N]: y Downloading Packages: Running rpm_check_debug Running T ransaction T est
T ransaction T est Succeeded Running Transaction
Installing : drbd83-8.3.15-2.el5.centos.x86_64 1/1 Verifying : drbd83-8.3.15-2.el5.centos.x86_64 1/1
Installed:
drbd83.x86_64 0:8.3.15-2.el5.centos
Complete!
======================================================================================= =================================================================================
}
本机实验环境:CentOS 6.5 X86_64 下采用 yum 源安装的方式:
1.安装 DRBD drbd83-utils kmod-drbd83
[root@node1 yum.repos.d]# yum -y install drbd83-utils kmod-drbd83 Stopping all DRBD resources: .
Verifying : drbd83-utils-8.3.16-1.el6.elrepo.x86_64 Verifying : kmod-drbd83-8.3.16-3.el6.elrepo.x86_64 Verifying : drbd83-8.3.15-2.el5.centos.x86_64
Installed:
drbd83-utils.x86_64 0:8.3.16-1.el6.elrepo
Replaced:
drbd83.x86_64 0:8.3.15-2.el5.centos
Complete!
2.加载 DRBD 模块到内核 modprobe drbd [root@node1 ~]# modprobe drbd
3.检测 DRBD 是否安装成功 lsmod | grep drbd
1/3 2/3
3/3
kmod-drbd83.x86_64 0:8.3.16-3.el6.elrepo
[root@node1 ~]# lsmod | grep drbd drbd 332493 0
#node2.heyuxuan.com 节点操作与上一致!
【DRBD 配置】
drbd 的主配置文件为/etc/drbd.conf;为了管理的便捷性,目前通常会将些配置文件分成多个部分,且都保存至/etc/drbd.d 目录中,主配置文件 中仅使用"include"指令将这些配置文件片断整合起来。通常,/etc/drbd.d 目录中的配置文件为 global_common.conf 和所有以.res 结尾的文件。 其中 global_common.conf 中主要定义 global段和 common 段,而每一个.res 的文件用于定义一个资源。
在配置文件中,globa l 段仅能出现一次,且如果所有的配置信息都保存至同一个配置文件中而不分开为多个文件的话,globa l 段必须位于配置文 件的最开始处。目前 global段中可以定义的参数仅有 minor-count, dialog-refresh, disable-ip-verification 和 usage-count。
common 段则用于定义被每一个资源默认继承的参数,可以在资源定义中使用的参数都可以在 common 段中定义。实际应用中,common 段并非 必须,但建议将多个资源共享的参数定义为 common 段中的参数以降低配置文件的复杂度。
resource 段则用于定义 drbd 资源,每个资源通常定义在一个单独的位于/etc/drbd.d 目录中的以.res 结尾的文件中。资源在定义时必须为其命名, 名字可以由非空白的 ASCII 字符组成。每一个资源段的定义中至少要包含两个 host 子段,以定义此资源关联至的节点,其它参数均可以从
common 段或 drbd 的默认中进行继承而无须定义。
配置所用参数说明:
RESOURCE: 资源名称
PROTOCOL: 使用协议”C”表示”同步的”,即收到远程的写入确认之后,则认为写入完成.
NET : 两个节点的 SHA1 key 是一样的
after-sb-0pri : “Split Brain”发生时且没有数据变更,两节点之间正常连接
after-sb-1pri : 如果有数据变更,则放弃辅设备数据,并且从主设备同步
after-sb-2pri : 如果前面的选择是不可能的,那么断开节点之间连接.这种情况下,要求手动处理”Split-Brain” rr-conflict: 假如前面的设置不能应用,并且 drbd 系统有角色冲突的话,系统自动断开节点间连接
DEVICE: 虚拟设备
DISK: 物理磁盘设备
MET A-DISK: Meta data 保存在同一个磁盘(sdb1) ON
下面的操作在 node1.heyuxuan.com 上完成。 1)复制样例配置文件为即将使用的配置文件:
# cp/usr/share/doc/drbd83-8.3.15/drbd.conf/etc/
[root@node1 drbd]# cat /etc/drbd.conf
#Youcanfindanexamplein /usr/share/doc/drbd.../drbd.conf.example
include "drbd.d/global_common.conf"; include "drbd.d/*.res";
2)配置/etc/drbd.d/globa l-common.conf
[root@node1 ~]# cat /etc/drbd.d/global_common.conf global{
usage-count yes;
# minor-count dialog-refresh disable-ip-verification }
common { protocol C;
handlers {
# These are EXAMPLE handlers only.
# They may have severe implications,
# like hard resetting the node under certain circumstances. # Be careful when chosing your poison.
pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq- trigger ; reboot -f";
pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
# split-brain "/usr/lib/drbd/notify-split-brain.sh root";
# out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
# before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k"; # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
}
startup {
# wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
}
disk {
# on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes # no-disk-drain no-md-flushes max-bio-bvecs
on-io-error detach;
}
net {
# sndbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers
# max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret # after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork cram-hmac-alg "sha1";
shared-secret "mydrbdlab";
}
syncer {
# rate after al-extents use-rle cpu-mask verify-alg csums-alg
rate 1000M; }
}
3、定义一个资源/etc/drbd.d/mydrbd.res,内容如下:
resource mydrbd { devices /dev/drbd0; disk /dev/sdb1; meta-disk internal; on node1 {
address 192.168.1.30:7789; }
on node2 {
address 192.168.1.31:7789;
} }
以上文件在两个节点上必须相同,因此,可以基于 ssh 将刚才配置的文件全部同步至另外一个节点。 # scp /etc/drbd.* node2:/etc
[root@node1 ~]# scp -r /etc/drbd.* node2:/etc/
drbd.conf 100% 100 0.1KB/s 00:00 global_common.conf 100% 1748 1.7KB/s 00:00 mydrbd.res
【创建 DRBD 资源及文件系统】 给资源(mydrbd)创建 meta data
Node1:
[root@node1 drbd.d]# drbdadm create-md mydrbd
--== Thank you for participating in the global usage survey ==-- The server's response is:
you are the 23215th user to install this version Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created. success
Node2:
[root@node2 drbd.d]# drbdadm create-md mydrbd
--== Thank you for participating in the global usage survey ==-- The server's response is:
you are the 23216th user to install this version Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created. success
【激活资源】
- 首先确保 drbd module 已经加载 查看是否加载:
# lsmod | grep drbd 若未加载,则需加载:
[root@node1 drbd.d]# modprobe drbd
[root@node1 drbd.d]# lsmod | grep drbd drbd 332493 0
- 启动 drbd 后台进程:
[root@node1 drbd]# drbdadm up mydrbd
[root@node2 drbd]# drbdadm up mydrbd
- 查看 drbd 状态: Node1:
[root@node1 drbd.d]#/etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37 m:res cs ro ds p mounted fstype
0:mydrbd Connected Secondary/Secondary Inconsistent/Inconsistent C
Node2:
[root@node2 drbd.d]#/etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37 m:res cs ro ds p mounted fstype
0:mydrbd Connected Secondary/Secondary Inconsistent/Inconsistent C
从上面的信息可以看到,DRBD 服务已经在两台机器上运行,但任何一台机器都不是主机器(“primary” host),因此无法访问到资源(block device).
【开始同步主节点数据】
- 仅在主节点操作(这里为 node1.heyuxuan.com).
[root@node1 drbd]# drbdadm -- --overwrite-data-of-peer primary mydrbd
- 查看同步状态:
[root@node1 drbd.d]# cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37
0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
ns:2914432 nr:0 dw:0 dr:2917060 al:0 bm:177 lo:0 pe:4 ua:15 ap:0 ep:1 wo:f oos:2322912
[==========>.........] sync'ed: 55.7% (2268/5112)M
finish: 0:00:55 speed: 41,592 (41,040) K/sec
[root@node1 drbd.d]# cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37
0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
ns:3002368 nr:0 dw:0 dr:3003076 al:0 bm:183 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:2234592 [==========>.........] sync'ed: 57.4% (2180/5112)M
finish: 0:00:53 speed: 41,812 (41,128) K/sec
#上面的输出结果的一些说明:
cs (connection state): 网络连接状态
ro (roles): 节点的角色(本节点的角色首先显示)
ds (disk states):硬盘的状态
复制协议: A, B or C(本配置是 C)
看到 drbd状态为”cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate”即表示同步结束.
也可以使用【DRBD-OVERVIWER 命令输出:】
[root@node1 drbd.d]# drbd-overview
0:mydrbd SyncSource Primary/Secondary UpToDate/Inconsistent C r-----
[>....................] sync'ed: 4.8% (4872/5112)M
[root@node1 drbd.d]# drbd-overview
0:mydrbd SyncSource Primary/Secondary UpToDate/Inconsistent C r-----
[>...................] sync'ed: 9.9% (4612/5112)M [root@node1 drbd.d]# drbd-overview
0:mydrbd SyncSource Primary/Secondary UpToDate/Inconsistent C r----- [=>..................] sync'ed: 10.8% (4568/5112)M
[root@node1 drbd.d]# drbd-overview
0:mydrbd SyncSource Primary/Secondary UpToDate/Inconsistent C r-----
[=>..................] sync'ed: 12.2% (4496/5112)
#node1 同步完成后输出 [root@node1 drbd.d]# drbd-overview
0:mydrbd ConnectedPrimary/SecondaryUpToDate/UpToDateCr-----
#node2 同步完成后输出
[root@node2 drbd.d]# drbd-overview
0:mydrbd ConnectedSecondary/PrimaryUpToDate/UpToDateCr-----
【创建文件系统】
- 在主节点(Node1)创建文件系统:
[root@node1 drbd]# mkfs -t ext4 /dev/drbd0 mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
519168 inodes, 2074312 blocks
103715 blocks (5.00%) reserved for the super user First data block=0
Maximum filesystem blocks=2126512128
64 block groups
32768 blocks per group, 32768 fragments per group
8112 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 39 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override.
注:没必要在辅节点(Node2)做同样的操作,因为 DRBD 会处理原始磁盘数据的同步.
另外,我们也不需要将这个 DRBD 系统挂载到任何一台机器(当然安装 MySQL 的时候需要临时挂载来安装 MySQL),因为集群管理软件会处理.还有
要确保复制的文件系统仅仅挂载在 Active 的主服务器上.
三、安装和配置 MySQL
【MySQL 5.6 安装】
创建 mysql 用户组/用户(Node1,Node2)
# groupadd mysql
# useradd -g mysql mysql
【安装 MySQL】(Node1,Node2)
# yum -y install gcc-c++ ncurses-devel cmake
# wget -c http://dev.mysql.com/get/Downloads/MySQL-5.6/mysql-5.6.10.tar.gz/from/http://cdn.mysql.com/ # tar zxvf mysql-5.6.10.tar.gz
# cd mysql-5.6.10
# cmake . -DCMAKE_INSTALL_PREFIX=/usr/local/mysql
# make && make install
【创建 DRBD 分区挂载目录】(Node1,Node2)
# mkdir /var/lib/mysql_drbd
# mkdir /var/lib/mysql
# chown mysql:mysql -R /var/lib/mysql_drbd # chown mysql:mysql -R /var/lib/mysql
【初始化 MySQL 数据库】
- 初始化之前先临时挂载 DRBD 文件系统到主节点(Node1)
[root@node1 ~]# mount /dev/drbd0 /var/lib/mysql_drbd/
- 初始化操作(Node1):
[root@node1 mysql]# cd /usr/local/mysql
[root@node1 mysql]# mkdir /var/lib/mysql_drbd/data
[root@node1 mysql]# chown -R mysql:mysql /var/lib/mysql_drbd/data
[root@node1 mysql]# chown -R mysql:mysql .
[root@node1 mysql]# scripts/mysql_install_db --datadir=/var/lib/mysql_drbd/data --user=mysql
- 初始化完成之后:
[root@node1 mysql]# cp support-files/mysql.server /etc/init.d/mysql [root@node1 mysql]# mv support-files/my-default.cnf /etc/my.cnf [root@node1 mysql]# chown mysql /etc/my.cnf
[root@node1 mysql]# chmod 644 /etc/my.cnf
[root@node1 mysql]# chown -R root . [root@node1 mysql]# cd /var/lib/mysql_drbd [root@node1 mysql_drbd]# chmod -R uog+rw * [root@node1 mysql_drbd]# chown -R mysql data
配置 MySQL(Node1):
[root@node1 mysql_drbd]# cat /etc/my.cnf #
# /etc/my.cnf
#
[client]
port socket
[mysqld]
port socket
datadir user #memlock
= 3306
= /var/lib/mysql/mysql.sock
= 3306
= /var/lib/mysql/mysql.sock
= /var/lib/mysql_drbd/data = mysql
= 1
= 3072 = 1024
= 64M = 64M
#table_open_cache #table_definition_cache max_heap_table_size tmp_table_size
# Connections
max_connections max_user_connections max_allowed_packet thread_cache_size
# Buffers
sort_buffer_size join_buffer_size read_buffer_size read_rnd_buffer_size
# Query Cache #query_cache_size
= 505 = 500
= 16M = 32
= 8M = 8M
= 2M
= 16M
= 64M
# InnoDB
#innodb_buffer_pool_size #innodb_data_file_path
#innodb_log_file_size #innodb_log_files_in_group
# MyISAM myisam_recover # Logging
#general-log = 0 #general_log_file
= 1G
= ibdata1:2G:autoextend
= 128M = 2
= backup,force
= /var/lib/mysql/mysql_general.log
log_warnings log_error
= 2
= /var/lib/mysql/mysql_error.log
#slow_query_log #slow_query_log_file #long_query_time #log_queries_not_using_indexes = 1 #min_examined_row_limit = 20
# Binary Log / Replication
server_id
log-bin binlog_cache_size #sync_binlog binlog_format expire_logs_days max_binlog_size
= 1
= mysql-bin
= 1M = 8
= row = 7
= 128M
= 1
= /var/lib/mysql/mysql_slow.log = 0.5
[mysqldump]
quick max_allowed_packet
[mysql] no_auto_rehash [myisamchk]
#key_buffer #sort_buffer_size read_buffer write_buffer
[mysqld_safe]
= 16M
= 512M = 512M
= 8M = 8M
open-files-limit = 8192
pid-file = /var/lib/mysql/mysql.pid
【主节点 Node1 测试 MySQL】
[root@node1 mysql_drbd]# /usr/local/mysql/bin/mysqld_safe --user=mysql > /dev/null & [root@node1 mysql_drbd]# mysql -uroot -p
Enter password:
WelcometotheMySQLmonitor. Commandsendwith;or\g.
Your MySQL connection id is 1
Server version: 5.6.10-log Source distribution
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> use test; Database changed mysql> show tables; Empty set (0.10 sec)
mysql> create table tbl (a int); Query OK, 0 rows affected (3.80 sec)
mysql> insert into tbl values (1), (2); Query OK, 2 rows affected (0.25 sec) Records: 2 Duplicates: 0 Warnings: 0
mysql> quit;
Bye
[root@node1 mysql_drbd]# /usr/local/mysql/bin/mysqladmin -uroot -p shutdown Enter password:
[1]+ Done /usr/local/mysql/bin/mysqld_safe --user=mysql > /dev/null
【在节点 Node1 卸载 DRBD 文件系统】 [root@node1 ~]# umount /var/lib/mysql_drbd
[root@node1 ~]# drbdadm secondary mydrbd
【将 DRBD 文件系统挂载节点 Node2】
[root@node2 ~]# drbdadm primary mydrbd
[root@node2 ~]# mount /dev/drbd0 /var/lib/mysql_drbd [root@node2 ~]# ll /var/lib/mysql_drbd/
total 20
drwxrwxrwx 5 mysql mysql 4096 Mar 12 09:30 data drwxrw-rw- 2 mysql mysql 16384 Mar 10 07:49 lost+found
【节点 Node2 上配置 MySQL 并测试】
[root@node2 ~]# scp node1:/etc/my.cnf /etc/my.cnf [root@node2 ~]# chown mysql /etc/my.cnf
[root@node2 ~]# chmod 644 /etc/my.cnf
[root@node2 ~]# cd /usr/local/mysql/
[root@node2 mysql]# cp support-files/mysql.server /etc/init.d/mysql [root@node2 mysql]# chown -R root:mysql .
【测试 MySQL:】
[root@node2 mysql]# /usr/local/mysql/bin/mysqld_safe --user=mysql > /dev/null & [1] 15864
[root@node2 mysql]# mysql -uroot -p
Enter password:
WelcometotheMySQLmonitor. Commandsendwith;or\g. Your MySQL connection id is 1
Server version: 5.6.10-log Source distribution
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> use test; Database changed mysql> select * from tbl; +------+
|a|
+------+
|1|
|2|
+------+
2 rows in set (0.26 sec)
mysql> quit
Bye
[root@node2 mysql]# /usr/local/mysql/bin/mysqladmin -uroot -p shutdown
Enter password:
[1]+ Done /usr/local/mysql/bin/mysqld_safe --user=mysql > /dev/null
在 Node2 上卸载 DRBD 文件系统,交由集群管理软件 Pacemaker 来管理
[root@node2 mysql]# umount /var/lib/mysql_drbd [root@node2 mysql]# drbdadm secondary mydrbd [root@node2 mysql]# drbd-overview
0:mydrbd/0 Connected Secondary/Secondary UpToDate/UpToDate C r----- [root@node2 mysql]#
四、Corosync 和 Pacemaker
【适用于 CentOS 6.5 X86_64 位系统安装】 [root@node1 corosync]# yum install corosync
[root@node1 corosync]# yum install pacemaker
##pacemaker 需要依赖 heartbeat 使用 letter_h.group.html
##需要自行下载 heartbeat-3.0.4-2.el6.x86_64 以及 heartbeat-libs-3.0.4-2.el6.x86_64 进行安装
[root@node1 ~]# yum -y --nogpgcheck localinstall heartbeat-3.0.4-2.el6.x86_64 heartbeat-libs-3.0.4-2.el6.x86_64
4、crmsh 的安装及使用简介 1.Pacemaker 配置资源方法
(1).命令配置方式 crmshpcs(2).图形配置方式 pyguihawkLCMCpcs
#注:本文主要的讲解的是 crmsh
2.安装 crmsh
RHEL 自 6.4 起不再提供集群的命令行配置工具 crmsh,转而使用 pcs;如果习惯了使用 crm 命令,可下载相关的程序包自行安装即可。crmsh 依 赖于 pssh,因此需要一并下载。
lustering:/Stable/CentOS_CentOS-6/x86_64/
python-pssh-2.3.1-4.2.x86_64 pssh-2.3.1-4.2.x86_64
l_6/com/crmsh-1.2.6-0.rc2.2.1.x86_64.rpm crmsh-1.2.6-0.rc2.2.1.x86_64
[root@node1 ~]# yum -y --nogpgcheck localinstall crmsh-1.2.6-0.rc2.2.1.x86_64.rpm pssh-2.3.1-4.2.x86_64.rpm python-pssh-2.3.1- 4.2.x86_64.rpm
##实验用到的 crmsh 以及 pssh 安装包的版本,到此为止,crmsh 安装完毕
【配置 corosync】(以下命令在 node1.heyuxuan.com 上执行) # cd /etc/corosync
# cp corosync.conf.example corosync.conf
接着编辑 corosync.conf,添加如下内容:
service {
ver: 0
name: pacemaker # use_mgmtd: yes
}
aisexec {
user: root
group: root }
并设定此配置文件中 bindnetaddr 后面的 IP 地址为你的网卡所在网络的网络地址,我们这里的两个节点在 192.168.1.0 网络,因此这里将其设定 为 192.168.1.0;如下
bindnetaddr: 192.168.1.0
【生成节点间通信时用到的认证密钥文件:】
# corosync-keygen
[root@node1corosync]#corosync-keygen
Corosy ncClusterEng ineAuthent icatio nkey generator . Gathering1024bitsforkeyfrom/dev/random. Presskeysonyourkeyboardtogenerateentropy . Presskeysonyourkeyboardtogenerateentropy(bits=192).
#注:corosync 生成 key 文件会默认调用/dev/random 随机数设备,一旦系统中断的 IRQS 的随机数不够用,将会产生大量的等待时间;
将 corosync 和 authkey 复制至 node2:
# scp -p corosync authkey node2:/etc/corosync/
分别为两个节点创建 corosync 生成的日志所在的目录: # mkdir /var/log/cluster
# ssh node2 'mkdir /var/log/cluster'
【尝试启动】(以下命令在 node1 上执行): # /etc/init.d/corosync start
查看 corosync 引擎是否正常启动:
# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
Apr 15 23:21:06 corosync [MAIN ] Corosync Cluster Engine ('1.4.1'): started and ready to provide service. Apr 15 23:21:06 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
查看初始化成员节点通知是否正常发出:
# grep TOTEM /var/log/cluster/corosync.log
Apr 15 23:21:06 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).
Apr 15 23:21:06 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Apr 15 23:21:06 corosync [TOTEM ] The network interface [192.168.1.30] is now up.
Apr 15 23:21:06 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Apr 15 23:21:20 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
检查启动过程中是否有错误产生:
# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources
Apr 16 00:10:03 corosync [pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.
Apr 16 00:10:03 corosync [pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of 'Clusters from Scratch' () for details on using Pacemaker with CMAN
#上面的错误信息表示 packmaker 不久之后将不再作为 corosync 的插件运行,因此,建议使用 cman 作为集群基础架构服务;此处可安全忽略。
查看 pacemaker 是否正常启动:
# grep pcmk_startup /var/log/cluster/corosync.log
Apr 15 23:21:06 corosync [pcmk ] info: pcmk_startup: CRM: Initialized
Apr 15 23:21:06 corosync [pcmk ] Logging: Initialized pcmk_startup
Apr 15 23:21:06 corosync [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615 Apr 15 23:21:06 corosync [pcmk ] info: pcmk_startup: Service: 9
Apr 15 23:21:06 corosync [pcmk ] info: pcmk_startup: Local hostname: node1
如果上面命令执行均没有问题,接着可以执行如下命令启动 node2 上的 corosync # ssh node2 -- /etc/init.d/corosync start
注意:启动 node2 需要在 node1 上使用如上命令进行,不要在 node2 节点上直接启动;
使用如下命令查看集群节点的启动状态:
# crm status
============
[root@node1 ~]# crm status
Last updated: Sat Apr 16 00:48:37 2016
Last change: Sat Apr 16 00:10:04 2016 via crmd on node2 Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
0 Resources configured
Online: [ node1 node2 ]
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
# crm_mon =============================
Last updated: Fri Apr 15 23:30:39 2016
Last change: Fri Apr 15 23:21:34 2016 via crmd on node2 Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
0 Resources configured
Online: [ node1 node2 ]
#注意:如果想要知道一个命令的具体的使用方式,可在命令的前面加上 help 关键字 crm(live)node# help status
Show nodes' status as XML. If the node parameter is omitted then all nodes are shown.
Usage: ...............
status [
从上面的信息可以看出两个节点都已经正常启动,并且集群已经处于正常工作状态。 执行 ps auxf 命令可以查看 corosync 启动的各相关进程。
189 root root 189
71694 0.5 1.0 94012 10108 ? 71695 0.0 0.3 94380 3480 ? 71696 0.0 0.2 76088 2628 ?
71697 0.0 0.2 89628 2916 ?
S 22:44 0:06 /usr/libexec/pacemaker/cib
S 22:44 0:00 /usr/libexec/pacemaker/stonithd S 22:44 0:00 /usr/libexec/pacemaker/lrmd
S 22:44 0:00 /usr/libexec/pacemaker/attrd
189 189 root
71698 0.0 1.8 117276 18368 ? S 22:44 0:00 /usr/libexec/pacemaker/pengine 71699 0.0 0.5 147784 5972 ? S 22:44 0:00 /usr/libexec/pacemaker/crmd
71736 0.0 1.5 228840 15900 pts/0 S+ 22:46 0:00 /usr/bin/python /usr/sbin/crm
6、配置集群的工作属性,禁用 stonith
corosync 默认启用了 stonith,而当前集群并没有相应的 stonith 设备,因此此默认配置目前尚不可用,这可以通过如下命令验正:
# crm_verify -L
crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity Errors found during check: config not valid
-V may provide more details
我们里可以通过如下命令先禁用 stonith:
# crm configure property stonith-enabled=false
使用如下命令查看当前的配置信息:
[root@node1 ~]# crm configure show node node1
node node2
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false"
从中可以看出 stonith 已经被禁用。
上面的 crm,crm_verify 命令是 1.0 后的版本的 pacemaker 提供的基于命令行的集群管理工具;可以在集群中的任何一个节点上执行。
资源配置
配置资源及约束
配置默认属性 查看已存在的配置:
[root@node1 ~]# crm configure show
node node1
node node2
property $id="cib-bootstrap-options" expected-quorum-votes="2"
检验配置是否正确:
dc-version="1.1.8-7.el6-394e906"
cluster-infrastructure="classic openais (with plugin)"
[root@node1 ~]# crm_verify -L -V
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity Errors found during check: config not valid
-V may provide more details 禁止 STONITH 错误:
[root@node1 ~]# crm configure property stonith-enabled=false [root@node1 ~]# crm_verify -L
让集群忽略 Quorum:
[root@node1 ~]# crm configure property no-quorum-policy=ignore
防止资源在恢复之后移动:
[root@node1 ~]# crm configure rsc_defaults resource-stickiness=100
设置操作的默认超时:
[root@node1 www]# crm configure property default-action-timeout=180s
设置默认的启动失败是否为致命的:
[root@node1 www]# crm configure property start-failure-is-fatal=false
配置 DRBD 资源
- 配置之前先停止 DRBD:
[root@node1 ~]# /etc/init.d/drbd stop [root@node2 ~]# /etc/init.d/drbd stop
【配置 DRBD 资源:】 - 将 drbd 定义成资源
[root@node1 ~]# crm configure primitive p_drbd_mysql ocf:linbit:drbd params drbd_resource=mydrbd op monitor role=Master interval=15s op start timeout=240s op stop timeout=100s
或者
crm(live)configure# primitive p_drbd_mysql ocf:linbit:drbd params drbd_resource=mydrbd op monitor role=Master interval=15s op start timeout=240s opstoptimeout=100s
- 配置 DRBD 资源主从关系(定义只有一个 Master 节点):
[root@node1 ~]# crm configure ms ms_drbd_mysql p_drbd_mysql meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true
或者
crm(live)configure# ms ms_drbd_mysql p_drbd_mysql meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
- 配置文件系统资源,定义挂载点(mount point):
[root@node1~]#crmconfigureprimitivep_fs_mysqlocf:heartbeat:Filesystem paramsdevice=/dev/drbd0directory=/var/lib/mysql_drbd/
fstype=ext4
或者
crm(live)configure# primitive p_fs_mysql ocf:heartbeat:Filesystem params device=/dev/drbd0 directory=/var/lib/mysql_drbd/ fstype=ext4
【配置 VIP 资源】
[root@node1 ~]# crm configure primitive primitive p_ip_mysql ocf:heartbeat:IPaddr params ip=192.168.1.99 cidr_netmask=24 op monitor
interval=30s
或者
crm(live)configure# primitive p_ip_mysql ocf:heartbeat:IPaddr params ip=192.168.1.99 cidr_netmask=24 op monitor interval=30s
【配置 MySQL 资源】 使用 LSB 方式(本文使用):
crm(live)configure# primitive p_mysql lsb:mysql op monitor interval=20s timeout=30s op start interval=0 timeout=180s op stop interval=0 timeout=240s
或使用 OCF 方式:
crm(live)configure# primitive p_mysql ocf:heartbeat:mysql params binary=/usr/local/mysql/bin/mysqld_safe config=/etc/my.cnf user=mysql group=mysqllog=/var/lib/mysql/mysql_error.log pid=/var/lib/mysql/mysql.pidsocket=/var/lib/mysql/mysql.sock datadir=/var/lib/mysql_drbd/data opmonitorinterval=60stimeout=60s opstarttimeout=180sopstoptimeout=240s
【组资源和约束】
通过”组”确保 DRBD,MySQL 和 VIP 是在同一个节点(Master)并且确定资源的启动/停止顺序. 启动: p_fs_mysql–>p_ip_mysql->p_mysql
停止: p_mysql–>p_ip_mysql–>p_fs_mysql
crm(live)configure# group g_mysql p_fs_mysql p_ip_mysql p_mysql
组 group_mysql 永远只在 Master 节点:
crm(live)configure# colocation c_mysql_on_drbd inf: g_mysql ms_drbd_mysql:Master
MySQL 的启动永远是在 DRBD Master 之后:
crm(live)configure# order o_drbd_before_mysql inf: ms_drbd_mysql:promote g_mysql:start
配置检查和提交
crm(live)configure# verify
crm(live)configure# commit crm(live)configure# quit
【查看集群状态和 failover 测试】 - 状态查看:
[root@node1 mysql]# crm_mon -1r
Last updated: Wed Mar 13 11:24:44 2013
Last change: Wed Mar 13 11:24:04 2013 via crm_attribute on node2 Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.8-7.el6-394e906
2 Nodes configured, 2 expected votes
5 Resources configured.
Online: [ node1 node2 ]
Full list of resources:
Master/Slave Set: ms_drbd_mysql [p_drbd_mysql] Masters: [ node1 ]
Slaves: [ node2 ]
Resource Group: g_mysql p_fs_mysql(ocf::heartbeat:Filesystem): Started node1 p_ip_mysql (ocf::heartbeat:IPaddr2): Started node1 p_mysql (lsb:mysql): Started node1
Failover 测试:
将 Node1 设置为 Standby 状态
[root@node1 ~]# crm node standby 过几分钟查看集群状态(若切换成功,则看到如下状态):
[root@node1 ~]# crm status
Last updated: Wed Mar 13 11:29:41 2013
Last change: Wed Mar 13 11:26:46 2013 via crm_attribute on node1 Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.8-7.el6-394e906
2 Nodes configured, 2 expected votes
5 Resources configured.
Node node1: standby Online: [ node2 ]
Master/Slave Set: ms_drbd_mysql [p_drbd_mysql] Masters: [ node2 ]
Stopped: [ p_drbd_mysql:1 ]
Resource Group: g_mysql p_fs_mysql(ocf::heartbeat:Filesystem): Started node2 p_ip_mysql (ocf::heartbeat:IPaddr2): Started node2
p_mysql (lsb:mysql): Started node2 将 Node1(node1)恢复 online 状态:
[root@node1 mysql]# crm node online
[root@node1 mysql]# crm status
Last updated: Wed Mar 13 11:32:49 2013
Last change: Wed Mar 13 11:31:23 2013 via crm_attribute on node1 Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum Version: 1.1.8-7.el6-394e906
2 Nodes configured, 2 expected votes
5 Resources configured.
Online: [ node1 node2 ]
Master/Slave Set: ms_drbd_mysql [p_drbd_mysql] Masters: [ node2 ]
Slaves: [ node1 ]
Resource Group: g_mysql p_fs_mysql(ocf::heartbeat:Filesystem): Started node2 p_ip_mysql (ocf::heartbeat:IPaddr2): Started node2 p_mysql (lsb:mysql): Started node2
“断网”即停止 Master 服务 避免因”断网”而发生”split brain”(“裂脑”)
利用 Pacemaker 去 ping 一个独立的网络(比如网络路由),当发现主机网络断网(被隔离)的时候,即阻止该主机为 DRBD master. [root@node1 ~]# crm configure
crm(live)configure# primitive p_ping ocf:pacemaker:ping params name="ping" > multiplier="1000" host_list="192.168.1.1" op monitor interval="15s" timeout="60s" > start timeout="60s"
由于两台主机需要运行 ping 去检查他们的网络连接,需要创建一个 clone (cl_ping),让 ping 资源可以运行在集群所有的主机上.
crm(live)configure# clone cl_ping p_ping meta interleave="true" 告诉 Pacemaker 如何处理 ping 的结果:
crm(live)configure# location l_drbd_master_on_ping ms_drbd_mysql rule $role="Master" > -inf: not_defined ping or ping number:lte 0 上面的例子表示:当主机没有 ping 的服务或是无法 ping 通至少一个节点的时候,就为该主机设置一个偏好分数(preference score)为负无穷大 (-inf),
从而让 location 约束(l_drbd_master_on_ping)控制 DRBD master 的资源地址. 验证和提交配置:
crm(live)configure# verify
WARNING: p_drbd_mysql: action monitor not advertised in meta-data, it may not be supported by the RA crm(live)configure# commit
crm(live)configure# quit
检查 ping 服务是否已经在运行:
[root@node1 ~]# crm_mon -1
Last updated: Thu Mar 14 01:02:14 2013
Last change: Thu Mar 14 01:01:20 2013 via cibadmin on node1 Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum Version: 1.1.8-7.el6-394e906
2 Nodes configured, 2 expected votes
7 Resources configured.
Online: [ node1 node2 ]
Master/Slave Set: ms_drbd_mysql [p_drbd_mysql] Masters: [ node2 ]
Slaves: [ node1 ]
Resource Group: g_mysql p_fs_mysql(ocf::heartbeat:Filesystem): Started node2 p_ip_mysql (ocf::heartbeat:IPaddr2): Started node2 p_mysql (lsb:mysql): Started node2
Clone Set: cl_ping [p_ping] Started: [ node1 node2 ]
断网测试
- 在当前 Master 停止网络服务:
[root@node2 ~]# service network stop [root@node1 ~]# crm resource status
Master/Slave Set: ms_drbd_mysql [p_drbd_mysql] Slaves: [ node1 ]
Stopped: [ p_drbd_mysql:1 ]
Resource Group: g_mysql p_fs_mysql(ocf::heartbeat:Filesystem): Stopped p_ip_mysql (ocf::heartbeat:IPaddr2): Stopped p_mysql (lsb:mysql): Stopped
Clone Set: cl_ping [p_ping] Started: [ node1 ] Stopped: [ p_ping:1 ]
- 恢复 Master 的网络服务:
[root@node2 ~]# service network stop [root@node1 ~]# crm resource status
Master/Slave Set: ms_drbd_mysql [p_drbd_mysql] Masters: [ node2 ]
Slaves: [ node1 ]
Resource Group: g_mysql p_fs_mysql(ocf::heartbeat:Filesystem): Started p_ip_mysql (ocf::heartbeat:IPaddr2): Started p_mysql (lsb:mysql): Started
Clone Set: cl_ping [p_ping] Started: [ node1 node2 ]
[root@node1 ~]# crm status
Last updated: Thu Mar 14 01:09:51 2013
Last change: Thu Mar 14 01:09:49 2013 via crmd on node1 Stack: classic openais (with plugin)
Current DC: node2 - partition with quorum
Version: 1.1.8-7.el6-394e906
2 Nodes configured, 2 expected votes 7 Resources configured.
Online: [ node1 node2 ]
Master/Slave Set: ms_drbd_mysql [p_drbd_mysql] Masters: [ node2 ]
Slaves: [ node1 ]
Resource Group: g_mysql p_fs_mysql(ocf::heartbeat:Filesystem): Started node2 p_ip_mysql (ocf::heartbeat:IPaddr2): Started node2 p_mysql (lsb:mysql): Started node2
Clone Set: cl_ping [p_ping] Started: [ node1 node2 ]
系统启动项设置
系统启动选项设置 由于 DRBD,MySQL
[root@node1 ~]# [root@node1 ~]# [root@node1 ~]# [root@node1 ~]# [root@node2 ~]# [root@node2 ~]# [root@node2 ~]# [root@node2 ~]#
等服务已经交由 Pacemaker 来管理,需要将他们的系统自启动选项关掉,同时确保 CoroSync 和 Pacemaker 随着系统启动.
chkconfig drbd off chkconfig mysql off chkconfig corosync on chkconfig pacemaker on chkconfig drbd off chkconfig mysql off chkconfig corosync on chkconfig pacemaker on
手动解决”Split -Bra in”
- 从”Split-Brain”中恢复
DRBD 的 Active/Standby 架构设计的两主机的数据因为某些原因也可能发生不一致.假如这种情况发生的话,DRBD 两主机之间将会中断连接(可以 通过/etc/init.d/drbd status 或 drbd-overview 查看他们的关系状态).如果查看日志(/var/log/messages)确定造成 DRBD 连接中断的原因是”Split- Brain”的话,那么就需要找出/确定拥有正确的数据的主机,然后让 DRBD 重新同步数据.
- 查看 DRBD 主机状态及查看日志:
[root@node1 ~]# cat /proc/drbd
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by dag@Build64R6, 2012-09-06 08:16:10
0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----- ns:32948 nr:0 dw:4 dr:34009 al:1 bm:9 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
[root@node1 ~]# cat /var/log/messages | grep Split-Brain
Mar 14 21:11:48 node1 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection! [root@node2 drbd.d]# cat /proc/drbd
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by dag@Build64R6, 2012-09-06 08:16:10
0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
ns:0 nr:32948 dw:32948 dr:0 al:0 bm:9 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 - 手动解决”Split-Brain”:
这里找到的”好数据”数据的主机为 node1.出现”坏数据”的主机为 node2. 在”坏数据”主机 node2 上:
[root@node2 ~]# drbdadm disconnect mydrbd
[root@node2 ~]# cat /proc/drbd
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by dag@Build64R6, 2012-09-06 08:16:10
0: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown r-----
ns:0 nr:32948 dw:32948 dr:0 al:0 bm:9 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
[root@node2 ~]# drbdadm secondary mydrbd
[root@node2 ~]# drbdadm -- --discard-my-data connect mydrbd 在”好数据”的主机 node1 上(如果下面的 cs:状态为 WFConnection,则无需下面操作.)
[root@node1 ~]# cat /proc/drbd
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by dag@Build64R6, 2012-09-06 08:16:10
0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----- ns:32948 nr:0 dw:4 dr:34009 al:1 bm:9 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
[root@node1 ~]# drbdadm connect mydrbd
[root@node1 ~]# /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by dag@Build64R6, 2012-09-06 08:16:10 m:res cs ro ds p mounted fstype
0:mydrbd Connected Primary/Secondary UpToDate/UpToDate C /var/lib/mysql_drbd ext4 Posted in DRBD, Use Cases.
Comments are closed.