Chinaunix首页 | 论坛 | 博客
  • 博客访问: 184284
  • 博文数量: 46
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 490
  • 用 户 组: 普通用户
  • 注册时间: 2017-03-26 14:22
个人简介

做最Low逼的DBA

文章分类

全部博文(46)

文章存档

2017年(46)

我的朋友

分类: Mysql/postgreSQL

2017-04-19 21:27:12

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 22.0px Helvetica; color: #000000} p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Helvetica; color: #000000} p.p3 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Tahoma; color: #000000} span.s1 {font: 11.0px Tahoma} span.s2 {font: 11.0px Helvetica}

基于 DRBD+Pacemaker+Corosync 实现高可用的 MySQL

:Active/Standby 的架构体系中,永远只有 Active 主机在提供服务,Standby 主机不对外提供任何服务(包括 MySQL 的”读”). 一、环境搭建说明

IP 配置】

node1:

- IP: 192.168.1.30 - HostName: node1

node2:

- IP: 192.168.1.31 - HostName: node2

【虚拟 IP(VIP):

- IP: 192.168.1.99

【网络和服务器设置】 时间同步

# ntpdate cn.pool.ntp.org 【设置 Selinux

可将 SELINUX 设置为 permissive disabled

[root@node2 ~]# cat /etc/sysconfig/selinux

# This file controls the state of SELinux on the system. # SELINUX= can take one of these three values:

# enforcing - SELinux security policy is enforced.

# permissive - SELinux prints warnings instead of enforcing.

# disabled - No SELinux policy is loaded. SELINUX=disabled

# SELINUXTYPE= can take one of these two values:

# targeted - Targeted processes are protected,

# mls - Multi Level Security protection. SELINUXTYPE=targeted

iptables 防火墙设置】 这里为了方便,关闭防火墙 iptables:

# service iptables stop

iptables: Flushing firewall rules:

iptables: Setting chains to policy ACCEPT: filter iptables: Unloading modules:

# chkconfig iptables off

[ OK ]

[ OK ]

[ OK ] :实际环境中不必关闭防火墙,只需要开启相关端口即可(DRBD:7788-7789,CoroSync:3999-4000)

【设置机器 hostname

[root@node2 ~]# cat /etc/sysconfig/network NETWORKING=yes

HOST NAME=node2

[root@node2 ~]# source /etc/sysconfig/network [root@node2 ~]# hostname $HOSTNAME

HOSTS设置】

添加 hostname 到每台机器的/etc/hosts

[root@node2 ~]# cat /etc/hosts

...

192.168.1.30 node1.heyuxuan.com node1 192.168.1.31 node2.heyuxuan.com node2

建议:不使用外部的 DNS 服务(那样会成为额外的故障点),而是将这些 mappings 配置到每台机器的/etc/hosts 文件. 【配置 SSH 互信】

[root@node2 ~]# ssh-keygen -t rsa -b 1024 [root@node2 ~]# ssh-copy-id root@192.168.1.30 [root@node1 ~]# ssh-keygen -t rsa -b 1024 [root@node1 ~]# ssh-copy-id root@192.168.1.31

【升级 Linux 2.6.32 内核】

1.进入 yum 源配置目录 cd /etc/yum.repos.d

2.备份系统自带的 yum mv CentOS-Base.repo CentOS-Base.repo.bak

3.下载 163 网易的 yum 源:wget

4.更改文件名 mv CentOS6-Base-163.repo CentOS-Base.repo

5.首先升级内核

[root@node1 yum.repos.d]# yum -y update kernel T otal download size: 48 M

Downloading Packages:

Setting up and reading Presto delta metadata

updates/prestodelta

Processing delta metadata

Package(s) data still to download: 48 M

(1/4): dracut-004-388.el6.noarch.rpm

(2/4): dracut-kernel-004-388.el6.noarch.rpm

(3/4): kernel-2.6.32-573.22.1.el6.x86_64.rpm

(4/4): kernel-firmware-2.6.32-573.22.1.el6.noarch.rpm 00:28

| 545 kB 00:00

---- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Total

.....

Installed:

kernel.x86_64 0:2.6.32-573.22.1.el6

Dependency Updated: dracut.noarch 0:004-388.el6

873kB/s| 48MB

00:56

dracut-kernel.noarch 0:004-388.el6

kernel-firmware.noarch 0:2.6.32-573.22.1.el6

| 125 kB | 26 kB

| 30 MB

| 18 MB

00:00 00:00

00:26

Complete!

[root@node1 yum.repos.d]# yum install kernel-devel

.....

Running Transaction

Installing : kernel-devel-2.6.32-573.22.1.el6.x86_64 1/1 Verifying : kernel-devel-2.6.32-573.22.1.el6.x86_64 1/1

Installed:

kernel-devel.x86_64 0:2.6.32-573.22.1.el6

Complete!

6.下载安装

[root@node1 yum.repos.d]# rpm -Uvh Retrieving

warning: /var/tmp/rpm-tmp.AP86LX: Header V4 DSA/SHA1 Signature, key ID baadae52: NOKEY Preparing... ########################################### [100%]

1:elrepo-release 7.重启系统

二、DRBD 的安装与配置 #{

########################################### [100%]

============================================================== DRBD 下载与安装】--暂不启用,适用于 rhel-5 或者 centos 5 系列

==============================================================

drbd 共有两部分组成:内核模块和用户空间的管理工具。其中 drbd 内核模块代码已经整合进 Linux 内核 2.6.33 以后的版本中,因此,如果您的 内核版本高于此版本的话,你只需要安装管理工具即可;否则,您需要同时安装内核模块和管理工具两个软件包,并且此两者的版本号一定要保 持对应。

目前在用的 drbd 版本主要有 8.08.2 8.3 三个版本,其对应的 rpm 包的名字分别为 drbd, drbd82 drbd83,对应的内核模块的名字分别为 kmod-drbd, kmod-drbd82 kmod-drbd83。各版本的功能和配置等略有差异;我们实验所用的平台为 x86 且系统为 CentOS6.5,因此只需要同 时安装管理工具。

这里选用最新的 8.3 的版本(drbd83-8.3.8-1.el5.centos.i386.rpm kmod-drbd83-8.3.8-1.el5.centos.i686.rpm); 下载地址为:

实际使用中,需要根据自己的系统平台等下载符合您需要的软件包版本,这里不提供各版本的下载地址。

[root@node2 home]# yum --nogpgcheck localinstall drbd83-8.3.15-2.el5.centos.x86_64.rpm Dependencies Resolved

======================================================================================= ======================================================================================= ===

Package Arch Version Repository Size

======================================================================================= ======================================================================================= ===

Installing:

drbd83 x86_64 8.3.15-2.el5.centos /drbd83-8.3.15-2.el5.centos.x86_64 487 k

Transaction Summary

======================================================================================= ======================================================================================= ===

Install 1 Package(s)

T otal size: 487 k

Installed size: 487 k

Is this ok [y/N]: y Downloading Packages: Running rpm_check_debug Running T ransaction T est

T ransaction T est Succeeded Running Transaction

Installing : drbd83-8.3.15-2.el5.centos.x86_64 1/1 Verifying : drbd83-8.3.15-2.el5.centos.x86_64 1/1

Installed:

drbd83.x86_64 0:8.3.15-2.el5.centos

Complete!

======================================================================================= =================================================================================

}

本机实验环境:CentOS 6.5 X86_64 下采用 yum 源安装的方式:

1.安装 DRBD drbd83-utils kmod-drbd83

[root@node1 yum.repos.d]# yum -y install drbd83-utils kmod-drbd83 Stopping all DRBD resources: .

Verifying : drbd83-utils-8.3.16-1.el6.elrepo.x86_64 Verifying : kmod-drbd83-8.3.16-3.el6.elrepo.x86_64 Verifying : drbd83-8.3.15-2.el5.centos.x86_64

Installed:

drbd83-utils.x86_64 0:8.3.16-1.el6.elrepo

Replaced:

drbd83.x86_64 0:8.3.15-2.el5.centos

Complete!

2.加载 DRBD 模块到内核 modprobe drbd [root@node1 ~]# modprobe drbd

3.检测 DRBD 是否安装成功 lsmod | grep drbd

1/3 2/3

3/3

kmod-drbd83.x86_64 0:8.3.16-3.el6.elrepo

[root@node1 ~]# lsmod | grep drbd drbd 332493 0

#node2.heyuxuan.com 节点操作与上一致!

DRBD 配置】

drbd 的主配置文件为/etc/drbd.conf;为了管理的便捷性,目前通常会将些配置文件分成多个部分,且都保存至/etc/drbd.d 目录中,主配置文件 中仅使用"include"指令将这些配置文件片断整合起来。通常,/etc/drbd.d 目录中的配置文件为 global_common.conf 和所有以.res 结尾的文件。 其中 global_common.conf 中主要定义 global段和 common 段,而每一个.res 的文件用于定义一个资源。

在配置文件中,globa l 段仅能出现一次,且如果所有的配置信息都保存至同一个配置文件中而不分开为多个文件的话,globa l 段必须位于配置文 件的最开始处。目前 global段中可以定义的参数仅有 minor-count, dialog-refresh, disable-ip-verification usage-count

common 段则用于定义被每一个资源默认继承的参数,可以在资源定义中使用的参数都可以在 common 段中定义。实际应用中,common 段并非 必须,但建议将多个资源共享的参数定义为 common 段中的参数以降低配置文件的复杂度。

resource 段则用于定义 drbd 资源,每个资源通常定义在一个单独的位于/etc/drbd.d 目录中的以.res 结尾的文件中。资源在定义时必须为其命名, 名字可以由非空白的 ASCII 字符组成。每一个资源段的定义中至少要包含两个 host 子段,以定义此资源关联至的节点,其它参数均可以从

common 段或 drbd 的默认中进行继承而无须定义。

配置所用参数说明:

RESOURCE: 资源名称

PROTOCOL: 使用协议”C”表示”同步的”,即收到远程的写入确认之后,则认为写入完成.

NET : 两个节点的 SHA1 key 是一样的

after-sb-0pri : Split Brain”发生时且没有数据变更,两节点之间正常连接

after-sb-1pri : 如果有数据变更,则放弃辅设备数据,并且从主设备同步

after-sb-2pri : 如果前面的选择是不可能的,那么断开节点之间连接.这种情况下,要求手动处理”Split-Brainrr-conflict: 假如前面的设置不能应用,并且 drbd 系统有角色冲突的话,系统自动断开节点间连接

DEVICE: 虚拟设备

DISK: 物理磁盘设备

MET A-DISK: Meta data 保存在同一个磁盘(sdb1) ON : 组成集群的节点

下面的操作在 node1.heyuxuan.com 上完成。 1)复制样例配置文件为即将使用的配置文件:

# cp/usr/share/doc/drbd83-8.3.15/drbd.conf/etc/

[root@node1 drbd]# cat /etc/drbd.conf

#Youcanfindanexamplein /usr/share/doc/drbd.../drbd.conf.example

include "drbd.d/global_common.conf"; include "drbd.d/*.res";

2)配置/etc/drbd.d/globa l-common.conf

[root@node1 ~]# cat /etc/drbd.d/global_common.conf global{

usage-count yes;

# minor-count dialog-refresh disable-ip-verification }

common { protocol C;

handlers {

# These are EXAMPLE handlers only.

# They may have severe implications,

# like hard resetting the node under certain circumstances. # Be careful when chosing your poison.

pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq- trigger ; reboot -f";

pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";

local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";

# split-brain "/usr/lib/drbd/notify-split-brain.sh root";

# out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";

# before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k"; # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;

}

startup {

# wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb

}

disk {

# on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes # no-disk-drain no-md-flushes max-bio-bvecs

on-io-error detach;

}

net {

# sndbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers

# max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret # after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork cram-hmac-alg "sha1";

shared-secret "mydrbdlab";

}

syncer {

# rate after al-extents use-rle cpu-mask verify-alg csums-alg

rate 1000M; }

}

3、定义一个资源/etc/drbd.d/mydrbd.res,内容如下:

resource mydrbd { devices /dev/drbd0; disk /dev/sdb1; meta-disk internal; on node1 {

address 192.168.1.30:7789; }

on node2 {

address 192.168.1.31:7789;

} }

以上文件在两个节点上必须相同,因此,可以基于 ssh 将刚才配置的文件全部同步至另外一个节点。 # scp /etc/drbd.* node2:/etc

[root@node1 ~]# scp -r /etc/drbd.* node2:/etc/

drbd.conf 100% 100 0.1KB/s 00:00 global_common.conf 100% 1748 1.7KB/s 00:00 mydrbd.res

【创建 DRBD 资源及文件系统】 给资源(mydrbd)创建 meta data

Node1:

[root@node1 drbd.d]# drbdadm create-md mydrbd

--== Thank you for participating in the global usage survey ==-- The server's response is:

you are the 23215th user to install this version Writing meta data...

initializing activity log

NOT initialized bitmap

New drbd meta data block successfully created. success

Node2:

[root@node2 drbd.d]# drbdadm create-md mydrbd

--== Thank you for participating in the global usage survey ==-- The server's response is:

you are the 23216th user to install this version Writing meta data...

initializing activity log

NOT initialized bitmap

New drbd meta data block successfully created. success

【激活资源】

- 首先确保 drbd module 已经加载 查看是否加载:

# lsmod | grep drbd 若未加载,则需加载:

[root@node1 drbd.d]# modprobe drbd

[root@node1 drbd.d]# lsmod | grep drbd drbd 332493 0

- 启动 drbd 后台进程:

[root@node1 drbd]# drbdadm up mydrbd

[root@node2 drbd]# drbdadm up mydrbd

- 查看 drbd 状态: Node1:

[root@node1 drbd.d]#/etc/init.d/drbd status

drbd driver loaded OK; device status:

version: 8.3.16 (api:88/proto:86-97)

GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37 m:res cs ro ds p mounted fstype

0:mydrbd Connected Secondary/Secondary Inconsistent/Inconsistent C

Node2:

[root@node2 drbd.d]#/etc/init.d/drbd status

drbd driver loaded OK; device status:

version: 8.3.16 (api:88/proto:86-97)

GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37 m:res cs ro ds p mounted fstype

0:mydrbd Connected Secondary/Secondary Inconsistent/Inconsistent C

从上面的信息可以看到,DRBD 服务已经在两台机器上运行,但任何一台机器都不是主机器(primaryhost),因此无法访问到资源(block device).

【开始同步主节点数据】

- 仅在主节点操作(这里为 node1.heyuxuan.com).

[root@node1 drbd]# drbdadm -- --overwrite-data-of-peer primary mydrbd

- 查看同步状态:

[root@node1 drbd.d]# cat /proc/drbd

version: 8.3.16 (api:88/proto:86-97)

GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37

0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----

ns:2914432 nr:0 dw:0 dr:2917060 al:0 bm:177 lo:0 pe:4 ua:15 ap:0 ep:1 wo:f oos:2322912

[==========>.........] sync'ed: 55.7% (2268/5112)M

finish: 0:00:55 speed: 41,592 (41,040) K/sec

[root@node1 drbd.d]# cat /proc/drbd

version: 8.3.16 (api:88/proto:86-97)

GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37

0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----

ns:3002368 nr:0 dw:0 dr:3003076 al:0 bm:183 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:2234592 [==========>.........] sync'ed: 57.4% (2180/5112)M

finish: 0:00:53 speed: 41,812 (41,128) K/sec

#上面的输出结果的一些说明:

cs (connection state): 网络连接状态

ro (roles): 节点的角色(本节点的角色首先显示)

ds (disk states):硬盘的状态

复制协议: A, B or C(本配置是 C)

看到 drbd状态为”cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate”即表示同步结束.

也可以使用【DRBD-OVERVIWER 命令输出:】

[root@node1 drbd.d]# drbd-overview

0:mydrbd SyncSource Primary/Secondary UpToDate/Inconsistent C r-----

[>....................] sync'ed: 4.8% (4872/5112)M

[root@node1 drbd.d]# drbd-overview

0:mydrbd SyncSource Primary/Secondary UpToDate/Inconsistent C r-----

[>...................] sync'ed: 9.9% (4612/5112)M [root@node1 drbd.d]# drbd-overview

0:mydrbd SyncSource Primary/Secondary UpToDate/Inconsistent C r----- [=>..................] sync'ed: 10.8% (4568/5112)M

[root@node1 drbd.d]# drbd-overview

0:mydrbd SyncSource Primary/Secondary UpToDate/Inconsistent C r-----

[=>..................] sync'ed: 12.2% (4496/5112)

#node1 同步完成后输出 [root@node1 drbd.d]# drbd-overview

0:mydrbd ConnectedPrimary/SecondaryUpToDate/UpToDateCr-----

#node2 同步完成后输出

[root@node2 drbd.d]# drbd-overview

0:mydrbd ConnectedSecondary/PrimaryUpToDate/UpToDateCr-----

【创建文件系统】

- 在主节点(Node1)创建文件系统:

[root@node1 drbd]# mkfs -t ext4 /dev/drbd0 mke2fs 1.41.12 (17-May-2010)

Filesystem label=

OS type: Linux

Block size=4096 (log=2)

Fragment size=4096 (log=2)

Stride=0 blocks, Stripe width=0 blocks

519168 inodes, 2074312 blocks

103715 blocks (5.00%) reserved for the super user First data block=0

Maximum filesystem blocks=2126512128

64 block groups

32768 blocks per group, 32768 fragments per group

8112 inodes per group

Superblock backups stored on blocks:

32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Writing inode tables: done

Creating journal (32768 blocks): done

Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 39 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override.

:没必要在辅节点(Node2)做同样的操作,因为 DRBD 会处理原始磁盘数据的同步.

另外,我们也不需要将这个 DRBD 系统挂载到任何一台机器(当然安装 MySQL 的时候需要临时挂载来安装 MySQL),因为集群管理软件会处理.还有

要确保复制的文件系统仅仅挂载在 Active 的主服务器上.

三、安装和配置 MySQL

MySQL 5.6 安装】

创建 mysql 用户组/用户(Node1,Node2)

# groupadd mysql

# useradd -g mysql mysql

【安装 MySQL(Node1,Node2)

# yum -y install gcc-c++ ncurses-devel cmake

# wget -c http://dev.mysql.com/get/Downloads/MySQL-5.6/mysql-5.6.10.tar.gz/from/http://cdn.mysql.com/ # tar zxvf mysql-5.6.10.tar.gz

# cd mysql-5.6.10

# cmake . -DCMAKE_INSTALL_PREFIX=/usr/local/mysql

# make && make install

【创建 DRBD 分区挂载目录】(Node1,Node2)

# mkdir /var/lib/mysql_drbd

# mkdir /var/lib/mysql

# chown mysql:mysql -R /var/lib/mysql_drbd # chown mysql:mysql -R /var/lib/mysql

【初始化 MySQL 数据库】

- 初始化之前先临时挂载 DRBD 文件系统到主节点(Node1)

[root@node1 ~]# mount /dev/drbd0 /var/lib/mysql_drbd/

- 初始化操作(Node1):

[root@node1 mysql]# cd /usr/local/mysql

[root@node1 mysql]# mkdir /var/lib/mysql_drbd/data

[root@node1 mysql]# chown -R mysql:mysql /var/lib/mysql_drbd/data

[root@node1 mysql]# chown -R mysql:mysql .

[root@node1 mysql]# scripts/mysql_install_db --datadir=/var/lib/mysql_drbd/data --user=mysql

- 初始化完成之后:

[root@node1 mysql]# cp support-files/mysql.server /etc/init.d/mysql [root@node1 mysql]# mv support-files/my-default.cnf /etc/my.cnf [root@node1 mysql]# chown mysql /etc/my.cnf

[root@node1 mysql]# chmod 644 /etc/my.cnf

[root@node1 mysql]# chown -R root . [root@node1 mysql]# cd /var/lib/mysql_drbd [root@node1 mysql_drbd]# chmod -R uog+rw * [root@node1 mysql_drbd]# chown -R mysql data

配置 MySQL(Node1):

[root@node1 mysql_drbd]# cat /etc/my.cnf #

# /etc/my.cnf

#

[client]

port socket

[mysqld]

port socket

datadir user #memlock

= 3306

= /var/lib/mysql/mysql.sock

= 3306

= /var/lib/mysql/mysql.sock

= /var/lib/mysql_drbd/data = mysql

= 1

= 3072 = 1024

= 64M = 64M

#table_open_cache #table_definition_cache max_heap_table_size tmp_table_size

# Connections

max_connections max_user_connections max_allowed_packet thread_cache_size

# Buffers

sort_buffer_size join_buffer_size read_buffer_size read_rnd_buffer_size

# Query Cache #query_cache_size

= 505 = 500

= 16M = 32

= 8M = 8M

= 2M

= 16M

= 64M

# InnoDB

#innodb_buffer_pool_size #innodb_data_file_path

#innodb_log_file_size #innodb_log_files_in_group

# MyISAM myisam_recover # Logging

#general-log = 0 #general_log_file

= 1G

= ibdata1:2G:autoextend

= 128M = 2

= backup,force

= /var/lib/mysql/mysql_general.log

log_warnings log_error

= 2

= /var/lib/mysql/mysql_error.log

#slow_query_log #slow_query_log_file #long_query_time #log_queries_not_using_indexes = 1 #min_examined_row_limit = 20

# Binary Log / Replication

server_id

log-bin binlog_cache_size #sync_binlog binlog_format expire_logs_days max_binlog_size

= 1

= mysql-bin

= 1M = 8

= row = 7

= 128M

= 1

= /var/lib/mysql/mysql_slow.log = 0.5

[mysqldump]

quick max_allowed_packet

[mysql] no_auto_rehash [myisamchk]

#key_buffer #sort_buffer_size read_buffer write_buffer

[mysqld_safe]

= 16M

= 512M = 512M

= 8M = 8M

open-files-limit = 8192

pid-file = /var/lib/mysql/mysql.pid

【主节点 Node1 测试 MySQL

[root@node1 mysql_drbd]# /usr/local/mysql/bin/mysqld_safe --user=mysql > /dev/null & [root@node1 mysql_drbd]# mysql -uroot -p

Enter password:

WelcometotheMySQLmonitor. Commandsendwith;or\g.

Your MySQL connection id is 1

Server version: 5.6.10-log Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use test; Database changed mysql> show tables; Empty set (0.10 sec)

mysql> create table tbl (a int); Query OK, 0 rows affected (3.80 sec)

mysql> insert into tbl values (1), (2); Query OK, 2 rows affected (0.25 sec) Records: 2 Duplicates: 0 Warnings: 0

mysql> quit;

Bye

[root@node1 mysql_drbd]# /usr/local/mysql/bin/mysqladmin -uroot -p shutdown Enter password:

[1]+ Done /usr/local/mysql/bin/mysqld_safe --user=mysql > /dev/null

【在节点 Node1 卸载 DRBD 文件系统】 [root@node1 ~]# umount /var/lib/mysql_drbd

[root@node1 ~]# drbdadm secondary mydrbd

【将 DRBD 文件系统挂载节点 Node2

[root@node2 ~]# drbdadm primary mydrbd

[root@node2 ~]# mount /dev/drbd0 /var/lib/mysql_drbd [root@node2 ~]# ll /var/lib/mysql_drbd/

total 20

drwxrwxrwx 5 mysql mysql 4096 Mar 12 09:30 data drwxrw-rw- 2 mysql mysql 16384 Mar 10 07:49 lost+found

【节点 Node2 上配置 MySQL 并测试】

[root@node2 ~]# scp node1:/etc/my.cnf /etc/my.cnf [root@node2 ~]# chown mysql /etc/my.cnf

[root@node2 ~]# chmod 644 /etc/my.cnf

[root@node2 ~]# cd /usr/local/mysql/

[root@node2 mysql]# cp support-files/mysql.server /etc/init.d/mysql [root@node2 mysql]# chown -R root:mysql .

【测试 MySQL:

[root@node2 mysql]# /usr/local/mysql/bin/mysqld_safe --user=mysql > /dev/null & [1] 15864

[root@node2 mysql]# mysql -uroot -p

Enter password:

WelcometotheMySQLmonitor. Commandsendwith;or\g. Your MySQL connection id is 1

Server version: 5.6.10-log Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective

owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use test; Database changed mysql> select * from tbl; +------+

|a|

+------+

|1|

|2|

+------+

2 rows in set (0.26 sec)

mysql> quit

Bye

[root@node2 mysql]# /usr/local/mysql/bin/mysqladmin -uroot -p shutdown

Enter password:

[1]+ Done /usr/local/mysql/bin/mysqld_safe --user=mysql > /dev/null

Node2 上卸载 DRBD 文件系统,交由集群管理软件 Pacemaker 来管理

[root@node2 mysql]# umount /var/lib/mysql_drbd [root@node2 mysql]# drbdadm secondary mydrbd [root@node2 mysql]# drbd-overview

0:mydrbd/0 Connected Secondary/Secondary UpToDate/UpToDate C r----- [root@node2 mysql]#

四、Corosync Pacemaker

【适用于 CentOS 6.5 X86_64 位系统安装】 [root@node1 corosync]# yum install corosync

[root@node1 corosync]# yum install pacemaker

##pacemaker 需要依赖 heartbeat 使用 letter_h.group.html

##需要自行下载 heartbeat-3.0.4-2.el6.x86_64 以及 heartbeat-libs-3.0.4-2.el6.x86_64 进行安装

[root@node1 ~]# yum -y --nogpgcheck localinstall heartbeat-3.0.4-2.el6.x86_64 heartbeat-libs-3.0.4-2.el6.x86_64

4crmsh 的安装及使用简介 1.Pacemaker 配置资源方法

(1).命令配置方式 crmshpcs(2).图形配置方式 pyguihawkLCMCpcs

#注:本文主要的讲解的是 crmsh

2.安装 crmsh

RHEL 6.4 起不再提供集群的命令行配置工具 crmsh,转而使用 pcs;如果习惯了使用 crm 命令,可下载相关的程序包自行安装即可。crmsh 依 赖于 pssh,因此需要一并下载。

lustering:/Stable/CentOS_CentOS-6/x86_64/

python-pssh-2.3.1-4.2.x86_64 pssh-2.3.1-4.2.x86_64

l_6/com/crmsh-1.2.6-0.rc2.2.1.x86_64.rpm crmsh-1.2.6-0.rc2.2.1.x86_64

[root@node1 ~]# yum -y --nogpgcheck localinstall crmsh-1.2.6-0.rc2.2.1.x86_64.rpm pssh-2.3.1-4.2.x86_64.rpm python-pssh-2.3.1- 4.2.x86_64.rpm

##实验用到的 crmsh 以及 pssh 安装包的版本,到此为止,crmsh 安装完毕

【配置 corosync】(以下命令在 node1.heyuxuan.com 上执行) # cd /etc/corosync

# cp corosync.conf.example corosync.conf

接着编辑 corosync.conf,添加如下内容:

service {

ver: 0

name: pacemaker # use_mgmtd: yes

}

aisexec {

user: root

group: root }

并设定此配置文件中 bindnetaddr 后面的 IP 地址为你的网卡所在网络的网络地址,我们这里的两个节点在 192.168.1.0 网络,因此这里将其设定 为 192.168.1.0;如下

bindnetaddr: 192.168.1.0

【生成节点间通信时用到的认证密钥文件:】

# corosync-keygen

[root@node1corosync]#corosync-keygen

Corosy ncClusterEng ineAuthent icatio nkey generator . Gathering1024bitsforkeyfrom/dev/random. Presskeysonyourkeyboardtogenerateentropy . Presskeysonyourkeyboardtogenerateentropy(bits=192).

#注:corosync 生成 key 文件会默认调用/dev/random 随机数设备,一旦系统中断的 IRQS 的随机数不够用,将会产生大量的等待时间;

corosync authkey 复制至 node2:

# scp -p corosync authkey node2:/etc/corosync/

分别为两个节点创建 corosync 生成的日志所在的目录: # mkdir /var/log/cluster

# ssh node2 'mkdir /var/log/cluster'

【尝试启动】(以下命令在 node1 上执行): # /etc/init.d/corosync start

查看 corosync 引擎是否正常启动:

# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log

Apr 15 23:21:06 corosync [MAIN ] Corosync Cluster Engine ('1.4.1'): started and ready to provide service. Apr 15 23:21:06 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.

查看初始化成员节点通知是否正常发出:

# grep TOTEM /var/log/cluster/corosync.log

Apr 15 23:21:06 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).

Apr 15 23:21:06 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Apr 15 23:21:06 corosync [TOTEM ] The network interface [192.168.1.30] is now up.

Apr 15 23:21:06 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Apr 15 23:21:20 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.

检查启动过程中是否有错误产生:

# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources

Apr 16 00:10:03 corosync [pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.

Apr 16 00:10:03 corosync [pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of 'Clusters from Scratch' () for details on using Pacemaker with CMAN

#上面的错误信息表示 packmaker 不久之后将不再作为 corosync 的插件运行,因此,建议使用 cman 作为集群基础架构服务;此处可安全忽略。

查看 pacemaker 是否正常启动:

# grep pcmk_startup /var/log/cluster/corosync.log

Apr 15 23:21:06 corosync [pcmk ] info: pcmk_startup: CRM: Initialized

Apr 15 23:21:06 corosync [pcmk ] Logging: Initialized pcmk_startup

Apr 15 23:21:06 corosync [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615 Apr 15 23:21:06 corosync [pcmk ] info: pcmk_startup: Service: 9

Apr 15 23:21:06 corosync [pcmk ] info: pcmk_startup: Local hostname: node1

如果上面命令执行均没有问题,接着可以执行如下命令启动 node2 上的 corosync # ssh node2 -- /etc/init.d/corosync start

注意:启动 node2 需要在 node1 上使用如上命令进行,不要在 node2 节点上直接启动;

使用如下命令查看集群节点的启动状态:

# crm status

============

[root@node1 ~]# crm status

Last updated: Sat Apr 16 00:48:37 2016

Last change: Sat Apr 16 00:10:04 2016 via crmd on node2 Stack: classic openais (with plugin)

Current DC: node1 - partition with quorum

Version: 1.1.10-14.el6-368c726

2 Nodes configured, 2 expected votes

0 Resources configured

Online: [ node1 node2 ]

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

# crm_mon =============================

Last updated: Fri Apr 15 23:30:39 2016

Last change: Fri Apr 15 23:21:34 2016 via crmd on node2 Stack: classic openais (with plugin)

Current DC: node1 - partition with quorum

Version: 1.1.10-14.el6-368c726

2 Nodes configured, 2 expected votes

0 Resources configured

Online: [ node1 node2 ]

#注意:如果想要知道一个命令的具体的使用方式,可在命令的前面加上 help 关键字 crm(live)node# help status

Show nodes' status as XML. If the node parameter is omitted then all nodes are shown.

Usage: ...............

status [] ...............

从上面的信息可以看出两个节点都已经正常启动,并且集群已经处于正常工作状态。 执行 ps auxf 命令可以查看 corosync 启动的各相关进程。

189 root root 189

71694 0.5 1.0 94012 10108 ? 71695 0.0 0.3 94380 3480 ? 71696 0.0 0.2 76088 2628 ?

71697 0.0 0.2 89628 2916 ?

S 22:44 0:06 /usr/libexec/pacemaker/cib

S 22:44 0:00 /usr/libexec/pacemaker/stonithd S 22:44 0:00 /usr/libexec/pacemaker/lrmd

S 22:44 0:00 /usr/libexec/pacemaker/attrd

189 189 root

71698 0.0 1.8 117276 18368 ? S 22:44 0:00 /usr/libexec/pacemaker/pengine 71699 0.0 0.5 147784 5972 ? S 22:44 0:00 /usr/libexec/pacemaker/crmd

71736 0.0 1.5 228840 15900 pts/0 S+ 22:46 0:00 /usr/bin/python /usr/sbin/crm

6、配置集群的工作属性,禁用 stonith

corosync 默认启用了 stonith,而当前集群并没有相应的 stonith 设备,因此此默认配置目前尚不可用,这可以通过如下命令验正:

# crm_verify -L

crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined

crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity Errors found during check: config not valid

-V may provide more details

我们里可以通过如下命令先禁用 stonith:

# crm configure property stonith-enabled=false

使用如下命令查看当前的配置信息:

[root@node1 ~]# crm configure show node node1

node node2

property $id="cib-bootstrap-options" \

dc-version="1.1.10-14.el6-368c726" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false"

从中可以看出 stonith 已经被禁用。

上面的 crmcrm_verify 命令是 1.0 后的版本的 pacemaker 提供的基于命令行的集群管理工具;可以在集群中的任何一个节点上执行。

资源配置

配置资源及约束

配置默认属性 查看已存在的配置:

[root@node1 ~]# crm configure show

node node1

node node2

property $id="cib-bootstrap-options" expected-quorum-votes="2"

检验配置是否正确:

dc-version="1.1.8-7.el6-394e906"

cluster-infrastructure="classic openais (with plugin)"

[root@node1 ~]# crm_verify -L -V

error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined

error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option

error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity Errors found during check: config not valid

-V may provide more details 禁止 STONITH 错误:

[root@node1 ~]# crm configure property stonith-enabled=false [root@node1 ~]# crm_verify -L

让集群忽略 Quorum:

[root@node1 ~]# crm configure property no-quorum-policy=ignore

防止资源在恢复之后移动:

[root@node1 ~]# crm configure rsc_defaults resource-stickiness=100

设置操作的默认超时:

[root@node1 www]# crm configure property default-action-timeout=180s

设置默认的启动失败是否为致命的:

[root@node1 www]# crm configure property start-failure-is-fatal=false

配置 DRBD 资源

- 配置之前先停止 DRBD:

[root@node1 ~]# /etc/init.d/drbd stop [root@node2 ~]# /etc/init.d/drbd stop

【配置 DRBD 资源:- drbd 定义成资源

[root@node1 ~]# crm configure primitive p_drbd_mysql ocf:linbit:drbd params drbd_resource=mydrbd op monitor role=Master interval=15s op start timeout=240s op stop timeout=100s

或者

crm(live)configure# primitive p_drbd_mysql ocf:linbit:drbd params drbd_resource=mydrbd op monitor role=Master interval=15s op start timeout=240s opstoptimeout=100s

- 配置 DRBD 资源主从关系(定义只有一个 Master 节点):

[root@node1 ~]# crm configure ms ms_drbd_mysql p_drbd_mysql meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1

notify=true

或者

crm(live)configure# ms ms_drbd_mysql p_drbd_mysql meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

- 配置文件系统资源,定义挂载点(mount point):

[root@node1~]#crmconfigureprimitivep_fs_mysqlocf:heartbeat:Filesystem paramsdevice=/dev/drbd0directory=/var/lib/mysql_drbd/

fstype=ext4

或者

crm(live)configure# primitive p_fs_mysql ocf:heartbeat:Filesystem params device=/dev/drbd0 directory=/var/lib/mysql_drbd/ fstype=ext4

【配置 VIP 资源】

[root@node1 ~]# crm configure primitive primitive p_ip_mysql ocf:heartbeat:IPaddr params ip=192.168.1.99 cidr_netmask=24 op monitor

interval=30s

或者

crm(live)configure# primitive p_ip_mysql ocf:heartbeat:IPaddr params ip=192.168.1.99 cidr_netmask=24 op monitor interval=30s

【配置 MySQL 资源】 使用 LSB 方式(本文使用):

crm(live)configure# primitive p_mysql lsb:mysql op monitor interval=20s timeout=30s op start interval=0 timeout=180s op stop interval=0 timeout=240s

或使用 OCF 方式:

crm(live)configure# primitive p_mysql ocf:heartbeat:mysql params binary=/usr/local/mysql/bin/mysqld_safe config=/etc/my.cnf user=mysql group=mysqllog=/var/lib/mysql/mysql_error.log pid=/var/lib/mysql/mysql.pidsocket=/var/lib/mysql/mysql.sock datadir=/var/lib/mysql_drbd/data opmonitorinterval=60stimeout=60s opstarttimeout=180sopstoptimeout=240s

【组资源和约束】

通过”组”确保 DRBD,MySQL VIP 是在同一个节点(Master)并且确定资源的启动/停止顺序. 启动: p_fs_mysql>p_ip_mysql->p_mysql

停止: p_mysql>p_ip_mysql>p_fs_mysql

crm(live)configure# group g_mysql p_fs_mysql p_ip_mysql p_mysql

group_mysql 永远只在 Master 节点:

crm(live)configure# colocation c_mysql_on_drbd inf: g_mysql ms_drbd_mysql:Master

MySQL 的启动永远是在 DRBD Master 之后:

crm(live)configure# order o_drbd_before_mysql inf: ms_drbd_mysql:promote g_mysql:start

配置检查和提交

crm(live)configure# verify

crm(live)configure# commit crm(live)configure# quit

【查看集群状态和 failover 测试】 - 状态查看:

[root@node1 mysql]# crm_mon -1r

Last updated: Wed Mar 13 11:24:44 2013

Last change: Wed Mar 13 11:24:04 2013 via crm_attribute on node2 Stack: classic openais (with plugin)

Current DC: node1 - partition with quorum

Version: 1.1.8-7.el6-394e906

2 Nodes configured, 2 expected votes

5 Resources configured.

Online: [ node1 node2 ]

Full list of resources:

Master/Slave Set: ms_drbd_mysql [p_drbd_mysql] Masters: [ node1 ]

Slaves: [ node2 ]

Resource Group: g_mysql p_fs_mysql(ocf::heartbeat:Filesystem): Started node1 p_ip_mysql (ocf::heartbeat:IPaddr2): Started node1 p_mysql (lsb:mysql): Started node1

Failover 测试:

Node1 设置为 Standby 状态

[root@node1 ~]# crm node standby 过几分钟查看集群状态(若切换成功,则看到如下状态):

[root@node1 ~]# crm status

Last updated: Wed Mar 13 11:29:41 2013

Last change: Wed Mar 13 11:26:46 2013 via crm_attribute on node1 Stack: classic openais (with plugin)

Current DC: node1 - partition with quorum

Version: 1.1.8-7.el6-394e906

2 Nodes configured, 2 expected votes

5 Resources configured.

Node node1: standby Online: [ node2 ]

Master/Slave Set: ms_drbd_mysql [p_drbd_mysql] Masters: [ node2 ]

Stopped: [ p_drbd_mysql:1 ]

Resource Group: g_mysql p_fs_mysql(ocf::heartbeat:Filesystem): Started node2 p_ip_mysql (ocf::heartbeat:IPaddr2): Started node2

p_mysql (lsb:mysql): Started node2 Node1(node1)恢复 online 状态:

[root@node1 mysql]# crm node online

[root@node1 mysql]# crm status

Last updated: Wed Mar 13 11:32:49 2013

Last change: Wed Mar 13 11:31:23 2013 via crm_attribute on node1 Stack: classic openais (with plugin)

Current DC: node1 - partition with quorum Version: 1.1.8-7.el6-394e906

2 Nodes configured, 2 expected votes

5 Resources configured.

Online: [ node1 node2 ]

Master/Slave Set: ms_drbd_mysql [p_drbd_mysql] Masters: [ node2 ]

Slaves: [ node1 ]

Resource Group: g_mysql p_fs_mysql(ocf::heartbeat:Filesystem): Started node2 p_ip_mysql (ocf::heartbeat:IPaddr2): Started node2 p_mysql (lsb:mysql): Started node2

“断网”即停止 Master 服务 避免因”断网”而发生”split brain(“裂脑”)

利用 Pacemaker ping 一个独立的网络(比如网络路由),当发现主机网络断网(被隔离)的时候,即阻止该主机为 DRBD master. [root@node1 ~]# crm configure

crm(live)configure# primitive p_ping ocf:pacemaker:ping params name="ping" > multiplier="1000" host_list="192.168.1.1" op monitor interval="15s" timeout="60s" > start timeout="60s"

由于两台主机需要运行 ping 去检查他们的网络连接,需要创建一个 clone (cl_ping),ping 资源可以运行在集群所有的主机上.

crm(live)configure# clone cl_ping p_ping meta interleave="true" 告诉 Pacemaker 如何处理 ping 的结果:

crm(live)configure# location l_drbd_master_on_ping ms_drbd_mysql rule $role="Master" > -inf: not_defined ping or ping number:lte 0 上面的例子表示:当主机没有 ping 的服务或是无法 ping 通至少一个节点的时候,就为该主机设置一个偏好分数(preference score)为负无穷大 (-inf),

从而让 location 约束(l_drbd_master_on_ping)控制 DRBD master 的资源地址. 验证和提交配置:

crm(live)configure# verify

WARNING: p_drbd_mysql: action monitor not advertised in meta-data, it may not be supported by the RA crm(live)configure# commit

crm(live)configure# quit

检查 ping 服务是否已经在运行:

[root@node1 ~]# crm_mon -1

Last updated: Thu Mar 14 01:02:14 2013

Last change: Thu Mar 14 01:01:20 2013 via cibadmin on node1 Stack: classic openais (with plugin)

Current DC: node1 - partition with quorum Version: 1.1.8-7.el6-394e906

2 Nodes configured, 2 expected votes

7 Resources configured.

Online: [ node1 node2 ]

Master/Slave Set: ms_drbd_mysql [p_drbd_mysql] Masters: [ node2 ]

Slaves: [ node1 ]

Resource Group: g_mysql p_fs_mysql(ocf::heartbeat:Filesystem): Started node2 p_ip_mysql (ocf::heartbeat:IPaddr2): Started node2 p_mysql (lsb:mysql): Started node2

Clone Set: cl_ping [p_ping] Started: [ node1 node2 ]

断网测试

- 在当前 Master 停止网络服务:

[root@node2 ~]# service network stop [root@node1 ~]# crm resource status

Master/Slave Set: ms_drbd_mysql [p_drbd_mysql] Slaves: [ node1 ]

Stopped: [ p_drbd_mysql:1 ]

Resource Group: g_mysql p_fs_mysql(ocf::heartbeat:Filesystem): Stopped p_ip_mysql (ocf::heartbeat:IPaddr2): Stopped p_mysql (lsb:mysql): Stopped

Clone Set: cl_ping [p_ping] Started: [ node1 ] Stopped: [ p_ping:1 ]

- 恢复 Master 的网络服务:

[root@node2 ~]# service network stop [root@node1 ~]# crm resource status

Master/Slave Set: ms_drbd_mysql [p_drbd_mysql] Masters: [ node2 ]

Slaves: [ node1 ]

Resource Group: g_mysql p_fs_mysql(ocf::heartbeat:Filesystem): Started p_ip_mysql (ocf::heartbeat:IPaddr2): Started p_mysql (lsb:mysql): Started

Clone Set: cl_ping [p_ping] Started: [ node1 node2 ]

[root@node1 ~]# crm status

Last updated: Thu Mar 14 01:09:51 2013

Last change: Thu Mar 14 01:09:49 2013 via crmd on node1 Stack: classic openais (with plugin)

Current DC: node2 - partition with quorum

Version: 1.1.8-7.el6-394e906

2 Nodes configured, 2 expected votes 7 Resources configured.

Online: [ node1 node2 ]

Master/Slave Set: ms_drbd_mysql [p_drbd_mysql] Masters: [ node2 ]

Slaves: [ node1 ]

Resource Group: g_mysql p_fs_mysql(ocf::heartbeat:Filesystem): Started node2 p_ip_mysql (ocf::heartbeat:IPaddr2): Started node2 p_mysql (lsb:mysql): Started node2

Clone Set: cl_ping [p_ping] Started: [ node1 node2 ]

系统启动项设置

系统启动选项设置 由于 DRBD,MySQL

[root@node1 ~]# [root@node1 ~]# [root@node1 ~]# [root@node1 ~]# [root@node2 ~]# [root@node2 ~]# [root@node2 ~]# [root@node2 ~]#

等服务已经交由 Pacemaker 来管理,需要将他们的系统自启动选项关掉,同时确保 CoroSync Pacemaker 随着系统启动.

chkconfig drbd off chkconfig mysql off chkconfig corosync on chkconfig pacemaker on chkconfig drbd off chkconfig mysql off chkconfig corosync on chkconfig pacemaker on

手动解决”Split -Bra in

- 从”Split-Brain”中恢复

DRBD Active/Standby 架构设计的两主机的数据因为某些原因也可能发生不一致.假如这种情况发生的话,DRBD 两主机之间将会中断连接(可以 通过/etc/init.d/drbd status drbd-overview 查看他们的关系状态).如果查看日志(/var/log/messages)确定造成 DRBD 连接中断的原因是”Split- Brain”的话,那么就需要找出/确定拥有正确的数据的主机,然后让 DRBD 重新同步数据.

- 查看 DRBD 主机状态及查看日志:

[root@node1 ~]# cat /proc/drbd

version: 8.4.2 (api:1/proto:86-101)

GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by dag@Build64R6, 2012-09-06 08:16:10

0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----- ns:32948 nr:0 dw:4 dr:34009 al:1 bm:9 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

[root@node1 ~]# cat /var/log/messages | grep Split-Brain

Mar 14 21:11:48 node1 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection! [root@node2 drbd.d]# cat /proc/drbd

version: 8.4.2 (api:1/proto:86-101)

GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by dag@Build64R6, 2012-09-06 08:16:10

0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----

ns:0 nr:32948 dw:32948 dr:0 al:0 bm:9 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 - 手动解决”Split-Brain:

这里找到的”好数据”数据的主机为 node1.出现”坏数据”的主机为 node2. 在”坏数据”主机 node2 :

[root@node2 ~]# drbdadm disconnect mydrbd

[root@node2 ~]# cat /proc/drbd

version: 8.4.2 (api:1/proto:86-101)

GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by dag@Build64R6, 2012-09-06 08:16:10

0: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown r-----

ns:0 nr:32948 dw:32948 dr:0 al:0 bm:9 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

[root@node2 ~]# drbdadm secondary mydrbd

[root@node2 ~]# drbdadm -- --discard-my-data connect mydrbd 在”好数据”的主机 node1 (如果下面的 cs:状态为 WFConnection,则无需下面操作.)

[root@node1 ~]# cat /proc/drbd

version: 8.4.2 (api:1/proto:86-101)

GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by dag@Build64R6, 2012-09-06 08:16:10

0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----- ns:32948 nr:0 dw:4 dr:34009 al:1 bm:9 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

[root@node1 ~]# drbdadm connect mydrbd

[root@node1 ~]# /etc/init.d/drbd status

drbd driver loaded OK; device status:

version: 8.4.2 (api:1/proto:86-101)

GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by dag@Build64R6, 2012-09-06 08:16:10 m:res cs ro ds p mounted fstype

0:mydrbd Connected Primary/Secondary UpToDate/UpToDate C /var/lib/mysql_drbd ext4 Posted in DRBD, Use Cases.

Comments are closed.

阅读(5625) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~