mha+keepalived安装配置-大镇-ChinaUnix博客

大镇的博客

首页　| 　博文目录　| 　关于我

大镇

博客访问： 2279867
博文数量： 297
博客积分： 0
博客等级：民兵
技术积分： 2200
用户组：普通用户
注册时间： 2014-03-31 14:30

个人简介

自己慢慢积累。

文章分类

全部博文（297）

appium（1）
flask（1）
docker（0）
js（1）
docker（4）
excel（1）
Bootstrap（3）
Django（21）
性能测试（1）
移动端测试（3）
python（68）

pytest（1）

进程的使用（2）

数据分析（2）

tkinter（3）
selenium（19）
分布式数据库（1）
WINDOWS（5）
TokuDB（7）
TokuDB（3）
安全测试（4）
JMeter（10）
oracle（2）
goldengate（10）
测试理论（2）
mysql（71）

数据恢复（2）

mysql学习（3）

MYSQL优化（14）

mysql管理（10）

高可用（4）

数据导出（3）

性能测试（2）

表分区（4）

触发器（0）

一机多Mysql（3）

查询技巧（7）

主从互备（9）
linux（59）

分布式存储（1）

定时任务（2）

环境相关（4）

FTP（1）

linux管理（4）

keepalived（3）

awk（3）

磁盘性能（1）

shell编程（25）

虚拟机（2）

磁盘挂载（11）
未分配的博文（0）

相关博文

mha+keepalived安装配置

分类： Mysql/postgreSQL

2015-06-03 17:05:14

转自：

相关文档:
首先,放上mha的项目地址,这里可以获取到最新最正宗的mha信息和文档。
MHA项目地址:

keepalived权威指南

%20document.pdf

mha配置参数列表(中文版)

MHA切换日志分析

一、MHA的简单介绍
MHA是由perl语言编写的,用外挂脚本的方式实现mysql主从复制的高可用性。
MHA可以自动检测mysql是否宕机,如果宕机,在10-30s内完成new master的选举,应用所有差异的binlog日志到所有slave,将所有的slave切换到新的master上来。
MHA除了自动检测mysql是否宕机,还能够交互式的切换master,在日常的数据库维护中,这个功能还是挺有用的。
由于MHA本身只负责数据库主从的切换,但是应用程序并不知道数据库的master变了。针对这种情况,可以使用MHA预留的几个脚本接口,通过虚拟IP或者修改全局配置文件的方法通知应用程序,master数据库已经改变。
MHA还是一个很活跃的项目,生产环境的使用者众多,不乏大公司,MHA的版本也很快,MHA作者在持续更新版本,最新版本已经支持GTID了。

二、MHA的原理
MHA的架构如下:
Mysql master1(MHA manger,MHA node)
|
____|____
| |
Mysql slave1(node) Mysql slave2(node)

首先介绍一下架构,上面这个图很挫,大家见谅哈,看下面文字。
MHA只支持两层的mysql复制架构,如上图,Mysql slave1下面还有slave的话,那么下面的slave属于第三层了,MHA是没法控制的。
在每个mysql的服务器上,都需要安装一个MHA的node节点。
全局一个MHA manger,manger节点需要通过配置文件中的账号访问到每个节点的Mysql,和ssh(非交互式)到每个节点的操作系统。所以这里就需要通过ssh key来完成。

MHA manage节点包含这几个程序:
masterha_manager (监控master,如果master down,自动完成failover)
masterha_master_switch (手动或者交互的完成failover或者master切换)
masterha_master_switch –conf=/etc/app1.cnf –master_state=dead –dead_master_host=192.168.153.150
masterha_master_switch –conf=/etc/app1.cnf –master_state=alive –new_master_host=192.168.153.151
masterha_check_status(检查masterha_manager是否运行)
masterha_check_repl(检查master复制环境是否正确)
masterha_stop(停止MHA)
masterha_conf_host
masterha_ssh_check (检查通过ssh是否可以登录对应的node节点)
purge_relay_logs (删除无用的relay log,避免延时)
masterha_secondary_check(通过其他路由去检测master是否真的挂了)
masterha_secondary_check -s 192.168.153.151 -s 192.168.153.152 –user=root –master_host=localhost –master_ip=192.168.153.150 –master_port=3306
Master is reachable from 192.168.153.151!

MHA node节点包含着四个程序:
save_binary_logs(保存和复制当掉的主服务器二进制日志)
apply_diff_relay_logs(识别差异的relay log事件，并应用于其他salve服务器)
purge_relay_logs(清除relay log文件)
filter_mysqlbinlog(这个脚本现在已经废弃了)
需要在所有mysql服务器上安装MHA节点，MHA管理服务器也需要安装。MHA管理节点模块内部依赖MHA节点模块。MHA管理节点通过ssh连接管理mysql服务器和执行MHA节点脚本。

MHA的failover流程:
#启动前的准备工作
#检查数据库服务器状态,获取相关参数设置
#测试ssh连接是否成功
#测试MHA node是否可用
#创建MHA日志目录
#开始检查slave的差异日志应用权限
#确定当前的复制架构
#调试master_ip_failover_script
#调试shutdown_script
#设置二次检查的主机masterha_secondary_check
#MHA启动完毕,进入监测状态
#监测DB1服务器挂了
#通过定义的二次监测,确认master是否挂了
#确认master挂了,开始进入failover流程
#再试尝试连接master和master的ssh
#通过MHA配置文件,监测其他slave的状态
#再次监测slave的配置是否有变化,是否符合failover条件
#正式开始failover
#再次对slave配置做检查
#对原Master做master_ip_failover_script和shutdown_script的操作
#开始差异日志的恢复:获取slave最后得到的binlog位置
#获取原master的binlog日志
#确定新的master
#在new master上应用差异的binlog日志
#获取new master的binlog位置。
#如果有master_ip_failover_script,那么给new master设置VIP
#开始恢复其他slave,也是从原master的binlog对比来做恢复
#差异日志应用完成以后,切换所有slave到new master。
#failover操作完成,生成failover报告

三、安装配置
环境设定:
主机角色 IP 安装软件
db1 MASTER 192.168.153.150 mysql,mha manger,mha node,keepalived
db2 SLAVE1 192.168.153.151 mysql,mha node,keepalived(候选MASTER)
db3 SLAVE2 192.168.153.152 mysql,mha node
VIP(virtual ip):192.168.153.100

大概的安装流程:
1、关闭selinux和iptables
2、安装开发库和基础库,以及相关的开发工具,perl库
3、配置ssh的公匙,免密码登录
4、安装配置mysql数据库,并且授权
5、安装mha node
6、安装mha manager
7、修改mha配置文件
8、测试mha切换
9、安装,配置,测试keepalived
10、将mha和keepalived结合,加上相关脚本,联合调试。

1、cat /etc/sysconfig/selinux
设置SELINUX=disabled
2、
#iptables -F INPUT
#service iptables save
#iptables -xvnL //查看没有任何规则为准,如果你真需要iptables规则,建议再安装调试玩MHA以后,在设置规则,再调试一次规则是否对MHA有影响。

3、配置ssh key免密码登录.
db1:
#ssh-keygen -t rsa //一路回车
#ssh-copy-id -i .ssh/id_rsa.pub root@192.168.153.150
#ssh-copy-id -i .ssh/id_rsa.pub root@192.168.153.151
#ssh-copy-id -i .ssh/id_rsa.pub root@192.168.153.152

db2:
#ssh-keygen -t rsa //一路回车
#ssh-copy-id -i .ssh/id_rsa.pub root@192.168.153.150
#ssh-copy-id -i .ssh/id_rsa.pub root@192.168.153.152

db2:
#ssh-keygen -t rsa //一路回车
#ssh-copy-id -i .ssh/id_rsa.pub root@192.168.153.150
#ssh-copy-id -i .ssh/id_rsa.pub root@192.168.153.151

4、安装mysql数据库,并授权
所有机器上安装mysql server,修改配置文件,完成三台机器主从复制的搭建,由于这块描述起来挺多的,大家可以参考我写的安装mysql-mmm的资料.
《mysql-mmm安装手册》

5、安装mha node,所有机器上都需要安装
在下载最新的rpm包或者源码包安装，我使用的rpm包,如果包缺乏依赖关系,使用yum安装对应的包就可以。
#rpm -ivh mha4mysql-node-0.54-0.el6.noarch.rpm

6、在db1上安装MHA manager软件。
开始我也尝试用rpm包安装,但是遇见两个兼容行问题,我的yum库没有对应的包,使用CPAN安装以后,rpm包不能识别,转而使用了源码编译.总的来说MHA的软件包还是比较好安装的。
#tar -zxvf mha4mysql-manager-0.55.tar.gz
#cd mha4mysql-manager-0.55
#ls
#perl Makefile.PL
#make install

7、修改配置文件,配置文件只需要mha_manager机器上存在就行了.
默认的配置文件模板在源码包里面有，具体位置如下.
/root/soft/mha4mysql-manager-0.55/samples/conf,有app1.cnf和masterha_default.cnf两个配置文件。masterha_manager会同时读取这两个配置文件。
app1.cnf主要是存放node节点的配置,masterha_default.cnf主要存放服务器端的配置.但是通常的处理方式是不用masterha_default.cnf,而是把这个文件里面的配置写入到app1.cnf里面。
我的app1.cnf配置如下:

# cat /etc/app1.cnf
[server default]
user=mha		//mha用来获取数据库一些配置和状态的用户
password=mha
ssh_user=root	//ssh key的用户
repl_user=slave		//mysql复制使用的账号和密码
repl_password=slave
manager_workdir=/var/log/masterha/app1		//mha状态和日志,差异日志保存的目录
manager_log=/var/log/masterha/app1/manager.log	//mha日志
remote_workdir=/var/log/masterha/app1		//node节点的工作目录
secondary_check_script="masterha_secondary_check -s 192.168.153.151 -s 192.168.153.152"	//二次检查的配置.意思是manager将连接到192.168.153.151和152的系统上,测试master是否可用,避免脑裂问题.
//下面几个脚本控制稍后来讲，我们现在先不启用他们。
#master_ip_failover_script="/opt/master_ip_failover.sh"	//failover的控制Vip的脚本
#master_ip_online_change_script=""	//交互式出发的在线切换时调用的脚本
#shutdown_script="/opt/master_ip_failover.sh"	//关机脚本
#report_script=""		//通知脚本

//下面是每个node节点的单独配置
[server1]
hostname=192.168.153.150
candidate_master=1

[server2]
hostname=192.168.153.151
candidate_master=1

[server3]
hostname=192.168.153.152
no_master=1

8、测试mha
首先两个小测试:
#测试ssh key是否可用

# masterha_check_ssh --conf=/etc/app1.cnf
Sun Sep 28 14:39:57 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Sep 28 14:39:57 2014 - [info] Reading application default configurations from /etc/app1.cnf..
Sun Sep 28 14:39:57 2014 - [info] Reading server configurations from /etc/app1.cnf..
Sun Sep 28 14:39:57 2014 - [info] Starting SSH connection tests..
Sun Sep 28 14:40:02 2014 - [debug]
Sun Sep 28 14:39:58 2014 - [debug]  Connecting via SSH from root@192.168.153.150(192.168.153.150:22) to root@192.168.153.151(192.168.153.151:22)..
Sun Sep 28 14:40:01 2014 - [debug]   ok.
Sun Sep 28 14:40:01 2014 - [debug]  Connecting via SSH from root@192.168.153.150(192.168.153.150:22) to root@192.168.153.152(192.168.153.152:22)..
Sun Sep 28 14:40:02 2014 - [debug]   ok.
Sun Sep 28 14:40:02 2014 - [debug]
Sun Sep 28 14:39:58 2014 - [debug]  Connecting via SSH from root@192.168.153.151(192.168.153.151:22) to root@192.168.153.150(192.168.153.150:22)..
Sun Sep 28 14:40:01 2014 - [debug]   ok.
Sun Sep 28 14:40:01 2014 - [debug]  Connecting via SSH from root@192.168.153.151(192.168.153.151:22) to root@192.168.153.152(192.168.153.152:22)..
Sun Sep 28 14:40:02 2014 - [debug]   ok.
Sun Sep 28 14:40:03 2014 - [debug]
Sun Sep 28 14:39:59 2014 - [debug]  Connecting via SSH from root@192.168.153.152(192.168.153.152:22) to root@192.168.153.150(192.168.153.150:22)..
Sun Sep 28 14:40:02 2014 - [debug]   ok.
Sun Sep 28 14:40:02 2014 - [debug]  Connecting via SSH from root@192.168.153.152(192.168.153.152:22) to root@192.168.153.151(192.168.153.151:22)..
Sun Sep 28 14:40:03 2014 - [debug]   ok.
Sun Sep 28 14:40:03 2014 - [info] All SSH connection tests passed successfully.

测试复制环境

# masterha_check_repl --conf=/etc/app1.cnf
Sun Sep 28 14:40:43 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Sep 28 14:40:43 2014 - [info] Reading application default configurations from /etc/app1.cnf..
Sun Sep 28 14:40:43 2014 - [info] Reading server configurations from /etc/app1.cnf..
Sun Sep 28 14:40:43 2014 - [info] MHA::MasterMonitor version 0.55.
Sun Sep 28 14:40:53 2014 - [info] Checking replication health on 192.168.153.151..

省略若干行..............................
Sun Sep 28 14:40:53 2014 - [info]  ok.
Sun Sep 28 14:40:53 2014 - [info] Checking replication health on 192.168.153.152..
Sun Sep 28 14:40:53 2014 - [info]  ok.
Sun Sep 28 14:40:53 2014 - [warning] master_ip_failover_script is not defined.
Sun Sep 28 14:40:53 2014 - [info] Checking shutdown script status:
Sun Sep 28 14:40:53 2014 - [info]   /opt/master_ip_failover.sh --command=status --ssh_user=root --host=192.168.153.150 --ip=192.168.153.150
Sun Sep 28 14:40:53 2014 - [info]  OK.
Sun Sep 28 14:40:53 2014 - [info] Got exit code 0 (Not master dead).

如果以上两个测试都通过,看来环境和配置基本OK，我们来启动MHA

# masterha_manager  --conf=/etc/app1.cnf
Sun Sep 28 14:42:43 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Sep 28 14:42:43 2014 - [info] Reading application default configurations from /etc/app1.cnf..
Sun Sep 28 14:42:43 2014 - [info] Reading server configurations from /etc/app1.cnf..

进程在切换触发以后会自动关掉.这个命令建议放到screen里面跑。
现在可以去# cd /var/log/masterha/app1/看看生成的日志
如果没有什么很严重的错误信息,那么就可以准备尝试failover了。

#开始failover
关闭master的mysql服务,观察db02和db03的复制变化情况。
# service mysql stop
Shutting down MySQL (Percona Server)….. SUCCESS!
查看/var/log/masterha/app1/manager.log的日志.如果看见如下信息,那么就是failover成功了。

----- Failover Report -----

app1: MySQL Master failover 192.168.153.150 to 192.168.153.151 succeeded

Master 192.168.153.150 is down!

Check MHA Manager logs at localhost.localdomain:/var/log/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
The latest slave 192.168.153.151(192.168.153.151:3306) has all relay logs for recovery.
Selected 192.168.153.151 as a new master.
192.168.153.151: OK: Applying all logs succeeded.
192.168.153.152: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.153.152: OK: Applying all logs succeeded. Slave started, replicating from 192.168.153.151.
192.168.153.151: Resetting slave info succeeded.
Master failover to 192.168.153.151(192.168.153.151:3306) completed successfully.

接下来观察一下db2和db3的复制情况:
db2:

(root:hostname)[(none)]> show slave status\G
Empty set (0.00 sec)

db3:

(root:hostname)[(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.153.151
                  Master_User: slave
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: mysql-bin.000016
          Read_Master_Log_Pos: 688
               Relay_Log_File: mysql-relay.000002
                Relay_Log_Pos: 283
        Relay_Master_Log_File: mysql-bin.000016
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 688
              Relay_Log_Space: 452
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 151
                  Master_UUID: dd079e18-4244-11e4-b851-000c29da163e
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
1 row in set (0.00 sec)

9、安装测试keepalived,在db1和db2上安装keepalived。
关于keepalvied的信息,可以阅读《LVS+Keepalived使用总结》
或者搜索《keepalived权威指南》
下载keepalived的软件包

 下载最新的tar.gz包。
#yum install kernel-devel
#tar -zxvf keepalived-1.2.13.tar.gz
#cd keepalived-1.2.13
#./configure --prefix=/ --with-kernel-dir=/usr/src/kernels/2.6.32-431.29.2.el6.x86_64/
# make && make install

安装完成后,修改配置文件,下面是db1上面的配置文件,db2的话,将优先级改低50就可以了。

# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived

global_defs {
   notification_email {
     acassen@firewall.loc
     failover@firewall.loc
     sysadmin@firewall.loc
   }
   notification_email_from Alexandre.Cassen@firewall.loc
   smtp_server 192.168.200.1
   smtp_connect_timeout 30
   router_id LVS_DEVEL
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0		//keepalived使用的网口
    virtual_router_id 51
    priority 150		//优先级越高,优先获取虚拟IP
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.153.100		//虚拟IP
    }
}

测试keepalived是否正常工作.

db1#service keepalived restart
db2#service keepalived restart

db1#ip add
1: lo:  mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:da:16:3d brd ff:ff:ff:ff:ff:ff
    inet 192.168.153.150/24 brd 192.168.153.255 scope global eth0
    inet 192.168.153.100/32 scope global eth0
    inet6 fe80::20c:29ff:feda:163d/64 scope link
       valid_lft forever preferred_lft forever

现在关掉db1的keepalived的进程:

db1# killall keepalived
db1#ip add
1: lo:  mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:da:16:3d brd ff:ff:ff:ff:ff:ff
    inet 192.168.153.150/24 brd 192.168.153.255 scope global eth0
    inet6 fe80::20c:29ff:feda:163d/64 scope link
       valid_lft forever preferred_lft forever

db2# ip add
1: lo:  mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:da:16:3e brd ff:ff:ff:ff:ff:ff
    inet 192.168.153.151/24 brd 192.168.153.255 scope global eth0
    inet 192.168.153.100/32 scope global eth0
    inet6 fe80::20c:29ff:feda:163e/64 scope link
       valid_lft forever preferred_lft forever

大家可以发现,虚拟IP瞬间已经漂移到了db2上面.调试信息可以在/var/log/messages 中看见。

我们使用keepalived的目的就是在MHA检测到master挂掉的时候,调用shutdown_script关掉keepalived进程,从而是虚拟IP移动到新的master上面去。

9、联合MHA和keepalived调试.
在调试之前,我们需要搞清楚一些事情.那就是关于上面我们注释掉的几个script,他们是干什么的,在什么时候调用.
#master_ip_failover_script:
首先启动的时候会调用这个脚本
/opt/master_ip_failover_script.sh –command=status –ssh_user=root –orig_master_host=192.168.153.150 –orig_master_ip=192.168.153.150 –orig_master_port=3306
然后在正式failover过程中的第二步,Dead Master Shutdown Phase阶段会在次执行。
/opt/master_ip_failover_script.sh –orig_master_host=192.168.153.150 –orig_master_ip=192.168.153.150 –orig_master_port=3306 –command=stopssh –ssh_user=root
在正式failover过程中的第3.4步骤中(选举新的master以后,应用差异的binlog后),会再次执行。
/opt/master_ip_failover_script.sh –command=start –ssh_user=root –orig_master_host=192.168.153.150 –orig_master_ip=192.168.153.150 –orig_master_port=3306 –new_master_host=192.168.153.151 –new_master_ip=192.168.153.151 –new_master_port=3306 –new_master_user=’mha’ –new_master_password=’mha’

#master_ip_online_change_script：
在使用masterha_master_switch –conf=/etc/app1.cnf –master_state=alive –new_master_host=192.168.153.151主动切换mysql master的时候会调用.
在online切换的第二阶段,拒绝写入原master的时候执行。
/opt/master_ip_online_change_script.sh –command=stop –orig_master_host=192.168.153.150 –orig_master_ip=192.168.153.150 –orig_master_port=3306 –orig_master_user=’mha’ –orig_master_password=’mha’ –new_master_host=192.168.153.151 –new_master_ip=192.168.153.151 –new_master_port=3306 –new_master_user=’mha’ –new_master_password=’mha’
然后会在new master上执行
/opt/master_ip_online_change_script.sh –command=start –orig_master_host=192.168.153.150 –orig_master_ip=192.168.153.150 –orig_master_port=3306 –orig_master_user=’mha’ –orig_master_password=’mha’ –new_master_host=192.168.153.151 –new_master_ip=192.168.153.151 –new_master_port=3306 –new_master_user=’mha’ –new_master_password=’mha’

#shutdown_script:
首先启动的时候会执行这个脚本,执行时间紧跟着master_ip_failover_script第一次执行后面
/opt/shutdown_script.sh –command=status –ssh_user=root –host=192.168.153.150 –ip=192.168.153.150
第二次执行是在master_ip_failover_script第二次执行后面
/opt/shutdown_script.sh –command=stopssh –ssh_user=root –host=192.168.153.150 –ip=192.168.153.150 –port=3306

#report_script=”" //通知脚本
在masterha_manager自动切换完成的最后会调用一次这个脚本。
report_script.sh –orig_master_host=(dead master’s hostname) –new_master_host=(new master’s hostname) –new_slave_hosts=(new slaves’ hostnames, delimited by commas) –subject=(mail subject) –body=(body)

在mha4mysql-manager源码包的samples/scripts/目录,会有几个示例的脚本.是perl编写的,我不太懂perl啦.如果有一样像我这样不太懂perl的同学,可以根据上面的调用参数,使用shell或者python从新实现一次。
自己在从新实现这些脚本的时候,有两点注意:
1、尽量符合调用的参数,让脚本更人性化
2、脚本的返回值需要是0或者10,不然会认为脚本执行错误,后面的操作将不再继续执行,failover操作将会停止。

我们现在需要自己写一个shutdown_script的脚本,内容就是检测master上的mysql是否真的挂掉了,如果真的挂掉了,那么就杀掉master上面的keepalived进程,触发VIP的漂移。
修改app.cnf中被注释掉的shutdown_script,指定到对应的脚本.我的shutdow_script.sh脚本在文章末尾公布,其实最简单的shutdown_script只需要干两个事情,一检查mysql是否当掉,二如果当掉就killall keepalived。

下面我们开始联合调试.
检查三台mysql的复制情况
master和备用master开启keepalived监听,检查虚拟ip是否在master上面。
启动mha_manager
关闭master mysql
检查slave的复制情况以及VIP漂移情况.

TIPS:有两个数据安全方面可以需要优化的地方
1、设置所有slave的read_only=on
如果设置了这个参数,就需要使用master_ip_failover_script和master_ip_online_change_script参数,在新master初始化的时候设置成read_only=off.这个设置的主要目的是避免master的os宕机时,keepalived的VIP比MHA先切换到new master.
2、设置所有的slave的relay_log_purge=0
设置这个参数以后,已完成的relay log就不会自动的purge掉.这个设置的主要目的是为了避免在failover的3.3和4.1阶段,diff log需要某个slave的已经完成的relay log存在.使用这个参数以后,会产生一个问题,
那就是relay log会越来越来,并且清理relay log的时候可能会导致复制阻塞.所以MHA的node提供了一个脚本purge_relay_logs来完成无阻塞的清理relay log.
我们需要在slave加上一个计划任务.
[app@slave_host1]$ cat /etc/cron.d/purge_relay_logs
# purge relay logs at 5am
0 5 * * * app /usr/bin/purge_relay_logs –user=root –password=PASSWORD –disable_relay_log_purge >> /var/log/masterha/purge_relay_logs.log 2>&1

下面是我的shutdown_script脚本,这个脚本主要使用的是stopssh方法,stop方法一般没有调用,如果你有需要,自己在稍微修改一下.
shutdown_script.sh:

[root@localhost opt]# cat shutdown_script.sh
#!/bin/bash
#       masterha shutdown_script.
#       version:        2013-11-06       frist version
#
#                               by andy.feng
#                               copy right
LANG=C
for i in $@
do
        if  [ ${i:2:2} = "ip" ]
                then
                IP=${i:5:20}
        elif [ ${i:2:7} = "command" ]
                then
                CMD=${i:10:20}
        elif [ ${i:2:4} = "port" ]
                then
                MYSQL_PORT=${i:7:20}
        fi
done
USER="mha"
PASSWORD="mha"
function stopssh {
        mysql -s -u$USER -p$PASSWORD -h$IP -P$MYSQL_PORT -e 'select count(*) as c from mysql.user;'  &> /dev/null
        if [ $? -ne 0 ]
        then
                ssh $IP 'killall keepalived'
                if [ $? != 0 ]
                        then
                        echo "$IP killall keepalived fail....."
                        return 1
                fi
                        return 0
        fi
}

function stop {
        mysql -s -u$USER -p$PASSWORD -h$IP -P$MYSQL_PORT -e 'select count(*) as c from mysql.user;'  &> /dev/null
        if [ $? -ne 0 ]
        then
                ssh $IP 'shutdown -h now'
               if [ $? != 0 ]
                        then
                        echo "$IP shutdown  fail....."
                        return 1
               fi
                        return 0
        fi
}

if [ $CMD = 'stopssh' ]
        then
        stopssh
fi

阅读(1783) | 评论(0) | 转发(0) |

上一篇：获取服务器IP

下一篇：sudo 管理

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6