搭建MHA环境-guanshanyue1990-ChinaUnix博客

guanshanyue1990的ChinaUnix博客

首页　| 　博文目录　| 　关于我

guanshanyue1990

博客访问： 25520
博文数量： 5
博客积分： 0
博客等级：民兵
技术积分： 137
用户组：普通用户
注册时间： 2014-03-03 08:00

文章分类

全部博文（5）

www服务器（1）
集群（0）
linux服务器（3）
MYSQL（1）
未分配的博文（0）

文章存档

2014年（5）

我的朋友

相关博文

搭建MHA环境

分类： Mysql/postgreSQL

2014-03-03 08:24:44

1,主机规划
10.10.54.26 计划安装 MHA mha4mysql-manager-0.53.tar.gz
10.10.54.27 计划安装 MHA mha4mysql-node-0.53.tar.gz
10.10.54.25 计划安装 MHA mha4mysql-node-0.53.tar.gz
10.10.54.22 计划安装 MHA mha4mysql-node-0.53.tar.gz
2.下载软件并安装
所需软件依赖
perl-DBD-MySQL.x86_64(软件源安装即可)
源码包
List-MoreUtils-0.12.tar.gz
Log-Dispatch-2.28.tar.gz
Module-Build-0.3601.tar.gz
Module-CoreList-3.07.tar.gz
Module-Implementation-0.07.t
Module-Install-1.06.tar.gz
Module-Metadata-1.000019.tar
Module-Runtime-0.014.tar.gz
Module-ScanDeps-1.13.tar.gz
Parallel-ForkManager-1.06.ta
Params-Validate-1.08.tar.gz
PAR-Dist-0.49.tar.gz
Parse-CPAN-Meta-1.4413.tar.g
Perl-OSType-1.007.tar.gz
Software-License-0.103009.ta
Test-Fatal-0.013.tar.gz
yaml-0.1.4-7.ram0.98.x86_64.
YAML-0.90.tar.gz
YAML-Tiny-1.61.tar.gz

MHA管理软件

mha4mysql-node-0.53.tar.gz
master:
[root@wang mha4mysql-node-0.53]# perl Makefile.PL

点击(此处)折叠或打开

*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI ...loaded. (1.609)
- DBD::mysql ...loaded. (4.013)
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Writing Makefile for mha4mysql::node

[root@wang mha4mysql-node-0.53]# make && make install

[root@wang mha4mysql-manager-0.53]# perl Makefile.PL

点击(此处)折叠或打开

*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI ...loaded. (1.609)
- DBD::mysql ...loaded. (4.013)
- Time::HiRes ...loaded. (1.9726)
- Config::Tiny ...loaded. (2.20)
- Log::Dispatch ...loaded. (2.28)
- Parallel::ForkManager ...loaded. (1.06)
- MHA::NodeConst ...loaded. (0.53)
*** Module::AutoInstall configuration finished.
Writing Makefile for mha4mysql::manager

[root@wang mha4mysql-manager-0.53]# make && make install

slave
[root@wang mha4mysql-node-0.53]# perl Makefile.PL
[root@wang mha4mysql-node-0.53]# make && make install
3）检查和配置SSH无密码验证设置
创建配置文件
[root@wang home]# vi /etc/app1.cnf

点击(此处)折叠或打开

[server default]
# mysql user and password
user=root
password=123
ssh_user=root
# working directory on the manager
manager_workdir=/var/log/masterha/app1 #不存在Node会自动创建
manager_log=/var/log/masterha/app1/app1.log
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1
master_binlog_dir=/var/lib/mysql/
check_repl_delay=0
[server1]
hostname=10.10.54.27
candidate_master=1
[server2]
hostname=10.10.54.22
candidate_master=1
[server3]
hostname=10.10.54.25
no_master=1

-------------------------
在manager上10.10.54.26配置到个Node的无密码验证：
[root@wang home]# ssh-keygen -t rsa
[root@wang home]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.10.54.22
[root@wang home]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.10.54.25
[root@wang home]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.10.54.27

在Node：10.10.54.22上：
[root@wang home]# ssh-keygen -t rsa
[root@wang home]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.10.54.25
[root@wang home]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.10.54.27
在Node：10.10.54.25上：
[root@wang home]# ssh-keygen -t rsa
[root@wang home]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.10.54.22
[root@wang home]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.10.54.27
在Node：10.10.54.27上：
[root@wang home]# ssh-keygen -t rsa
[root@wang home]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.10.54.22
[root@wang home]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.10.54.27

检查SSH：
[root@wang home]# masterha_check_ssh --conf=/etc/app1.cnf

点击(此处)折叠或打开

Fri Feb 28 10:44:31 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Feb 28 10:44:31 2014 - [info] Reading application default configurations from /etc/app1.cnf..
Fri Feb 28 10:44:31 2014 - [info] Reading server configurations from /etc/app1.cnf..
Fri Feb 28 10:44:31 2014 - [info] Starting SSH connection tests..
Fri Feb 28 10:44:31 2014 - [debug]
Fri Feb 28 10:44:31 2014 - [debug] Connecting via SSH from root@10.10.54.27(10.10.54.27:22) to root@10.10.54.22(10.10.54.22:22)..
Fri Feb 28 10:44:31 2014 - [debug] ok.
Fri Feb 28 10:44:31 2014 - [debug] Connecting via SSH from root@10.10.54.27(10.10.54.27:22) to root@10.10.54.25(10.10.54.25:22)..
Fri Feb 28 10:44:31 2014 - [debug] ok.
Fri Feb 28 10:44:32 2014 - [debug]
Fri Feb 28 10:44:31 2014 - [debug] Connecting via SSH from root@10.10.54.22(10.10.54.22:22) to root@10.10.54.27(10.10.54.27:22)..
Fri Feb 28 10:44:31 2014 - [debug] ok.
Fri Feb 28 10:44:31 2014 - [debug] Connecting via SSH from root@10.10.54.22(10.10.54.22:22) to root@10.10.54.25(10.10.54.25:22)..
Fri Feb 28 10:44:31 2014 - [debug] ok.
Fri Feb 28 10:44:32 2014 - [debug]
Fri Feb 28 10:44:32 2014 - [debug] Connecting via SSH from root@10.10.54.25(10.10.54.25:22) to root@10.10.54.27(10.10.54.27:22)..
Fri Feb 28 10:44:32 2014 - [debug] ok.
Fri Feb 28 10:44:32 2014 - [debug] Connecting via SSH from root@10.10.54.25(10.10.54.25:22) to root@10.10.54.22(10.10.54.22:22)..
Fri Feb 28 10:44:32 2014 - [debug] ok.
Fri Feb 28 10:44:32 2014 - [info] All SSH connection tests passed successfully.

4）检查复制状态
[root@wang mha4mysql-manager-0.53]# masterha_check_repl --conf=/etc/app1.cnf

点击(此处)折叠或打开

Fri Feb 28 10:50:40 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Feb 28 10:50:40 2014 - [info] Reading application default configurations from /etc/app1.cnf..
Fri Feb 28 10:50:40 2014 - [info] Reading server configurations from /etc/app1.cnf..
Fri Feb 28 10:50:40 2014 - [info] MHA::MasterMonitor version 0.53.
Creating directory /var/log/masterha/app1 #不存在Node会自动创建.. done.
Fri Feb 28 10:50:40 2014 - [info] Dead Servers:
Fri Feb 28 10:50:40 2014 - [info] 10.10.54.25(10.10.54.25:3306)
Fri Feb 28 10:50:40 2014 - [info] Alive Servers:
Fri Feb 28 10:50:40 2014 - [info] 10.10.54.27(10.10.54.27:3306)
Fri Feb 28 10:50:40 2014 - [info] 10.10.54.22(10.10.54.22:3306)
Fri Feb 28 10:50:40 2014 - [info] Alive Slaves:
Fri Feb 28 10:50:40 2014 - [info] 10.10.54.27(10.10.54.27:3306) Version=5.5.35-ndb-7.2.15-cluster-gpl-log (oldest major version between slaves) log-bin:enabled

错误处理：
1，Fri Feb 28 11:24:33 2014 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln193] There is no alive slave. We can't do failover
解决办法：
更改10.10.54.22的server-id=10
让10.10.54.22 与10.10.54.27建立主主互备构架即可

2，Fri Feb 28 11:34:39 2014 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln732] Multi-master configuration is detected, but two or more masters are either writable (read-only is not set) or dead! Check configurations for details. Master configurations are as below:
解决办法：
当前是主主互备构架，设置一个candicate主机22以及其他从机(27)均为read_only。

3,Fri Feb 28 13:24:46 2014 - [info] Connecting to root@10.10.54.27(10.10.54.27:22)..
Can't exec "mysqlbinlog": No such file or directory at /usr/local/share/perl5/MHA/BinlogManager.pm line 99.
mysqlbinlog version not found!

解决办法
在/usr/local/bin目录下建立相关命令的连接如下：
ln -s /usr/local/mysql/bin/mysqlbinlog /usr/local/bin/mysqlbinlog
ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysql
2}修改/etc/ssh/sshd_config 文件
PermitUserEnvironment yes
vim ~/.ssh/environment
PATH=/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/mysql/bin
然后要重启sshd服务。

正确状态

点击(此处)折叠或打开

Fri Feb 28 13:57:14 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Feb 28 13:57:14 2014 - [info] Reading application default configurations from /etc/app1.cnf..
Fri Feb 28 13:57:14 2014 - [info] Reading server configurations from /etc/app1.cnf..
Fri Feb 28 13:57:14 2014 - [info] MHA::MasterMonitor version 0.53.
Fri Feb 28 13:57:14 2014 - [info] Multi-master configuration is detected. Current primary(writable) master is 10.10.54.22(10.10.54.22:3306)
Fri Feb 28 13:57:14 2014 - [info] Master configurations are as below:
Master 10.10.54.27(10.10.54.27:3306), replicating from 10.10.54.22(10.10.54.22:3306), read-only
Master 10.10.54.22(10.10.54.22:3306), replicating from 10.10.54.27(10.10.54.27:3306)
Fri Feb 28 13:57:14 2014 - [info] Dead Servers:
Fri Feb 28 13:57:14 2014 - [info] Alive Servers:
Fri Feb 28 13:57:14 2014 - [info] 10.10.54.27(10.10.54.27:3306)
Fri Feb 28 13:57:14 2014 - [info] 10.10.54.22(10.10.54.22:3306)
Fri Feb 28 13:57:14 2014 - [info] 10.10.54.25(10.10.54.25:3306)
Fri Feb 28 13:57:14 2014 - [info] Alive Slaves:
Fri Feb 28 13:57:14 2014 - [info] 10.10.54.27(10.10.54.27:3306) Version=5.5.35-ndb-7.2.15-cluster-gpl-log (oldest major version between slaves) log-bin:enabled
Fri Feb 28 13:57:14 2014 - [info] Replicating from 10.10.54.22(10.10.54.22:3306)
Fri Feb 28 13:57:14 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Feb 28 13:57:14 2014 - [info] 10.10.54.25(10.10.54.25:3306) Version=5.5.35-ndb-7.2.15-cluster-gpl-log (oldest major version between slaves) log-bin:enabled
Fri Feb 28 13:57:14 2014 - [info] Replicating from 10.10.54.22(10.10.54.22:3306)
Fri Feb 28 13:57:14 2014 - [info] Not candidate for the new Master (no_master is set)
Fri Feb 28 13:57:14 2014 - [info] Current Alive Master: 10.10.54.22(10.10.54.22:3306)
Fri Feb 28 13:57:14 2014 - [info] Checking slave configurations..
Fri Feb 28 13:57:14 2014 - [info] Checking replication filtering settings..
Fri Feb 28 13:57:14 2014 - [info] binlog_do_db= , binlog_ignore_db=
Fri Feb 28 13:57:14 2014 - [info] Replication filtering check ok.
Fri Feb 28 13:57:14 2014 - [info] Starting SSH connection tests..
Fri Feb 28 13:57:16 2014 - [info] All SSH connection tests passed successfully.
Fri Feb 28 13:57:16 2014 - [info] Checking MHA Node version..
Fri Feb 28 13:57:16 2014 - [info] Version check ok.
Fri Feb 28 13:57:16 2014 - [info] Checking SSH publickey authentication settings on the current master..
Fri Feb 28 13:57:16 2014 - [info] HealthCheck: SSH to 10.10.54.22 is reachable.
Fri Feb 28 13:57:16 2014 - [info] Master MHA Node version is 0.53.
Fri Feb 28 13:57:16 2014 - [info] Checking recovery script configurations on the current master..
Fri Feb 28 13:57:16 2014 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/ndb --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.53 --start_file=slave22-bin.000007
Fri Feb 28 13:57:16 2014 - [info] Connecting to root@10.10.54.22(10.10.54.22)..
Creating /var/log/masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/ndb, up to slave22-bin.000007
Fri Feb 28 13:57:17 2014 - [info] Master setting check done.
Fri Feb 28 13:57:17 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Fri Feb 28 13:57:17 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=10.10.54.27 --slave_ip=10.10.54.27 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.5.35-ndb-7.2.15-cluster-gpl-log --manager_version=0.53 --relay_log_info=/data/ndb/relay-log.info --relay_dir=/data/ndb/ --slave_pass=xxx
Fri Feb 28 13:57:17 2014 - [info] Connecting to root@10.10.54.27(10.10.54.27:22)..
Checking slave recovery environment settings..
Opening /data/ndb/relay-log.info ... ok.
Relay log found at /data/ndb, up to relay-log.000002
Temporary relay log file is /data/ndb/relay-log.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Feb 28 13:57:17 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=10.10.54.25 --slave_ip=10.10.54.25 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.5.35-ndb-7.2.15-cluster-gpl-log --manager_version=0.53 --relay_log_info=/data/ndb/relay-log.info --relay_dir=/data/ndb/ --slave_pass=xxx
Fri Feb 28 13:57:17 2014 - [info] Connecting to root@10.10.54.25(10.10.54.25:22)..
Checking slave recovery environment settings..
Opening /data/ndb/relay-log.info ... ok.
Relay log found at /data/ndb, up to wang-relay-bin.000012
Temporary relay log file is /data/ndb/wang-relay-bin.000012
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Feb 28 13:57:17 2014 - [info] Slaves settings check done.
Fri Feb 28 13:57:17 2014 - [info]
10.10.54.22 (current master)
+--10.10.54.27
+--10.10.54.25
Fri Feb 28 13:57:17 2014 - [info] Checking replication health on 10.10.54.27..
Fri Feb 28 13:57:17 2014 - [info] ok.
Fri Feb 28 13:57:17 2014 - [info] Checking replication health on 10.10.54.25..
Fri Feb 28 13:57:17 2014 - [info] ok.
Fri Feb 28 13:57:17 2014 - [warning] master_ip_failover_script is not defined.
Fri Feb 28 13:57:17 2014 - [warning] shutdown_script is not defined.
Fri Feb 28 13:57:17 2014 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.

当前是主主互备构架，设置一个candicate主机22以及其他从机均为read_only。

mysql> set global read_only=1;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like 'read_only';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only | ON |
+---------------+-------+
1 row in set (0.00 sec)

设置主机和其他从机位relay_log_purge：

mysql> set global relay_log_purge=0;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like 'relay_log_purge';
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| relay_log_purge | OFF |
+-----------------+-------+
1 row in set (0.00 sec)

设置relay_log_purge=0 后必须对relay log文件进行有效的管理：
vi /etc/cron.d/purge_relay_logs
# purge relay logs at 5am
0 5 * * * app /usr/bin/purge_relay_logs --user=root --password=123 --disable_relay_log_purge >> /var/log/masterha/purge_relay_logs.log 2>&1
/etc/init.d/crond restart

五，测试运营
1）启动MHA manager：
[root@wang ~]# masterha_manager --conf=/etc/app1.cnf --remove_dead_master_conf
Fri Feb 28 14:12:32 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Feb 28 14:12:32 2014 - [info] Reading application default configurations from /etc/app1.cnf..
Fri Feb 28 14:12:32 2014 - [info] Reading server configurations from /etc/app1.cnf..

可以后台运营
[root@wang ~]# nohup masterha_manager --conf=/etc/app1.cnf --remove_dead_master_conf < /dev/null > /var/log/masterha/app1/app1.log 2>&1 &

2）检查manager运营状态：
[root@wang ~]# masterha_check_status --conf=/etc/app1.cnf
app1 (pid:1169) is running(0:PING_OK), master:10.10.54.22

3）手工切换测试
方式一：手工对dead master进行切换
如果，MHA manager检测到没有dead的server，将报错，并结束failover：
[root@wang ~]# masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=10.10.54.28 --new_master_host=10.10.54.27
Fri Feb 28 14:16:29 2014 - [info] Dead Servers:
Fri Feb 28 14:16:29 2014 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln181] None of server is dead. Stop failover.
Fri Feb 28 14:16:29 2014 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/local/bin/masterha_master_switch

方式二：当前master在正常运行情况的切换：
[root@wang ~]# masterha_master_switch --conf=/etc/app1.cnf --master_state=alive --new_master_host=10.10.54.27

报错：
Fri Feb 28 14:18:41 2014 - [error][/usr/local/share/perl5/MHA/MasterRotate.pm, ln142] Getting advisory lock failed on the current master. MHA Monitor runs on the current master. Stop MHA Manager/Monitor and try again.
解决办法：
[root@wang ~]# vi /etc/app1.cnf
[root@wang ~]# masterha_stop --conf=/etc/app1.cnf
Stopped app1 successfully.
[1]+ Exit 1 nohup masterha_manager --conf=/etc/app1.cnf --remove_dead_master_conf < /dev/null > /var/log/masterha/app1/app1.log 2>&1
[root@wang ~]# masterha_master_switch --conf=/etc/app1.cnf --master_state=alive --new_master_host=10.10.54.27

部分结果为

点击(此处)折叠或打开

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 10.10.54.22(10.10.54.22:3306)? (YES/no): yes
Fri Feb 28 14:24:46 2014 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Fri Feb 28 14:24:46 2014 - [info] ok.
Fri Feb 28 14:24:46 2014 - [info] Checking MHA is not monitoring or doing failover..
Fri Feb 28 14:24:46 2014 - [info] Checking replication health on 10.10.54.27..
Fri Feb 28 14:24:46 2014 - [info] ok.
Fri Feb 28 14:24:46 2014 - [info] Checking replication health on 10.10.54.25..
Fri Feb 28 14:24:46 2014 - [info] ok.
Fri Feb 28 14:24:46 2014 - [info] 10.10.54.27 can be new master.
Fri Feb 28 14:24:46 2014 - [info]
From:
10.10.54.22 (current master)
+--10.10.54.27
+--10.10.54.25
To:
10.10.54.27 (new master)
+--10.10.54.25
Fri Feb 28 14:24:56 2014 - [info] 10.10.54.27: Resetting slave info succeeded.
Fri Feb 28 14:24:56 2014 - [info] Switching master to 10.10.54.27(10.10.54.27:3306) completed successfully.

4）停止manager，不停止mysql server
[root@wang ~]# masterha_stop --conf=/etc/app1.cnf
MHA Manager is not running on app1(2:NOT_RUNNING).

阅读(3799) | 评论(1) | 转发(0) |

上一篇：没有了

下一篇：nfs服务器的搭建

给主人留下些什么吧！~~

啦哆A梦2014-03-04 10:04:07

文明上网，理性发言...

回复 | 举报

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6