1,主机规划
10.10.54.26 计划安装 MHA mha4mysql-manager-0.53.tar.gz
10.10.54.27 计划安装 MHA mha4mysql-node-0.53.tar.gz
10.10.54.25 计划安装 MHA mha4mysql-node-0.53.tar.gz
10.10.54.22 计划安装 MHA mha4mysql-node-0.53.tar.gz
2.下载软件并安装
所需软件依赖
perl-DBD-MySQL.x86_64(软件源安装即可)
源码包
List-MoreUtils-0.12.tar.gz
Log-Dispatch-2.28.tar.gz
Module-Build-0.3601.tar.gz
Module-CoreList-3.07.tar.gz
Module-Implementation-0.07.t
Module-Install-1.06.tar.gz
Module-Metadata-1.000019.tar
Module-Runtime-0.014.tar.gz
Module-ScanDeps-1.13.tar.gz
Parallel-ForkManager-1.06.ta
Params-Validate-1.08.tar.gz
PAR-Dist-0.49.tar.gz
Parse-CPAN-Meta-1.4413.tar.g
Perl-OSType-1.007.tar.gz
Software-License-0.103009.ta
Test-Fatal-0.013.tar.gz
yaml-0.1.4-7.ram0.98.x86_64.
YAML-0.90.tar.gz
YAML-Tiny-1.61.tar.gz
MHA管理软件
mha4mysql-node-0.53.tar.gz
master:
[root@wang mha4mysql-node-0.53]# perl Makefile.PL
-
*** Module::AutoInstall version 1.03
-
*** Checking for Perl dependencies...
-
[Core Features]
-
- DBI ...loaded. (1.609)
-
- DBD::mysql ...loaded. (4.013)
-
*** Module::AutoInstall configuration finished.
-
Checking if your kit is complete...
-
Looks good
-
Writing Makefile for mha4mysql::node
[root@wang mha4mysql-node-0.53]# make && make install
[root@wang mha4mysql-manager-0.53]# perl Makefile.PL
-
*** Module::AutoInstall version 1.03
-
*** Checking for Perl dependencies...
-
[Core Features]
-
- DBI ...loaded. (1.609)
-
- DBD::mysql ...loaded. (4.013)
-
- Time::HiRes ...loaded. (1.9726)
-
- Config::Tiny ...loaded. (2.20)
-
- Log::Dispatch ...loaded. (2.28)
-
- Parallel::ForkManager ...loaded. (1.06)
-
- MHA::NodeConst ...loaded. (0.53)
-
*** Module::AutoInstall configuration finished.
-
Writing Makefile for mha4mysql::manager
[root@wang mha4mysql-manager-0.53]# make && make install
slave
[root@wang mha4mysql-node-0.53]# perl Makefile.PL
[root@wang mha4mysql-node-0.53]# make && make install
3)检查和配置SSH无密码验证设置
创建配置文件
[root@wang home]# vi /etc/app1.cnf
-
[server default]
-
# mysql user and password
-
user=root
-
password=123
-
ssh_user=root
-
# working directory on the manager
-
manager_workdir=/var/log/masterha/app1 #不存在Node会自动创建
-
manager_log=/var/log/masterha/app1/app1.log
-
# working directory on MySQL servers
-
remote_workdir=/var/log/masterha/app1
-
master_binlog_dir=/var/lib/mysql/
-
check_repl_delay=0
-
[server1]
-
hostname=10.10.54.27
-
candidate_master=1
-
[server2]
-
hostname=10.10.54.22
-
candidate_master=1
-
[server3]
-
hostname=10.10.54.25
-
no_master=1
-------------------------
在manager上10.10.54.26配置到个Node的无密码验证:
[root@wang home]# ssh-keygen -t rsa
[root@wang home]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.10.54.22
[root@wang home]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.10.54.25
[root@wang home]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.10.54.27
在Node:10.10.54.22上:
[root@wang home]# ssh-keygen -t rsa
[root@wang home]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.10.54.25
[root@wang home]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.10.54.27
在Node:10.10.54.25上:
[root@wang home]# ssh-keygen -t rsa
[root@wang home]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.10.54.22
[root@wang home]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.10.54.27
在Node:10.10.54.27上:
[root@wang home]# ssh-keygen -t rsa
[root@wang home]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.10.54.22
[root@wang home]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.10.54.27
检查SSH:
[
root@wang home]# masterha_check_ssh --conf=/etc/app1.cnf
-
Fri Feb 28 10:44:31 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
-
Fri Feb 28 10:44:31 2014 - [info] Reading application default configurations from /etc/app1.cnf..
-
Fri Feb 28 10:44:31 2014 - [info] Reading server configurations from /etc/app1.cnf..
-
Fri Feb 28 10:44:31 2014 - [info] Starting SSH connection tests..
-
Fri Feb 28 10:44:31 2014 - [debug]
-
Fri Feb 28 10:44:31 2014 - [debug] Connecting via SSH from root@10.10.54.27(10.10.54.27:22) to root@10.10.54.22(10.10.54.22:22)..
-
Fri Feb 28 10:44:31 2014 - [debug] ok.
-
Fri Feb 28 10:44:31 2014 - [debug] Connecting via SSH from root@10.10.54.27(10.10.54.27:22) to root@10.10.54.25(10.10.54.25:22)..
-
Fri Feb 28 10:44:31 2014 - [debug] ok.
-
Fri Feb 28 10:44:32 2014 - [debug]
-
Fri Feb 28 10:44:31 2014 - [debug] Connecting via SSH from root@10.10.54.22(10.10.54.22:22) to root@10.10.54.27(10.10.54.27:22)..
-
Fri Feb 28 10:44:31 2014 - [debug] ok.
-
Fri Feb 28 10:44:31 2014 - [debug] Connecting via SSH from root@10.10.54.22(10.10.54.22:22) to root@10.10.54.25(10.10.54.25:22)..
-
Fri Feb 28 10:44:31 2014 - [debug] ok.
-
Fri Feb 28 10:44:32 2014 - [debug]
-
Fri Feb 28 10:44:32 2014 - [debug] Connecting via SSH from root@10.10.54.25(10.10.54.25:22) to root@10.10.54.27(10.10.54.27:22)..
-
Fri Feb 28 10:44:32 2014 - [debug] ok.
-
Fri Feb 28 10:44:32 2014 - [debug] Connecting via SSH from root@10.10.54.25(10.10.54.25:22) to root@10.10.54.22(10.10.54.22:22)..
-
Fri Feb 28 10:44:32 2014 - [debug] ok.
-
Fri Feb 28 10:44:32 2014 - [info] All SSH connection tests passed successfully.
4)检查复制状态
[root@wang mha4mysql-manager-0.53]# masterha_check_repl --conf=/etc/app1.cnf
-
Fri Feb 28 10:50:40 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
-
Fri Feb 28 10:50:40 2014 - [info] Reading application default configurations from /etc/app1.cnf..
-
Fri Feb 28 10:50:40 2014 - [info] Reading server configurations from /etc/app1.cnf..
-
Fri Feb 28 10:50:40 2014 - [info] MHA::MasterMonitor version 0.53.
-
Creating directory /var/log/masterha/app1 #不存在Node会自动创建.. done.
-
Fri Feb 28 10:50:40 2014 - [info] Dead Servers:
-
Fri Feb 28 10:50:40 2014 - [info] 10.10.54.25(10.10.54.25:3306)
-
Fri Feb 28 10:50:40 2014 - [info] Alive Servers:
-
Fri Feb 28 10:50:40 2014 - [info] 10.10.54.27(10.10.54.27:3306)
-
Fri Feb 28 10:50:40 2014 - [info] 10.10.54.22(10.10.54.22:3306)
-
Fri Feb 28 10:50:40 2014 - [info] Alive Slaves:
-
Fri Feb 28 10:50:40 2014 - [info] 10.10.54.27(10.10.54.27:3306) Version=5.5.35-ndb-7.2.15-cluster-gpl-log (oldest major version between slaves) log-bin:enabled
错误处理:
1,Fri Feb 28 11:24:33 2014 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln193] There is no alive slave. We can't do failover
解决办法:
更改10.10.54.22的server-id=10
让10.10.54.22 与10.10.54.27建立主主互备构架即可
2,Fri Feb 28 11:34:39 2014 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln732] Multi-master configuration is detected, but two or more masters are either writable (read-only is not set) or dead! Check configurations for details. Master configurations are as below:
解决办法:
当前是主主互备构架,设置一个candicate主机22以及其他从机(27)均为read_only。
3,Fri Feb 28 13:24:46 2014 - [info] Connecting to root@10.10.54.27(10.10.54.27:22)..
Can't exec "mysqlbinlog": No such file or directory at /usr/local/share/perl5/MHA/BinlogManager.pm line 99.
mysqlbinlog version not found!
解决办法
在/usr/local/bin目录下建立相关命令的连接如下:
ln -s /usr/local/mysql/bin/mysqlbinlog /usr/local/bin/mysqlbinlog
ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysql
2}修改/etc/ssh/sshd_config 文件
PermitUserEnvironment yes
vim ~/.ssh/environment
PATH=/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/mysql/bin
然后要重启sshd服务。
正确状态
-
Fri Feb 28 13:57:14 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
-
Fri Feb 28 13:57:14 2014 - [info] Reading application default configurations from /etc/app1.cnf..
-
Fri Feb 28 13:57:14 2014 - [info] Reading server configurations from /etc/app1.cnf..
-
Fri Feb 28 13:57:14 2014 - [info] MHA::MasterMonitor version 0.53.
-
Fri Feb 28 13:57:14 2014 - [info] Multi-master configuration is detected. Current primary(writable) master is 10.10.54.22(10.10.54.22:3306)
-
Fri Feb 28 13:57:14 2014 - [info] Master configurations are as below:
-
Master 10.10.54.27(10.10.54.27:3306), replicating from 10.10.54.22(10.10.54.22:3306), read-only
-
Master 10.10.54.22(10.10.54.22:3306), replicating from 10.10.54.27(10.10.54.27:3306)
-
-
-
Fri Feb 28 13:57:14 2014 - [info] Dead Servers:
-
Fri Feb 28 13:57:14 2014 - [info] Alive Servers:
-
Fri Feb 28 13:57:14 2014 - [info] 10.10.54.27(10.10.54.27:3306)
-
Fri Feb 28 13:57:14 2014 - [info] 10.10.54.22(10.10.54.22:3306)
-
Fri Feb 28 13:57:14 2014 - [info] 10.10.54.25(10.10.54.25:3306)
-
Fri Feb 28 13:57:14 2014 - [info] Alive Slaves:
-
Fri Feb 28 13:57:14 2014 - [info] 10.10.54.27(10.10.54.27:3306) Version=5.5.35-ndb-7.2.15-cluster-gpl-log (oldest major version between slaves) log-bin:enabled
-
Fri Feb 28 13:57:14 2014 - [info] Replicating from 10.10.54.22(10.10.54.22:3306)
-
Fri Feb 28 13:57:14 2014 - [info] Primary candidate for the new Master (candidate_master is set)
-
Fri Feb 28 13:57:14 2014 - [info] 10.10.54.25(10.10.54.25:3306) Version=5.5.35-ndb-7.2.15-cluster-gpl-log (oldest major version between slaves) log-bin:enabled
-
Fri Feb 28 13:57:14 2014 - [info] Replicating from 10.10.54.22(10.10.54.22:3306)
-
Fri Feb 28 13:57:14 2014 - [info] Not candidate for the new Master (no_master is set)
-
Fri Feb 28 13:57:14 2014 - [info] Current Alive Master: 10.10.54.22(10.10.54.22:3306)
-
Fri Feb 28 13:57:14 2014 - [info] Checking slave configurations..
-
Fri Feb 28 13:57:14 2014 - [info] Checking replication filtering settings..
-
Fri Feb 28 13:57:14 2014 - [info] binlog_do_db= , binlog_ignore_db=
-
Fri Feb 28 13:57:14 2014 - [info] Replication filtering check ok.
-
Fri Feb 28 13:57:14 2014 - [info] Starting SSH connection tests..
-
Fri Feb 28 13:57:16 2014 - [info] All SSH connection tests passed successfully.
-
Fri Feb 28 13:57:16 2014 - [info] Checking MHA Node version..
-
Fri Feb 28 13:57:16 2014 - [info] Version check ok.
-
Fri Feb 28 13:57:16 2014 - [info] Checking SSH publickey authentication settings on the current master..
-
Fri Feb 28 13:57:16 2014 - [info] HealthCheck: SSH to 10.10.54.22 is reachable.
-
Fri Feb 28 13:57:16 2014 - [info] Master MHA Node version is 0.53.
-
Fri Feb 28 13:57:16 2014 - [info] Checking recovery script configurations on the current master..
-
Fri Feb 28 13:57:16 2014 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/ndb --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.53 --start_file=slave22-bin.000007
-
Fri Feb 28 13:57:16 2014 - [info] Connecting to root@10.10.54.22(10.10.54.22)..
-
Creating /var/log/masterha/app1 if not exists.. ok.
-
Checking output directory is accessible or not..
-
ok.
-
Binlog found at /data/ndb, up to slave22-bin.000007
-
Fri Feb 28 13:57:17 2014 - [info] Master setting check done.
-
Fri Feb 28 13:57:17 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
-
Fri Feb 28 13:57:17 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=10.10.54.27 --slave_ip=10.10.54.27 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.5.35-ndb-7.2.15-cluster-gpl-log --manager_version=0.53 --relay_log_info=/data/ndb/relay-log.info --relay_dir=/data/ndb/ --slave_pass=xxx
-
Fri Feb 28 13:57:17 2014 - [info] Connecting to root@10.10.54.27(10.10.54.27:22)..
-
Checking slave recovery environment settings..
-
Opening /data/ndb/relay-log.info ... ok.
-
Relay log found at /data/ndb, up to relay-log.000002
-
Temporary relay log file is /data/ndb/relay-log.000002
-
Testing mysql connection and privileges.. done.
-
Testing mysqlbinlog output.. done.
-
Cleaning up test file(s).. done.
-
Fri Feb 28 13:57:17 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=10.10.54.25 --slave_ip=10.10.54.25 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.5.35-ndb-7.2.15-cluster-gpl-log --manager_version=0.53 --relay_log_info=/data/ndb/relay-log.info --relay_dir=/data/ndb/ --slave_pass=xxx
-
Fri Feb 28 13:57:17 2014 - [info] Connecting to root@10.10.54.25(10.10.54.25:22)..
-
Checking slave recovery environment settings..
-
Opening /data/ndb/relay-log.info ... ok.
-
Relay log found at /data/ndb, up to wang-relay-bin.000012
-
Temporary relay log file is /data/ndb/wang-relay-bin.000012
-
Testing mysql connection and privileges.. done.
-
Testing mysqlbinlog output.. done.
-
Cleaning up test file(s).. done.
-
Fri Feb 28 13:57:17 2014 - [info] Slaves settings check done.
-
Fri Feb 28 13:57:17 2014 - [info]
-
10.10.54.22 (current master)
-
+--10.10.54.27
-
+--10.10.54.25
-
-
-
Fri Feb 28 13:57:17 2014 - [info] Checking replication health on 10.10.54.27..
-
Fri Feb 28 13:57:17 2014 - [info] ok.
-
Fri Feb 28 13:57:17 2014 - [info] Checking replication health on 10.10.54.25..
-
Fri Feb 28 13:57:17 2014 - [info] ok.
-
Fri Feb 28 13:57:17 2014 - [warning] master_ip_failover_script is not defined.
-
Fri Feb 28 13:57:17 2014 - [warning] shutdown_script is not defined.
-
Fri Feb 28 13:57:17 2014 - [info] Got exit code 0 (Not master dead).
-
-
-
MySQL Replication Health is OK.
当前是主主互备构架,设置一个candicate主机22以及其他从机均为read_only。
mysql> set global read_only=1;
Query OK, 0 rows affected (0.00 sec)
mysql> show variables like 'read_only';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only | ON |
+---------------+-------+
1 row in set (0.00 sec)
设置主机和其他从机位relay_log_purge:
mysql> set global relay_log_purge=0;
Query OK, 0 rows affected (0.00 sec)
mysql> show variables like 'relay_log_purge';
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| relay_log_purge | OFF |
+-----------------+-------+
1 row in set (0.00 sec)
设置relay_log_purge=0 后必须对relay log文件进行有效的管理:
vi /etc/cron.d/purge_relay_logs
# purge relay logs at 5am
0 5 * * * app /usr/bin/purge_relay_logs --user=root --password=123 --disable_relay_log_purge >> /var/log/masterha/purge_relay_logs.log 2>&1
/etc/init.d/crond restart
五,测试运营
1)启动MHA manager:
[root@wang ~]# masterha_manager --conf=/etc/app1.cnf --remove_dead_master_conf
Fri Feb 28 14:12:32 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Feb 28 14:12:32 2014 - [info] Reading application default configurations from /etc/app1.cnf..
Fri Feb 28 14:12:32 2014 - [info] Reading server configurations from /etc/app1.cnf..
可以后台运营
[root@wang ~]# nohup masterha_manager --conf=/etc/app1.cnf --remove_dead_master_conf < /dev/null > /var/log/masterha/app1/app1.log 2>&1 &
2)检查manager运营状态:
[root@wang ~]# masterha_check_status --conf=/etc/app1.cnf
app1 (pid:1169) is running(0:PING_OK), master:10.10.54.22
3)手工切换测试
方式一:手工对dead master进行切换
如果,MHA manager检测到没有dead的server,将报错,并结束failover:
[root@wang ~]# masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=10.10.54.28 --new_master_host=10.10.54.27
Fri Feb 28 14:16:29 2014 - [info] Dead Servers:
Fri Feb 28 14:16:29 2014 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln181] None of server is dead. Stop failover.
Fri Feb 28 14:16:29 2014 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/local/bin/masterha_master_switch
方式二:当前master在正常运行情况的切换:
[root@wang ~]# masterha_master_switch --conf=/etc/app1.cnf --master_state=alive --new_master_host=10.10.54.27
报错:
Fri Feb 28 14:18:41 2014 - [error][/usr/local/share/perl5/MHA/MasterRotate.pm, ln142] Getting advisory lock failed on the current master. MHA Monitor runs on the current master. Stop MHA Manager/Monitor and try again.
解决办法:
[root@wang ~]# vi /etc/app1.cnf
[root@wang ~]# masterha_stop --conf=/etc/app1.cnf
Stopped app1 successfully.
[1]+ Exit 1 nohup masterha_manager --conf=/etc/app1.cnf --remove_dead_master_conf < /dev/null > /var/log/masterha/app1/app1.log 2>&1
[root@wang ~]# masterha_master_switch --conf=/etc/app1.cnf --master_state=alive --new_master_host=10.10.54.27
部分结果为
-
It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 10.10.54.22(10.10.54.22:3306)? (YES/no): yes
-
Fri Feb 28 14:24:46 2014 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
-
Fri Feb 28 14:24:46 2014 - [info] ok.
-
Fri Feb 28 14:24:46 2014 - [info] Checking MHA is not monitoring or doing failover..
-
Fri Feb 28 14:24:46 2014 - [info] Checking replication health on 10.10.54.27..
-
Fri Feb 28 14:24:46 2014 - [info] ok.
-
Fri Feb 28 14:24:46 2014 - [info] Checking replication health on 10.10.54.25..
-
Fri Feb 28 14:24:46 2014 - [info] ok.
-
Fri Feb 28 14:24:46 2014 - [info] 10.10.54.27 can be new master.
-
Fri Feb 28 14:24:46 2014 - [info]
-
From:
-
10.10.54.22 (current master)
-
+--10.10.54.27
-
+--10.10.54.25
-
To:
-
10.10.54.27 (new master)
-
+--10.10.54.25
-
Fri Feb 28 14:24:56 2014 - [info] 10.10.54.27: Resetting slave info succeeded.
-
Fri Feb 28 14:24:56 2014 - [info] Switching master to 10.10.54.27(10.10.54.27:3306) completed successfully.
4)停止manager,不停止mysql server
[root@wang ~]# masterha_stop --conf=/etc/app1.cnf
MHA Manager is not running on app1(2:NOT_RUNNING).