上述所有切换是正常情况下,MHA Failover的操作流程。下面会讲讲具体的实现细节:
-
1.停止所有slave从master接收数据(stop io_thread),并摘除Master的vip.
该步骤,主要通过函数force_shutdown实现,该函数主要部分如下:
-
my $slave_io_stopper = new Parallel::ForkManager( $#alive_slaves + 1 );
-
my $stop_io_failed = 0;
-
$slave_io_stopper->run_on_start(
-
sub {
-
my ( $pid, $target ) = @_;
-
}
-
);
-
$slave_io_stopper->run_on_finish(
-
sub {
-
my ( $pid, $exit_code, $target ) = @_;
-
return if ( $target->{ignore_fail} );
-
$stop_io_failed = 1 if ($exit_code);
-
}
-
);
-
-
foreach my $target (@alive_slaves) {
-
$slave_io_stopper->start($target) and next;
-
eval {
-
$SIG{INT} = $SIG{HUP} = $SIG{QUIT} = $SIG{TERM} = "DEFAULT";
-
my $rc = $target->stop_io_thread();
-
$slave_io_stopper->finish($rc);
-
};
-
if ($@) {
-
$log->error($@);
-
undef $@;
-
$slave_io_stopper->finish(1);
-
}
-
$slave_io_stopper->finish(0);
-
}
-
…...
-
force_shutdown_internal($dead_master); //摘除vip
该函数很明显,并发的去所有alive的slave上执行stop io_thread,之后调用force_shutdown_internal 摘除主库的vip。
总结:
从第一步我们看出,这一步所做的事情就是所有从库停止接收主库的数据,并且摘除主库的vip,此时主库无法在进行写入,为后面的补数据,并重新构建主从结构做好准备工作!
-
2.通过所有slave的信息,获取latest slaves和oldest slaves
第二步获取latest slaves和oldest slaves的实现代码如下:
-
sub check_set_latest_slaves {
-
$_server_manager->read_slave_status(); //读取所有alive slave的show slave status信息
-
$_server_manager->identify_latest_slaves(); //根据这些信息将latest slaves信息保存起来
-
….
-
$_server_manager->identify_oldest_slaves(); //同样根据alive slave的信息,将oldest slaves信息保存起来。
-
….
-
}
关于如何获取最新的slave和最老的slave,实现代码也比较简单:
-
if (
-
!$find_oldest
-
&& (
-
( !$a && !defined($b) )
-
|| ( $_->{Master_Log_File} gt $latest[0]{Master_Log_File} )
-
|| ( ( $_->{Master_Log_File} ge $latest[0]{Master_Log_File} )
-
&& $_->{Read_Master_Log_Pos} > $latest[0]{Read_Master_Log_Pos} )
-
)
-
)
-
{
-
@latest = ();
-
push( @latest, $_ );
-
}
-
elsif (
-
$find_oldest
-
&& (
-
( !$a && !defined($b) )
-
|| ( $_->{Master_Log_File} lt $latest[0]{Master_Log_File} )
-
|| ( ( $_->{Master_Log_File} le $latest[0]{Master_Log_File} )
-
&& $_->{Read_Master_Log_Pos} < $latest[0]{Read_Master_Log_Pos} )
-
)
-
)
-
{
-
@latest = ();
-
push( @latest, $_ );
-
}
-
elsif ( ( $_->{Master_Log_File} eq $latest[0]{Master_Log_File} )
-
&& ( $_->{Read_Master_Log_Pos} == $latest[0]{Read_Master_Log_Pos} ) )
-
{
-
push( @latest, $_ );
-
}
-
}
-
foreach (@latest) {
-
$_->{latest} = 1 if ( !$find_oldest );
-
$_->{oldest} = 1 if ($find_oldest);
-
}
从上述代码可以看到,主要通过slave的io thread读取主库的信息来判断。根据判断,将相关的slave进行标记,是否为latest或着oldest。
总结:
这一步所做的事情,就是根据所有alive slaves的信息,来确定谁是最新的slave,谁是最老的slave。这样做的目的是为了之后保存和dead master之间相差binlog 做准备。有了latest slave,就可以知道从dead master哪个位置copy binlog events,从而为恢复做准备。
-
3.取出一个latest_slave,并通过该slave的和待切换的master(dead master)做比较,将之间相差的binlog保存到binlog server上(通常为mha monitor server).
这一步的主要功能,是保存latest slave和dead master之间相差的binlog events,具体实现如下:
-
sub save_master_binlog {
-
my $dead_master = shift;
-
if ( $_real_ssh_reachable && !$g_skip_save_master_binlog ) { // 如果dead master ssh 可达,且不允许跳过补齐binlog
-
MHA::ManagerUtil::check_node_version( //检测apply_diff_relay_logs的版本
-
$log,
-
$dead_master->{ssh_user},
-
$dead_master->{ssh_host},
-
$dead_master->{ssh_ip},
-
$dead_master->{ssh_port}
-
);
-
-
//获取latest 读取到的dead master binlog file信息
-
my $latest_file =( $_server_manager->get_latest_slaves() )[0]->{Master_Log_File};
-
//获取latest 读取到的dead master binlog position信息
-
my $latest_pos =( $_server_manager->get_latest_slaves() )[0]->{Read_Master_Log_Pos};
-
//保存dead master和latest slave之间相差的binlog events到binlog server上。
-
save_master_binlog_internal( $latest_file, $latest_pos, $dead_master, );
-
}
-
else {
-
if ($g_skip_save_master_binlog) {
-
$log->info("Skipping trying to save dead master's binary log.");
-
}
-
elsif ( !$_real_ssh_reachable ) {
-
$log->warning(
-
"Dead Master is not SSH reachable. Could not save it's binlogs. Transactions that were not sent to the latest slave (Read_Master_Log_Pos to the tail of the dead master's binlog) were lost."
-
);
-
}
-
}
-
}
从上述代码得知,该步做了下面几件事情:
-
3.1 在ssh可达并且不允许跳过save binlog步骤的情况下,检测apply_diff_relay_logs版本(这一步应该是个bug,应该为检测
save_binary_logs)
-
3.2 获取latest slave的(master_log_file,master_log_pos)信息,并根据该信息调用save_master_binlog_internal保存latest slave和dead master之间相差的binlog events到binlog server。
这里提到了copy这段dead master和latest slave之间相差的binlog events是通过save_master_binlog_internal函数实现的,这段代码主要是通过slave_binary_logs脚本实现:
-
save_binary_logs --command=save --start_file=$master_log_file --start_pos=$read_master_log_pos --binlog_dir=$dead_master->{master_binlog_dir} --output_file=$_diff_binary_log_remote --handle_raw_binlog=$dead_master->{handle_raw_binlog} --disable_log_bin=$dead_master->{disable_log_bin} --manager_version=$MHA::ManagerConst::VERSION
-
4.从所有的slave中,选择一个合适的slave做为new master(通常为latest slave).
这一步所做的是整个切换最关键的一环,选择一个slave做为master,这一步的实现如下:
-
sub select_new_master($$) {
-
my $dead_master = shift;
-
my $latest_base_slave = shift;
-
-
my $new_master =
-
$_server_manager->select_new_master( $g_new_master_host, $g_new_master_port,
-
$latest_base_slave->{check_repl_delay} );
-
unless ($new_master) {
-
my $msg =
-
"None of existing slaves matches as a new master. Maybe preferred node is misconfigured or all slaves are too far behind.";
-
$log->error($msg);
-
$mail_body .= $msg . "\n";
-
croak;
-
}
-
$log->info( "New master is " . $new_master->get_hostinfo() );
-
$mail_body .=
-
"Selected " . $new_master->get_hostinfo() . " as a new master.\n";
-
$log->info("Starting master failover..");
-
$_server_manager->print_servers_migration_ascii( $dead_master, $new_master );
-
if ($g_interactive) {
-
$new_master =
-
$_server_manager->manually_decide_new_master( $dead_master, $new_master );
-
$log->info(
-
"New master decided manually is " . $new_master->get_hostinfo() );
-
}
-
return $new_master;
-
}
这里我们只讨论正常的,自动的切换。从上述代码我们可以看到,切换中new master的选择调用了
-
$_server_manager->select_new_master( $g_new_master_host, $g_new_master_port,$latest_base_slave->{check_repl_delay} );
这里我们看看这部分的实现细节:
-
my @pref = $self->get_candidate_masters(); //配置为候选主库
-
-
# The following servers can not be master:
-
# - dead servers
-
# - Set no_master in conf files (i.e. DR servers)
-
# - log_bin is disabled
-
# - Major version is not the oldest
-
# - too much replication delay
-
my @bad =
-
$self->get_bad_candidate_masters( $latest[0], $check_replication_delay );
-
-
.....
-
-
$log->info("Searching new master from slaves..");
-
$log->info(" Candidate masters from the configuration file:");
-
$self->print_servers( \@pref );
-
$log->info(" Non-candidate masters:");
-
$self->print_servers( \@bad );
-
-
return $latest[0]
-
if ( $#pref < 0 && $#bad < 0 && $latest[0]->{latest_priority} );
-
-
if ( $latest[0]->{latest_priority} ) {
-
$log->info(
-
" Searching from candidate_master slaves which have received the latest relay log events.."
-
) if ( $#pref >= 0 );
-
foreach my $h (@latest) {
-
foreach my $p (@pref) {
-
if ( $h->{id} eq $p->{id} ) {
-
return $h
-
if ( !$self->get_server_from_by_id( \@bad, $p->{id} ) );
-
}
-
}
-
}
-
$log->info(" Not found.") if ( $#pref >= 0 );
-
}
-
-
-
#new master is not latest
-
$log->info(" Searching from all candidate_master slaves..")
-
if ( $#pref >= 0 );
-
foreach my $s (@slaves) {
-
foreach my $p (@pref) {
-
if ( $s->{id} eq $p->{id} ) {
-
my $a = $self->get_server_from_by_id( \@bad, $p->{id} );
-
return $s unless ($a);
-
}
-
}
-
}
-
$log->info(" Not found.") if ( $#pref >= 0 );
-
-
if ( $latest[0]->{latest_priority} ) {
-
$log->info(
-
" Searching from all slaves which have received the latest relay log events.."
-
);
-
foreach my $h (@latest) {
-
my $a = $self->get_server_from_by_id( \@bad, $h->{id} );
-
return $h unless ($a);
-
}
-
$log->info(" Not found.");
-
}
-
-
# none of latest servers can not be a master
-
$log->info(" Searching from all slaves..");
-
foreach my $s (@slaves) {
-
my $a = $self->get_server_from_by_id( \@bad, $s->{id} );
-
return $s unless ($a);
-
}
-
$log->info(" Not found.");
-
-
return;
-
}
从上述代码,我们可以得出如下选择new master结论:
-
4.1 如果该slave 配置为候选主库,且该slave 是latest slave,并且该slave为非bad slave,则选择该slave 为new master。否则继续选择。
-
4.2 如果该slave配置为候选主库。且该slave为非bad slave,则选择改slave为new master。否则继续选择。
-
4.3 从所有latest slave中,选择第一个非bad 的slave做为 new master。
-
4.4 从所有的slave中,选择第一个非bad的slave作为new master 。
何为bad slave呢 ?
# The following servers can not be master:
# - dead servers
# - Set no_master in conf files (i.e. DR servers)
# - log_bin is disabled
# - Major version is not the oldest
# - too much replication delay
-
5.应用相应的binlog,将该new master恢复到和dead master一致的位置,并保存此时new master的binlog 信息。
该部分要做的事情,是根据上一步获取到的new master,将该new master恢复到和dead master一致的位置,为后续主库重新激活(接受应用写)和构建ms架构做准备.
该部分的实现步骤如下:
-
5.1 将new master和latest slave之间相差的relay log events拷贝到new master 上。
-
5.2 将new master上,SQL Thread 领先IO Thread的relay log events apply掉。
-
5.3 将5.1拷贝过来的relay log events应用到new master,此是new master到达了和latest slave一致的位置。
-
5.4 将latest 和dead master之间相差的binlog events应用到该new master上,使该new master和dead master达到一致的状态。
-
5.5 保存该new master的binlog 信息,用以构建master-slave集群。
关于上面的分析,具体实现如下:
-
# apply diffs to master and get master status
-
# We do not reset slave here
-
sub recover_slave {
-
my ( $target, $logger ) = @_;
-
$logger = $log unless ($logger);
-
-
$logger->info(
-
sprintf(
-
"Starting recovery on %s(%s:%d)..",
-
$target->{hostname}, $target->{ip}, $target->{port}
-
)
-
);
-
-
if ( $target->{latest} eq '0' || $_has_saved_binlog ) {
-
$logger->info(" Generating diffs succeeded.");
-
my ( $high, $low ) = apply_diff( $target, $logger );
-
if ( $high ne '0' || $low ne '0' ) {
-
$logger->error(" Applying diffs failed with return code $high:$low.");
-
return -1;
-
}
-
}
-
else {
-
$logger->info(
-
" This server has all relay logs. Waiting all logs to be applied.. ");
-
my $ret = $target->wait_until_relay_log_applied($logger);
-
if ($ret) {
-
$logger->error(" Failed with return code $ret");
-
return -1;
-
}
-
$logger->info(" done.");
-
$target->stop_sql_thread($logger);
-
}
-
$logger->info(" All relay logs were successfully applied.");
-
return 0;
-
}
由于new master和dead master达到了一致的状态,为了使业务尽快恢复。所以先让new master提供服务。
具体添加vip的实现是通过调用脚本master_ip_failover_script来完成。具体实现如下:
-
$new_master->{master_ip_failover_script} --command=start --ssh_user=$new_master->{ssh_user} --orig_master_host=$dead_master->{hostname} --orig_master_ip=$dead_master->{ip} --orig_master_port=$dead_master->{port} --new_master_host=$new_master->{hostname} --new_master_ip=$new_master->{ip} --new_master_port=$new_master->{port} --new_master_user=$new_master->{escaped_user} --new_master_password=$new_master->{escaped_password}
-
7.并发的将所有的slave恢复到latest slave的位置,并应用所有的binlog server上的binlog,将所有的slave并发的恢复到和dead master一致的位置.
该过程主要是为了构建master-slave架构,通过前面可知new master保存了恢复到dead master时,自身的binlog位置信息。这里我们只要将所有slave也恢复到这个位置,并通过修改slave指向,master-slave架构即完成。
该部分的执行逻辑如下:
-
7.1 将所有alive slaves和latest slave之间相差的relay log events 复制到每个slave的机器上。
-
7.2 将binlog server上产生的binlog events 复制到每个slave机器上。
-
7.3 将每个slave上的io thread和sql thread之间相差的部分relay log events应用掉。
-
7.4 将latest slave和每个slave之间的相差的部分relay log events应用掉,此时所有的slave都成为了latest slave。
-
7.5 每个slave应用从binlog server上拷贝过来的binlog events,所有的slave恢复到dead server的状态。
代码部分由于太长,这里不进行分析,有兴趣可以查看Failover.pm中的recover_slaves函数的实现。
-
8.将所有的slave指向新的new master(根据第5步获取的信息).
-
9.所有slave执行start slave,并且在new master上执行reset slave all.
-
10.切换完成.
整个过程就分析完了,这里只分析了正常Failover的部分,由于分支太多,为避免文章太过复杂,所以没有对异常进行分析。有兴趣的同学可以自行阅读代码。
well done...