1.背景
mysql的replication运行过程中有时会因为网络故障等一些原因,主从停止同步。但此时,show slave status显示IO/SQL线程都是正常的。常规的check_mysql监控方式根本发现不了问题。
2. 解决方案
利用replication中主从的日志位置差判断是否同步。
Master端: show master status中的position 显然,这是master端写binlog的位置
Slave端: show slave stuatus中的exec_master_log_pos 是slave端sql线程读取、执行的位置
这样如果IO/SQL线程不正常,或者同步有异常都能检测出来。
3. 实施
自己写了个shell脚本对比两个值监控nagios同步情况。后来,偶然在网上发现一个perl脚本,写的很不错,和大家共享。
#!/usr/bin/perl -w
use strict;
use Getopt::Long;
use DBI;
my $options = {
'master' =>
'server1',
'slave' =>
'server2',
'crit' => 1000000000,
'warn' => 10000000 ,
'dbuser' =>
'repl_test',
'dbpass' =>
'secret',
'mport' =>
'3306',
'sport' =>
'3306'};
GetOptions($options,
"master=s",
"slave=s",
"dbuser=s",
"dbpass=s",
"crit=i",
"warn=i",
"mport=i",
"sport=i",
"help");
my $max_binlog;
if (defined $options->{
'help'}) {
print <
$0: check replication between mysql databases
check_replication.pl [ --master ] [ --slave ]
[ --crit ] [ --warn ] [ --dbuser ]
[ --dbpass ]
--master - MySQL instance running as a master server
--mport - Port on the master server to connect to
--slave - MySQL instance running as a slave server
--sport - Port on the slave server to connect to
--crit - Number of binlog positions for critical state
--warn - Number of binlog positions for warning state
--dbuser - Username with File and Process privs to check status
--dbpass - Password for above user
--help - This help page
The user that is testing must be the same on all instances, eg:
GRANT File, Process on *.* TO repl_test\@192.168.0.% IDENTIFIED BY
Note: Any mysqldump tables (for backups) may lock large tables for a long
time. If you dump from your slave for this, then your master will gallop
away from your slave, and the difference will become large. The trick is to
set crit above this differnce and warn below.
(c) 2004 Fotango. James Bromberger .
FOO
exit;
}
sub get_status {
my $host = shift;
my $role = shift;
my $port = "";
if (lc ($role) eq 'master') {
$port = $options->{'mport'};
}
if (lc ($role) eq 'slave') {
$port = $options->{'sport'};
}
my $dbh = DBI->connect("DBI:mysql:host=$host;port=$port", $options->{'dbuser'},
$options->{'dbpass'});
if (not $dbh) {
print "UNKNOWN: cannot connect to $host";
exit 3;
}
if (lc ($role) eq 'master') {
my $sql1 = "show variables like 'max_binlog_size'";
my $sth1 = $dbh->prepare($sql1);
my $res1 = $sth1->execute;
my $ref1 = $sth1->fetchrow_hashref;
$max_binlog = $ref1->{'Value'};
}
my $sql = sprintf "SHOW %s STATUS", $role;
my $sth = $dbh->prepare($sql);
my $res = $sth->execute;
if (not $res) {
die "No results";
}
my $ref = $sth->fetchrow_hashref;
$sth->finish;
#print "$host:\n";
#print join (', ', map { sprintf " %s: %s", $_, $ref->{$_} } keys %{$ref}) . "\n";
$dbh->disconnect;
return $ref;
}
sub compare_status {
my ($a, $b) = @_;
my ($master, $slave);
if (defined($a->{'File'})) {
$master = $a;
$slave = $b;
} elsif (defined($b->{'File'})) {
$master = $b;
$slave = $a;
}
$master->{'File_No'} = $1 if ($master->{'File'} =~ /(\d+)$/);
$slave->{'File_No'} = $1 if ($slave->{'Master_Log_File'} =~ /(\d+)$/);
my $diff = ($master->{'File_No'} - $slave->{'File_No'}) * $max_binlog;
#printf "Master: %d Slave: %d\n", $master->{'Position'}, $slave->{'Pos'};
$diff+= $master->{'Position'} - $slave->{'Exec_Master_Log_Pos'};
my $state = sprintf "Master: %d/%d Slave: %d/%d Diff: %d/%d\n",
$master->{'File_No'}, $master->{'Position'}, $slave->{'File_No'},
$slave->{'Exec_Master_Log_Pos'}, ($diff/$max_binlog), ($diff % $max_binlog);
if ($diff >= $options->{'crit'}) {
print "CRITICAL: $state";
exit 2;
} elsif ($diff >= $options->{'warn'}) {
print "WARN: $state";
exit 1;
}
print "OK: $state";
exit 0;
}
compare_status(get_status($options->{'slave'}, 'slave'), get_status($options->{'master'}, 'master'));
Have fun!
Refs:
2.
阅读(1110) | 评论(1) | 转发(0) |