Nagios 里面监控MySQL 监控事务夯住（RUNNING）报警通知-china_Linux

DevOps之路

首页　| 　博文目录　| 　关于我

china_Linux_hy

博客访问： 878679
博文数量： 72
博客积分： 0
博客等级：民兵
技术积分： 1693
用户组：普通用户
注册时间： 2014-08-04 15:53

个人简介

主要从事Linux，云原生架构改造，服务网格，ELK，python，golang等相关技术。

文章分类

全部博文（72）

Linux（19）
shell（8）
Mysql学习（3）
集群（0）
监控（5）
openstack（0）
网络安全（0）
存储（0）
zabbix（1）
redis（7）
docker（2）
Python（7）
Lvs（3）
webserver（9）
Linux网络安装（4）
网络资源（4）
未分配的博文（0）

文章存档

2015年（52）

2014年（20）

我的朋友

相关博文

Nagios 里面监控MySQL 监控事务夯住（RUNNING）报警通知

分类：架构设计与优化

2015-04-07 09:01:37

序言：
业务报警订单提交异常，页面一直没有反应，排查后是事务没有提交或者回滚导致，想到如果及时监控事务的运行状态报警出来，那么就可以及时排查出问题所在，方便运营处理，所以自己就弄了一个shell脚本放在nagios来处理事务报警情况。

1,编写事务监控脚本
#!/bin/bash
# author: tim.man
# version: 1.0
# desc: check the RUNNING TRANSACTION over

ST_OK=0
ST_WR=1
ST_CR=2
ST_UK=3

TIME_TRX=10

# 提示信息
print_help() {
echo "$PROGNAME -w INT -c INT"
echo "Options:"
echo " -w/--warning)"
echo " Sets a warning number"
echo " -c/--critical)"
echo " Sets a critical level for io"
exit $ST_UK
}

while test -n "$1"; do
case "$1" in
-help|-h)
print_help
exit $ST_UK
;;
--warning|-w)
warning=$2
shift
;;
--critical|-c)
critical=$2
shift
;;
*)
echo "Unknown argument: $1"
print_help
exit $ST_UK
;;
esac
shift
done

get_wcdiff() {
if [ ! -z "$warning" -a ! -z "$critical" ]
then
wclvls=1

if [ ${warning} -gt ${critical} ]
then
wcdiff=1
fi
elif [ ! -z "$warning" -a -z "$critical" ]
then
wcdiff=2
elif [ -z "$warning" -a ! -z "$critical" ]
then
wcdiff=3
fi
}

# 脚本判断
val_wcdiff() {
if [ "$wcdiff" = 1 ]
then
echo "Please adjust your warning/critical thresholds. The warning must be lower than the critical level!"
exit $ST_UK
elif [ "$wcdiff" = 2 ]
then
echo "Please also set a critical value when you want to use warning/critical thresholds!"
exit $ST_UK
elif [ "$wcdiff" = 3 ]
then
echo "Please also set a warning value when you want to use warning/critical thresholds!"
exit $ST_UK
fi
}

get_wcdiff
val_wcdiff

# 统计mysql的事务中最大运行时间
max_over_time=`/usr/local/mysql/bin/mysql --user=nagios --password="nagiosq@xxx" -NS /usr/local/mysql/mysql.sock -e "SELECT TIME_TO_SEC(TIMEDIFF(NOW(),t.trx_started)) FROM information_schem
a.INNODB_TRX t WHERE TIME_TO_SEC(TIMEDIFF(NOW(),t.trx_started))>$TIME_TRX ORDER BY TIME_TO_SEC(TIMEDIFF(NOW(),t.trx_started)) DESC LIMIT 1;" |awk '{print $1}'`

# 如果当前没有RUNNING的事务，则直接赋值为0，以免下面if判断出错
if [ ! -n "$max_over_time" ];then max_over_time=0
fi

# 取得当前所以阻塞的事务数量
num_trx=`/usr/local/mysql/bin/mysql --user=nagios --password="nagiosq@xxx" -NS /usr/local/mysql/mysql.sock -e "SELECT COUNT(1) FROM information_schema.INNODB_TRX t WHERE TIME_TO_SEC(TIMEDIF
F(NOW(),t.trx_started))>$TIME_TRX;" |awk '{print $1}'`

if [ -n "$warning" -a -n "$critical" ]
then
if [ `expr $max_over_time \> $warning` -eq 1 -a `expr $max_over_time \< $critical` -eq 1 ]
then
echo "WARNING - $num_trx TRANSACTIONS RUNNING,go over for $max_over_time seconds"
exit $ST_WR
elif [ `expr $max_over_time \> $critical` -eq 1 ]
then
echo "CRITICAL- $num_trx TRANSACTIONS RUNNNING,go over for $max_over_time seconds"
exit $ST_CR
else
echo "OK- TRANSACTIONS RAN successfully."
exit $ST_OK
fi
fi

2，在nagios客户端添加脚本监控
先测试下脚本
[root@wgq_idc_dbm_3_61 binlog]# /usr/local/nagios/libexec/check_trx -w 30 -c 60
Warning: Using a password on the command line interface can be insecure.
Warning: Using a password on the command line interface can be insecure.
OK- TRANSACTIONS RAN successfully.
[root@wgq_idc_dbm_3_61 binlog]#
在nrpe.cfg里面添加监控命令
[root@wgq_idc_dbm_3_61 binlog]# vim /usr/local/nagios/etc/nrpe.cfg
command[check_mysql_trx]=/usr/local/nagios/libexec/check_trx -w 30 -c 60

之后重启nagios客户端监控, service nrpe restart

4,在nagios主监控服务器上面添加配置选项
先去nagios服务器上面check一下
[root@localhost etc]# /usr/local/nagios/libexec/check_nrpe -H10.254.3.61 -c check_mysql_trx
OK- TRANSACTIONS RAN successfully.
[root@localhost etc]#

在services.cfg里面添加事务监控选项：
define service{
host_name mysqlserver
service_description Check mysql transctions
check_command check_nrpe!check_mysql_trx
max_check_attempts 5
check_command check_nrpe!check_mysql_trx
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups opsweb
}

在commands.cnf里面添加事务监控命令：
# add by tim.man on 20141201
define command{
command_name check_mysql_trx
command_line $USER1$/check_mysql_trx -w $ARG1$ -c $ARG2$
}

邮件短信报警电话报警已经添加，所以无需重新配置。

然后重新加载nagios
[root@localhost objects]# service nagios reload
Running configuration check...
Reloading nagios configuration...
done
[root@localhost objects]#

5，去nagios主监控界面查看监控效果
正常监控效果：

严重监控效果：

----------------------------------------------------------------------------------------------------------------
原博客地址： http://blog.itpub.net/26230597/viewspace-1355720/
原作者：黄杉 (mchdba)
----------------------------------------------------------------------------------------------------------------

阅读(5518) | 评论(0) | 转发(0) |

上一篇：nagios监控加报警的搭建

下一篇：nagios监控三部曲之——为什么nagios不能发送报警邮件

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6