nagios的插件监控能
(2007-09-11 14:20)
分类: Nagios
nagios本身并没有监控的功能,所有的监控是由插件完成的,插件将监控的结果返回给nagios,nagios分析这些结果,以web的方式展现给我们,同时提供相应的报警功能(这个报警的功能也是由插件完成的)
Usage: check_disk -w limit -c limit [-p path | -x device] [-t timeout][-m] [-e] [-W limit] [-K limit] [-v] [-q] [-E]
所有的这些插件是一些实现特定功能的可执行程序,默认安装的路径是/usr/local/nagios/libexec,可以查看
例如,我们查看check_disk这个插件的用法则可以使用check_disk –h
[root@xyfxh libexec]# ./check_disk --help
check_disk (nagios-plugins 1.4.9) 1.91
Copyright (c) 1999 Ethan Galstad <nagios@nagios.org>
Copyright (c) 1999-2006 Nagios Plugin Development Team
<nagiosplug-devel@lists.sourceforge.net>
check_disk (nagios-plugins 1.4.9) 1.91
Copyright (c) 1999 Ethan Galstad <nagios@nagios.org>
Copyright (c) 1999-2006 Nagios Plugin Development Team
<nagiosplug-devel@lists.sourceforge.net>
This plugin checks the amount of used disk space on a mounted file system
and generates an alert if free space is less than one of the threshold values
and generates an alert if free space is less than one of the threshold values
Usage: check_disk -w limit -c limit [-p path | -x device] [-t timeout][-m] [-e] [-W limit] [-K limit] [-v] [-q] [-E]
Options:
-h, --help
Print detailed help screen
-V, --version
Print version information
-w, --warning=INTEGER
Exit with WARNING status if less than INTEGER units of disk are free
-w, --warning=PERCENT%
Exit with WARNING status if less than PERCENT of disk space is free
-W, --iwarning=PERCENT%
Exit with WARNING status if less than PERCENT of inode space is free
-K, --icritical=PERCENT%
Exit with CRITICAL status if less than PERCENT of inode space is free
-c, --critical=INTEGER
Exit with CRITICAL status if less than INTEGER units of disk are free
-c, --critical=PERCENT%
Exit with CRITCAL status if less than PERCENT of disk space is free
-C, --clear
Clear thresholds
-u, --units=STRING
Choose bytes, kB, MB, GB, TB (default: MB)
-k, --kilobytes
Same as '--units kB'
-m, --megabytes
Same as '--units MB'
-l, --local
Only check local filesystems
-p, --path=PATH, --partition=PARTITION
Path or partition (may be repeated)
-r, --ereg-path=PATH, --ereg-partition=PARTITION
Regular expression for path or partition (may be repeated)
-R, --eregi-path=PATH, --eregi-partition=PARTITION
Case insensitive regular expression for path/partition (may be repeated)
-g, --group=NAME
Group pathes. Thresholds apply to (free-)space of all partitions together
-x, --exclude_device=PATH <STRING>
Ignore device (only works if -p unspecified)
-X, --exclude-type=TYPE <STRING>
Ignore all filesystems of indicated type (may be repeated)
-M, --mountpoint
Display the mountpoint instead of the partition
-E, --exact-match
For paths or partitions specified with -p, only check for exact paths
-e, --errors-only
Display only devices/mountpoints with errors
-t, --timeout=INTEGER
Seconds before connection times out (default: 10)
-v, --verbose
Show details for command-line debugging (Nagios may truncate output)
-h, --help
Print detailed help screen
-V, --version
Print version information
-w, --warning=INTEGER
Exit with WARNING status if less than INTEGER units of disk are free
-w, --warning=PERCENT%
Exit with WARNING status if less than PERCENT of disk space is free
-W, --iwarning=PERCENT%
Exit with WARNING status if less than PERCENT of inode space is free
-K, --icritical=PERCENT%
Exit with CRITICAL status if less than PERCENT of inode space is free
-c, --critical=INTEGER
Exit with CRITICAL status if less than INTEGER units of disk are free
-c, --critical=PERCENT%
Exit with CRITCAL status if less than PERCENT of disk space is free
-C, --clear
Clear thresholds
-u, --units=STRING
Choose bytes, kB, MB, GB, TB (default: MB)
-k, --kilobytes
Same as '--units kB'
-m, --megabytes
Same as '--units MB'
-l, --local
Only check local filesystems
-p, --path=PATH, --partition=PARTITION
Path or partition (may be repeated)
-r, --ereg-path=PATH, --ereg-partition=PARTITION
Regular expression for path or partition (may be repeated)
-R, --eregi-path=PATH, --eregi-partition=PARTITION
Case insensitive regular expression for path/partition (may be repeated)
-g, --group=NAME
Group pathes. Thresholds apply to (free-)space of all partitions together
-x, --exclude_device=PATH <STRING>
Ignore device (only works if -p unspecified)
-X, --exclude-type=TYPE <STRING>
Ignore all filesystems of indicated type (may be repeated)
-M, --mountpoint
Display the mountpoint instead of the partition
-E, --exact-match
For paths or partitions specified with -p, only check for exact paths
-e, --errors-only
Display only devices/mountpoints with errors
-t, --timeout=INTEGER
Seconds before connection times out (default: 10)
-v, --verbose
Show details for command-line debugging (Nagios may truncate output)
Examples:
check_disk -w 10% -c 5% -p /tmp -p /var -C -w 100000 -c 50000 -p /
Checks /tmp and /var at 10% and 5%, and / at 100MB and 50MB
check_disk -w 10% -c 5% -p /tmp -p /var -C -w 100000 -c 50000 -p /
Checks /tmp and /var at 10% and 5%, and / at 100MB and 50MB
Send email to nagios-users@lists.sourceforge.net if you have questions
regarding use of this software. To submit patches or suggest improvements,
send email to nagiosplug-devel@lists.sourceforge.net
regarding use of this software. To submit patches or suggest improvements,
send email to nagiosplug-devel@lists.sourceforge.net
我现在来独立执行它,例如查看根分区的使用情况,执行
[root@server1 libexec]# ./check_disk -w 10% -c 5% /
命令的含义是检查分区/的使用情况,若剩余10%以下,为警告状态(warning),5%以下为严重状态(critical),
执行后我们会看到下面这条信息
[root@xyfxh libexec]# ./check_disk -w 10% -c 5% /
DISK WARNING - free space: / 505 MB (10% inode=94%);| /=4162MB;4426;4672;0;4918
说明当前是warning的状态,空闲空间只有10%了.如果nagios收到这些状态结果就会采取报警等措施了
commands.cfg中定义的监控命令就是使用的这些插件.举个例子,之前我们已经不止一次用到了check-host-alive这个命令,打开commands.cfg就可以看到这个命令的定义,如下:
################################################################################
#
# SAMPLE HOST CHECK COMMANDS
#
################################################################################
# This command checks to see if a host is "alive" by pinging it
# The check must result in a 100% packet loss or 5 second (5000ms) round trip
# average time to produce a critical error.
# Note: Only one ICMP echo packet is sent (determined by the '-p 1' argument)
# 'check-host-alive' command definition
define command{
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1
}
command_name check-host-alive
这句话的意思是定义的命令名是check-host-alive,也就是我们在services.cfg中使用的名称
执行的操作是
$USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1
其中$USER1$是在resource.cfg文件中定义的,代表插件的安装路径.就如我们上面看到的那样$USER1$=/usr/local/nagios/libexec,至于$HOSTADDRESS$,则默认被定义为监控主机的地址.
简单的说,我们在services.cfg中定义了对dbpi执行check-host-alive命令,实际上就是执行了
/usr/local/nagios/libexec/ check_ping -H dbpi的ip地址 -w 3000.0,80% -c 5000.0,100% -p 1
实际上check-host-alive只是这一长串命令的简称而已,而在services.cfg中都是使用简称的.
在commands.cfg中定义了很多这样的命令简称.基本上我们常用的监控项目都包含了,例如ftp,http,本地的磁盘,负载等等.
我们再看一个命令,check_local_disk定义如下
# 'check_local_disk' command definition
define command{
command_name check_local_disk
command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
}
check_local_disk实际上是执行的check_disk插件.这里的$ARG1$, $ARG2$, $ARG3$是什么意思呢?在之前我们已经提到了这个check_disk这个插件的用法,-w的参数指定磁盘剩了多少是警告状态,-c的参数指定剩多少是严重状态,-p用来指定路径.
在使用check-host-alive的时候,只需要在services.cfg中直接写上这个命令名check-host-alive.后面没任何的参数.而使用check_local_disk则不同,在services.cfg中这要这么写
check_local_disk!10%!5%!/
在命令名后面用!分隔出了3个参数,10%是$ARG1$的值,5%是$ARG2$的值,/ 是$ARG3$的值
services.cfg定义监控项目用某个命令
↓
这个命令必须在commands.cfg中定义
↓
定义这个命令时使用了libexec下的插件
如果命令不带$ARG1$就可以在services.cfg中直接使用,如果带了使用时就带上参数,以!相隔
1).监控nagios-server的ftp
编辑services.cfg 增加下面的内容,基本上就是copy上节我们定义监控主机存活的代码.略做修改.
define service{
host_name nagios-server
要监控的机器,给出机器名,注意必须是hosts.cfg中定义的
service_description check ftp
给这个监控项目起个名字吧,任意起,你自己懂就行
check_command check_ftp
所用的命令,当然必须是commands.cfg中定义了的
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
2).监控dbpi的ssh
define service{
host_name dbpi
意义同上
service_description check-ssh
意义同上
check_command check_tcp!22
ssh所用的tcp的22号端口,我就用commands中定义的check_tcp命令.至于!22的意思不用我说了吧.
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
3).监控yahoon的IIS
define service{
host_name yahoon
service_description check-http
check_command check_http
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
4).监控nagios-sever的根分区的使用情况.
define service{
host_name nagios-server
service_description check disk
check_command check_local_disk!10%!5%!/
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}


