nagios配置解释-leigaiting-ChinaUnix博客

欢迎来到Shellmy的Blogshellmy.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

leigaiting

博客访问： 467982
博文数量： 403
博客积分： 0
博客等级：民兵
技术积分： -70
用户组：普通用户
注册时间： 2016-09-05 12:45

文章分类

全部博文（403）

GIS（1）
DOCTOR（0）
安卓开发（0）
AIX（3）
网站制作（3）
WIN相关（2）
Solaris（5）
我的游记（0）
我的论文（2）
VPN（0）
心情日记（32）
Perl（1）
ASP（4）
UML（0）
赛尔相关（2）
qq（3）
其它（7）
IPV6（0）
计算机硬件（4）
JAVA和JSP（14）
卫星通讯项目文档（4）
病毒相关（5）
C语言/C++（29）
协议大全（54）

socks5（8）

TCP（13）

HTTPMU（2）

UDP（3）

RFC（6）

HTTP（14）

PPPOE（2）

eap协议（2）

arp协议（1）

802.1x认证协议（3）
卫星通讯（9）
网络监控Iptable/（72）
数据库（41）
代理认证服务器（27）
Upnp技术（5）
PHP、APACHE（8）
P2P技术（7）
Linux（44）
FreeBSD（4）
DOS（2）
随笔（7）
网易控项目文档（2）
未分配的博文（0）

文章存档

2014年（3）

2013年（1）

2012年（3）

2011年（21）

2010年（13）

2009年（64）

2008年（9）

2007年（36）

2006年（253）

我的朋友

相关博文

nagios配置解释

分类：系统运维

2009-10-09 13:06:11

主机定义配置文件

define host{
     host_name host_name                       ＃简短的主机名称
     alias alias                               ＃别名，可以更详细的说明主机
     address address                           ＃ip地址，当然你如果足够信任你的DNS的话，也可以写名称。如果你不定义这个值，nagios将会用host_name去寻找主机。
     parents host_names                        ＃上一节点的名称，也就是指从nagios服务器到被监控主机之间经过的节点，可以是路由、交换机、主机等等。当然，这个节点也要定义，并且要被nagios监控。
     hostgroups hostgroup_names                ＃主机组名称，简短的
     check_command command_name                ＃检查命令的简短名称，如果此项留空，nagios将不会去判断该主机是否alive。
     max_check_attempts 整数                    ＃当检查命令的返回值不是“OK”时，重试的次数
     check_interval 数字                        ＃循环检查的间隔时间。
     active_checks_enabled [0/1]                ＃是否启用“active_checks”
     passive_checks_enabled [0/1]               ＃是否启用“passive_checks”，及“被动检查”
     check_period timeperiod_name                ＃检测时间段简短名称，注意这个只是个名称，具体的时间段要写在其他的配置文件中哦！
     obsess_over_host [0/1]                      ＃是否启用主机操作系统探测。
     check_freshness [0/1]                       ＃是否启用freshness测试。freshness测试是对于启用被动测试模式的主机而言的，其作用是定期检查该主机报告的状态信息，如果该状态信息已经过期，freshness将会强制作主机检查。
     freshness_threshold 数字                    ＃fressness的临界值，单位为秒。如果定义为0，则为自动定义。
     event_handler command_name                  ＃当主机发生状态改变时，采用的处理命令的简短的名字（可以在commands.cfg中对其定义）
     event_handler_enabled [0/1]                 ＃是否启用event_handler
     low_flap_threshold 数字                      ＃抖动的下限值。这里我简单解释一下抖动的含义，它定义了这样一种现象：在一段时间内，主机（或服务）的状态值频繁的发生变化，类似一个问题风暴或者一个网络问题。
     high_flap_threshold 数字                     ＃抖动的上限值
     flap_detection_enabled [0/1]                 ＃是否启用抖动检测
     process_perf_data [0/1]                      ＃是否启用processing of performance data
     retain_status_information [0/1]              ＃程序重启时，是否保持主机状态相关的信息
     retain_nonstatus_information [0/1]           ＃程序重启时，是否保持主机状态无关的信息
     contact_groups contact_groups                ＃联系人组（这个组会在contactgroup.cfg文件中定义），在此组中的联系人都会受到该主机的告警提醒信息。
     notification_interval 整数                   ＃告警临界值。达到此次数之后，才会发送该机的报警提醒信息。
     notification_period timeperiod_name          ＃该机的告警时间段
     notification_options [d,u,r,f]               ＃该机告警包括的状态变化结果
     notifications_enabled [0/1]                  ＃是否开启提醒功能。"1" 为开启，"0" 为禁用。一般，这个选项会在主配置文件 (nagios.cfg) 中定义，效果相同。
     stalking_options [o,d,u]                     ＃持续状态检测参数，o = 持续的 UP 状态 , d = 持续的 DOWN 状态 , u = 持续的 UNREACHABLE 状态
}

服务监控的配置

define service {
host_name host_name
service_description service_description
servicegroups servicegroup_names
is_volatile [0/1]
check_command command_name
max_check_attempts
normal_check_interval
retry_check_interval
active_checks_enabled [0/1]
passive_checks_enabled [0/1]
check_period timeperiod_name
parallelize_check [0/1]
obsess_over_service [0/1]
check_freshness [0/1]
freshness_threshold
event_handler command_name
event_handler_enabled [0/1]
low_flap_threshold
high_flap_threshold
flap_detection_enabled [0/1]
process_perf_data [0/1]
retain_status_information [0/1]
retain_nonstatus_information [0/1]
notification_interval
notification_period timeperiod_name n
otification_options [w,u,c,r,f]
notifications_enabled [0/1]
contact_groups contact_groups
stalking_options [o,w,u,c]
}

服务监控的配置和主机监控的配置较为相似，就不一一说明了。

间隔时间的计算方法为：
normal_check_interval x interval_length 秒
retry_check_interval x interval_length 秒
notification_interval x interval_length 秒

主机监控配置的例子

define host {
host_name web1
alias web1
address 192.168.0.101
contact_groups admins
check_command check-host-alive
max_check_attempts 5
notification_interval 0
notification_period 24x7
notification_options d,u,r
}

对主机 web1 进行 24x7 的监控，默认会每 10 秒检查一次状态，累计五次失败就发送提醒，并且不再重复发送提醒。

服务监控配置的例子

define service {
host_name web1
service_description check_http
check_period 24x7
max_check_attempts 3
normal_check_interval 30
contact_groups admins
retry_check_interval 15
notification_interval 3600
notification_period 24x7
notification_options w,u,c,r
check_command check_http
}

配置解释： 24x7 监控 web1 主机上的 HTTP 服务，检查间隔为 30 秒，检查失败后每 15 秒再进行一次检查，累计三次失败就认定是故障并发送提醒。
联系人组是 admins 。提醒后恢复到 30 秒一次的 normal_check_interval 检查。如果服务仍然没有被恢复，每个小时发送一次提醒。

如果要检测其他服务，例如，要检查 ssh 服务是否开启，更改如下两行：
service_description check_ssh
check_command check_ssh

为方便管理，对配置文件的分布做了如下修改：
nagios.cfg 中增加了：
cfg_dir=/usr/local/nagios/etc/hosts
cfg_dir=/usr/local/nagios/etc/services

在 hosts 目录中，为不同类型的主机创建了配置文件，如： app.cfg cache.cfg mysql.cfg web.cfg
并创建了 hostgroup.cfg 文件对主机进行分组，如：

define hostgroup {
hostgroup_name app-hosts
alias APP Hosts
members app1,app2
}

在 services 目录中创建了各种服务的配置文件，如： disk.cfg http.cfg load.cfg mysql.cfg
并创建了 servicegroup.cfg 文件对服务进行分组，如：

define servicegroup {
servicegroup_name disk
alias DISK
members cache1,check_disk,cache2,check_disk
}

阅读(1102) | 评论(0) | 转发(0) |

上一篇：ORACLE TNSING报错(TNS-03505)的解决

下一篇：使用Cacti监控你的网络（一）- Cacti概述及工作流程

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6