分类: LINUX
2010-01-15 09:52:21
CentOS 5.2+Nagios安装及配置手册
(部分内容参考于互联网)
Version 1.0
Ant
写在内容前面
主要列出了CentOS环境中Nagios需要的环境以及搭建过程,与相关配置文件说明。
后面增加了Centreon与nagios的整合。
由于内容不多,而且也不建议跳跃式阅读所以本手册省略目录。^&^
预计将来还会增加此环境下添加主机、网络设备、apache、tomcat、mysql等的详细配置说明。其实已经增加了页面管理,不闲麻烦的话自己看看就OK。^!^
手册中一些问题以红色字体提示,如再遇到其他新的状况请参见本文最后的锦囊妙计。
Ant已经尽力去写,如果有其他不正确或是不清楚的问题还请海涵。
Ant
2010-1-8
1.安装nagios
一、 什么是Nagios
Nagios是一款用于系统和网络监控的应用程序。它可以在你设定的条件下对主机和服务进行监控,在状态变差和变好的时候给出告警信息。
Nagios最初被设计为在Linux系统之上运行,然而它同样可以在类Unix的系统之上运行。
Nagios更进一步的特征包括:
1. 监控网络服务(SMTP、POP3、HTTP、NNTP、PING等);
2. 监控主机资源(处理器负荷、磁盘利用率等);
3. 简单地插件设计使得用户可以方便地扩展自己服务的检测方法;
4. 并行服务检查机制;
5. 具备定义网络分层结构的能力,用"parent"主机定义来表达网络主机间的关系,这种关系可被用来发现和明晰主机宕机或不可达状态;
6. 当服务或主机问题产生与解决时将告警发送给联系人(通过EMail、短信、用户定义方式);
7. 具备定义事件句柄功能,它可以在主机或服务的事件发生时获取更多问题定位;
8. 自动的日志回滚;
9. 可以支持并实现对主机的冗余监控;
10. 可选的WEB界面用于查看当前的网络状态、通知和故障历史、日志文件等
二、安装前的准备
A、CentOS的安装过程省略。
B、接着执行如下命令对系统进行更新:
# yum update
C、系统更新完成电脑重启后,安装下列软件包:
安装 Apache2
# yum install httpd httpd-manual httpd-devel
安装 GD modules
# yum install gd gd-devel perl-GD
安装 MySQL
# yum install mysql-server mysql-devel
安装 PHP
# yum install php php-mysql php-gd php-pear
安装 DBI modules
# yum install perl-DBI
安装 SNMP
# yum install perl-Digest-SHA1 perl-Digest-HMAC net-snmp-utils perl-Socket6 perl-IO-Socket-INET6 net-snmp-devel php-snmp dmidecode net-snmp-perl perl-Crypt-DES
安装 RRDtool
先yum安装所要的库文件:
[root@hmg2 yum.repos.d]#yum install cairo-devel libxml2-devel pango-devel pango libpng-devel freetype freetype-devel libart_lgpl-devel
然后下载rrdtool包,安装:
# wget http://oss.oetiker.ch/rrdtool/pub/rrdtool-1.3.1.tar.gz
# tar -zxvf rrdtool-1.3.1.tar.gz
# ./configure –prefix=/usr/local/rrdtool && make && make install
运行rrdtool –v测试安装成功没有:
# rrdtool -v //可以看到RRDTool的用法以及版本信息
安装其他软件包
# yum install fping perl-Config-IniFiles graphviz gcc-c++ glib2-devel
D、设置Apache、MySQL和SNMP在系统启动时自启动
# chkconfig --level 345 httpd on
# chkconfig --level 345 mysqld on
# chkconfig --level 345 snmpd on
E、配置snmp
修改/etc/snmp/snmpd.conf文件,修改以下几行为红字所示,其中的x.x.x.x换成运行nagios的服务器的IP地址:
com2sec notConfigUser x.x.x.x public
access notConfigGroup "" any noauth exact all none none
view all included .1 80
F、配置php-PEAR
# pear channel-update pear.php.net
当出现一下几行,即表示成功:
Retrieving channel.xml from remote server
Update of Channel "pear.php.net" succeeded
三、安装nagios软件包
A、安装 Nagios
# yum install nagios nagios-devel
此处所安装的nagios是3.0.6版本
B、安装Nagios-plugins
# yum install nagios-plugins
C、配置Nagios登录用户
# htpasswd -c /etc/nagios/htpasswd.users nagiosadmin
New password: nagiosadmin
Re-type new password: nagiosadmin
Adding password for user nagiosadmin
用户名和密码可以按自己的习惯自定义。
D、登录web界面,检查Nagios是否成功运行
# service httpd start
# service nagios start
访问http://x.x.x.x//nagios(x.x.x.x为服务器ip地址)
出现以下画面,表示nagios正常运行
四、至些CentOS+Nagios安装完毕。
配置文件注解
A、主配置文件
位置:/etc/nagios/nagios.cfg
第一部分:日志文件
# LOG FILE
# This is the main log file where service and host events are logged
# for historical purposes. This should be the first option specified
# in the config file!!!
日志文件,用来记录主机活动事件的主要日志文件文件,这应该是配置文件中位于最首的。(这条应该比较重要吧,原文中可是连加了3个感叹号来的= =)
log_file=/var/log/nagios/nagios.log
设定Nagios的主日志文件路径。
第二部分:对象配置文件模块
# OBJECT CONFIGURATION FILE(S)
# This is the configuration file in which you define hosts, host
# groups, contacts, contact groups, services, etc. I guess it would
# be better called an object definition file, but for historical
# reasons it isn't. You can split object definitions into several
# different config files by using multiple cfg_file statements here.
# Nagios will read and process all the config files you define.
# This can be very useful if you want to keep command definitions
# separate from host and contact definitions...
对象配置文件。这些配置文件将会分别用来定义主机、主机组、联系人、联系人组、服务、等等。这些更应当被理解为“对象定义”文件较为贴切。可以通过将一些对象通过多个不同独立的cfg_file语句指明的配置文件分开定义。Nagios将会读取并且处理所有这些配置文件中记载的定义。对于需要将命令定义和其他类似于主机、联系人等定义分开处理记录的话,将会非常有用。
# Command definitions
cfg_file=/etc/nagios/commands.cfg
设定命令定义配置文件路径。
# Host and service definitions for monitoring this machine
cfg_file=/etc/nagios/localhost.cfg
设定主机和服务监视定义配置文件路径。
# You can split other types of object definitions across several
# config files if you wish (as done here), or keep them all in a
# single config file.
以下则是其它类型细分定义的配置文件,也可以将它们全部写在一张配置文件当中。这些默认是加注的,如果要启用的话需要手动解注。而且这些文件默认也是没有的,如果要启用还必须手动建立它们。
#cfg_file=/etc/nagios/contactgroups.cfg
#cfg_file=/etc/nagios/contacts.cfg
#cfg_file=/etc/nagios/dependencies.cfg
#cfg_file=/etc/nagios/escalations.cfg
#cfg_file=/etc/nagios/hostgroups.cfg
#cfg_file=/etc/nagios/hosts.cfg
#cfg_file=/etc/nagios/services.cfg
#cfg_file=/etc/nagios/timeperiods.cfg
#cfg_dir=/etc/nagios/servers
#cfg_dir=/etc/nagios/printers
#cfg_dir=/etc/nagios/switches
#cfg_dir=/etc/nagios/routers
# Extended host/service info definitions are now stored along with
# other object definitions:
以下是一些扩展的主机/服务信息定义配置文件。
#cfg_file=/etc/nagios/hostextinfo.cfg
#cfg_file=/etc/nagios/serviceextinfo.cfg
第三部分:对象缓存文件
# OBJECT CACHE FILE
# This option determines where object definitions are cached when
# Nagios starts/restarts. The CGIs read object definitions from
# this cache file (rather than looking at the object config files
# directly) in order to prevent inconsistencies that can occur
# when the config files are modified after Nagios starts.
对象缓存文件。这些选项将决定当Nagios启动时或重新启动时,对象定义将被缓存在什么地方。CGI将从这个对象文件中读取对象的定义,而不是在之前的对象配置文件路径中去找。这样做是为了避免修改Nagios配置文件后引起的不一致问题。换句简单的话说就是更改配置文件后要重新运新Nagios新的配置参数才会生效,而当前运行的Nagios只参考缓存中的配置参数而已。
object_cache_file=/var/log/nagios/objects.cache
设定对象缓存文件的路径。
第四部分:状态文件
# STATUS FILE
# This is where the current status of all monitored services and
# hosts is stored. Its contents are read and processed by the CGIs.
# The contents of the status file are deleted every time Nagios
# restarts.
状态文件。这个文件将保存着目前检测到的服务和主机数据信息。这个文件当中的内容是被CGI读取并处理的,而它也是在每次Nagios重新启动的时候被删除清空的。
status_file=/var/log/nagios/status.dat
设定状态文件的路径。
第五部分:Nagios服务宿主用户
# NAGIOS USER
# This determines the effective user that Nagios should run as.
# You can either supply a username or a UID.
支撑Nagios服务运行的用户。一般默认安装后没有这个用户,如果没有的话,手动建立一个。
nagios_user=nagios
默认情况下宿主用户就是nagios。
第六部分:Nagios服务宿主用户组
# NAGIOS GROUP
# This determines the effective group that Nagios should run as.
# You can either supply a group name or a GID.
支撑Nagios服务运行的用户组。一般安装后也没,如果没有的话,和用户一起手动建立。
nagios_group=nagios
默认情况下宿主用户组就是nagios。
第七部分:外部命令选项
# EXTERNAL COMMAND OPTION
# This option allows you to specify whether or not Nagios should check
# for external commands (in the command file defined below). By default
# Nagios will *not* check for external commands, just to be on the
# cautious side. If you want to be able to use the CGI command interface
# you will have to enable this. Setting this value to 0 disables command
# checking (the default), other values enable it.
外部命令选项。这个选项允许用户指定是否Nagios应该对外部的命令进行检查。默认下,Nagios将不会检查外部命令,如果需要能够使用CGI命令接口的话,那么就必须要启用这个选项。设定0值表示关闭命令检查(默认),其它值表示启用。
#check_external_commands=0
check_external_commands=1
设定是否检测外部命令。默认是不启用的,但是由于需要配合Apache工作,在Web界面下进行Nagios的控制和管理的话,必须要将此项打开。
第八部分:外部命令检测时间间隔
# EXTERNAL COMMAND CHECK INTERVAL
# This is the interval at which Nagios should check for external commands.
# This value works of the interval_length you specify later. If you leave
# that at its default value of 60 (seconds), a value of 1 here will cause
# Nagios to check for external commands every minute. If you specify a
# number followed by an "s" (i.e. 15s), this will be interpreted to mean
# actual seconds rather than a multiple of the interval_length variable.
# Note: In addition to reading the external command file at regularly
# scheduled intervals, Nagios will also check for external commands after
# event handlers are executed.
# NOTE: Setting this value to -1 causes Nagios to check the external
# command file as often as possible.
这个配置选项是用来指定外部命令检测的频率。默认值-1表示Nagios将会尽可能频繁地进行检测。这里的配置的数值如果不加上单位的话,默认单位表示分钟,如1就是表示1分钟,Nagios每分钟检测一次。如果要指定单位是秒的话,那么需要在数值后面跟上s,如15s就表示15秒。
command_check_interval=15s
#command_check_interval=-1
设定外部命令检测时间间隔。默认值是-1,Nagios会尽可能频繁地进行检测,这样会造成系统资源换的巨大消耗。这里还是建议将此条注释,而将另外一条备选的15s的项解注,让Nagios每15秒进行一次检测好了。具体值请根据具体情况决定。
第九部分:外部命令文件
# EXTERNAL COMMAND FILE
# This is the file that Nagios checks for external command requests.
# It is also where the command CGI will write commands that are submitted
# by users, so it must be writeable by the user that the web server
# is running as (usually 'nobody'). Permissions should be set at the
# directory level instead of on the file, as the file is deleted every
# time its contents are processed.
这是Nagios用来检查外部命令请求的文件。这个文件同样也是用户操作提交与CGI命令写入的地方,所以这个文件必须对于相关服务可写,一般是针对Apache的宿主用户可写。并且注意,这个文件所在的目录的权限必须被Apache可写,而不单指这文件,因为这个文件在工作当中是频繁被删除的。(之前我们将Apahe归入Nagios组的时候就已经完成了)
command_file=/var/log/nagios/rw/nagios.cmd
设定外部命令文件的路径。
第十部分:外部命令缓冲插槽
# EXTERNAL COMMAND BUFFER SLOTS
# This settings is used to tweak the number of items or "slots" that
# the Nagios daemon should allocate to the buffer that holds incoming
# external commands before they are processed. As external commands
# are processed by the daemon, they are removed from the buffer.
这里将设定Nagios守护进程对接收进来的外部命令进行处理前的缓冲区大小。一旦一个外部命令被Nagios守护进程处理完了之后就会被从缓冲区删除掉。
external_command_buffer_slots=4096
设置外部命令缓存区大小。
第十一部分:注解文件
# COMMENT FILE
# This is the file that Nagios will use for storing host and service
# comments.
这是Nagios用来保存主机和服务的注解信息的文件。
comment_file=/var/log/nagios/comments.dat
设定注解文件的路径。
第十二部分:故障时间文件
# DOWNTIME FILE
# This is the file that Nagios will use for storing host and service
# downtime data.
这是Nagios用来记录主机和服务故障停机时间数据的文件。
downtime_file=/var/log/nagios/downtime.dat
设定故障时间文件。
第十三部分:锁定文件
# LOCK FILE
# This is the lockfile that Nagios will use to store its PID number
# in when it is running in daemon mode.
这是Nagios在守护进程模式运行时用来保存它的PID号的文件。
lock_file=/var/run/nagios.pid
设定Nagios的PID文件。
第十四部分:临时文件
# TEMP FILE
# This is a temporary file that is used as scratch space when Nagios
# updates the status log, cleans the comment file, etc. This file
# is created, used, and deleted throughout the time that Nagios is
# running.
这是当Nagios更新状态日至、清除注解文件等等工作时暂时保留文件内容的临时文件。它将在Nagios运行时不停地被建立、使用和删除。
temp_file=/var/log/nagios/nagios.tmp
设定临时文件的路径。
第十五部分:事件代理选项
# EVENT BROKER OPTIONS
# Controls what (if any) data gets sent to the event broker.
# Values: 0 = Broker nothing
# -1 = Broker everything
#
控制事件代理将受理什么样的数据。设定0值时表示不代理任何东西,设定-1值时表示代理任何信息,设定其它值的话要参考专门的手册。
event_broker_options=-1
设定事件代理。默认值就是-1。
第十六部分:事件代理模块
# EVENT BROKER MODULE(S)
# This directive is used to specify an event broker module that should
# by loaded by Nagios at startup. Use multiple directives if you want
# to load more than one module. Arguments that should be passed to
# the module at startup are seperated from the module path by a space.
#
# Example:
#
# broker_module=
指定Nagios事件采集代理的模块库的路径,可以指定多个,后面可以跟上参数。
#broker_module=/somewhere/module1.o
#broker_module=/somewhere/module2.o arg1 arg2=3 debug=0
默认这里是空的。如果有强人找到定制模块或者自己开发的话就写在这边吧= =||
第十七部分:日志循环方式
# LOG ROTATION METHOD
# This is the log rotation method that Nagios should use to rotate
# the main log file. Values are as follows..
# n = None - don't rotate the log
# h = Hourly rotation (top of the hour)
# d = Daily rotation (midnight every day)
# w = Weekly rotation (midnight on Saturday evening)
# m = Monthly rotation (midnight last day of month)
指定Nagios用在写主日志记录时的循环记录方式。
值n表示none,不做循环。
值h表示hourly,每小时循环一次。(每小时的开始)
值d表示daily,每天循环一次。(每天的午夜)
值w表示weekly,每周循环一次。(每周六的晚上)
值m表示monthly,每月循环一次。(每上个月的最后一天的午夜)
log_rotation_method=d
设定主日志循环方式,默认是每天循环
第十八部分:日志归档路径
# LOG ARCHIVE PATH
# This is the directory where archived (rotated) log files should be
# placed (assuming you've chosen to do log rotation).
这个路径将作为日志归档存放的地方(需要之前日志循环设定启用)
log_archive_path=/var/log/nagios/archives
设定日志归档路径。
第十九部分:加入系统日志选项
# LOGGING OPTIONS
# If you want messages logged to the syslog facility, as well as the
# NetAlarm log file set this option to 1. If not, set it to 0.
选择是否将Nagios的运行信息加入系统的Syslog当中。值1表示加入,值0表示不加入。
use_syslog=1
默认设定Nagio信息加入系统日志。
第二十部分:通知消息记录选项
# NOTIFICATION LOGGING OPTION
# If you don't want notifications to be logged, set this value to 0.
# If notifications should be logged, set the value to 1.
如果不想记录通知消息的话,就设定该项为0值,相对的值1表示记录。
log_notifications=1
默认设定Nagios的通知信息是记录的。
第二十一部分:服务重启信息记录选项
# SERVICE RETRY LOGGING OPTION
# If you don't want service check retries to be logged, set this value
# to 0. If retries should be logged, set the value to 1.
如果要记录服务重启信息,设定值1。不记录设定值0。
log_service_retries=1
默认设定记录服务重启信息。
第二十二部分:主机重启信息记录选项
# HOST RETRY LOGGING OPTION
# If you don't want host check retries to be logged, set this value to
# 0. If retries should be logged, set the value to 1.
如果要记录主机重启信息,设定值1。不记录设定值0。
log_host_retries=1
默认设定记录主机重启信息。
第二十三部分:事件处理程序信息记录选项
# EVENT HANDLER LOGGING OPTION
# If you don't want host and service event handlers to be logged, set
# this value to 0. If event handlers should be logged, set the value
# to 1.
如果要记录事件处理信息,设定值1。不记录设定值0。
log_event_handlers=1
默认启用记录事件处理程序信息。
第二十四部分:初始状态信息记录选项
# INITIAL STATES LOGGING OPTION
# If you want Nagios to log all initial host and service states to
# the main log file (the first time the service or host is checked)
# you can enable this option by setting this value to 1. If you
# are not using an external application that does long term state
# statistics reporting, you do not need to enable this option. In
# this case, set the value to 0.
如果想要Nagios记录所有初始化的主机和服务状态到主日志文件的话,那么请将这个配置项的值设定为1。不记录设定值0
log_initial_states=0
默认不记录初始化状态信息。
第二十五部分:外部命令信息记录选项。
# EXTERNAL COMMANDS LOGGING OPTION
# If you don't want Nagios to log external commands, set this value
# to 0. If external commands should be logged, set this value to 1.
# Note: This option does not include logging of passive service
# checks - see the option below for controlling whether or not
# passive checks are logged.
如果想要Nagios记录外部命令信息的话,那么请将这个配置项的值设定为1,不记录设定值0。注意,这个选项将不包括被动服务的检查。
log_external_commands=1
默认设定记录外部命令信息。
第二十六部分:被动检查信息记录
# PASSIVE CHECKS LOGGING OPTION
# If you don't want Nagios to log passive host and service checks, set
# this value to 0. If passive checks should be logged, set
# this value to 1.
如果想要Nagios记录被动主机和服务的检查信息的话,请设置为1,不记录设定为0。
log_passive_checks=1
默认设定记录被动检查信息。
第二十七部分:全局主机和服务的事件处理程序
# GLOBAL HOST AND SERVICE EVENT HANDLERS
# These options allow you to specify a host and service event handler
# command that is to be run for every host or service state change.
# The global event handler is executed immediately prior to the event
# handler that you have optionally specified in each host or
# service definition. The command argument is the short name of a
# command definition that you define in your host configuration file.
# Read the HTML docs for more information.
这些选项允许指定针对所有主机或服务状态变化运行的一个主机和服务事件处理程序命令。这个全局事件处理程序将比每一个主机上和服务上的定义要来的优先。命令的参数是在主机配置文件中,命令定义的简短名。
#global_host_event_handler=somecommand
#global_service_event_handler=somecommand
默认这里为空。
第二十八部分:服务间内部检查之间延时的方式
# SERVICE INTER-CHECK DELAY METHOD
# This is the method that Nagios should use when initially
# "spreading out" service checks when it starts monitoring. The
# default is to use smart delay calculation, which will try to
# space all service checks out evenly to minimize CPU load.
# Using the dumb setting will cause all checks to be scheduled
# at the same time (with no delay between them)! This is not a
# good thing for production, but is useful when testing the
# parallelization functionality.
# n = None - don't use any delay between checks
# d = Use a "dumb" delay of 1 second between checks
# s = Use "smart" inter-check delay calculation
# x.xx = Use an inter-check delay of x.xx seconds
这是Nagios启动初始化、对服务展开监控的时候使用的方式。默认情况下使用smart延时计算方式,它将尽力分隔开所有服务的检查,平衡地减少CPU届时的负载。如使用dump方式会造成所有检查作业在初始化时并发(这种方式下它们之间将没有时间延迟间隙)。
值n表示none,不做任何任何延迟。
值d表示dump,表示在两个相邻的检查之间做1秒钟的延迟。
值s表示smart,表示默认精简方式安排延迟。
值x.xx表示手动定制每相邻的检查之间固定的x.xx秒延迟。
service_inter_check_delay_method=s
默认设定服务间检查间隔采用smart算法。
第二十九部分:最大服务检查扩张
# MAXIMUM SERVICE CHECK SPREAD
# This variable determines the timeframe (in minutes) from the
# program start time that an initial check of all services should
# be completed. Default is 30 minutes.
这个变量值决定了从程序开始时间起,对所有服务进行一次初始化检查完成的时间框架。默认是30分钟。
max_service_check_spread=30
默认最大服务检查扩张是30分钟。
第三十部分:服务交错检查因数
# SERVICE CHECK INTERLEAVE FACTOR
# This variable determines how service checks are interleaved.
# Interleaving the service checks allows for a more even
# distribution of service checks and reduced load on remote
# hosts. Setting this value to 1 is equivalent to how versions
# of Nagios previous to 0.0.5 did service checks. Set this
# value to s (smart) for automatic calculation of the interleave
# factor unless you have a specific reason to change it.
# s = Use "smart" interleave factor calculation
# x = Use an interleave factor of x, where x is a
# number greater than or equal to 1.
这个变量决定了服务检查之间是如何被交错检查的。交错的服务检查将更加离散化,并且减少远程主机的负载。设定值分为s,或者任何比1大的数字。
service_interleave_factor=s
默认下交错因数为s,即smart精简算法。
第三十一部分:主机间内部检查之间延时的方式
# HOST INTER-CHECK DELAY METHOD
# This is the method that Nagios should use when initially
# "spreading out" host checks when it starts monitoring. The
# default is to use smart delay calculation, which will try to
# space all host checks out evenly to minimize CPU load.
# Using the dumb setting will cause all checks to be scheduled
# at the same time (with no delay between them)!
# n = None - don't use any delay between checks
# d = Use a "dumb" delay of 1 second between checks
# s = Use "smart" inter-check delay calculation
# x.xx = Use an inter-check delay of x.xx second
这是Nagios启动初始化、对主机展开监控的时候使用的方式。默认情况下使用smart延时计算方式,它将尽力分隔开所有主机的检查,平衡地减少CPU届时的负载。如使用dump方式会造成所有检查作业在初始化时并发(这种方式下它们之间将没有时间延迟间隙)。
值n表示none,不做任何任何延迟。
值d表示dump,表示在两个相邻的检查之间做1秒钟的延迟。
值s表示smart,表示默认精简方式安排延迟。
值x.xx表示手动定制每相邻的检查之间固定的x.xx秒延迟
host_inter_check_delay_method=s
默认设定主机间检查间隔是采用smart算法。
第三十二部分:最大主机检查扩张
# MAXIMUM HOST CHECK SPREAD
# This variable determines the timeframe (in minutes) from the
# program start time that an initial check of all hosts should
# be completed. Default is 30 minutes.
这个变量值决定了从程序开始时间起,对所有主机进行一次初始化检查完成的时间框架。默认是30分钟。
max_host_check_spread=30
默认最大主机检查扩张是30分钟。
第三十三部分:最大并发服务检查
# MAXIMUM CONCURRENT SERVICE CHECKS
# This option allows you to specify the maximum number of
# service checks that can be run in parallel at any given time.
# Specifying a value of 1 for this variable essentially prevents
# any service checks from being parallelized. A value of 0
# will not restrict the number of concurrent checks that are
# being executed.
这个选项将影响Nagios任一时刻间对服务检查的并发数量。设定值1的话会防止任何并发的服务检查。如果设定值0的话,每一时刻的服务检查并发数量将不严格按照当时实际执行的服务并发数量。
max_concurrent_checks=0
默认下这个参数设定是0。
第三十四部分:服务检查结果处理频率
# SERVICE CHECK REAPER FREQUENCY
# This is the frequency (in seconds!) that Nagios will process
# the results of services that have been checked.
这个值是Nagios将服务检查结果进行处理的频率。
service_reaper_frequency=10
默认这个频率值设定为10。
第三十五部分:检查结果缓冲区
# CHECK RESULT BUFFER SLOTS
# This settings is used to tweak the number of items or "slots" that
# the Nagios daemon should allocate to the buffer that holds
# service check results before they are processed. As check results
# are processed by the daemon, they are removed from the buffer.
这个缓冲区是Nagios守护进程对服务检查结果未处理前进行保留缓冲的大小。一旦这些结果被处理完毕之后就被立即清除出缓冲区。
check_result_buffer_slots=4096
默认这个检查结果的缓冲区是4MB。
第三十六部分:自动重调度选项
# AUTO-RESCHEDULING OPTION
# This option determines whether or not Nagios will attempt to
# automatically reschedule active host and service checks to
# "smooth" them out over time. This can help balance the load on
# the monitoring server.
# WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE
# PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY
这个选项决定了Nagios是否将尝试自动去调度对活动主机和服务的检查,并使这些工作更平稳进行。这将对监视服务器的负载起到均衡帮助。警告:这个值的设定请慎重通过实验性能的测试,否则效果适得其反。
auto_reschedule_checks=0
默认下自动调度参数为0,关闭。
第三十七部分:自动重调度间隔
# AUTO-RESCHEDULING INTERVAL
# This option determines how often (in seconds) Nagios will
# attempt to automatically reschedule checks. This option only
# has an effect if the auto_reschedule_checks option is enabled.
# Default is 30 seconds.
# WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE
# PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY
这个选项决定了Nagios将多频繁地尝试自动重新调度检查工作,单位为秒。这个选项只有在AUTO-RESCHEULING OPTION选项启用时才有效。默认情况下是30秒。警告:这个值的设定请慎重通过实验性能的测试,否则效果适得其反。
auto_rescheduling_interval=30
默认下自动重调度间隔时间为30秒。
第三十八部分:自动重调度窗口大小
# AUTO-RESCHEDULING WINDOW
# This option determines the "window" of time (in seconds) that
# Nagios will look at when automatically rescheduling checks.
# Only host and service checks that occur in the next X seconds
# (determined by this variable) will be rescheduled. This option
# only has an effect if the auto_reschedule_checks option is
# enabled. Default is 180 seconds (3 minutes).
# WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE
# PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY
这个选项决定了Nagios一次尝试自动重新调度检查工作窗口的大小,单位为秒。只有在下一个X秒中内的主机和服务的检查会被重新调度(大小就是由这个值来决定)。同样,这个选项只有在AUTO-RESCHEULING OPTION选项启用时才有效。默认情况下是180秒。警告:这个值的设定请慎重通过实验性能的测试,否则效果适得其反。
auto_rescheduling_window=180
默认下自动重新调度窗口大小为180秒。
第三十九部分:休眠时间
# SLEEP TIME
# This is the number of seconds to sleep between checking for system
# events and service checks that need to be run.
这个以秒为单位的数值是在检查系统事件和服务检测之间的时间差值。
sleep_time=0.25
默认休眠时间是0.25秒。
第四十部分:超时数值
# TIMEOUT VALUES
# These options control how much time Nagios will allow various
# types of commands to execute before killing them off. Options
# are available for controlling maximum time allotted for
# service checks, host checks, event handlers, notifications, the
# ocsp command, and performance data commands. All values are in
# seconds.
这些选项控制着Nagios将能够多长时间允许那些不同类型命令在它们被终止以前保持执行。
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
第四十一部分:保持状态信息
# RETAIN STATE INFORMATION
# This setting determines whether or not Nagios will save state
# information for services and hosts before it shuts down. Upon
# startup Nagios will reload all saved service and host state
# information before starting to monitor. This is useful for
# maintaining long-term data on state statistics, etc, but will
# slow Nagios down a bit when it (re)starts. Since its only
# a one-time penalty, I think its well worth the additional
# startup delay.
这个设定决定了Nagios是否在Nagios自己关闭前为服务和主机保存状态信息。紧接在Nagios启动之后,Nagios将在开始进行检测工作前重新载入之前保存的服务和主机的状态信息。这将对维护长期状态监测数据统计等工作有利。但是这个设定也会略微造成Nagios启动缓慢的效果。由于这种影响是一时性的,因此有开启的必要。
retain_state_information=1
默认保持状态信息功能是开启的。
第四十二部分:状态保留文件
# STATE RETENTION FILE
# This is the file that Nagios should use to store host and
# service state information before it shuts down. The state
# information in this file is also read immediately prior to
# starting to monitor the network when Nagios is restarted.
# This file is used only if the preserve_state_information
# variable is set to 1.
这个文件将保留Nagios关闭前记录的主机和服务的状态信息。而这个文件也将是当Nagios再次启动后立即读取的文件。这个选项仅仅在保持状态信息功能开启时才有效。
state_retention_file=/var/log/nagios/retention.dat
设定状态保留文件路径。
第四十三部分:保留状态数据更新间隔
# RETENTION DATA UPDATE INTERVAL
# This setting determines how often (in minutes) that Nagios
# will automatically save retention data during normal operation.
# If you set this value to 0, Nagios will not save retention
# data at regular interval, but it will still save retention
# data before shutting down or restarting. If you have disabled
# state retention, this option has no effect.
这个设定决定Nagios将多频繁地在正常工作过程中自动保存状态数据。如果设定值0的话,那么Nagios将在正常间隔中不保存数据,但是它仍然会在关闭或重启前对主机和服务状态做一次保存。如果关闭了保持状态信息功能的话,那么这个选项将无效。
retention_update_interval=60
默认保存状态数据间隔是60秒。
第四十四部分:使用保留程序状态
# USE RETAINED PROGRAM STATE
# This setting determines whether or not Nagios will set
# program status variables based on the values saved in the
# retention file. If you want to use retained program status
# information, set this value to 1. If not, set this value
# to 0.
这个设定决定了Nagios是否将程序的状态设定为基于上次保留的状态。如果使用保留的程序状态的话,那么设定值1,不启用则设定值0。
use_retained_program_state=1
默认是启用保留程序状态。
第四十五部分:使用保留调度信息
# USE RETAINED SCHEDULING INFO
# This setting determines whether or not Nagios will retain
# the scheduling info (next check time) for hosts and services
# based on the values saved in the retention file. If you
# If you want to use retained scheduling info, set this
# value to 1. If not, set this value to 0.
这个设定决定了Nagios是否保留将在下一个主机和服务的检测时间里使用上次保留的调度。如果启用保留调度信息的话设定值1,不启用为0。
use_retained_scheduling_info=0
默认不启用保留调度信息。
第四十六部分:间隔长度
# INTERVAL LENGTH
# This is the seconds per unit interval as used in the
# host/contact/service configuration files. Setting this to 60 means
# that each interval is one minute long (60 seconds). Other settings
# have not been tested much, so your mileage is likely to vary...
这是个以秒为单位,被用在主机/服务配置文件中。设定这个值为60意味着每个间隔将会是一分钟的长度。
interval_length=60
默认时间间隔长度是60秒。
第四十七部分:强行主机检测选项
# AGGRESSIVE HOST CHECKING OPTION
# If you don't want to turn on aggressive host checking features, set
# this value to 0 (the default). Otherwise set this value to 1 to
# enable the aggressive check option. Read the docs for more info
# on what aggressive host check is or check out the source code in
# base/checks.c
如果不想启用强行主机检测特性的话,设定该配置项值为0。否则设定该项值为1来启用强行主机检测选项。如果想更多了解关于强行主机检测功能的话请阅读base/checks.c的源代码或相关手册。
use_aggressive_host_checking=0
默认是不启用强制主机检测特性的。
第四十八部分:服务检测执行选项
# SERVICE CHECK EXECUTION OPTION
# This determines whether or not Nagios will actively execute
# service checks when it initially starts. If this option is
# disabled, checks are not actively made, but Nagios can still
# receive and process passive check results that come in. Unless
# you're implementing redundant hosts or have a special need for
# disabling the execution of service checks, leave this enabled!
# Values: 1 = enable checks, 0 = disable checks
这个设定决定了Nagios在启动初始的时候是否将主动地执行服务检测。如果这个选项被关闭了,监测将不会主动进行,但是Nagios会仍然接收和处理被动检测的结果。除非正在执行冗余主机或者有一个特殊的理由需要关闭服务检测的执行,否则应该让这个配置项保持启用。值1表示启用检测,值0表示关闭监测。
execute_service_checks=1
默认下服务检测是启用的。
第四十九部分:被动服务检测接受选项
# PASSIVE SERVICE CHECK ACCEPTANCE OPTION
# This determines whether or not Nagios will accept passive
# service checks results when it initially (re)starts.
# Values: 1 = accept passive checks, 0 = reject passive checks
这个设定决定了Nagios在启动或者重启时,是否将接受被动服务检测的结果。值1表示接受被动检测,值0表示拒绝被动检测。
accept_passive_service_checks=1
默认是启用被动服务检测功能的。
第五十部分:主机检测执行选项。
# HOST CHECK EXECUTION OPTION
# This determines whether or not Nagios will actively execute
# host checks when it initially starts. If this option is
# disabled, checks are not actively made, but Nagios can still
# receive and process passive check results that come in. Unless
# you're implementing redundant hosts or have a special need for
# disabling the execution of host checks, leave this enabled!
# Values: 1 = enable checks, 0 = disable checks
这个设定决定了Nagios在启动初始的时候是否将主动地执行主机检测。如果这个选项被关闭了,监测将不会主动进行,但是Nagios会仍然接收和处理被动检测的结果。除非正在执行冗余主机或者有一个特殊的理由需要关闭服务检测的执行,否则应该让这个配置项保持启用。值1表示启用检测,值0表示关闭监测。
execute_host_checks=1
默认下启用主机检测功能的。
第五十一部分:被动主机检测接受选项
# PASSIVE HOST CHECK ACCEPTANCE OPTION
# This determines whether or not Nagios will accept passive
# host checks results when it initially (re)starts.
# Values: 1 = accept passive checks, 0 = reject passive checks
这个设定决定了Nagios在启动或者重启时,是否将接受被动主机检测的结果。值1表示接受被动检测,值0表示拒绝被动检测。
accept_passive_host_checks=1
默认下是启用被动主机检测功能的。
第五十二部分:通知选项
# NOTIFICATIONS OPTION
# This determines whether or not Nagios will sent out any host or
# service notifications when it is initially (re)started.
# Values: 1 = enable notifications, 0 = disable notifications
这个设定将决定当Nagios启动或者重启时是否发送出任何主机或服务的通知。值1表示启用通知,值0表示关闭通知。
enable_notifications=1
默认是启用通知。
第五十三部分:事件处理程序使用选项
# EVENT HANDLER USE OPTION
# This determines whether or not Nagios will run any host or
# service event handlers when it is initially (re)started. Unless
# you're implementing redundant hosts, leave this option enabled.
# Values: 1 = enable event handlers, 0 = disable event handlers
这个设定将决定当Nagios启动或者重启时是否运行任何主机或者服务的事件处理程序。除非在执行冗余主机或者有特殊的理由,否则请务必开启此功能。值1表示开启事件处理程序,值0表示关闭事件处理程序。
enable_event_handlers=1
默认启用事件处理程序。
第五十四部分:处理性能数据选项
# PROCESS PERFORMANCE DATA OPTION
# This determines whether or not Nagios will process performance
# data returned from service and host checks. If this option is
# enabled, host performance data will be processed using the
# host_perfdata_command (defined below) and service performance
# data will be processed using the service_perfdata_command (also
# defined below). Read the HTML docs for more information on
# performance data.
# Values: 1 = process performance data, 0 = do not process performance data
这个选项将决定Nagios是否处理从服务和主机检测当中返回的性能数据。如果这个选项被启用了,那么主机性能数据将通过host_perfdata_command中定义的方式进行处理,并且服务性能数据也将通过service_perfdata_command中定义的方式进行处理。设定值1表示处理性能数据,值0表示不处理性能数据。
process_performance_data=0
默认不对性能数据进行处理。
第五十五部分:主机和服务性能数据处理命令
# HOST AND SERVICE PERFORMANCE DATA PROCESSING COMMANDS
# These commands are run after every host and service check is
# performed. These commands are executed only if the
# enable_performance_data option (above) is set to 1. The command
# argument is the short name of a command definition that you
# define in your host configuration file. Read the HTML docs for
# more information on performance data.
这些命令在每个主机和服务检测被执行的时候运行。这些命令仅仅当enable_performance_data选项置1的时候才会被执行。这个命令的参数则是在主机配置文件中定义的简写名。
#host_perfdata_command=process-host-perfdata
#service_perfdata_command=process-service-perfdata
默认这里是空的。
第五十六部分:主机和服务性能数据文件
# HOST AND SERVICE PERFORMANCE DATA FILES
# These files are used to store host and service performance data.
# Performance data is only written to these files if the
# enable_performance_data option (above) is set to 1.
这些文件被用来保存主机和服务性能的数据。如果enable_performance_data项设定值1的话,性能数据将能写入到这些文件当中。
#host_perfdata_file=/tmp/host-perfdata
#service_perfdata_file=/tmp/service-perfdata
默认注释掉性能数据文件的路径。
第五十七部分:主机和服务性能数据文件模版
# HOST AND SERVICE PERFORMANCE DATA FILE TEMPLATES
# These options determine what data is written (and how) to the
# performance data files. The templates may contain macros, special
# characters (\t for tab, \r for carriage return, \n for newline)
# and plain text. A newline is automatically added after each write
# to the performance data file. Some examples of what you can do are
# shown below.
这些选项决定了数据文件当中将记录什么样的数据。模版可以包含宏、指定字符(\t表示tab,\r表示返回,\n表示换行)以及简单文本。换行则将被自动添加到每个写入的新的性能数据之后。以下则是示例的格式。
#host_perfdata_file_template=[HOSTPERFDATA]\t$TIMET$\t$HOSTNAME$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$
#service_perfdata_file_template=[SERVICEPERFDATA]\t$TIMET$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$
第五十八部分:主机和服务性能数据文件模式
# HOST AND SERVICE PERFORMANCE DATA FILE MODES
# This option determines whether or not the host and service
# performance data files are opened in write ("w") or append ("a")
# mode. Unless you are the files are named pipes, you will probably
# want to use the default mode of append ("a").
这个选项将觉得主机和服务性能数据文件是否开放“写”(w)或“追加”(a)模式。除非你的文件通过管道来命名的,否则请使用默认的追加模式a。
#host_perfdata_file_mode=a
#service_perfdata_file_mode=a
默认这里是注释掉的。
第五十九部分:主机和服务性能数据文件处理间隔
# HOST AND SERVICE PERFORMANCE DATA FILE PROCESSING INTERVAL
# These options determine how often (in seconds) the host and service
# performance data files are processed using the commands defined
# below. A value of 0 indicates the files should not be periodically
# processed.
这些选项决定了主机和服务性能数据文件将多频繁地通过下面定义的命令来处理。值0表示这些文件将不会被循环处理。
#host_perfdata_file_processing_interval=0
#service_perfdata_file_processing_interval=0
默认这里是注释掉的。
第六十部分:主机和服务性能数据文件处理命令
# HOST AND SERVICE PERFORMANCE DATA FILE PROCESSING COMMANDS
# These commands are used to periodically process the host and
# service performance data files. The interval at which the
# processing occurs is determined by the options above.
这些命令被用来循环地处理主机和服务性能数据文件。而这些处理进行的时间的间隔将由上面的配置选项决定。
#host_perfdata_file_processing_command=process-host-perfdata-file
#service_perfdata_file_processing_command=process-service-perfdata-file
默认这里是注释掉的。
第六十一部分:迷惑服务检测选项?
# OBSESS OVER SERVICE CHECKS OPTION
# This determines whether or not Nagios will obsess over service
# checks and run the ocsp_command defined below. Unless you're
# planning on implementing distributed monitoring, do not enable
# this option. Read the HTML docs for more information on
# implementing distributed monitoring.
# Values: 1 = obsess over services, 0 = do not obsess (default)
这个选项将决定Nagios是否被服务检测迷惑并运行之后定义的ocsp_command。除非机会执行分布式的监测,否则不要轻易启用这个选项。值1表示迷惑服务。值0表示不迷惑服务。
obsess_over_services=0
默认不启用。
第六十二部分:OCSP命令
# OBSESSIVE COMPULSIVE SERVICE PROCESSOR COMMAND
# This is the command that is run for every service check that is
# processed by Nagios. This command is executed only if the
# obsess_over_service option (above) is set to 1. The command
# argument is the short name of a command definition that you
# define in your host configuration file. Read the HTML docs for
# more information on implementing distributed monitoring.
这是个由Nagios处理的、为每个服务检测而运行的命令。这个命令仅仅当obsess_over_service选项设定值1启动时才执行有效。这个命令的参数则是在主机配置文件中定义的简写名。
#ocsp_command=somecommand
默认这里配置为空。
第六十三部分:孤儿服务检测选项
# ORPHANED SERVICE CHECK OPTION
# This determines whether or not Nagios will periodically
# check for orphaned services. Since service checks are not
# rescheduled until the results of their previous execution
# instance are processed, there exists a possibility that some
# checks may never get rescheduled. This seems to be a rare
# problem and should not happen under normal circumstances.
# If you have problems with service checks never getting
# rescheduled, you might want to try enabling this option.
# Values: 1 = enable checks, 0 = disable checks
这个设定决定了Nagios是否会周期地检查被孤立的服务。因为服务检测在直到他们的上一个执行实例被处理前都不会被重新调度,所以就会有一个可能就是一些检测将会永远得不到重新调度。虽然这似乎会是一个比较罕见的问题,并且在正常情况下不会发生。如果你有这种某些服务检测永远得不到重调度的问题的话,可能需要将此选项置值为1启用。值0表示关闭。
check_for_orphaned_services=1
默认情况下是启用孤儿服务检测选项的。
第六十四部分:服务更新监测选项
# SERVICE FRESHNESS CHECK OPTION
# This option determines whether or not Nagios will periodically
# check the "freshness" of service results. Enabling this option
# is useful for ensuring passive checks are received in a timely
# manner.
# Values: 1 = enabled freshness checking, 0 = disable freshness checking
这个选项将决定Nagios是否会周期性地检查服务更新结果。启用置1,不启用置0。
check_service_freshness=1
默认启用服务更新监测。
第六十五部分:服务更新监测间隔
# SERVICE FRESHNESS CHECK INTERVAL
# This setting determines how often (in seconds) Nagios will
# check the "freshness" of service check results. If you have
# disabled service freshness checking, this option has no effect.
这个设定决定Nagios将多频繁地对服务更新进行检查。如果关闭了服务更新监测选项的话,那么这个选项将无效。
service_freshness_check_interval=60
默认服务更新监测时间间隔是60秒。
第六十六部分:主机更新检测选项
# HOST FRESHNESS CHECK OPTION
# This option determines whether or not Nagios will periodically
# check the "freshness" of host results. Enabling this option
# is useful for ensuring passive checks are received in a timely
# manner.
# Values: 1 = enabled freshness checking, 0 = disable freshness checking
这个选项将决定Nagios是否会周期性地检查主机更新结果。启用置1,不启用置0。
check_host_freshness=0
默认关闭主机更新检测。
第六十七部分:主机更新检测间隔
# HOST FRESHNESS CHECK INTERVAL
# This setting determines how often (in seconds) Nagios will
# check the "freshness" of host check results. If you have
# disabled host freshness checking, this option has no effect.
这个设定决定Nagios将多频繁地对主机更新进行检查。如果关闭了主机更新监测选项的话,那么这个选项将无效。
host_freshness_check_interval=60
默认主机更新监测间隔为60秒。
第六十八部分:聚集状态更新
# AGGREGATED STATUS UPDATES
# This option determines whether or not Nagios will
# aggregate updates of host, service, and program status
# data. Normally, status data is updated immediately when
# a change occurs. This can result in high CPU loads if
# you are monitoring a lot of services. If you want Nagios
# to only refresh status data every few seconds, disable
# this option.
# Values: 1 = enable aggregate updates, 0 = disable aggregate updates
这个选项决定了Nagios将是否聚集主机、服务以及程序状态的数据更新。正常情况下,当一个变化发生时,状态数据将会立即更新。但是当对多个服务进行检测的时候这种情况会造成高额的CPU负载。如果想要Nagios只隔若干秒就刷新一次状态的话,就关闭这个选项。值1表示启用聚集状态更新,值0的话则关闭。
aggregate_status_updates=1
默认情况下启用聚集状态更新。
第六十九部分:聚集状态更新间隔
# AGGREGATED STATUS UPDATE INTERVAL
# Combined with the aggregate_status_updates option,
# this option determines the frequency (in seconds!) that
# Nagios will periodically dump program, host, and
# service status data. If you are not using aggregated
# status data updates, this option has no effect.
该选项须结合aggregate_status_update选项的设定。这个选项将决定Nagios是否周期性地更新程序、主机以及服务的状态数据。如果没有启用聚集状态更新功能选项,那么这个选项将无效。
status_update_interval=15
默认聚集状态更新间隔为15秒。
第七十部分:抖动探测选项
# FLAP DETECTION OPTION
# This option determines whether or not Nagios will try
# and detect hosts and services that are "flapping".
# Flapping occurs when a host or service changes between
# states too frequently. When Nagios detects that a
# host or service is flapping, it will temporarily suppress
# notifications for that host/service until it stops
# flapping. Flap detection is very experimental, so read
# the HTML documentation before enabling this feature!
# Values: 1 = enable flap detection
# 0 = disable flap detection (default)
这个选项决定Nagios将是否尝试去检测那些“发生抖动”的主机和服务。所谓“抖动”就是发生在当一个主机或者服务频繁地在两种状态之间变化的现象。当Nagios探测到一个主机或者服务正在发生抖动的时候,它将暂时抑制这个主机/服务的通知消息,直到抖动停止。值1表示启用抖动探测,值0表示关闭抖动检测,默认关闭。
enable_flap_detection=0
默认是关闭抖动检测的。
第七十一部分:主机和服务的抖动检测阀值
# FLAP DETECTION THRESHOLDS FOR HOSTS AND SERVICES
# Read the HTML documentation on flap detection for
# an explanation of what this option does. This option
# has no effect if flap detection is disabled.
设定主机和服务抖动的高低阀值。如果抖动检测没有被启动的话,那么这些配置项将无效。
low_service_flap_threshold=5.0
最低服务抖动阀值,默认为5秒。
high_service_flap_threshold=20.0
最高服务抖动法制,默认为20秒。
low_host_flap_threshold=5.0
最低主机抖动阀值,默认为5秒。
high_host_flap_threshold=20.0
最高主机抖动阀值,默认为20秒。
第七十二部分:日期格式选项
# DATE FORMAT OPTION
# This option determines how short dates are displayed. Valid options
# include:
# us (MM-DD-YYYY HH:MM:SS)
# euro (DD-MM-YYYY HH:MM:SS)
# iso8601 (YYYY-MM-DD HH:MM:SS)
# strict-iso8601 (YYYY-MM-DDTHH:MM:SS)
#
这个选项决定了简写日期显示的格式,可用选项如下:
美国标准 us (MM-DD-YYYY HH:MM:SS)
欧洲标准 euro (DD-MM-YYYY HH:MM:SS)
国际标准 iso8601 (YYYY-MM-DD HH:MM:SS)
严格国际标准 strict-iso8601 (YYYY-MM-DDTHH:MM:SS)
date_format=us
默认格式为美国标准。
第七十四部分:非法对象命名字符
# ILLEGAL OBJECT NAME CHARACTERS
# This option allows you to specify illegal characters that cannot
# be used in host names, service descriptions, or names of other
# object types.
在这个选项里定义了哪些字符将不能使用在主机名、服务描述或者其他对象类型当中。
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
默认已经列出了一些非法命名字符。
第七十五部分:非法宏输出字符
# ILLEGAL MACRO OUTPUT CHARACTERS
# This option allows you to specify illegal characters that are
# stripped from macros before being used in notifications, event
# handlers, etc. This DOES NOT affect macros used in service or
# host check commands.
# The following macros are stripped of the characters you specify:
# $HOSTOUTPUT$
# $HOSTPERFDATA$
# $HOSTACKAUTHOR$
# $HOSTACKCOMMENT$
# $SERVICEOUTPUT$
# $SERVICEPERFDATA$
# $SERVICEACKAUTHOR$
# $SERVICEACKCOMMENT$
这个选项里定义了哪些字符将不能作为宏的输出字符。以下的这些宏将受到该选项配置的影响:
$HOSTOUTPUT$
$HOSTPERFDATA$
$HOSTACKAUTHOR$
$HOSTACKCOMMENT$
$SERVICEOUTPUT$
$SERVICEPERFDATA$
$SERVICEACKAUTHOR$
$SERVICEACKCOMMENT$
illegal_macro_output_chars=`~$&|'"<>
默认已经列出了一些非法命名字符。
第七十六部分:正则表达式匹配
# REGULAR EXPRESSION MATCHING
# This option controls whether or not regular expression matching
# takes place in the object config files. Regular expression
# matching is used to match host, hostgroup, service, and service
# group names/descriptions in some fields of various object types.
# Values: 1 = enable regexp matching, 0 = disable regexp matching
这个选项控制了表达式在对象配置文件中是否会发生表达式的匹配。正则表达式用来在一些字段中匹配主机、主机组、服务、服务组的命名/描述。值1表示启用正则表达式匹配功能,值0表示关闭。
use_regexp_matching=0
默认关闭正则表达式的匹配功能。
第七十七部分:真值正则表达式匹配
# "TRUE" REGULAR EXPRESSION MATCHING
# This option controls whether or not "true" regular expression
# matching takes place in the object config files. This option
# only has an effect if regular expression matching is enabled
# (see above). If this option is DISABLED, regular expression
# matching only occurs if a string contains wildcard characters
# (* and ?). If the option is ENABLED, regexp matching occurs
# all the time (which can be annoying).
# Values: 1 = enable true matching, 0 = disable true matching
这个选项控制了在对象配置文件中是否会发生值为“真”的正则表达式的匹配。这个选项仅仅在启正则表达式匹配功能的前提下有效。如果这个选项被关闭了,那么正则表送达式匹配将仅发生在一个字符串包含通配字符(“*”和“?”)的情况下了。如果这个选项被启用了,那么正则表达式匹配将会在任何情况下发生了(这可能会引起一些不希望发生的问题)。
use_true_regexp_matching=0
默认情况下真值的正则表达式匹配功能是关闭的。
第七十八部分:管理员电子邮件地址
# ADMINISTRATOR EMAIL ADDRESS
# The email address of the administrator of *this* machine (the one
# doing the monitoring). Nagios never uses this value itself, but
# you can access this value by using the $ADMINEMAIL$ macro in your
# notification commands.
指定管理员的邮件地址
admin_email=nagios
默认情况下该项值为nagios。
第七十九部分:管理员联系地址
# ADMINISTRATOR PAGER NUMBER/ADDRESS
# The pager number/address for the administrator of *this* machine.
# Nagios never uses this value itself, but you can access this
# value by using the $ADMINPAGER$ macro in your notification
# commands.
设定管理员的联系地址
admin_pager=pagenagios
默认下该项值为pagenagios
第八十部分:守护进程核心转储
# DAEMON CORE DUMP OPTION
# This option determines whether or not Nagios is allowed to create
# a core dump when it runs as a daemon. Note that it is generally
# considered bad form to allow this, but it may be useful for
# debugging purposes.
# Values: 1 - Allow core dumps
# 0 - Do not allow core dumps (default)
这个选项决定了当Nagios作为一个守护进程运行的时候,是否允许建立一个核心转储。注意通常情况下启用这个选项被认为是一种不优的方式,但是它可能对诊断排错带来帮助。值1表示允许核心转储,值0表示不允许核心转储,默认情况下是关闭的。
daemon_dumps_core=0
默认情况下不允许核心转储。
2.安装Centreon
一、什么是Centreon
Centreon是一款开源的软件,主要用于与nagios搭配,通过页面管理nagios,通过第三方组件实现对网络,操作系统,应用程序的监控。
二、安装 Ndoutils
A、下载Ndoutils
# wget http://nchc.dl.sourceforge.net/sourceforge/nagios/ndoutils-1.4b7.tar.gz
B、安装Ndoutils
# tar -zxvf ndoutils-1.4b7.tar.gz
# cd ndoutils-1.4b7
# ./configure --prefix=/usr/lib/nagios --enable-mysql --disable-pgsql --with-ndo2db-user=nagios--with-ndo2db-group=nagios
# make clean
# make
# cp config/ndomod.cfg /etc/nagios
# cp config/ndo2db.cfg /etc/nagios
# mkdir /usr/lib/nagios/bin
# cp src/ndomod-3x.o /usr/lib/nagios/bin/ndomod.o
# cp src/ndo2db-3x /usr/lib/nagios/bin/ndo2db
# cp src/log2ndo /usr/lib/nagios/bin/log2ndo
# cp src/sockdebug /usr/lib/nagios/bin/sockdebug
# cp src/file2sock /usr/lib/nagios/bin/file2sock
# vi /etc/nagios/nagios.cfg (更改如下所示)
broker_module=/usr/lib/nagios/bin/ndomod.o config_file=/etc/nagios/ndomod.cfg
event_broker_options=-1
# vi /etc/nagios/ndomod.cfg (更改如下所示)
output_type=tcpsocket
#output_type=unixsocket
output=127.0.0.1
#output=/usr/local/nagios/var/ndo.sock
buffer_file=/var/nagios/ndomod.tmp
# vi /etc/nagios/ndo2db.cfg (更改如下所示)
#socket_type=unix
socket_type=tcp
#socket_name=/usr/local/nagios/var/ndo.sock
db_servertype=mysql
db_name=ndo
db_user=ndouser
db_pass=ndopassword
debug_level=-1
debug_verbosity=2
debug_file=/var/log/nagios/ndo2db-debug.log
# touch /var/log/nagios/ndo2db-debug.log
# chown nagios.nagios /var/log/nagios/ndo2db-debug.log
C、设置NDO DB
# wget http://download.centreon.com/centreon/centreon-2.0.1.tar.gz
# tar -zxvf centreon-2.0.1.tar.gz
# cd centreon-2.0.1
# service mysqld start
# mysql -u root -p (密码为空,可以通过mysqladmin -u root -p 'password'来设置一个密码)
mysql> CREATE DATABASE `ndo` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
mysql>exit;
# mysql -u root -p ndo < /$centreon-2.0.1-path$/www/install/createNDODB.sql
# mysql -u root -p
mysql> GRANT SELECT , INSERT , UPDATE , DELETE ON `ndo` . * TO 'ndouser'@'localhost' IDENTIFIED BY 'ndopassword';
mysql>flush privileges;
mysql>exit;
D、设置NDO2DB服务
# vi /etc/init.d/ndo2db
#!/bin/sh
#
#
# chkconfig: 345 99 01
# description: Nagios to mysql
#
# Author : Gaëtan Lucas
# Realase : 07/02/08
# Version : 0.1 b
# File : ndo2db
# Description: Starts and stops the Ndo2db daemon
# used to provide network services status in a database.
#
status_ndo ()
{
if ps -p $NdoPID > /dev/null 2>&1; then
return 0
else
return 1
fi
return 1
}
printstatus_ndo()
{
if status_ndo $1 $2; then
echo "ndo (pid $NdoPID) is running..."
else
echo "ndo is not running"
fi
}
killproc_ndo ()
{
echo "kill $2 $NdoPID"
kill $2 $NdoPID
}
pid_ndo ()
{
if test ! -f $NdoRunFile; then
echo "No lock file found in $NdoRunFile"
echo -n " checking runing process..."
NdoPID=`ps h -C ndo2db -o pid`
if [ -z "$NdoPID" ]; then
echo " No ndo2db process found"
exit 1
else
echo " found process pid: $NdoPID"
echo -n " reinit $NdoRunFile ..."
touch $NdoRunFile
chown $NdoUser:$NdoGroup $NdoRunFile
echo "$NdoPID" > $NdoRunFile
echo " done"
fi
fi
NdoPID=`head $NdoRunFile`
}
# Source function library
# Solaris doesn't have an rc.d directory, so do a test first
if [ -f /etc/rc.d/init.d/functions ]; then
. /etc/rc.d/init.d/functions
elif [ -f /etc/init.d/functions ]; then
. /etc/init.d/functions
fi
prefix=/usr/lib/nagios
exec_prefix=${prefix}
NdoBin=/usr/lib/nagios/bin/ndo2db
NdoCfgFile=/etc/nagios/ndo2db.cfg
NdoRunFile=/var/nagios/ndo2db.run
NdoLockDir=/var/lock/subsys
NdoLockFile=ndo2db.lock
NdoUser=nagios
NdoGroup=nagios
# Check that ndo exists.
if [ ! -f $NdoBin ]; then
echo "Executable file $NdoBin not found. Exiting."
exit 1
fi
# Check that ndo.cfg exists.
if [ ! -f $NdoCfgFile ]; then
echo "Configuration file $NdoCfgFile not found. Exiting."
exit 1
fi
# See how we were called.
case "$1" in
start)
echo -n "Starting ndo:"
touch $NdoRunFile
chown $NdoUser:$NdoGroup $NdoRunFile
daemon $NdoBin -c $NdoCfgFile
if [ -d $NdoLockDir ]; then
touch $NdoLockDir/$NdoLockFile;
fi
ps h -C ndo2db -o pid > $NdoRunFile
if [ $? -eq 0 ]; then
echo " done."
exit 0
else
echo " failed."
$0 stop
exit 1
fi
;;
stop)
echo -n "Stopping ndo: "
pid_ndo
killproc_ndo
# now we have to wait for ndo to exit and remove its
# own NdoRunFile, otherwise a following "start" could
# happen, and then the exiting ndo will remove the
# new NdoRunFile, allowing multiple ndo daemons
# to (sooner or later) run
#echo -n 'Waiting for ndo to exit .'
for i in 1 2 3 4 5 6 7 8 9 10 ; do
if status_ndo > /dev/null; then
echo -n '.'
sleep 1
else
break
fi
done
if status_ndo > /dev/null; then
echo
echo 'Warning - ndo did not exit in a timely manner'
else
echo 'done.'
fi
rm -f $NdoRunFile $NdoLockDir/$NdoLockFile
;;
status)
pid_ndo
printstatus_ndo ndo
;;
restart)
$0 stop
$0 start
;;
*)
echo "Usage: ndo {start|stop|restart|status}"
exit 1
;;
esac
# End of this script
接着:
# chmod 755 /etc/init.d/ndo2db
# chkconfig --level 345 ndo2db on
# mv /etc/rc3.d/S99ndo2db /etc/rc3.d/S97ndo2db
# mv /etc/rc5.d/S99ndo2db /etc/rc5.d/S97ndo2db
# service nagios stop ## 必须先停止Nagios服务
# service ndo2db start
验证ndo2db TCP端口是否打开:
# netstat -tl (如下图红框所示)
# service nagios start
# tail -f /var/log/nagios/nagios.log
必须有如红框所示信息,表示ndo2db安装启动成功。
三、安装Centreon 2.0.1
A、安装Centreon
# cd centreon-2.0.1
# visudo (更改如下所示)
# Defaults requiretty
重新确定PHP-PEAR能正常更新
B、执行Centreon安装脚本
# ./install.sh –i
注意:
执行此脚本前升级php-pear到1.5.4以上,否则安装过程中有些升级程序无法执行(在本文档制作时)。将来可能yum可以直接升级pear到更新的版本,这个问题就不会存在了。
#wget ftp://ftp.pbone.net/mirror/rpms.famillecollet.com/enterprise/5/remi/i386/php-pear-1.9.0-1.el5.remi.noarch.rpm
# rpm -Uvh php-pear-1.9.0-1.el5.remi.noarch.rpm
# pear version
PHP Warning: Module 'ldap' already loaded in Unknown on line 0
PEAR Version: 1.9.0
PHP Version: 5.1.6
Zend Engine Version: 2.1.0
Running on: Linux localhost 2.6.18-164.9.1.el5 #1 SMP Tue Dec 15 21:04:57 EST 2009 i686
Centreon安装完成后,你可以通过http://yourServerIPAddress/centreon来访问。
四、Centreon的配置
需要在Apache的默认根目录建一个index.html文档,否则在Centreon的配置过程中会有错误产生。
# touch /var/www/html/index.html
另外需要将Nagios的配置目录的属组改成apache
# chown nagios.apache /etc/nagios
注意:
如果初始化配置时出现cookies 错误类提示,可能是由于服务器与本地时间不一致造成。请将服务器时间更改为北京时间此问题可解决。
A、初始化配置
当首次登录后,界面会出现如下错误:
修正方法见下图:
修改好后,请保存,再访问“Home”,就会出现如下画面:
B、配置nagios.cfg
名稱 |
設定值 |
Object Configuration Directory |
/etc/nagios/ |
Log file |
/var/log/nagios/nagios.log |
Downtime File |
/var/log/nagios/downtime.log |
Comment File |
/var/log/nagios/comment.log |
Temp File |
/var/nagios/nagios.tmp |
P1 File |
/usr/bin/p1.pl |
Lock File |
/var/run/nagios.pid |
Object Cache File |
/var/nagios/objects.cache |
Status File |
/var/nagios/status.dat |
Aggregated Status Updates Option |
Yes |
Aggregated Status Data Update Interval |
15 sec |
External Command Check Option |
|
External Command Check Interval |
1s |
External Command File |
/var/nagios/rw/nagios.cmd |
Agressive Host Checks |
Yes |
Predictive Host Dependency Checks |
Yes |
Service Check Option |
|
Soft Service Dependencies Option |
No |
Predictive Service Dependency Checks |
Yes |
Global Host Event Handler |
none |
Global Service Event Handler |
none |
Service Freshness Check Option |
Yes |
Service Freshness Check Interval |
60 |
Host Freshness Check Option |
Yes |
Host Freshness Check Interval |
60 |
Additional freshness latency |
15 |
Flap Detection Option |
no |
Obsess Over Services Option |
No |
Obsess Over Hosts Option |
no |
Cached Host Check |
15 |
Cached Service Check |
15 |
Notification Option |
Yes |
Service Check Execution Option |
Yes |
Passive Service Check Acceptance Option |
Yes |
Event Handler Option |
Yes |
Host Check Execution Option |
Yes |
Passive Host Check Acceptance Option |
Yes |
Syslog Logging Option |
No |
Notification Logging Option |
Yes |
Service Check Retry Logging Option |
Yes |
Host Retry Logging Option |
Yes |
Event Handler Logging Option |
Yes |
Initial State Logging Option |
no |
External Command Logging Option |
Yes |
Passive Check Logging Option |
Yes |
Log Archive Path |
/var/log/nagios/archives/ |
State Retention Option |
Yes |
State Retention File |
/var/nagios/retention.dat |
Automatic State Retention Update Interval |
Yes |
Use Retained Program State Option |
Yes |
Use Retained Scheduling Info Option |
Yes |
Broker Module |
/usr/lib/nagios/bin/ndomod.o config_file=/etc/nagios/ndomod.cfg |
Broker Module Options |
-1 |
Performance Data Processing Option |
Yes |
Host Performance Data File |
/var/nagios/host-perfdata |
Service Performance Data File |
/var/nagios/service-perfdata |
Host Performance Data File Mode |
a |
Service Performance Data File Mode |
a |
Host Performance Data File Processing Interval |
0 |
Service Performance Data File Processing Interval |
0 |
Timing Interval Length |
5 |
Host Inter-Check Delay Method |
s |
Maximum Host Check Spread |
30 |
Inter-Check Sleep Time |
1 |
Service Inter-Check Delay Method |
s |
Maximum Service Check Spread |
5 |
Service Interleave Factor |
s |
Maximum Concurrent Service Checks |
200 |
Service Repear Frequency |
5 |
Auto-Rescheduling Option |
Yes |
Auto-Rescheduling Interval |
30 |
Auto-Rescheduling Window |
180 |
Use large installation tweaks |
No |
Free child process memory |
No |
Child processes fork twice |
No |
Enable environment macros |
Yes |
Date Format |
us |
Regular Expression Matching Option |
No |
True Regular Expression Matching Option |
no |
Enable embedded Perl |
default |
Use embedded Perl implicitly |
default |
Debug file (Directory + File) |
/var/log/nagios/nagios.debug |
Debug file Maximum Size |
100000000 |
C、配置cgi.cfg
D、应用更改
注意:以后每次修改配置文件,或添加新的Host、Service等,都需要执行上述步骤才能生效!!
E、最后
# vi /etc/centreon/instCentStorage.conf
RRD_PERL=/usr/lib/perl5/vendor_perl/5.8.8/i386-linux-thread-multi/RRDs.pm
# vi /etc/centreon/instCentWeb.conf
RRD_PERL=/usr/lib/perl5/vendor_perl/5.8.8/i386-linux-thread-multi/RRDs.pm
# vi /etc/centreon/instCentPlugins.conf
RRD_PERL=/usr/lib/perl5/vendor_perl/5.8.8/i386-linux-thread-multi/RRDs.pm
至此,Nagios+Centreon配置完成!
锦囊妙计 ----- 略