Chinaunix首页 | 论坛 | 博客
  • 博客访问: 290048
  • 博文数量: 65
  • 博客积分: 1514
  • 博客等级: 中尉
  • 技术积分: 820
  • 用 户 组: 普通用户
  • 注册时间: 2011-10-20 21:01





2012-02-19 22:24:34

Nagios 3.x
Dr. 田朝阳



修订 0.0.3 30/01/2008 enochcytian
修订 0.0.2 20/12/2007 enochcytian
修订 0.0.1 12/12/2007 enochcytian



1. 致谢


其次要感谢Nagios的作者,是Ethan Galstad给我们带来了这么好的一款软件,也是他给我的回信,让我知道了Nagios软件将向何处努力与发展。


第 1 章 序







第 2 章 关于Nagios
2.1. 什么是Nagios?




  1. 监控网络服务(SMTP、POP3、HTTP、NNTP、PING等);
  2. 监控主机资源(处理器负荷、磁盘利用率等);
  3. 简单地插件设计使得用户可以方便地扩展自己服务的检测方法;
  4. 并行服务检查机制;
  5. 具备定义网络分层结构的能力,用"parent"主机定义来表达网络主机间的关系,这种关系可被用来发现和明晰主机宕机或不可达状态;
  6. 当服务或主机问题产生与解决时将告警发送给联系人(通过EMail、短信、用户定义方式);
  7. 具备定义事件句柄功能,它可以在主机或服务的事件发生时获取更多问题定位;
  8. 自动的日志回滚;
  9. 可以支持并实现对主机的冗余监控;
  10. 可选的WEB界面用于查看当前的网络状态、通知和故障历史、日志文件等;
2.2. 系统需求



  1. 一个WEB服务(最好是)
  2. Thomas Boutell制作的版本应是1.6.3或更高(在CGIs程序模块和这两个模块里需要这个库)
2.3. 版权



2.4. 致谢


2.5. 下载最新版本



Nagios及Nagios商业标识由Ethan Galstad所拥有。其他的商业标识、服务标识、注册商标及注册服务属于各自的所有者。

第 3 章 Nagios 3.0新特性

Important: Make sure you read through the documentation and the FAQs at before sending a question to the mailing lists.

3.1. 更新日志


3.2. 变更与新特征
  • 文档:
    1. 更新了文档 - 很抱歉我对文档的更新工作进展迟缓。这会花些时间来做,因为有很多文档而且写这些文档并不是我喜欢的事情(我更不喜欢整天翻译,这也不是我喜欢的事情)。期待一些文档与其他的有所不同,而这些不同会对于那些新人或有经验的Nagios使用者起些作用。
  • 内嵌宏:
    4. 在检测、事件句柄处理、告警和其他外部命令执行时,宏可以获取环境变量。这可会使Nagios在大型部署方案时占用较高的CPU处理能力,你可以设置 选项来不使能它。
    5. 有关宏的更新信息可以在查到。
  • 预定义停机时间:
    1. 不再保存在各自文件(之前是由主配置文件里的downtime_file来指定)。当前的和保留的预定义停机时间将分别保存于和中。
  • 注释:
    1. 主机和服务的注释不再保存于各自的文件(之前在主配置文件中的comment_file来指定)。当前的和保留的注释将分别保存于和之中。
    2. Acknowledgement comments that are marked as non-persistent are now only deleted when the acknowledgement is removed. They were previously automatically deleted when Nagios restarted, which was not ideal.
  • State Retention Data:
    1. Status information for individual contacts is now retained across program restarts.
    2. Comment and downtime IDs are now retained across program restarts and should be unique unless the retention data is deleted or ignored.
    3. Added and variables to control what host/service attributes are retained globally across program restarts.
    4. Added and variables to control what process attributes are retained across program restarts.
    5. Added and variables to control what contact attributes are retained globally across program restarts.
  • Flap Detection:
    1. Added flap_detection_options directive to host and service definitions to allow you to specify what host/service states should be used by the flap detection logic (by default all states are used).
    2. Percent state change and state history are now retained and recorded even when flap detection is disabled.
    3. Hosts and services are immediately checked for flapping when flap detection is enabled program-wide.
    4. Hosts and services that are flapping when flap detection is disabled program-wide are now logged.
    5. More information on flap detection can be found .
  • External Commands:
    1. Added a new PROCESS_FILE external command to allow processing of external commands found in an eternal (regular) file. Useful for processing large amounts of passive checks with long output, or for scripting regular commands. More information can be found here.
    2. Custom commands may now be submitted to Nagios. Custom command names are prefixed with an underscore and are not processed internally by the Nagios daemon. They may, however, be processed by a loaded NEB module.
    3. The option is now enabled by default, which means Nagios is configured to check for external "commands out of the box". All 2.x and earlier versions of Nagios had this option disabled by default.
  • Status Data:
    1. Contact status information (last notification times, notifications enabled/disabled, etc.) is now saved in the and files, although it is not processed by the CGIs.
  • Embedded Perl:
    1. Added new and variables to control use of the embedded Perl interpreter.
    2. Perl scripts/plugins can now explicitly tell Nagios whether or not they should be run under the embedded Pel interpreter. This is useful if you have troublesome scripts that don't function well under the ePN.
    3. More information about these new optios can be found .
  • Adaptive Monitoring:
    1. The check timeperiod for hosts and services can now be modified on-the-fly with the appropriate external command (CHANGE_HOST_CHECK_TIMEPERIOD or CHANGE_SVC_CHECK_TIMEPERIOD).查阅这个网页以取得更多可用的适应性检测命令。
  • Notifications:
    1. A first_notification_delay option has been added to host and service definitions to (what else) introduce a delay between when a host/service problem first occurs and when the first problem notification goes out. In previous versions you had to use some mighty config-fu with escalations to accomplish this. Now this feature is available to normal mortals.
    2. Notifications are now sent out for hosts/services that are flapping when flap detection is disabled on a host- or service-specific basis or on a program-wide basis. The $NOTIFICATIONTYPE$ macro will be set to "FLAPPINGDISABLED" in this situation.
    3. Notifications can now be sent out when scheduled downtime start, ends, and is cancelled for hosts and services. The $NOTIFICATIONTYPE$ macro will be set to "DOWNTIMESTART", "DOWNTIMEEND", or "DOWNTIMECANCELLED", respectively. In order to received notifications on scheduled downtime events, specify "s" or "downtime" in your contact, host, and/or service notification options.
    4. More information on notifications can be found .
  • Object Definitions:
    1. Service dependencies can now be created to easily define "same host" dependencies for different services on one or more hosts. ()
    2. Extended host and service definitions (hostextinfo and serviceextinfo, respectively) have been deprecated. All values that from extended definitions have been merged with host or service definitions, as appropriate. Nagios 3 will continue to read and process older extended information definitions, but will log a warning. Future versions of Nagios (4.x and later) will not support separate extended info definitions.
    3. New hostgroup_members, servicegroup_members, and contactgroup_members directives have been added to hostgroup, servicegroup, and contactgroups definitions, respectively. This allows you to include hosts, services, or contacts from sub-groups in your group definitions.
    4. New notes, notes_url, and action_url have been added to hostgroup and servicegroup definition.
    5. Contact definitions have the new host_notifications_enabled, service_notifications_enabled, and can_submit_commands directives to better control notifications and determine whether or not they can submit commands through the web interface.
    6. Host and service dependencies now support an optional dependency_period directive. This allows you to limit the times during which dependencies are valid.
    7. The parallelize directive in service definitions is now deprecated and no longer used. All service checks are run in parallel in Nagios 3.
    8. There are no longer any inherent limitations on the length of host names or service descriptions.
    9. Extended regular expressions are now used if you enable the config option. Regular expression matching is only used in certain object definition directives that contain *, ?, +, or \..
    10. A new initial_state directive has been added to host and service definitions, so you can tell Nagios that a host/service should default to a specific state when Nagios starts, rather than UP or OK (which is still the default).
  • Object Inheritance:
    1. You can now inherit object variables/values from multiple templates by specifying more than one template name in the use directive of object definitions. This can allow for some very powerful (and complex) inheritance setups. ()
    2. Services now inherit contact groups, notification interval, and notification period from their associated host if not otherwise specified. ()
    3. Host and service escalations now inherit contact groups, notification interval, and escalation timeperiod fro their associated host or service if not otherwise specified. ()
    4. String variables in host, service, and contact definitions can now be prevented from being inherited by specifying a value of "null" (without quotes) for the value of the variable. ()
    5. Most string variables in local object definitions can now be appended to the string values that are inherited. This is quite handy in large configurations. ()
  • Performance Improvements:
    1. Add ability to precache object config files and exclude circular path detection checks from verification process. This can speed up Nagios start time immensely in large environments! Read more .
    2. A new option has been added that should improve performance in large Nagios installations. Read more about this .
    3. A number of internal improvements have been made with regards to how Nagios deals with internal data structures and object (e.g. host and service) relationships. These improvements should result in a speedup for larger installations.
    4. New option has been added to allow you to more easily scale Nagios in large environments. For best results you should consider using Nagios' usage of buffer slots over time.
  • Plugin Output:
    1. Multiline plugin output is now supported for host and service checks. Hooray! The plugin API has been updated to support multiple lines of output in a manner that retains backward compatability with older plugins. Additional lines of output (aside from the first line) are now stored in new $LONGHOSTOUTPUT$ and $LONGSERVICEOUTPUT$ macros.
    2. The maximum length of plugin output has been increased to 4K (from around 350 bytes in previous versions). This 4K limit has been arbitrarily chosen to protect again runaway plugins that dump back too much data to Nagios.
    3. More information on the plugins, multiline output, and max plugin output length can be found .
  • Service Checks:
    1. Nagios now checks for orphaned service checks by default.
    2. Added a new option to control whether or not Nagios will initiate predictive check of service that are being depended upon (in dependency definitions). Predictive checks help ensure that the dependency logic is as accurate as possible. ()
    3. A new cached service check feature has been implemented that can significantly improve performance for many people Instead of executing a plugin to check the status of a service, Nagios can often use a cached service check result instead. More information on this can be found .
  • Host Checks:
    1. Host checks are now run in parallel! Host checks used to be run in a serial fashion, which meant they were a major holdup in terms of performance. No longer! ()
    2. Host check retries are now performed like service check retries. That is to say, host definitions now have a new retry_interval that specifies how much time to wait before trying the host check again. :-)
    3. Regularly scheduled host checks now longer hinder performance. In fact, they can help to increase performance with the new cached check logic (see below).
    4. Added a new option to enable checks of orphaned host checks. This is need now that host checks are run in parallel.
    5. Added a new option to control whether or not Nagios will initiate predictive check of hosts that are being depended upon (in dependency definitions). Predictive checks help ensure that the dependency logic is as accurate as possible. ()
    6. A new cached host check feature has been implemented that can significantly improve performance for many people Instead of executing a plugin to check the status of a host, Nagios can often use a cached host check result instead. More information on this can be found .
    7. Passive host checks that have a DOWN or UNREACHABLE result can now be automatically translated to their proper state from the point of view of the Nagios instance that receives them. This is very useful in failover and distributed monitoring setups. More information on passive host check state translation can be found .
    8. Passive host checks normally put a host into a HARD state. This can now be changed by enabling the option.
  • Freshness checks:
    1. A new option has been added to allow to you specify the number of seconds that should be added to any host or service freshness threshold that is automatically calculated by Nagios.
  • IPC:
    1. The IPC mechanism that is used to transfer host/service check results back to the Nagios daemon from (grand)child processes has changed! This should help to reduce load/latency issues related to processing large numbers of passive checks in distributed monitoring environments.
    2. Check results are now transferred by writing check results to files in directory specified by the option. Files that are older that the option will be mercilessly deleted without further processing.
  • Timeperiods:
    1. Timeperiods were overdue for a major overhaul and have finally been extended to allow for date exceptions, skip dates (every 3 days), etc! This should help you out when defining notification timeperiods for pager rotations.
    2. More information on the new timeperiod directives can be found and .
  • Event Broker:
    1. Updated NEB API version
    2. Modified callback for adaptive program status data
    3. Added callback for adaptive contact status data
    4. Added precheck callbacks for hosts and services to allow modules to cancel/override internal host/service checks.
  • Web Interface:
    1. Hostgroup and servicegroup summaries now show important/unimportant problem breakdowns liek the TAC CGI.
    2. Minor layout changes to host and service detail views in extinfo CGI.
    3. New check statistics and have been added to the "Performance Info" screen.
    4. Added
    5. Added new and options to control what frame notes and action URLs are opened in.
    6. Added new option to prevent alteration of author names when users submit comments, acknowledgements, and scheduled downtime.
  • Deubbing Info:
    1. The DEBUGx compile options available in the configure script for have been removed.
    2. Debugging information can now be written to a separate debug file, which is automatically rotated when it reaches a user-defined size. This should make debugging problems much easier, as you don't need to recompiled Nagios. Full support for writing debugging information to file is being added during the alpha development phase, so it may not be complete when you try it.
    3. Variables that affect the debug log in , , , and .
  • Misc:
    1. Temp path variable - A new variable has been added to specify a scratch directory that Nagios can use for temporary scratch space.
    2. Unique notification and event ID numbers - A unique ID number is now assigned to each host and service notification. Another unique ID is now assigned to all host and service state changes as well. The unique IDs can be accessed using the following respective macros: $HOSTNOTIFICATIONID$, $SERVICENOTIFICATIONID$, $HOSTEVENTID$, $SERVICEEVENTID$, $LASTHOSTEVENTID$, $LASTSERVICEEVENTID$.
    3. New macros - A few new macros (other than those already mentioned elsewhere above) have been added. They include $HOSTGROUPNAMES$, $SERVICEGROUPNAMES$, $HOSTACKAUTHORNAME$, $HOSTACKAUTHORALIAS$, $SERVICEACKAUTHORNAME$, and $SERVICEACKAUTHORALIAS$.
    4. Reaper frequency - The old service_reaper_frequency variable has been renamed to , as it is now also used to process host check results.
    5. Max reaper time - A new variable has been added to limit the amount of time a single reaper event is allowed to run.
    6. Fractional intervals - Fractional notification and check intervals (e.g. "3.5" minutes) are now supported in host, service, host escalation, and service escalation definitions.
    7. Escaped command arguments - You can now pass bang (!) characters in your command arguments by escaping them with a backslash (\). If you need to include backslashes in your command arguments, they should also be escaped with a backslash.
    8. Multiline system command output - Nagios will now read multiple lines out output from system commands it runs (notification scripts, etc.), up to 4K. This matches the limits on plugin output mentioned earliar. Output from system commands is not directly processed by Nagios, but support for it is there nonetheless.
    9. Better scheduling information - More detailed information is given when Nagios is executed with the -s command line option. This information can be used to help the time it takes to start/restart Nagios.
    10. Aggregated status file updates - The old aggregate_status_updates option has been removed. All status file updates are now aggregated at a minimum interval of 1 second.
    11. New performance data file mode - A new "p" option has been added to the and options. This new mode will open the file in non-blocking read/write mode, which is useful for pipes.
    12. Timezone offset - A new option has been added to allow you to run different instances of Nagios in timezones different from the local zone.
第 4 章 入门
4.1. 给新手的建议

祝贺你选择了Nagios!Nagios是一个非常强大且柔性化的软件,但可能需要不少心血来学习如何配置使之符合你所需,一旦掌握了它如何工作并怎样来工作时,你会觉得再也离不开它! :-) 对于初次使用Nagios的新手这有几个建议需要遵从:

  • 放松点 - 这会花些时间。不要指望它事情可以在转瞬间就搞掟,没有那么容易。设置好Nagios是一个费点事的工作,部分是由于对Nagios设置并不清楚,而还可能是由于并不清楚如何来监控现有网络(或者说如何监控会更好)。
  • 使用快速上手指南。本帮助给出了是给那些新手尽快地将Nagios安装到位并运行起来而写就的。在不到二十分钟之内可以安装并监控本地的系统,一旦完成了,就可以继续学习配置Nagios了。
  • 阅读文档。如果掌握Nagios运行机制,可以高效地配置它并且使之无所不能。确信已经阅读了这些文档(是“配置Nagios”和“基本操作”两章)。在更好地理解基础性配置之前可以对那些高级内容暂时不管。
  • 获得他人协助。如果已经阅读文档并检测了样本配置文件但仍然有问题,写一个EMail给nagios-users邮件列表并写清楚问题。由于在这个项目上我有不少事情要做,直接给我的邮件我可能无法回复,所以最好是求助于邮件列表,如果有较好的背景并且可以将问题描述清楚,或许有人可以指出如何正确来做。更多地信息请在这个链接下寻找。

4.2. 旧Nagios升级到当前版本


4.2.1. 从旧的3.x版本升级到当前版本



切换为Nagios用户。使用Debian/Ubuntu系统的可以用sudo -s nagios来切换。

su -l nagios


wget 3.x.tar.gz


tar xzf nagios-3.x.tar.gz cd nagios-3.x


./configure --with-command-group=nagcmd


make all


make install


/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg /sbin/service nagios restart


4.2.2. 从2.x升级到3.x


  1. The old service_reaper_frequency variable in the main config file has been renamed to .
  2. The old $NOTIFICATIONNUMBER$ macro has been deprecated in favor of new and macros.
  3. The old parallelize directive in service definitions is now deprecated and no longer used, as all service checks are run in parallel.
  4. The old aggregate_status_updates option has been removed. All status file updates are now aggregated at a minimum interval of 1 second.
  5. Extended host and extended service definitions have been deprecated. They are still read and processed by Nagios, but it is recommended that you move the directives found in these definitions to your host and service definitions, respectively.
  6. The old downtime_file file variable in the main config file is no longer supported, as scheduled downtime entries are now saved in the . To preserve existing downtime entries, stop Nagios 2.x and append the contents of your old downtime file to the retention file.
  7. The old comment_file file variable in the main config file is no longer supported, as comments are now saved in the . To preserve existing comments, stop Nagios 2.x and append the contents of your old comment file to the retention file.

Also make sure to read the "" section of the documentation. It describes all the changes that were made to the Nagios 3 code since the latest stable release of Nagios 2.x. Quite a bit has changed, so make sure you read it over.

4.2.3. 从RPM包安装状态升级


  1. Main config file (usually nagios.cfg)
  2. Resource config file (usually resource.cfg)
  3. CGI config file (usually cgi.cfg)
  4. All your object definition files
  1. Configuration files
  2. Retention file (usually retention.dat)
  3. Current Nagios log file (usually nagios.log)
  4. Archived Nagios log files
  1. Backup your existing Nagios installation
  2. Uninstall the original RPM or APT package
  3. Install Nagios from source by following the
  4. Restore your original Nagios configuration files, retention file, and log files
  5. your configuration and Nagios


4.3. 快速安装指南
4.3.1. 介绍


4.3.2. 指南




4.3.3. 安装后该做的


4.4. 基于Fedora平台的快速指南
4.4.1. 介绍

本指南试图让你通过简单的指令以在20分钟内在Fedora平台上通过对Nagios的源程序的安装来监控本地主机。这里没有讨论更高级的设置项 - 只是一些基本操作,但这足以使95%的用户启动Nagios。

这些指令在基于Fedora Core 6的系统下写成的。



  1. Nagios和插件将安装到/usr/local/nagios
  2. Nagios将被配置为监控本地系统的几个主要服务(CPU负荷、磁盘利用率等)
  3. Nagios的Web接口是URL是
4.4.2. 准备软件包



  1. Apache
  2. GCC编译器
  3. 库与开发库


yum install httpd yum install gcc yum install glibc glibc-common yum install gd gd-devel

4.4.3. 操作过程



su -l


/usr/sbin/useradd nagios passwd nagios


/usr/sbin/groupadd nagcmd /usr/sbin/usermod -G nagcmd nagios /usr/sbin/usermod -G nagcmd apache



mkdir ~/downloads cd ~/downloads


wget 3.0rc1.tar.gz wget



cd ~/downloads tar xzf nagios-3.0rc1.tar.gz cd nagios-3.0rc1


./configure --with-command-group=nagcmd


make all


make install make install-init make install-config make install-commandmode





vi /usr/local/nagios/etc/objects/contacts.cfg



make install-webconf


htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin


service httpd restart



cd ~/downloads tar xzf nagios-plugins-1.4.11.tar.gz cd nagios-plugins-1.4.11


./configure --with-nagios-user=nagios --with-nagios-group=nagios make make install



chkconfig --add nagios chkconfig nagios on


/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg


service nagios start






setenforce 0



chcon -R -t httpd_sys_content_t /usr/local/nagios/sbin/ chcon -R -t httpd_sys_content_t /usr/local/nagios/share/










  1. (HTTP、FTP、SSH等)
4.5. 基于openSUSE平台的快速指南
4.5.1. 介绍

本指南试图让你通过简单的指令以在20分钟内在你的openSUSE平台上通过对Nagios的源程序的安装来监控本地主机。这里没有讨论更高级的设置项 - 只是一些基本操作,但这足以使95%的用户启动Nagios。


4.5.2. 所需的软件包


  • apache2
  • C/C++开发库
4.5.3. 操作过程



su -l


/usr/sbin/useradd nagios

passwd nagios


/usr/sbin/groupadd nagios

/usr/sbin/usermod -G nagios nagios


/usr/sbin/groupadd nagcmd

/usr/sbin/usermod -G nagcmd nagios

/usr/sbin/usermod -G nagcmd wwwrun



mkdir ~/downloads

cd ~/downloads


wget 3.0rc1.tar.gz




cd ~/downloads

tar xzf nagios-3.0rc1.tar.gz

cd nagios-3.0rc1


./configure --with-command-group=nagcmd


make all


make install

make install-init

make install-config

make install-commandmode

现在还不能启动Nagios - 还有一些要做的...




vi /usr/local/nagios/etc/objects/contacts.cfg



make install-webconf


htpasswd2 -c /usr/local/nagios/etc/htpasswd.users nagiosadmin


service apache2 restart



cd ~/downloads

tar xzf nagios-plugins-1.4.11.tar.gz

cd nagios-plugins-1.4.11


./configure --with-nagios-user=nagios --with-nagios-group=nagios


make install



chkconfig --add nagios

chkconfig nagios on


/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg


service nagios start








  1. 打开控制中心
  2. 选择'打开超户设置'以打开YaST超户控制中心
  3. 选择在'安全与用户'设置里的'防火墙'
  4. 在防火墙的配置窗口中点击'允许的服务'选项
  5. 在许可的服务中增加'HTTP服务',是'外部区'的部分
  6. 点击'下一步'并选择'接受'以使得防火墙设置生效


4.6. 基于Ubuntu平台的快速指南
4.6.1. 介绍



What You'll End Up With


  1. Nagios和插件将安装到/usr/local/nagios
  2. Nagios将被配置为监控本地系统的几个主要服务(CPU负荷、磁盘利用率等)
  3. Nagios的Web接口是URL是
4.6.2. 所需软件包


  1. Apache2
  2. GCC编译器与开发库
  3. GD库与开发库


sudo apt-get install apache2 sudo apt-get install build-essential sudo apt-get install libgd2-dev

4.6.3. 操作过程



sudo -s


/usr/sbin/useradd nagios passwd nagios


/usr/sbin/groupadd nagios /usr/sbin/usermod -G nagios nagios


/usr/sbin/groupadd nagcmd /usr/sbin/usermod -G nagcmd nagios /usr/sbin/usermod -G nagcmd www-data



mkdir ~/downloads cd ~/downloads


wget 3.0rc1.tar.gz wget



cd ~/downloads tar xzf nagios-3.0rc1.tar.gz cd nagios-3.0rc1


./configure --with-command-group=nagcmd


make all


make install make install-init make install-config make install-commandmode





vi /usr/local/nagios/etc/objects/contacts.cfg



make install-webconf


htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin


/etc/init.d/apache2 reload



cd ~/downloads tar xzf nagios-plugins-1.4.11.tar.gz cd nagios-plugins-1.4.11


./configure --with-nagios-user=nagios --with-nagios-group=nagios make make install



ln -s /etc/init.d/nagios /etc/rcS.d/S99nagios


/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg


/etc/init.d/nagios start






sudo apt-get install mailx


sudo /etc/init.d/nagios restart


4.7. 监控Windows主机
4.7.1. 介绍


  1. 内存占用率
  2. CPU负载
  3. Disk利用率
  4. 服务状态
  5. 运行进程
  6. 等等

Publicly available services that are provided by Windows machines (HTTP, FTP, POP3, etc.) can be monitored easily by following the documentation on .

Note: These instructions assume that you've installed Nagios according to the . The sample configuration entries below reference objects that are defined in the sample config files (commands.cfg, templates.cfg, etc.) that are installed if you follow the quickstart.

4.7.2. Overview

Monitoring private services or attributes of a Windows machine requires that you install an agent on it. This agent acts as a proxy between the Nagios plugin that does the monitoring and the actual service or attribute of the Windows machine. Without installing an agent on the Windows box, Nagios would be unable to monitor private services or attributes of the Windows box.

For this programlisting, we will be installing the addon on the Windows machine and using the check_nt plugin to communicate with the NSClient++ addon. The check_nt plugin should already be installed on the Nagios server if you followed the quickstart guide.

Other Windows agents (like ) could be used instead of NSClient++ if you wish - provided you change command and service definitions, etc. a bit. For the sake of simplicity I will only cover using the NSClient++ addon in these instructions.

4.7.3. Steps

There are several steps you'll need to follow in order to monitor a new Windows machine. They are:

  1. Perform first-time prerequisites
  2. Install a monitoring agent on the Windows machine
  3. Create new host and service definitions for monitoring the Windows machine
  4. Restart the Nagios daemon
4.7.4. What's Already Done For You

To make your life a bit easier, a few configuration tasks have already been done for you:

  1. A check_nt command definition has been added to the commands.cfg file. This allows you to use the check_nt plugin to monitor Window services.
  2. A Windows server host template (called windows-server) has already been created in the templates.cfg file. This allows you to add new Windows host definitions in a simple manner.

The above-mentioned config files can be found in the /usr/local/nagios/etc/objects/ directory. You can modify the definitions in these and other definitions to suit your needs better if you'd like. However, I'd recommend waiting until you're more familiar with configuring Nagios before doing so. For the time being, just follow the directions outlined below and you'll be monitoring your Windows boxes in no time.

4.7.5. Prerequisites

The first time you configure Nagios to monitor a Windows machine, you'll need to do a bit of extra work. Remember, you only need to do this for the *first* Windows machine you monitor.

Edit the main Nagios config file.

vi /usr/local/nagios/etc/nagios.cfg

Remove the leading pound (#) sign from the following line in the main configuration file:


Save the file and exit.

What did you just do? You told Nagios to look to the /usr/local/nagios/etc/objects/windows.cfg to find additional object definitions. That's where you'll be adding Windows host and service definitions. That configuration file already contains some sample host, hostgroup, and service definitions. For the *first* Windows machine you monitor, you can simply modify the sample host and service definitions in that file, rather than creating new ones.

4.7.6. Installing the Windows Agent

Before you can begin monitoring private services and attributes of Windows machines, you'll need to install an agent on those machines. I recommend using the NSClient++ addon, which can be found at . These instructions will take you through a basic installation of the NSClient++ addon, as well as the configuration of Nagios for monitoring the Windows machine.

1. Download the latest stable version of the NSClient++ addon from

2. Unzip the NSClient++ files into a new C:\NSClient++ directory

3. Open a command prompt and change to the C:\NSClient++ directory

4. Register the NSClient++ system service with the following command:

nsclient++ /install

5. Install the NSClient++ systray with the following command ('SysTray' is case-sensitive):

nsclient++ SysTray

6. Open the services manager and make sure the NSClientpp service is allowed to interact with the desktop (see the 'Log On' tab of the services manager). If it isn't already allowed to interact with the desktop, check the box to allow it to.

7. Edit the NSC.INI file (located in the C:\NSClient++ directory) and make the following changes:

  1. Uncomment all the modules listed in the [modules] section, except for CheckWMI.dll and RemoteConfiguration.dll
  2. Optionally require a password for clients by changing the 'password' option in the [Settings] section.
  3. Uncomment the 'allowed_hosts' option in the [Settings] section. Add the IP address of the Nagios server to this line, or leave it blank to allow all hosts to connect.
  4. Make sure the 'port' option in the [NSClient] section is uncommented and set to '12489' (the default port).

8. Start the NSClient++ service with the following command:

nsclient++ /start

9. If installed properly, a new icon should appear in your system tray. It will be a yellow circle with a black 'M' inside.

10. Success! The Windows server can now be added to the Nagios monitoring configuration...

4.7.7. Configuring Nagios

Now it's time to define some in your Nagios configuration files in order to monitor the new Windows machine.

Open the windows.cfg file for editing.

vi /usr/local/nagios/etc/objects/windows.cfg

Add a new definition for the Windows machine that you're going to monitor. If this is the *first* Windows machine you're monitoring, you can simply modify the sample host definition in windows.cfg. Change the host_name, alias, and address fields to appropriate values for the Windows box.

define host{

use windows-server ; Inherit default values from a Windows server template (make sure you keep this line!)

host_name winserver

alias My Windows Server



Good. Now you can add some service definitions (to the same configuration file) in order to tell Nagios to monitor different aspects of the Windows machine. If this is the *first* Windows machine you're monitoring, you can simply modify the sample service definitions in windows.cfg.

Note: Replace "winserver" in the programlisting definitions below with the name you specified in the host_name directive of the host definition you just added.

Add the following service definition to monitor the version of the NSClient++ addon that is running on the Windows server. This is useful when it comes time to upgrade your Windows servers to a newer version of the addon, as you'll be able to tell which Windows machines still need to be upgraded to the latest version of NSClient++.

define service{

use generic-service

host_name winserver

service_description NSClient++ Version

check_command check_nt!CLIENTVERSION


Add the following service definition to monitor the uptime of the Windows server.

define service{

use generic-service

host_name winserver

service_description Uptime

check_command check_nt!UPTIME


Add the following service definition to monitor the CPU utilization on the Windows server and generate a CRITICAL alert if the 5-minute CPU load is 90% or more or a WARNING alert if the 5-minute load is 80% or greater.

define service{

use generic-service

host_name winserver

service_description CPU Load

check_command check_nt!CPULOAD!-l 5,80,90


Add the following service definition to monitor memory usage on the Windows server and generate a CRITICAL alert if memory usage is 90% or more or a WARNING alert if memory usage is 80% or greater.

define service{

use generic-service

host_name winserver

service_description Memory Usage

check_command check_nt!MEMUSE!-w 80 -c 90


Add the following service definition to monitor usage of the C:\ drive on the Windows server and generate a CRITICAL alert if disk usage is 90% or more or a WARNING alert if disk usage is 80% or greater.

define service{

use generic-service

host_name winserver

service_description C:\ Drive Space

check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90


Add the following service definition to monitor the W3SVC service state on the Windows machine and generate a CRITICAL alert if the service is stopped.

define service{

use generic-service

host_name winserver

service_description W3SVC

check_command check_nt!SERVICESTATE!-d SHOWALL -l W3SVC


Add the following service definition to monitor the Explorer.exe process on the Windows machine and generate a CRITICAL alert if the process is not running.

define service{

use generic-service

host_name winserver

service_description Explorer

check_command check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe


That's it for now. You've added some basic services that should be monitored on the Windows box. Save the configuration file.

4.7.8. Password Protection

If you specified a password in the NSClient++ configuration file on the Windows machine, you'll need to modify the check_nt command definition to include the password. Open the commands.cfg file for editing.

vi /usr/local/nagios/etc/commands.cfg

Change the definition of the check_nt command to include the "-s " argument (where PASSWORD is the password you specified on the Windows machine) like this:

define command{

command_name check_nt

command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -s PASSWORD -v $ARG1$ $ARG2$


Save the file.

4.7.9. Restarting Nagios

You're done with modifying the Nagios configuration, so you'll need to and .

If the verification process produces any errors messages, fix your configuration file before continuing. Make sure that you don't (re)start Nagios until the verification process completes without any errors!

4.8. 监控Linux/Unix主机
4.8.1. 介绍


  1. CPU负荷
  2. 内存占用率
  3. 磁盘利用率
  4. 登录用户
  5. 运行进程


4.8.2. 概览




4.9. 监控路由器和交换机
4.9.1. 介绍



  1. 包丢弃率,平均回包周期RTA
  2. SNMP状态信息
  3. 带宽与流量
4.9.2. 概览




4.9.3. 步骤


  1. 第一时间执行些必备工作;
  2. 给设备创建要监控的主机与服务对象定义;
  3. 重启动Nagios守护进程。
4.9.4. 已经做了什么?


  1. 两个命令定义(check_snmpcheck_local_mrtgtraf)已经加到了commands.cfg文件中。可以用check_snmpcheck_mrtgtraf插件来监控网络打印机。
  2. 一个交换机模板(命名为generic-switch)已经创建在templates.cfg文件里。可以在对象定义里更容易地加一个新的交换机与路由器设备。


4.9.5. 必备工作



vi /usr/local/nagios/etc/nagios.cfg





4.9.6. 配置Nagios



vi /usr/local/nagios/etc/objects/switch.cfg


define host{ use generic-switch ; Inherit default values from a template host_name linksys-srw224p ; The name we're giving to this switch alias Linksys SRW224P Switch ; A longer name associated with the switch address ; IP address of the switch hostgroups allhosts,switches ; Host groups this switch is associated with }

4.9.7. 监控服务


4.9.8. 监控丢包率和RTA


define service{ use generic-service ; Inherit values from a template host_name linksys-srw224p ; The name of the host the service is associated with service_description PING ; The service description check_command check_ping!200.0,20%!600.0,60% ; The command used to monitor the service normal_check_interval 5 ; Check the service every 5 minutes under normal conditions retry_check_interval 1 ; Re-check the service every minute until its final/hard state is determined }


  1. 紧急(CRITICAL)-条件是RTA大于600ms或丢包率大于等于60%;
  2. 告警(WARNING)-条件是RTA大于200ms或是丢包率大于等于20%;
  3. 正常(OK)-条件是RTA小于200ms或丢包率小于20%
4.9.9. 监控SNMP状态信息



define service{ use generic-service ; Inherit values from a template host_name linksys-srw224p service_description Uptime check_command check_snmp!-C public -o sysUpTime.0 }

在上述服务定义中的check_command域里,用"-C public"来指定SNMP共同体名称为"public",用"-o sysUpTime.0"指明要检测的OID(译者注-MIB节点值)。


define service{ use generic-service ; Inherit values from a template host_name linksys-srw224p service_description Port 1 Link Status check_command check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB }

在上例中,"-o ifOperStatus.1"指出取出交换机的端口编号为1的OID状态。"-r 1"选项是让check_snmp插件检查返回一个正常(OK)状态,如果是在SNMP查询结果中存在"1"(1说明交换机端口处于运行状态)如果没找到1就是紧急(CRITICAL)状态。"-m RFC1213-MIB"是可选的,它告诉check_snmp插件只加载"RFC1213-MIB"库而不是加载每个在系统里的MIB库,这可以加快插件运行速度。


通常可以用如下命令来寻找你想用于监控的OID节点(用你的交换机IP替换192.168.1.253):snmpwalk -v1 -c public -m ALL .1
4.9.10. 监控带宽和流量



define service{ use generic-service ; Inherit values from a template host_name linksys-srw224p service_description Port 1 Bandwidth Usage check_command check_local_mrtgtraf!/var/lib/mrtg/!AVG!1000000,2000000!5000000,5000000!10 }



4.9.11. 重启动Nagios



4.10. 监控网络打印机
4.10.1. 介绍

本文件描述了如何监控网络打印机。特别是有内置或外置JetDirect卡的HP惠普打印机设备,或是其他(象Troy PocketPro 100S或Netgear PS101)支持JetDirect协议的打印机。


  1. 卡纸
  2. 无纸
  3. 打印机离线
  4. 需要人工干预
  5. 墨盒墨粉低
  6. 内存不足
  7. 开外壳
  8. 输出托盘已满
  9. 和其他...
4.10.2. 概览



4.10.3. 步骤


  1. 做些事先准备工作;
  2. 创建一个用于监控打印机的主机与服务对象定义;
  3. 重启动Nagios守护进程。
4.10.4. 已经做了什么?


  1. check_hpjd的命令定义已经加到了commands.cfg配置文件中,可以用check_hpjd插件来监控网络打印机;
  2. 一个网络打印机模板(命名为generic-printer)已经在templates.cfg配置文件里创建好,可以更方便地加入一个新打印机设备的主机对象。


4.10.5. 事先准备工作



vi /usr/local/nagios/etc/nagios.cfg





4.10.6. 配置Nagios



vi /usr/local/nagios/etc/objects/printer.cfg


define host{ use generic-printer ; Inherit default values from a template host_name hplj2605dn ; The name we're giving to this printer alias HP LaserJet 2605dn ; A longer name associated with the printer address ; IP address of the printer hostgroups allhosts ; Host groups this printer is associated with }




define service{ use generic-service ; Inherit values from a template host_name hplj2605dn ; The name of the host the service is associated with service_description Printer Status ; The service description check_command check_hpjd!-C public ; The command used to monitor the service normal_check_interval 10 ; Check the service every 10 minutes under normal conditions retry_check_interval 1 ; Re-check the service every minute until its final/hard state is determined }


define service{ use generic-service host_name hplj2605dn service_description PING check_command check_ping!3000.0,80%!5000.0,100% normal_check_interval 10 retry_check_interval 1 }


4.10.7. 重启动Nagios



4.11. 监控Netware服务器
4.11.1. 介绍


  1. 内存占用率
  2. 处理器利用率
  3. 缓冲区使用情况
  4. 活动的联接
  5. 磁盘卷使用率


4.11.2. 概览


我在找一个志愿者来写就HOWTO文档。我只能接触到一台旧的Netware 4.11服务器,所以无法跟上形势需要。如果可以更新这个文档,请把它张贴到里。
4.11.3. 其他资源


4.12. 监控公众服务平台
4.12.1. Introduction

This document describes how you can monitor publicly available services, applications and protocols. By "public" I mean services that are accessible across the network - either the local network or the greater Internet. Examples of public services include HTTP, POP3, IMAP, FTP, and SSH. There are many more public services that you probably use on a daily basis. These services and applications, as well as their underlying protocols, can usually be monitored by Nagios without any special access requirements.

Private services, in contrast, cannot be monitored with Nagios without an intermediary agent of some kind. Examples of private services associated with hosts are things like CPU load, memory usage, disk usage, current user count, process information, etc. These private services or attributes of hosts are not usually exposed to external clients. This situation requires that an intermediary monitoring agent be installed on any host that you need to monitor such information on. More information on monitoring private services on different types of hosts can be found in the documentation on:

Tip: Occassionally you will find that information on private services and applications can be monitored with SNMP. The SNMP agent allows you to remotely monitor otherwise private (and inaccessible) information about the host. For more information about monitoring services using SNMP, check out the documentation on .

Note: These instructions assume that you've installed Nagios according to the . The sample configuration entries below reference objects that are defined in the sample commands.cfg and localhost.cfg config files.

4.12.2. Plugins For Monitoring Services

When you find yourself needing to monitor a particular application, service, or protocol, chances are good that a exists to monitor it. The official Nagios plugins distribution comes with plugins that can be used to monitor a variety of services and protocols. There are also a large number of contributed plugins that can be found in the contrib/ subdirectory of the plugin distribution. The website hosts a number of additional plugins that have been written by users, so check it out when you have a chance.

If you don't happen to find an appropriate plugin for monitoring what you need, you can always write your own. Plugins are easy to write, so don't let this thought scare you off. Read the documentation on developing plugins for more information.

I'll walk you through monitoring some basic services that you'll probably use sooner or later. Each of these services can be monitored using one of the plugins that gets installed as part of the Nagios plugins distribution. Let's get started...

4.12.3. Creating A Host Definition

Before you can monitor a service, you first need to define a that is associated with the service. You can place host definitions in any object configuration file specified by a directive or placed in a directory specified by a directive. If you have already created a host definition, you can skip this step.

For this programlisting, lets say you want to monitor a variety of services on a remote host. Let's call that host remotehost. The host definition can be placed in its own file or added to an already exiting object configuration file. Here's what the host definition for remotehost might look like:

define host{

use generic-host ; Inherit default values from a template

host_name remotehost ; The name we're giving to this host

alias Some Remote Host ; A longer name associated with the host

address ; IP address of the host

hostgroups allhosts ; Host groups this host is associated with


Now that a definition has been added for the host that will be monitored, we can start defining services that should be monitored. As with host definitions, service definitions can be placed in any object configuration file.

4.12.4. Creating Service Definitions

For each service you want to monitor, you need to define a in Nagios that is associated with the host definition you just created. You can place service definitions in any object configuration file specified by a directive or placed in a directory specified by a directive.

Some programlisting service definitions for monitoring common public service (HTTP, FTP, etc) are given below.

4.12.5. Monitoring HTTP

Chances are you're going to want to monitor web servers at some point - either yours or someone else's. The check_http plugin is designed to do just that. It understands the HTTP protocol and can monitor response time, error codes, strings in the returned HTML, server certificates, and much more.

The commands.cfg file contains a command definition for using the check_http plugin. It looks like this:

define command{

name check_http

command_name check_http

command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$


A simple service definition for monitoring the HTTP service on the remotehost machine might look like this:

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description HTTP

check_command check_http


This simple service definition will monitor the HTTP service running on remotehost. It will produce alerts if the web server doesn't respond within 10 seconds or if it returns HTTP errors codes (403, 404, etc.). That's all you need for basic monitoring. Pretty simple, huh?

Tip: For more advanced monitoring, run the check_http plugin manually with --help as a command-line argument to see all the options you can give the plugin. This --help syntax works with all of the plugins I'll cover in this document.

A more advanced definition for monitoring the HTTP service is shown below. This service definition will check to see if the /download/index.php URI contains the string "latest-version.tar.gz". It will produce an error if the string isn't found, the URI isn't valid, or the web server takes longer than 5 seconds to respond.

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description Product Download Link

check_command check_http!-u /download/index.php -t 5 -s "latest-version.tar.gz"


4.12.6. Monitoring FTP

When you need to monitor FTP servers, you can use the check_ftp plugin. The commands.cfg file contains a command definition for using the check_ftp plugin, which looks like this:

define command{

command_name check_ftp

command_line $USER1$/check_ftp -H $HOSTADDRESS$ $ARG1$


A simple service definition for monitoring the FTP server on remotehost would look like this:

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description FTP

check_command check_ftp


This service definition will monitor the FTP service and generate alerts if the FTP server doesn't respond within 10 seconds.

A more advanced service definition is shown below. This service will check the FTP server running on port 1023 on remotehost. It will generate an alert if the server doesn't respond within 5 seconds or if the server response doesn't contain the string "Pure-FTPd [TLS]".

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description Special FTP

check_command check_ftp!-p 1023 -t 5 -e "Pure-FTPd [TLS]"


4.12.7. Monitoring SSH

When you need to monitor SSH servers, you can use the check_ssh plugin. The commands.cfg file contains a command definition for using the check_ssh plugin, which looks like this:

define command{

command_name check_ssh

command_line $USER1$/check_ssh $ARG1$ $HOSTADDRESS$


A simple service definition for monitoring the SSH server on remotehost would look like this:

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description SSH

check_command check_ssh


This service definition will monitor the SSH service and generate alerts if the SSH server doesn't respond within 10 seconds.

A more advanced service definition is shown below. This service will check the SSH server and generate an alert if the server doesn't respond within 5 seconds or if the server version string string doesn't match "OpenSSH_4.2".

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description SSH Version Check

check_command check_ssh!-t 5 -r "OpenSSH_4.2"


4.12.8. Monitoring SMTP

The check_smtp plugin can be using for monitoring your email servers. The commands.cfg file contains a command definition for using the check_smtp plugin, which looks like this:

define command{

command_name check_smtp

command_line $USER1$/check_smtp -H $HOSTADDRESS$ $ARG1$


A simple service definition for monitoring the SMTP server on remotehost would look like this:

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description SMTP

check_command check_smtp


This service definition will monitor the SMTP service and generate alerts if the SMTP server doesn't respond within 10 seconds.

A more advanced service definition is shown below. This service will check the SMTP server and generate an alert if the server doesn't respond within 5 seconds or if the response from the server doesn't contain "".

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description SMTP Response Check

check_command check_smtp!-t 5 -e ""


4.12.9. Monitoring POP3

The check_pop plugin can be using for monitoring the POP3 service on your email servers. The commands.cfg file contains a command definition for using the check_pop plugin, which looks like this:

define command{

command_name check_pop

command_line $USER1$/check_pop -H $HOSTADDRESS$ $ARG1$


A simple service definition for monitoring the POP3 service on remotehost would look like this:

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description POP3

check_command check_pop


This service definition will monitor the POP3 service and generate alerts if the POP3 server doesn't respond within 10 seconds.

A more advanced service definition is shown below. This service will check the POP3 service and generate an alert if the server doesn't respond within 5 seconds or if the response from the server doesn't contain "".

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description POP3 Response Check

check_command check_pop!-t 5 -e ""


4.12.10. Monitoring IMAP

The check_imap plugin can be using for monitoring IMAP4 service on your email servers. The commands.cfg file contains a command definition for using the check_imap plugin, which looks like this:

define command{

command_name check_imap

command_line $USER1$/check_imap -H $HOSTADDRESS$ $ARG1$


A simple service definition for monitoring the IMAP4 service on remotehost would look like this:

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description IMAP

check_command check_imap


This service definition will monitor the IMAP4 service and generate alerts if the IMAP server doesn't respond within 10 seconds.

A more advanced service definition is shown below. This service will check the IAMP4 service and generate an alert if the server doesn't respond within 5 seconds or if the response from the server doesn't contain "".

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description IMAP4 Response Check

check_command check_imap!-t 5 -e ""


4.12.11. Restarting Nagios

Once you've added the new host and service definitions to your object configuration file(s), you're ready to start monitoring them. To do this, you'll need to and .

If the verification process produces any errors messages, fix your configuration file before continuing. Make sure that you don't (re)start Nagios until the verification process completes without any errors!

第 5 章 准备配置Nagios
5.1. 配置概览
5.1.1. 介绍

在你开始监控网络与系统之前要有同个不同配置文件需要创建和编辑。耐心点,配置Nagios可能是要花些时间特别是对于那些初次使用者。弄清其机理所有的将它们搞定绝对是值得的。 :-)



5.1.2. 主配置文件



5.1.3. 资源配置文件



5.1.4. 对象定义文件




5.1.5. CGI配置文件



5.2. 主配置文件选项


  • 以符号'#'开头的行将视为注释不做处理;
  • 变量必须是新起的一行 - 变量之前不能有空格符;
  • 变量名是大小写敏感的;


5.2.1. 配置文件的位置


5.2.2. 配置文件里的变量


表 5.1. 日志文件

格式: log_file=
样例: log_file=/usr/local/nagios/var/nagios.log


表 5.2. 对象配置文件

格式: cfg_file=





表 5.3. 对象配置目录

格式: cfg_dir=





表 5.4. 对象缓冲文件

格式: object_cache_file=
样例: object_cache_file=/usr/local/nagios/var/objects.cache


表 5.5. 预缓冲对象文件

格式: precached_object_file=
样例: precached_object_file=/usr/local/nagios/var/objects.precache

该变量用于指定一个用于指定一个用于预处理、预缓冲 This directive is used to specify a file in which a pre-processed, pre-cached copy of 复本的文件存放位置。在大型或复杂Nagios安装模式下这个文件可用于显著地减少Nagios的启动时间。如何加快启动的更多信息可以查看内容。

表 5.6. 资源文件

格式: resource_file=
样例: resource_file=/usr/local/nagios/etc/resource.cfg


表 5.7. 临时文件

格式: temp_file=
样例: temp_file=/usr/local/nagios/var/nagios.tmp


表 5.8. 临时路径

格式: temp_path=
样例: temp_path=/tmp


表 5.9. 状态文件

格式: status_file=
样例: status_file=/usr/local/nagios/var/status.dat


表 5.10. 状态文件更新间隔

格式: status_update_interval=
样例: status_update_interval=15


表 5.11. Nagios用户

格式: nagios_user=
样例: nagios_user=nagios


表 5.12. Nagios组

格式: nagios_group=
样例: nagios_group=nagios


表 5.13. 通知选项

格式: enable_notifications=<0/1>
样例: enable_notifications=1


  1. 0 = 关闭通知
  2. 1 = 打开通知(默认)

表 5.14. 服务检测执行选项

格式: execute_service_checks=<0/1>
样例: execute_service_checks=1


  1. 0 = 不执行服务检测
  2. 1 = 执行服务检测(默认)

表 5.15. 被动服务检测结果接受选项

格式: accept_passive_service_checks=<0/1>
样例: accept_passive_service_checks=1


  1. 0 = 不接受强制服务检测结果
  2. 1 = 接受强制服务检测结果(默认)

表 5.16. 主机检测执行选项

格式: execute_host_checks=<0/1>
样例: execute_host_checks=1


  1. 0 = 不执行主机检测
  2. 1 = 执行主机检测(默认)

表 5.17. 强制主机检测接受选项

格式: accept_passive_host_checks=<0/1>
样例: accept_passive_host_checks=1


  1. 0 = 不接受强制主机检测结果
  2. 1 = 接受强制主机检测结果(默认)

表 5.18. 事件处理选项

格式: enable_event_handlers=<0/1>
样例: enable_event_handlers=1


  1. 0 = 禁止事件处理
  2. 1 = 打开事件处理(默认)

表 5.19. 日志回滚方法

格式: log_rotation_method=
样例: log_rotation_method=d


  1. n = None (不做日志回滚 - 这个是默认值)
  2. h = Hourly (每小时做一次日志回滚)
  3. d = Daily (每天午夜做日志回滚)
  4. w = Weekly (每周六午夜做日志回滚)
  5. m = Monthly (每月最后一天的午夜做日志回滚)

表 5.20. 日志打包路径

格式: log_archive_path=
样例: log_archive_path=/usr/local/nagios/var/archives/


表 5.21. 外部命令检查选项

格式: check_external_commands=<0/1>
样例: check_external_commands=1


  1. 0 = 不做外部命令检测
  2. 1 = 检测外部命令(默认值)

表 5.22. 外部命令检测间隔

格式: command_check_interval=[s]
样例: command_check_interval=1



表 5.23. 外部命令文件

格式: command_file=
样例: command_file=/usr/local/nagios/var/rw/nagios.cmd


表 5.24. 外部命令缓冲队列数

格式: external_command_buffer_slots=<#>
样例: external_command_buffer_slots=512


表 5.25. 互锁文件

格式: lock_file=
样例: lock_file=/tmp/nagios.lock


表 5.26. 状态保持选项

格式: retain_state_information=<0/1>
样例: retain_state_information=1


  1. 0 = 不保存状态保持信息
  2. 1 = 保留状态保持信息(默认)

表 5.27. 状态保持文件

格式: state_retention_file=
样例: state_retention_file=/usr/local/nagios/var/retention.dat


表 5.28. 自动状态保持的更新间隔

格式: retention_update_interval=
样例: retention_update_interval=60


表 5.29. 程序所用状态的使用选项

格式: use_retained_program_state=<0/1>
样例: use_retained_program_state=1


  1. 0 = 不使用程序变量的状态值
  2. 1 = 使用状态保持文件中的程序变量状态记录(默认)

表 5.30. 使用保持计划表信息选项

格式: use_retained_scheduling_info=<0/1>
样例: use_retained_scheduling_info=1


  1. 0 = 不使用计划表信息
  2. 1 = 使用保存的计划表信息(默认)

表 5.31. 保持主机和服务属性掩码









表 5.32. 保持进程属性掩码









表 5.33. 保持联系人属性掩码









表 5.34. Syslog日志选项

格式: use_syslog=<0/1>
样例: use_syslog=1


  1. 0 = 不使用Syslog机制
  2. 1 = 使用Syslog机制

表 5.35. 通知记录日志选项

格式: log_notifications=<0/1>
样例: log_notifications=1


  1. 0 = 不记录通知
  2. 1 = 记录通知

表 5.36. 服务检测重试记录选项

格式: log_service_retries=<0/1>
样例: log_service_retries=1


  1. 0 = 不记录服务检测重试
  2. 1 = 记录服务检测重试

表 5.37. 主机检测重试记录选项

格式: log_host_retries=<0/1>
样例: log_host_retries=1


  1. 0 = 不记录主机检测重试
  2. 1 = 记录主机检测重试

表 5.38. 事件处理记录选项

格式: log_event_handlers=<0/1>
样例: log_event_handlers=1


  1. 0 = 不记录事件处理
  2. 1 = 记录事件处理

表 5.39. 初始状态记录选项

格式: log_initial_states=<0/1>
样例: log_initial_states=1


  1. 0 = 不记录初始状态(默认)
  2. 1 = 记录初始状态

表 5.40. 外部命令记录选项

格式: log_external_commands=<0/1>
样例: log_external_commands=1

该选项决定了Nagios是否要记录,外部命令是从外部命令文件中提取的。注意:这个选项并不控制是否要对 (一种外部命令类型)进行记录。为使能或关闭对强制服务检测的记录,使用强制检测记录选项。

  1. 0 = 不记录外部命令
  2. 1 = 记录外部命令(默认)

表 5.41. 强制检测记录选项

格式: log_passive_checks=<0/1>
样例: log_passive_checks=1


  1. 0 = 不记录强制检测
  2. 1 = 记录强制检测(默认)

表 5.42. 全局主机事件处理选项

格式: global_host_event_handler=
样例: global_host_event_handler=log-host-event-to-db


表 5.43. 全局服务事件处理选项

格式: global_service_event_handler=
样例: global_service_event_handler=log-service-event-to-db


表 5.44. 检测休止时间间隔

格式: sleep_time=
样例: sleep_time=1


表 5.45. 服务检测迟滞间隔计数方法

格式: service_inter_check_delay_method=
样例: service_inter_check_delay_method=s

该选项容许你控制服务检测将如何初始展开事件队列。 Using a "smart" delay calculation (the default) will cause Nagios to calculate an average check interval and spread initial checks of all services out over that interval, thereby helping to eliminate CPU load spikes. Using no delay is generally not recommended, as it will cause all service checks to be scheduled for execution at the same time. This means that you will generally have large CPU spikes when the services are all executed in parallel. More information on how to estimate how the inter-check delay affects service check scheduling can be found . Values are as follows:

  1. n = Don't use any delay - schedule all service checks to run immediately (i.e. at the same time!)
  2. d = Use a "dumb" delay of 1 second between service checks
  3. s = Use a "smart" delay calculation to spread service checks out evenly (default)
  4. x.xx = Use a user-supplied inter-check delay of x.xx seconds

表 5.46. 最大服务检测传播时间

格式: max_service_check_spread=
样例: max_service_check_spread=30

This option determines the maximum number of minutes from when Nagios starts that all services (that are scheduled to be regularly checked) are checked. This option will automatically adjust the service inter-check delay method (if necessary) to ensure that the initial checks of all services occur within the timeframe you specify. In general, this option will not have an affect on service check scheduling if scheduling information is being retained using the use_retained_scheduling_info option. 默认值是30分钟。

表 5.47. 服务交错因子

格式: service_interleave_factor=x>
样例: service_interleave_factor=s

This variable determines how service checks are interleaved. Interleaving allows for a more even distribution of service checks, reduced load on remote hosts, and faster overall detection of host problems. Setting this value to 1 is equivalent to not interleaving the service checks (this is how versions of Nagios previous to 0.0.5 worked). Set this value to s (smart) for automatic calculation of the interleave factor unless you have a specific reason to change it. The best way to understand how interleaving works is to watch the (detailed view) when Nagios is just starting. You should see that the service check results are spread out as they begin to appear. More information on how interleaving works can be found .

  1. x = A number greater than or equal to 1 that specifies the interleave factor to use. An interleave factor of 1 is equivalent to not interleaving the service checks.
  2. s = Use a "smart" interleave factor calculation (default)

表 5.48. 最大并发服务检测数

格式: max_concurrent_checks=
样例: max_concurrent_checks=20


表 5.49. 检测结果的回收频度

格式: check_result_reaper_frequency=
样例: check_result_reaper_frequency=5


表 5.50. 最大检测结果回收时间段

格式: max_check_result_reaper_time=
样例: max_check_result_reaper_time=30


表 5.51. 检测结果保存路径

格式: check_result_path=
样例: check_result_path=/var/spool/nagios/checkresults



表 5.52. 检测结果文件的最大生存时间

格式: max_check_result_file_age=
样例: max_check_result_file_age=3600


表 5.53. 主机检测迟滞间隔计数方式

格式: host_inter_check_delay_method=
样例: host_inter_check_delay_method=s

This option allows you to control how host checks that are scheduled to be checked on a regular basis are initially "spread out" in the event queue. Using a "smart" delay calculation (the default) will cause Nagios to calculate an average check interval and spread initial checks of all hosts out over that interval, thereby helping to eliminate CPU load spikes. Using no delay is generally not recommended. Using no delay will cause all host checks to be scheduled for execution at the same time. More information on how to estimate how the inter-check delay affects host check scheduling can be found .Values are as follows:

  1. n = Don't use any delay - schedule all host checks to run immediately (i.e. at the same time!)
  2. d = Use a "dumb" delay of 1 second between host checks
  3. s = Use a "smart" delay calculation to spread host checks out evenly (default)
  4. x.xx = Use a user-supplied inter-check delay of x.xx seconds

表 5.54. 最大主机检测传播时间

格式: max_host_check_spread=
样例: max_host_check_spread=30

This option determines the maximum number of minutes from when Nagios starts that all hosts (that are scheduled to be regularly checked) are checked. This option will automatically adjust the host inter-check delay method (if necessary) to ensure that the initial checks of all hosts occur within the timeframe you specify. In general, this option will not have an affect on host check scheduling if scheduling information is being retained using the use_retained_scheduling_info option. Default value is 30 (minutes).

表 5.55. 计数间隔长度

格式: interval_length=
样例: interval_length=60



表 5.56. 自动计划检测选项

格式: auto_reschedule_checks=<0/1>
样例: auto_reschedule_checks=1



表 5.57. Auto-Rescheduling Interval

格式: auto_rescheduling_interval=
样例: auto_rescheduling_interval=30

This option determines how often (in seconds) Nagios will attempt to automatically reschedule checks. This option only has an effect if the auto_reschedule_checks option is enabled. Default is 30 seconds.


表 5.58. Auto-Rescheduling Window

格式: auto_rescheduling_window=
样例: auto_rescheduling_window=180

This option determines the "window" of time (in seconds) that Nagios will look at when automatically rescheduling checks. Only host and service checks that occur in the next X seconds (determined by this variable) will be rescheduled. This option only has an effect if the auto_reschedule_checks option is enabled. Default is 180 seconds (3 minutes).


表 5.59. 进取式主机检测选项

格式: use_aggressive_host_checking=<0/1>
样例: use_aggressive_host_checking=0

Nagios tries to be smart about how and when it checks the status of hosts. In general, disabling this option will allow Nagios to make some smarter decisions and check hosts a bit faster. Enabling this option will increase the amount of time required to check hosts, but may improve reliability a bit. Unless you have problems with Nagios not recognizing that a host recovered, I would suggest not enabling this option.

  1. 0 = Don't use aggressive host checking (default)
  2. 1 = Use aggressive host checking

表 5.60. 传递强制主机检测结果选项

格式: translate_passive_host_checks=<0/1>
样例: translate_passive_host_checks=1

This option determines whether or not Nagios will DOWN/UNREACHABLE passive host check results to their "correct" state from the viewpoint of the local Nagios instance. This can be very useful in distributed and failover monitoring installations. More information on passive check state translation can be found .

  1. 0 = Disable check translation (default)
  2. 1 = Enable check translation

表 5.61. Passive Host Checks Are SOFT Option

格式: passive_host_checks_are_soft=<0/1>
样例: passive_host_checks_are_soft=1

This option determines whether or not Nagios will treat as HARD states or SOFT states. By default, a passive host check result will put a host into a . You can change this behavior by enabling this option.

  1. 0 = Passive host checks are HARD (default)
  2. 1 = Passive host checks are SOFT

表 5.62. Predictive Host Dependency Checks Option

格式: enable_predictive_host_dependency_checks=<0/1>
样例: enable_predictive_host_dependency_checks=1

This option determines whether or not Nagios will execute predictive checks of hosts that are being dependended upon (as defined in ) for a particular host when it changes state.

Predictive checks help ensure that the dependency logic is as accurate as possible. More information on how predictive checks work can be found .

  1. 0 = Disable predictive checks
  2. 1 = Enable predictive checks (default)

表 5.63. Predictive Service Dependency Checks Option

格式: enable_predictive_service_dependency_checks=<0/1>
样例: enable_predictive_service_dependency_checks=1

This option determines whether or not Nagios will execute predictive checks of services that are being dependended upon (as defined in ) for a particular service when it changes state.

Predictive checks help ensure that the dependency logic is as accurate as possible. More information on how predictive checks work can be found .

  1. 0 = Disable predictive checks
  2. 1 = Enable predictive checks (default)

表 5.64. Cached Host Check Horizon

格式: cached_host_check_horizon=
样例: cached_host_check_horizon=15

This option determines the maximum amount of time (in seconds) that the state of a previous host check is considered current. Cached host states (from host checks that were performed more recently than the time specified by this value) can improve host check performance immensely. Too high of a value for this option may result in (temporarily) inaccurate host states, while a low value may result in a performance hit for host checks. Use a value of 0 if you want to disable host check caching. More information on cached checks can be found .

表 5.65. Cached Service Check Horizon

格式: cached_service_check_horizon=
样例: cached_service_check_horizon=15

This option determines the maximum amount of time (in seconds) that the state of a previous service check is considered current. Cached service states (from service checks that were performed more recently than the time specified by this value) can improve service check performance when a lot of are used. Too high of a value for this option may result in inaccuracies in the service dependency logic. Use a value of 0 if you want to disable service check caching. More information on cached checks can be found .

表 5.66. Large Installation Tweaks Option

格式: use_large_installation_tweaks=<0/1>
样例: use_large_installation_tweaks=0

This option determines whether or not the Nagios daemon will take several shortcuts to improve performance. These shortcuts result in the loss of a few features, but larger installations will likely see a lot of benefit from doing so. More information on what optimizations are taken when you enable this option can be found .

  1. 0 = Don't use tweaks (default)
  2. 1 = Use tweaks

表 5.67. 子进程内存选项

格式: free_child_process_memory=<0/1>
样例: free_child_process_memory=0

This option determines whether or not Nagios will free memory in child processes when they are fork()ed off from the main process. By default, Nagios frees memory. However, if the option is enabled, it will not. By defining this option in your configuration file, you are able to override things to get the behavior you want.

  1. 0 = Don't free memory
  2. 1 = Free memory

表 5.68. 子进程二次派生选项

格式: child_processes_fork_twice=<0/1>
样例: child_processes_fork_twice=0

This option determines whether or not Nagios will fork() child processes twice when it executes host and service checks. By default, Nagios fork()s twice. However, if the option is enabled, it will only fork() once. By defining this option in your configuration file, you are able to override things to get the behavior you want.

  1. 0 = Fork() just once
  2. 1 = Fork() twice

表 5.69. 环境变量中标准宏可用性选项

格式: enable_environment_macros=<0/1>
样例: enable_environment_macros=0

This option determines whether or not the Nagios daemon will make all standard available as environment variables to your check, notification, event hander, etc. commands. In large Nagios installations this can be problematic because it takes additional memory and (more importantly) CPU to compute the values of all macros and make them available to the environment.

  1. 0 = Don't make macros available as environment variables
  2. 1 = Make macros available as environment variables (default)

表 5.70. Flap Detection Option

格式: enable_flap_detection=<0/1>
样例: enable_flap_detection=0

This option determines whether or not Nagios will try and detect hosts and services that are "flapping". Flapping occurs when a host or service changes between states too frequently, resulting in a barrage of notifications being sent out. When Nagios detects that a host or service is flapping, it will temporarily suppress notifications for that host/service until it stops flapping. Flap detection is very experimental at this point, so use this feature with caution! More information on how flap detection and handling works can be found .注意:如果你使能状态保持选项(保存于状态保持文件中)而忽略这个设置,除非你已经关闭选项。如果你想在保持选项使能(且选项使能)的情况下修改这个选项,你只得用适当的或是通过Web接口来修改它。选项可用的值有:

  1. 0 = Don't enable flap detection (default)
  2. 1 = Enable flap detection

表 5.71. Low Service Flap Threshold

格式: low_service_flap_threshold=
样例: low_service_flap_threshold=25.0

This option is used to set the low threshold for detection of service flapping. For more information on how flap detection and handling works (and how this option affects things) read .

表 5.72. High Service Flap Threshold

格式: high_service_flap_threshold=
样例: high_service_flap_threshold=50.0

This option is used to set the low threshold for detection of service flapping. For more information on how flap detection and handling works (and how this option affects things) read .

表 5.73. Low Host Flap Threshold

格式: low_host_flap_threshold=
样例: low_host_flap_threshold=25.0

This option is used to set the low threshold for detection of host flapping. For more information on how flap detection and handling works (and how this option affects things) read .

表 5.74. High Host Flap Threshold

格式: high_host_flap_threshold=
样例: high_host_flap_threshold=50.0

This option is used to set the low threshold for detection of host flapping. For more information on how flap detection and handling works (and how this option affects things) read .

表 5.75. Soft State Dependencies Option

格式: soft_state_dependencies=<0/1>
样例: soft_state_dependencies=0

This option determines whether or not Nagios will use soft state information when checking . Normally Nagios will only use the latest hard host or service state when checking dependencies. If you want it to use the latest state (regardless of whether its a soft or hard ), enable this option.

  1. 0 = Don't use soft state dependencies (default)
  2. 1 = Use soft state dependencies

表 5.76. 服务检测超时

格式: service_check_timeout=
样例: service_check_timeout=60

This is the maximum number of seconds that Nagios will allow service checks to run. If checks exceed this limit, they are killed and a 紧急 state is returned. A timeout error will also be logged.

There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off plugins which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more), so that each service check normally finishes executing within this time limit. If a service check runs longer than this limit, Nagios will kill it off thinking it is a runaway processes.

表 5.77. 主机检测超时

格式: host_check_timeout=
样例: host_check_timeout=60

This is the maximum number of seconds that Nagios will allow host checks to run. If checks exceed this limit, they are killed and a 紧急 state is returned and the host will be assumed to be DOWN. A timeout error will also be logged.

There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off plugins which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more), so that each host check normally finishes executing within this time limit. If a host check runs longer than this limit, Nagios will kill it off thinking it is a runaway processes.

表 5.78. 事件处理超时

格式: event_handler_timeout=
样例: event_handler_timeout=60

This is the maximum number of seconds that Nagios will allow to be run. If an event handler exceeds this time limit it will be killed and a warning will be logged.

There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off commands which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more), so that each event handler command normally finishes executing within this time limit. If an event handler runs longer than this limit, Nagios will kill it off thinking it is a runaway processes.

表 5.79. 通知超时

格式: notification_timeout=
样例: notification_timeout=60

This is the maximum number of seconds that Nagios will allow notification commands to be run. If a notification command exceeds this time limit it will be killed and a warning will be logged.

There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off commands which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more), so that each notification command finishes executing within this time limit. If a notification command runs longer than this limit, Nagios will kill it off thinking it is a runaway processes.

表 5.80. Obsessive Compulsive Service Processor Timeout

格式: ocsp_timeout=
样例: ocsp_timeout=5

This is the maximum number of seconds that Nagios will allow an obsessive compulsive service processor command to be run. If a command exceeds this time limit it will be killed and a warning will be logged.

表 5.81. Obsessive Compulsive Host Processor Timeout

格式: ochp_timeout=
样例: ochp_timeout=5

This is the maximum number of seconds that Nagios will allow an obsessive compulsive host processor command to be run. If a command exceeds this time limit it will be killed and a warning will be logged.

表 5.82. 性能数据处理命令超时

格式: perfdata_timeout=
样例: perfdata_timeout=5

This is the maximum number of seconds that Nagios will allow a host performance data processor command or service performance data processor command to be run. If a command exceeds this time limit it will be killed and a warning will be logged.

表 5.83. Obsess Over Services Option

格式: obsess_over_services=<0/1>
样例: obsess_over_services=1

This value determines whether or not Nagios will "obsess" over service checks results and run the obsessive compulsive service processor command you define. I know - funny name, but it was all I could think of. This option is useful for performing . If you're not doing distributed monitoring, don't enable this option.

  1. 0 = Don't obsess over services (default)
  2. 1 = Obsess over services

表 5.84. Obsessive Compulsive Service Processor Command

格式: ocsp_command=
样例: ocsp_command=obsessive_service_handler

This option allows you to specify a command to be run after every service check, which can be useful in . This command is executed after any or commands. The command argument is the short name of a that you define in your 对象配置文件. The maximum amount of time that this command can run is controlled by the ocsp_timeout option. More information on distributed monitoring can be found . This command is only executed if the obsess_over_services option is enabled globally and if the obsess_over_service directive in the is enabled.

表 5.85. Obsess Over Hosts Option

格式: obsess_over_hosts=<0/1>
样例: obsess_over_hosts=1

This value determines whether or not Nagios will "obsess" over host checks results and run the obsessive compulsive host processor command you define. I know - funny name, but it was all I could think of. This option is useful for performing . If you're not doing distributed monitoring, don't enable this option.

  1. 0 = Don't obsess over hosts (default)
  2. 1 = Obsess over hosts

表 5.86. Obsessive Compulsive Host Processor Command

格式: ochp_command=
样例: ochp_command=obsessive_host_handler

This option allows you to specify a command to be run after every host check, which can be useful in . This command is executed after any or commands. The command argument is the short name of a that you define in your 对象配置文件. The maximum amount of time that this command can run is controlled by the ochp_timeout option. More information on distributed monitoring can be found . This command is only executed if the obsess_over_hosts option is enabled globally and if the obsess_over_host directive in the is enabled.

表 5.87. 性能数据处理选项

格式: process_performance_data=<0/1>
样例: process_performance_data=1


  1. 0 = Don't process performance data (default)
  2. 1 = Process performance data

表 5.88. 主机性能数据处理命令

格式: host_perfdata_command=
样例: host_perfdata_command=process-host-perfdata

This option allows you to specify a command to be run after every host check to process host that may be returned from the check. The command argument is the short name of a that you define in your 对象配置文件. This command is only executed if the process_performance_data option is enabled globally and if the process_perf_data directive in the is enabled.

表 5.89. 服务性能数据处理命令

格式: service_perfdata_command=
样例: service_perfdata_command=process-service-perfdata

This option allows you to specify a command to be run after every service check to process service that may be returned from the check. The command argument is the short name of a that you define in your 对象配置文件. This command is only executed if the process_performance_data option is enabled globally and if the process_perf_data directive in the is enabled.

表 5.90. 主机性能数据文件

格式: host_perfdata_file=
样例: host_perfdata_file=/usr/local/nagios/var/host-perfdata.dat

This option allows you to specify a file to which host will be written after every host check. Data will be written to the performance file as specified by the host_perfdata_file_template option. Performance data is only written to this file if the process_performance_data option is enabled globally and if the process_perf_data directive in the is enabled.

表 5.91. 服务性能数据文件

格式: service_perfdata_file=
样例: service_perfdata_file=/usr/local/nagios/var/service-perfdata.dat

This option allows you to specify a file to which service will be written after every service check. Data will be written to the performance file as specified by the option. Performance data is only written to this file if the process_performance_data option is enabled globally and if the process_perf_data directive in the is enabled.

表 5.92. 主机性能数据文件模板

格式: host_perfdata_file_template=