Chinaunix首页 | 论坛 | 博客
  • 博客访问: 287347
  • 博文数量: 65
  • 博客积分: 1514
  • 博客等级: 中尉
  • 技术积分: 820
  • 用 户 组: 普通用户
  • 注册时间: 2011-10-20 21:01
文章分类

全部博文(65)

文章存档

2012年(65)

分类: LINUX

2012-02-19 22:24:34

Nagios 3.x
Nagios在线帮助中文版
Dr. 田朝阳
项目负责人
sourceforge软件开发者组织
nagios-cn项目

0.1.0

Nagios版权归nagios软件的著作权者所有,本书仅对中文化后内容保留著作权。需要提醒的是:无论你将采用何种方式来引用本书,全部或部分章节,请一定要给出本书的来源站点是,并且一定引用sourceforge站点的相关出版物的版权提示与声明。

修订历史
修订 0.0.3 30/01/2008 enochcytian
将翻译完成的部分初步生成在线帮助文档。
修订 0.0.2 20/12/2007 enochcytian
建立DocBook工程,从源html文件反向生成xml章节文件。
修订 0.0.1 12/12/2007 enochcytian
建立初稿,开始编写初始文件。

摘要

Nagios是一款非常优秀的网络主机管理软件,它在开源社区的影响力是非同寻常的。但很可惜的是,它的界面及操作使用过程中采用了英语的语言提示与源程序紧密结合使得这款软件的汉化界面迟迟不能推出,影响了它在中文区的使用。为推进Nagios的使用,笔者建立了nagios-cn工程,该工程的主要目标是翻译源程序中运行提示、界面生成和文档说明,通过一些努力,nagios-cn终于可以正常运转了,本书编写的主要目的是为在中文使用区域推广和使用Nagios软件,让这款优秀的软件为国人服务。

题词
1. 致谢

首先要感谢我的家人,是他们在身后的支持使得我得以着手做我所喜欢的事情,我的家人是我引以自豪的根本,相信他们也会为我所做的而自豪。

其次要感谢Nagios的作者,是Ethan Galstad给我们带来了这么好的一款软件,也是他给我的回信,让我知道了Nagios软件将向何处努力与发展。

最后要感谢开源社区,特别是sourceforge和google.code站点,是在这两个组织的支持下使得nagios-cn得以生存下去,并使得越来越多的人感受到开源社区的对人类的贡献。

第 1 章 序

相信玩计算机网络的人都或多或少地知道网络管理这一类型软件,但真正在实际中使用并以此为工作基础的人相信并不多,毕竟它不象游戏或字处理类软件那么常见。要不是某些事情所迫,我也不会尽心来了解并使用网管软件,在2004年年底,因为某些任务实在安排不下,“尚有剩余时间”的我接下研究一款网络管理软件的事情。没有最终目标,没有时间截止期限,也不会有太多的人员资金投入,但要把一些很实际的问题解决掉,这就是这些工作的起点。

好在软件并不难以安装和试用,我只花了一天就下载、编译和安装好了,试着把配置文件改了一下,也可以操作着试着用了,但操作界面丑陋、配置更新繁琐、初建系统工作量大等一系列问题使我不得不怀疑是否还需要它?毕竟有一款商业化的软件就放在手边,虽然定制得不太合乎要求,但至少没有这么繁杂的责任背身上,毕竟,我可以不为这些事情负责任的。

考虑在三,"放弃"并不是我想要做的,既然时间没有限制,那就两条腿走路吧,先稳妥地配置好那个商业化软件,让它可以操作与运转,但对后序的改动,只好开启一个记录库,不断地将问题记录下来,而对于Nagios,再清理一下思路,先看看到底我要它做些什么事情,在使用中会有多少问题需要解决,解决到什么程度,再把现有条件对比一下,看看能否走通。

不断地尝试与调整是一个漫长的过程,尤其是到着手编写检测插件的阶段,并不是象想像中的那么顺利,好在时间是挤出来的,写来写去竟然也有了些心得,顺手把Perl和BASH给练习了(只是这些插件与工作内容相关,可惜不能公开),也把几个Nagios安装和运行中常有问题给改掉了,还写了个专门给实施和运行用的BASH脚本方便后来者研究和利用它。

再往下,因为工作情况有变,把掌握的东西交付出来,让它真正有所实用。而后面再搞东西就完全是自己的兴趣了,我先后对nagios-cn项目加入了SVG格式支持、把RRD和Grapher功能整合、写SPEC以定制RPM、增加DocBook转换工程等等,每每做完这些总能让人感到有一种新鲜愉快的感受。

直到最后阶段,我才想到要宣传和推广它,也是因为脱离工作内容的关系,使我做的这些事情不再带有工作内容才有条件在网上公开,这就是后面几个网站或博客上给出的日益增多的项目信息,这本书也是其中的一部分。

第 2 章 关于Nagios
2.1. 什么是Nagios?

Nagios是一款用于系统和网络监控的应用程序。它可以在你设定的条件下对主机和服务进行监控,在状态变差和变好的时候给出告警信息。

Nagios最初被设计为在系统之上运行,然而它同样可以在类Unix的系统之上运行。

Nagios更进一步的特征包括:

  1. 监控网络服务(SMTP、POP3、HTTP、NNTP、PING等);
  2. 监控主机资源(处理器负荷、磁盘利用率等);
  3. 简单地插件设计使得用户可以方便地扩展自己服务的检测方法;
  4. 并行服务检查机制;
  5. 具备定义网络分层结构的能力,用"parent"主机定义来表达网络主机间的关系,这种关系可被用来发现和明晰主机宕机或不可达状态;
  6. 当服务或主机问题产生与解决时将告警发送给联系人(通过EMail、短信、用户定义方式);
  7. 具备定义事件句柄功能,它可以在主机或服务的事件发生时获取更多问题定位;
  8. 自动的日志回滚;
  9. 可以支持并实现对主机的冗余监控;
  10. 可选的WEB界面用于查看当前的网络状态、通知和故障历史、日志文件等;
2.2. 系统需求

Nagios所需要的运行条件是机器必须可以运行Linux(或是Unix变种)并且有C语言编译器。你必须正确地配置TCP/IP协议栈以使大多数的服务检测可以通过网络得以进行。

你需要但并非必须正确地配置Nagios里的CGIs程序,而一旦你要使用CGI程序时,你必须要安装以下这些软件...

  1. 一个WEB服务(最好是)
  2. Thomas Boutell制作的版本应是1.6.3或更高(在CGIs程序模块和这两个模块里需要这个库)
2.3. 版权

Nagios版权遵从于由所发布的GNU版权协议第二版。有关GNU协议请查阅网站。该版权协议允许你在某些条件下可以复制、分发并且或者是修改它。可以在Nagios软件发行包里阅读版权文件LICENSE或是在网站上阅读文件以获取更多信息。

Nagios is provided AS IS with NO WARRANTY OF ANY KIND, INCLUDING THE WARRANTY OF DESIGN, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.

2.4. 致谢

一些人对Nagios的发布尽力,不管是报告错误、提供建议、编写插件等等,可以在网站上找到这些人的名字列表。

2.5. 下载最新版本

可以在Nagios站点获取最新版本。

注意

Nagios及Nagios商业标识由Ethan Galstad所拥有。其他的商业标识、服务标识、注册商标及注册服务属于各自的所有者。

第 3 章 Nagios 3.0新特性
重要

Important: Make sure you read through the documentation and the FAQs at before sending a question to the mailing lists.

3.1. 更新日志

Nagios的更新日志可以在这里的在线文件或是在源程序的发行包的根目录里找到。

3.2. 变更与新特征
  • 文档:
    1. 更新了文档 - 很抱歉我对文档的更新工作进展迟缓。这会花些时间来做,因为有很多文档而且写这些文档并不是我喜欢的事情(我更不喜欢整天翻译,这也不是我喜欢的事情)。期待一些文档与其他的有所不同,而这些不同会对于那些新人或有经验的Nagios使用者起些作用。
  • 内嵌宏:
    1. 新加宏 - 加入了一些新宏,包括:$TEMPPATH$、$LONGHOSTOUTPUT$、$LONGSERVICEOUTPUT$、$HOSTNOTIFICATIONID$、$SERVICENOTIFICATIONID$、$HOSTEVENTID$、$SERVICEEVENTID$、$SERVICEISVOLATILE$、$LASTHOSTEVENTID$、$LASTSERVICEEVENTID$、$HOSTDISPLAYNAME$、$SERVICEDISPLAYNAME$、$MAXHOSTATTEMPTS$、$MAXSERVICEATTEMPTS$、$TOTALHOSTSERVICES$、$TOTALHOSTSERVICESOK$、$TOTALHOSTSERVICESWARNING$、$TOTALHOSTSERVICESUNKNOWN$、$TOTALHOSTSERVICESCRITICAL$、$CONTACTGROUPNAME$、$CONTACTGROUPNAMES$、$CONTACTGROUPALIAS$、$CONTACTGROUPMEMBERS$、$NOTIFICATIONRECIPIENTS$、$NOTIFICATIONISESCALATED$、$NOTIFICATIONAUTHOR$、$NOTIFICATIONAUTHORNAME$、$NOTIFICATIONAUTHORALIAS$、$NOTIFICATIONCOMMENT$、$EVENTSTARTTIME$、$HOSTPROBLEMID$、$LASTHOSTPROBLEMID$、$SERVICEPROBLEMID$、$LASTSERVICEPROBLEMID$、$LASTHOSSTATE$、$LASTHOSTSTATEID$、$LASTSERVICESTATE$、$LASTSERVICESTATEID$。加入了两个特殊的守护时间宏:$ISVALIDTIME:$和$NEXTVALIDTIME:$。
    2. 移除的宏 - 原有的宏$NOTIFICATIONNUMBER$被分离为两个新宏$HOSTNOTIFICATIONNUMBER$和$SERVICENOTIFICATIONNUMBER$。
    3. 变更的宏 - 现有的$HOSTNOTES$和$SERVICENOTES$宏包括自身外,还包括$HOSTNOTESURL$、$HOSTACTIONURL$、$SERVICENOTESURL$和$SERVICEACTIONURL$等几个宏。
    4. 在检测、事件句柄处理、告警和其他外部命令执行时,宏可以获取环境变量。这可会使Nagios在大型部署方案时占用较高的CPU处理能力,你可以设置 选项来不使能它。
    5. 有关宏的更新信息可以在查到。
  • 预定义停机时间:
    1. 不再保存在各自文件(之前是由主配置文件里的downtime_file来指定)。当前的和保留的预定义停机时间将分别保存于和中。
  • 注释:
    1. 主机和服务的注释不再保存于各自的文件(之前在主配置文件中的comment_file来指定)。当前的和保留的注释将分别保存于和之中。
    2. Acknowledgement comments that are marked as non-persistent are now only deleted when the acknowledgement is removed. They were previously automatically deleted when Nagios restarted, which was not ideal.
  • State Retention Data:
    1. Status information for individual contacts is now retained across program restarts.
    2. Comment and downtime IDs are now retained across program restarts and should be unique unless the retention data is deleted or ignored.
    3. Added and variables to control what host/service attributes are retained globally across program restarts.
    4. Added and variables to control what process attributes are retained across program restarts.
    5. Added and variables to control what contact attributes are retained globally across program restarts.
  • Flap Detection:
    1. Added flap_detection_options directive to host and service definitions to allow you to specify what host/service states should be used by the flap detection logic (by default all states are used).
    2. Percent state change and state history are now retained and recorded even when flap detection is disabled.
    3. Hosts and services are immediately checked for flapping when flap detection is enabled program-wide.
    4. Hosts and services that are flapping when flap detection is disabled program-wide are now logged.
    5. More information on flap detection can be found .
  • External Commands:
    1. Added a new PROCESS_FILE external command to allow processing of external commands found in an eternal (regular) file. Useful for processing large amounts of passive checks with long output, or for scripting regular commands. More information can be found here.
    2. Custom commands may now be submitted to Nagios. Custom command names are prefixed with an underscore and are not processed internally by the Nagios daemon. They may, however, be processed by a loaded NEB module.
    3. The option is now enabled by default, which means Nagios is configured to check for external "commands out of the box". All 2.x and earlier versions of Nagios had this option disabled by default.
  • Status Data:
    1. Contact status information (last notification times, notifications enabled/disabled, etc.) is now saved in the and files, although it is not processed by the CGIs.
  • Embedded Perl:
    1. Added new and variables to control use of the embedded Perl interpreter.
    2. Perl scripts/plugins can now explicitly tell Nagios whether or not they should be run under the embedded Pel interpreter. This is useful if you have troublesome scripts that don't function well under the ePN.
    3. More information about these new optios can be found .
  • Adaptive Monitoring:
    1. The check timeperiod for hosts and services can now be modified on-the-fly with the appropriate external command (CHANGE_HOST_CHECK_TIMEPERIOD or CHANGE_SVC_CHECK_TIMEPERIOD).查阅这个网页以取得更多可用的适应性检测命令。
  • Notifications:
    1. A first_notification_delay option has been added to host and service definitions to (what else) introduce a delay between when a host/service problem first occurs and when the first problem notification goes out. In previous versions you had to use some mighty config-fu with escalations to accomplish this. Now this feature is available to normal mortals.
    2. Notifications are now sent out for hosts/services that are flapping when flap detection is disabled on a host- or service-specific basis or on a program-wide basis. The $NOTIFICATIONTYPE$ macro will be set to "FLAPPINGDISABLED" in this situation.
    3. Notifications can now be sent out when scheduled downtime start, ends, and is cancelled for hosts and services. The $NOTIFICATIONTYPE$ macro will be set to "DOWNTIMESTART", "DOWNTIMEEND", or "DOWNTIMECANCELLED", respectively. In order to received notifications on scheduled downtime events, specify "s" or "downtime" in your contact, host, and/or service notification options.
    4. More information on notifications can be found .
  • Object Definitions:
    1. Service dependencies can now be created to easily define "same host" dependencies for different services on one or more hosts. ()
    2. Extended host and service definitions (hostextinfo and serviceextinfo, respectively) have been deprecated. All values that from extended definitions have been merged with host or service definitions, as appropriate. Nagios 3 will continue to read and process older extended information definitions, but will log a warning. Future versions of Nagios (4.x and later) will not support separate extended info definitions.
    3. New hostgroup_members, servicegroup_members, and contactgroup_members directives have been added to hostgroup, servicegroup, and contactgroups definitions, respectively. This allows you to include hosts, services, or contacts from sub-groups in your group definitions.
    4. New notes, notes_url, and action_url have been added to hostgroup and servicegroup definition.
    5. Contact definitions have the new host_notifications_enabled, service_notifications_enabled, and can_submit_commands directives to better control notifications and determine whether or not they can submit commands through the web interface.
    6. Host and service dependencies now support an optional dependency_period directive. This allows you to limit the times during which dependencies are valid.
    7. The parallelize directive in service definitions is now deprecated and no longer used. All service checks are run in parallel in Nagios 3.
    8. There are no longer any inherent limitations on the length of host names or service descriptions.
    9. Extended regular expressions are now used if you enable the config option. Regular expression matching is only used in certain object definition directives that contain *, ?, +, or \..
    10. A new initial_state directive has been added to host and service definitions, so you can tell Nagios that a host/service should default to a specific state when Nagios starts, rather than UP or OK (which is still the default).
  • Object Inheritance:
    1. You can now inherit object variables/values from multiple templates by specifying more than one template name in the use directive of object definitions. This can allow for some very powerful (and complex) inheritance setups. ()
    2. Services now inherit contact groups, notification interval, and notification period from their associated host if not otherwise specified. ()
    3. Host and service escalations now inherit contact groups, notification interval, and escalation timeperiod fro their associated host or service if not otherwise specified. ()
    4. String variables in host, service, and contact definitions can now be prevented from being inherited by specifying a value of "null" (without quotes) for the value of the variable. ()
    5. Most string variables in local object definitions can now be appended to the string values that are inherited. This is quite handy in large configurations. ()
  • Performance Improvements:
    1. Add ability to precache object config files and exclude circular path detection checks from verification process. This can speed up Nagios start time immensely in large environments! Read more .
    2. A new option has been added that should improve performance in large Nagios installations. Read more about this .
    3. A number of internal improvements have been made with regards to how Nagios deals with internal data structures and object (e.g. host and service) relationships. These improvements should result in a speedup for larger installations.
    4. New option has been added to allow you to more easily scale Nagios in large environments. For best results you should consider using Nagios' usage of buffer slots over time.
  • Plugin Output:
    1. Multiline plugin output is now supported for host and service checks. Hooray! The plugin API has been updated to support multiple lines of output in a manner that retains backward compatability with older plugins. Additional lines of output (aside from the first line) are now stored in new $LONGHOSTOUTPUT$ and $LONGSERVICEOUTPUT$ macros.
    2. The maximum length of plugin output has been increased to 4K (from around 350 bytes in previous versions). This 4K limit has been arbitrarily chosen to protect again runaway plugins that dump back too much data to Nagios.
    3. More information on the plugins, multiline output, and max plugin output length can be found .
  • Service Checks:
    1. Nagios now checks for orphaned service checks by default.
    2. Added a new option to control whether or not Nagios will initiate predictive check of service that are being depended upon (in dependency definitions). Predictive checks help ensure that the dependency logic is as accurate as possible. ()
    3. A new cached service check feature has been implemented that can significantly improve performance for many people Instead of executing a plugin to check the status of a service, Nagios can often use a cached service check result instead. More information on this can be found .
  • Host Checks:
    1. Host checks are now run in parallel! Host checks used to be run in a serial fashion, which meant they were a major holdup in terms of performance. No longer! ()
    2. Host check retries are now performed like service check retries. That is to say, host definitions now have a new retry_interval that specifies how much time to wait before trying the host check again. :-)
    3. Regularly scheduled host checks now longer hinder performance. In fact, they can help to increase performance with the new cached check logic (see below).
    4. Added a new option to enable checks of orphaned host checks. This is need now that host checks are run in parallel.
    5. Added a new option to control whether or not Nagios will initiate predictive check of hosts that are being depended upon (in dependency definitions). Predictive checks help ensure that the dependency logic is as accurate as possible. ()
    6. A new cached host check feature has been implemented that can significantly improve performance for many people Instead of executing a plugin to check the status of a host, Nagios can often use a cached host check result instead. More information on this can be found .
    7. Passive host checks that have a DOWN or UNREACHABLE result can now be automatically translated to their proper state from the point of view of the Nagios instance that receives them. This is very useful in failover and distributed monitoring setups. More information on passive host check state translation can be found .
    8. Passive host checks normally put a host into a HARD state. This can now be changed by enabling the option.
  • Freshness checks:
    1. A new option has been added to allow to you specify the number of seconds that should be added to any host or service freshness threshold that is automatically calculated by Nagios.
  • IPC:
    1. The IPC mechanism that is used to transfer host/service check results back to the Nagios daemon from (grand)child processes has changed! This should help to reduce load/latency issues related to processing large numbers of passive checks in distributed monitoring environments.
    2. Check results are now transferred by writing check results to files in directory specified by the option. Files that are older that the option will be mercilessly deleted without further processing.
  • Timeperiods:
    1. Timeperiods were overdue for a major overhaul and have finally been extended to allow for date exceptions, skip dates (every 3 days), etc! This should help you out when defining notification timeperiods for pager rotations.
    2. More information on the new timeperiod directives can be found and .
  • Event Broker:
    1. Updated NEB API version
    2. Modified callback for adaptive program status data
    3. Added callback for adaptive contact status data
    4. Added precheck callbacks for hosts and services to allow modules to cancel/override internal host/service checks.
  • Web Interface:
    1. Hostgroup and servicegroup summaries now show important/unimportant problem breakdowns liek the TAC CGI.
    2. Minor layout changes to host and service detail views in extinfo CGI.
    3. New check statistics and have been added to the "Performance Info" screen.
    4. Added
    5. Added new and options to control what frame notes and action URLs are opened in.
    6. Added new option to prevent alteration of author names when users submit comments, acknowledgements, and scheduled downtime.
  • Deubbing Info:
    1. The DEBUGx compile options available in the configure script for have been removed.
    2. Debugging information can now be written to a separate debug file, which is automatically rotated when it reaches a user-defined size. This should make debugging problems much easier, as you don't need to recompiled Nagios. Full support for writing debugging information to file is being added during the alpha development phase, so it may not be complete when you try it.
    3. Variables that affect the debug log in , , , and .
  • Misc:
    1. Temp path variable - A new variable has been added to specify a scratch directory that Nagios can use for temporary scratch space.
    2. Unique notification and event ID numbers - A unique ID number is now assigned to each host and service notification. Another unique ID is now assigned to all host and service state changes as well. The unique IDs can be accessed using the following respective macros: $HOSTNOTIFICATIONID$, $SERVICENOTIFICATIONID$, $HOSTEVENTID$, $SERVICEEVENTID$, $LASTHOSTEVENTID$, $LASTSERVICEEVENTID$.
    3. New macros - A few new macros (other than those already mentioned elsewhere above) have been added. They include $HOSTGROUPNAMES$, $SERVICEGROUPNAMES$, $HOSTACKAUTHORNAME$, $HOSTACKAUTHORALIAS$, $SERVICEACKAUTHORNAME$, and $SERVICEACKAUTHORALIAS$.
    4. Reaper frequency - The old service_reaper_frequency variable has been renamed to , as it is now also used to process host check results.
    5. Max reaper time - A new variable has been added to limit the amount of time a single reaper event is allowed to run.
    6. Fractional intervals - Fractional notification and check intervals (e.g. "3.5" minutes) are now supported in host, service, host escalation, and service escalation definitions.
    7. Escaped command arguments - You can now pass bang (!) characters in your command arguments by escaping them with a backslash (\). If you need to include backslashes in your command arguments, they should also be escaped with a backslash.
    8. Multiline system command output - Nagios will now read multiple lines out output from system commands it runs (notification scripts, etc.), up to 4K. This matches the limits on plugin output mentioned earliar. Output from system commands is not directly processed by Nagios, but support for it is there nonetheless.
    9. Better scheduling information - More detailed information is given when Nagios is executed with the -s command line option. This information can be used to help the time it takes to start/restart Nagios.
    10. Aggregated status file updates - The old aggregate_status_updates option has been removed. All status file updates are now aggregated at a minimum interval of 1 second.
    11. New performance data file mode - A new "p" option has been added to the and options. This new mode will open the file in non-blocking read/write mode, which is useful for pipes.
    12. Timezone offset - A new option has been added to allow you to run different instances of Nagios in timezones different from the local zone.
第 4 章 入门
4.1. 给新手的建议

祝贺你选择了Nagios!Nagios是一个非常强大且柔性化的软件,但可能需要不少心血来学习如何配置使之符合你所需,一旦掌握了它如何工作并怎样来工作时,你会觉得再也离不开它! :-) 对于初次使用Nagios的新手这有几个建议需要遵从:

  • 放松点 - 这会花些时间。不要指望它事情可以在转瞬间就搞掟,没有那么容易。设置好Nagios是一个费点事的工作,部分是由于对Nagios设置并不清楚,而还可能是由于并不清楚如何来监控现有网络(或者说如何监控会更好)。
  • 使用快速上手指南。本帮助给出了是给那些新手尽快地将Nagios安装到位并运行起来而写就的。在不到二十分钟之内可以安装并监控本地的系统,一旦完成了,就可以继续学习配置Nagios了。
  • 阅读文档。如果掌握Nagios运行机制,可以高效地配置它并且使之无所不能。确信已经阅读了这些文档(是“配置Nagios”和“基本操作”两章)。在更好地理解基础性配置之前可以对那些高级内容暂时不管。
  • 获得他人协助。如果已经阅读文档并检测了样本配置文件但仍然有问题,写一个EMail给nagios-users邮件列表并写清楚问题。由于在这个项目上我有不少事情要做,直接给我的邮件我可能无法回复,所以最好是求助于邮件列表,如果有较好的背景并且可以将问题描述清楚,或许有人可以指出如何正确来做。更多地信息请在这个链接下寻找。

4.2. 旧Nagios升级到当前版本

目录

4.2.1. 从旧的3.x版本升级到当前版本

如果是使用3.x的旧版,肯定是要尽快升级到当前版本。新版本修正了许多错误,下面假定已经根据的操作步骤从源代码包开始安装好Nagios,下面可以安装更新的版本。虽然下面的操作都是用root操作的,但可以不用root权限也可以升级成功。下面是升级过程...

先确认已经备份好现有版本的Nagios软件和配置文件。如果升级过程中有不对的,至少可以回退到旧版本。

切换为Nagios用户。使用Debian/Ubuntu系统的可以用sudo -s nagios来切换。

su -l nagios

下载最新的Nagios安装包()。

wget 3.x.tar.gz

展开源码包。

tar xzf nagios-3.x.tar.gz cd nagios-3.x

运行Nagios源程序的配置脚本,把加入外部命令的组名加上,象这样:

./configure --with-command-group=nagcmd

编译源程序

make all

安装升级后的二进制程序、文档和Web接口程序。在这步时旧配置文件还不会被覆盖。

make install

验证配置并重启动Nagios

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg /sbin/service nagios restart

好了,升级完成!

4.2.2. 从2.x升级到3.x

Nagios从2.x升级到3.x并不难。升级过程如同上面的旧版3.x的升级过程。但是Nagios3.x中有几处配置文件的改动需要注意:

  1. The old service_reaper_frequency variable in the main config file has been renamed to .
  2. The old $NOTIFICATIONNUMBER$ macro has been deprecated in favor of new and macros.
  3. The old parallelize directive in service definitions is now deprecated and no longer used, as all service checks are run in parallel.
  4. The old aggregate_status_updates option has been removed. All status file updates are now aggregated at a minimum interval of 1 second.
  5. Extended host and extended service definitions have been deprecated. They are still read and processed by Nagios, but it is recommended that you move the directives found in these definitions to your host and service definitions, respectively.
  6. The old downtime_file file variable in the main config file is no longer supported, as scheduled downtime entries are now saved in the . To preserve existing downtime entries, stop Nagios 2.x and append the contents of your old downtime file to the retention file.
  7. The old comment_file file variable in the main config file is no longer supported, as comments are now saved in the . To preserve existing comments, stop Nagios 2.x and append the contents of your old comment file to the retention file.

Also make sure to read the "" section of the documentation. It describes all the changes that were made to the Nagios 3 code since the latest stable release of Nagios 2.x. Quite a bit has changed, so make sure you read it over.

4.2.3. 从RPM包安装状态升级

如果当前是用RPM包安装的,或是用Debian/Ubuntu的APT软件包来安装Nagios的,需要用源程序包来安装升级,下面是操作步骤:

  1. Main config file (usually nagios.cfg)
  2. Resource config file (usually resource.cfg)
  3. CGI config file (usually cgi.cfg)
  4. All your object definition files
  1. Configuration files
  2. Retention file (usually retention.dat)
  3. Current Nagios log file (usually nagios.log)
  4. Archived Nagios log files
  1. Backup your existing Nagios installation
  2. Uninstall the original RPM or APT package
  3. Install Nagios from source by following the
  4. Restore your original Nagios configuration files, retention file, and log files
  5. your configuration and Nagios

注意RPM和APT包把Nagios的文件放置的位置有所不同。在升级前要确保那些配置文件备份好以在碰到解决不了的升级问题时可以回退到旧版本。

4.3. 快速安装指南
4.3.1. 介绍

这些指南试图让你在二十分钟内用简单地指令操作下从源程序安装Nagios并监控你的本地机器。这里并不讨论那些高级指令对于95%以上的想起步的用户而言这是基础。

4.3.2. 指南

现在可以提供如下Linux发行版本上的快速安装指南:

你可以在的维基百科上找到更多的安装上手指南。什么?找不到你所用的操作系统版本的指南?在维基百科上给其他人写一条吧!

如果你在一个上面没列出的操作系统或Linux发行包上安装Nagios,请参照来概要地了解一下你需要做的事情。命令名、路径等可能因不同的发行包或操作系统而不同,因而这时你可能需要些努力来搞一下安装文档里的东西。

4.3.3. 安装后该做的

一旦你正确地安装并使Nagios运行起来后,毫无疑问你不仅要监控你的主机,你需要审视一下更多的文档来做更多的事情...

4.4. 基于Fedora平台的快速指南
4.4.1. 介绍

本指南试图让你通过简单的指令以在20分钟内在Fedora平台上通过对Nagios的源程序的安装来监控本地主机。这里没有讨论更高级的设置项 - 只是一些基本操作,但这足以使95%的用户启动Nagios。

这些指令在基于Fedora Core 6的系统下写成的。

最终结果是什么

如果按照本指南安装,最后将是这样结果:

  1. Nagios和插件将安装到/usr/local/nagios
  2. Nagios将被配置为监控本地系统的几个主要服务(CPU负荷、磁盘利用率等)
  3. Nagios的Web接口是URL是
4.4.2. 准备软件包

在做安装之前确认要对该机器拥有root权限。

确认你安装好的Fedora系统上已经安装如下软件包再继续。

  1. Apache
  2. GCC编译器
  3. 库与开发库

可以用yum命令来安装这些软件包,键入命令:

yum install httpd yum install gcc yum install glibc glibc-common yum install gd gd-devel

4.4.3. 操作过程

1)建立一个帐号

切换为root用户

su -l

创建一个名为nagios的帐号并给定登录口令

/usr/sbin/useradd nagios passwd nagios

创建一个用户组名为nagcmd用于从Web接口执行外部命令。将nagios用户和apache用户都加到这个组中。

/usr/sbin/groupadd nagcmd /usr/sbin/usermod -G nagcmd nagios /usr/sbin/usermod -G nagcmd apache

2)下载Nagios和插件程序包

建立一个目录用以存储下载文件

mkdir ~/downloads cd ~/downloads

下载Nagios和Nagios插件的软件包(访问站点以获得最新版本),在写本文档时,最新的Nagios的软件版本是3.0rc1,Nagios插件的版本是1.4.11。

wget 3.0rc1.tar.gz wget

3)编译与安装Nagios

展开Nagios源程序包

cd ~/downloads tar xzf nagios-3.0rc1.tar.gz cd nagios-3.0rc1

运行Nagios配置脚本并使用先前开设的用户及用户组:

./configure --with-command-group=nagcmd

编译Nagios程序包源码

make all

安装二进制运行程序、初始化脚本、配置文件样本并设置运行目录权限

make install make install-init make install-config make install-commandmode

现在还不能启动Nagios-还有一些要做的...

4)客户化配置

样例默认安装在这个目录下/usr/local/nagios/etc,这些样例文件可以配置Nagios使之正常运行,只需要做一个简单的修改...

用你擅长的编辑器软件来编辑这个/usr/local/nagios/etc/objects/contacts.cfg配置文件,更改email地址nagiosadmin的联系人定义信息中的EMail信息为你的EMail信息以接收报警内容。

vi /usr/local/nagios/etc/objects/contacts.cfg

5)配置WEB接口

安装Nagios的WEB配置文件到Apache的conf.d目录下

make install-webconf

创建一个nagiosadmin的用户用于Nagios的WEB接口登录。记下你所设置的登录口令,一会儿你会用到它。

htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin

重启Apache服务以使设置生效。

service httpd restart

6)编译并安装Nagios插件

展开Nagios插件的源程序包

cd ~/downloads tar xzf nagios-plugins-1.4.11.tar.gz cd nagios-plugins-1.4.11

编译并安装插件

./configure --with-nagios-user=nagios --with-nagios-group=nagios make make install

7)启动Nagios

把Nagios加入到服务列表中以使之在系统启动时自动启动

chkconfig --add nagios chkconfig nagios on

验证Nagios的样例配置文件

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

如果没有报错,可以启动Nagios服务

service nagios start

8)更改SELinux设置

Fedora与SELinux(安全增强型Linux)同步发行与安装后将默认使用强制模式。这会在你尝试联入Nagios的CGI时导致一个"内部服务错误"消息。

如果是SELinux处于强制安全模式时需要做

getenforce

令SELinux处于容许模式

setenforce 0

如果要永久性更变它,需要更改/etc/selinux/config里的设置并重启系统。

不关闭SELinux或是永久性变更它的方法是让CGI模块在SELinux下指定强制目标模式:

chcon -R -t httpd_sys_content_t /usr/local/nagios/sbin/ chcon -R -t httpd_sys_content_t /usr/local/nagios/share/

更多有关Nagios的CGI模块增加目标策略的强制权限方式见NagiosCommunity.org的维基百科。

9)登录WEB接口

你现在可以从WEB方式来接入Nagios的WEB接口了,你需要在提示下输入你的用户名(nagiosadmin)和口令,你刚刚设置的,这里用系统默认安装的浏览器,用下面这个超链接

点击“服务详情”的引导超链来查看你本机的监视详情。你可能需要给点时间让Nagios来检测你机器上所依赖的服务因为检测需要些时间。

10)其他的变更

确信你机器的防火墙规则配置允许你可以从远程登录到Nagios的WEB服务。

配置EMail的报警项超出了本文档的内容,指向你的系统档案用网页查找或是到这个站点来查找更进一步的信息,以使你的系统上可以向外部地址发送EMail信息。更多有关通知的信息可以查阅文档。

11)完成了

祝贺你已经成功安装好Nagios,但网络监控工作只是刚开始。毫无疑问你不是只监控本地系统,所以要看以下这些文档...

  1. (HTTP、FTP、SSH等)
4.5. 基于openSUSE平台的快速指南
4.5.1. 介绍

本指南试图让你通过简单的指令以在20分钟内在你的openSUSE平台上通过对Nagios的源程序的安装来监控本地主机。这里没有讨论更高级的设置项 - 只是一些基本操作,但这足以使95%的用户启动Nagios。

这些指令在基于openSUSE10.2的系统下写成的。

4.5.2. 所需的软件包

确认你安装好的openSUSE系统之上已经安装了如下软件包再继续。你可以在openSUSE系统下用yast来安装软件包。

  • apache2
  • C/C++开发库
4.5.3. 操作过程

1)建立一个帐号

切换为root用户

su -l

创建新帐户名为nagios并给它一个登录口令

/usr/sbin/useradd nagios

passwd nagios

创建一个用户组名为nagios,并把nagios帐户加入该组

/usr/sbin/groupadd nagios

/usr/sbin/usermod -G nagios nagios

创建一个用户组名为nagcmd来执行外部命令并可以通过WEB接口来执行。将nagios用户和apache用户都加到这个组中。

/usr/sbin/groupadd nagcmd

/usr/sbin/usermod -G nagcmd nagios

/usr/sbin/usermod -G nagcmd wwwrun

2)下载Nagios和插件程序包

建立一个目录用以存储下载文件

mkdir ~/downloads

cd ~/downloads

下载Nagios和Nagios插件的软件包(访问站点以获得最新版本),在写本文档时,最新的Nagios的软件版本是3.0rc1,Nagios插件的版本是1.4.11。

wget 3.0rc1.tar.gz

wget

3)编译与安装Nagios

展开Nagios源程序包

cd ~/downloads

tar xzf nagios-3.0rc1.tar.gz

cd nagios-3.0rc1

运行Nagios配置脚本并使用先前开设的用户及用户组:

./configure --with-command-group=nagcmd

编译Nagios程序包源码

make all

安装二进制运行程序、初始化脚本、配置文件样本并设置运行目录权限

make install

make install-init

make install-config

make install-commandmode

现在还不能启动Nagios - 还有一些要做的...

4)客户化配置

样例默认安装在这个目录下/usr/local/nagios/etc,这些样例文件可以配置Nagios使之正常运行,只需要做一个简单的修改...

用你擅长的编辑器软件来编辑这个/usr/local/nagios/etc/objects/contacts.cfg配置文件,更改email地址nagiosadmin的联系人定义信息中的EMail信息为你的EMail信息以接收报警内容。

vi /usr/local/nagios/etc/objects/contacts.cfg

5)配置WEB接口

安装Nagios的WEB配置文件到Apache的conf.d目录下

make install-webconf

创建一个nagiosadmin的用户用于Nagios的WEB接口登录。记下你所设置的登录口令,一会儿你会用到它。

htpasswd2 -c /usr/local/nagios/etc/htpasswd.users nagiosadmin

重启Apache服务以使设置生效。

service apache2 restart

6)编译并安装Nagios插件

展开Nagios插件的源程序包

cd ~/downloads

tar xzf nagios-plugins-1.4.11.tar.gz

cd nagios-plugins-1.4.11

编译并安装插件

./configure --with-nagios-user=nagios --with-nagios-group=nagios

make

make install

7)启动Nagios

把Nagios加入到服务列表中以使之在系统启动时自动启动

chkconfig --add nagios

chkconfig nagios on

验证Nagios的样例配置文件

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

如果没有报错,可以启动Nagios服务

service nagios start

8)登录WEB接口

你现在可以从WEB方式来接入Nagios的WEB接口了,你需要在提示下输入你的用户名(nagiosadmin)和口令,你刚刚设置的,这里用系统默认安装的浏览器,用下面这个超链接

konqueror

点击“服务详情”的引导超链来查看你本机的监视详情。你可能需要给点时间让Nagios来检测你机器上所依赖的服务因为检测需要些时间。

9)其他的变更

确信你机器的防火墙规则配置允许你可以从远程登录到Nagios的WEB服务。

你可以这样做:

  1. 打开控制中心
  2. 选择'打开超户设置'以打开YaST超户控制中心
  3. 选择在'安全与用户'设置里的'防火墙'
  4. 在防火墙的配置窗口中点击'允许的服务'选项
  5. 在许可的服务中增加'HTTP服务',是'外部区'的部分
  6. 点击'下一步'并选择'接受'以使得防火墙设置生效

配置EMail的报警项超出了本文档的内容,指向你的系统档案用网页查找或是到这个站点来查找更进一步的信息,以使你的openSUSE系统上可以向外部地址发送EMail信息。

4.6. 基于Ubuntu平台的快速指南
4.6.1. 介绍

本指南试图让你通过简单的指令以在20分钟内在Ubuntu平台上通过对Nagios的源程序的安装来监控本地主机。没有讨论更高级的设置项-只是一些基本操作,但这足以使95%的用户启动Nagios。

这些指令在基于Ubuntu6.10(桌面版)的系统下写成的。

What You'll End Up With

如果按照本指南安装,最后将是这样结果:

  1. Nagios和插件将安装到/usr/local/nagios
  2. Nagios将被配置为监控本地系统的几个主要服务(CPU负荷、磁盘利用率等)
  3. Nagios的Web接口是URL是
4.6.2. 所需软件包

确认你安装好的系统上已经安装如下软件包再继续。

  1. Apache2
  2. GCC编译器与开发库
  3. GD库与开发库

可以用apt-get命令来安装这些软件包,键入命令:

sudo apt-get install apache2 sudo apt-get install build-essential sudo apt-get install libgd2-dev

4.6.3. 操作过程

1)建立一个帐号

切换为root用户

sudo -s

创建一个名为nagios的帐号并给定登录口令

/usr/sbin/useradd nagios passwd nagios

在Ubuntu服务器版(6.01或更高版本),创建一个用户组名为nagios(默认是不创建的)。在Ubuntu桌面版上要跳过这一步。

/usr/sbin/groupadd nagios /usr/sbin/usermod -G nagios nagios

创建一个用户组名为nagcmd用于从Web接口执行外部命令。将nagios用户和apache用户都加到这个组中。

/usr/sbin/groupadd nagcmd /usr/sbin/usermod -G nagcmd nagios /usr/sbin/usermod -G nagcmd www-data

2)下载Nagios和插件程序包

建立一个目录用以存储下载文件

mkdir ~/downloads cd ~/downloads

下载Nagios和Nagios插件的软件包(访问站点以获得最新版本),在写本文档时,最新的Nagios的软件版本是3.0rc1,Nagios插件的版本是1.4.11。

wget 3.0rc1.tar.gz wget

3)编译与安装Nagios

展开Nagios源程序包

cd ~/downloads tar xzf nagios-3.0rc1.tar.gz cd nagios-3.0rc1

运行Nagios配置脚本并使用先前开设的用户及用户组:

./configure --with-command-group=nagcmd

编译Nagios程序包源码

make all

安装二进制运行程序、初始化脚本、配置文件样本并设置运行目录权限

make install make install-init make install-config make install-commandmode

现在还不能启动Nagios-还有一些要做的...

4)客户化配置

样例默认安装在这个目录下/usr/local/nagios/etc,这些样例文件可以配置Nagios使之正常运行,只需要做一个简单的修改...

用你擅长的编辑器软件来编辑这个/usr/local/nagios/etc/objects/contacts.cfg配置文件,更改email地址nagiosadmin的联系人定义信息中的EMail信息为你的EMail信息以接收报警内容。

vi /usr/local/nagios/etc/objects/contacts.cfg

5)配置WEB接口

安装Nagios的WEB配置文件到Apache的conf.d目录下

make install-webconf

创建一个nagiosadmin的用户用于Nagios的WEB接口登录。记下你所设置的登录口令,一会儿你会用到它。

htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin

重启Apache服务以使设置生效。

/etc/init.d/apache2 reload

6)编译并安装Nagios插件

展开Nagios插件的源程序包

cd ~/downloads tar xzf nagios-plugins-1.4.11.tar.gz cd nagios-plugins-1.4.11

编译并安装插件

./configure --with-nagios-user=nagios --with-nagios-group=nagios make make install

7)启动Nagios

把Nagios加入到服务列表中以使之在系统启动时自动启动

ln -s /etc/init.d/nagios /etc/rcS.d/S99nagios

验证Nagios的样例配置文件

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

如果没有报错,可以启动Nagios服务

/etc/init.d/nagios start

8)登录WEB接口

你现在可以从WEB方式来接入Nagios的WEB接口了,你需要在提示下输入你的用户名(nagiosadmin)和口令,你刚刚设置的,这里用系统默认安装的浏览器,用下面这个超链接

点击“服务详情”的引导超链来查看你本机的监视详情。你可能需要给点时间让Nagios来检测你机器上所依赖的服务因为检测需要些时间。

9)其他的变更

如果要接收Nagios的EMail警报,需要安装(Postfix)包

sudo apt-get install mailx

需要编辑Nagios里的EMail通知送出命令,它位于/usr/local/nagios/etc/commands.cfg文件中,将里面的'/bin/mail'全部替换为'/usr/bin/mail'。一旦设置好需要重启动Nagios以使配置生效。

sudo /etc/init.d/nagios restart

配置EMail的报警项超出了本文档的内容,指向你的系统档案用网页查找或是到这个站点来查找更进一步的信息,以使Ubuntu系统上可以向外部地址发送EMail信息。

4.7. 监控Windows主机
4.7.1. 介绍

本文用来说明如何监控Windows主机的本地服务和特性,包括:

  1. 内存占用率
  2. CPU负载
  3. Disk利用率
  4. 服务状态
  5. 运行进程
  6. 等等

Publicly available services that are provided by Windows machines (HTTP, FTP, POP3, etc.) can be monitored easily by following the documentation on .

Note: These instructions assume that you've installed Nagios according to the . The sample configuration entries below reference objects that are defined in the sample config files (commands.cfg, templates.cfg, etc.) that are installed if you follow the quickstart.

4.7.2. Overview

Monitoring private services or attributes of a Windows machine requires that you install an agent on it. This agent acts as a proxy between the Nagios plugin that does the monitoring and the actual service or attribute of the Windows machine. Without installing an agent on the Windows box, Nagios would be unable to monitor private services or attributes of the Windows box.

For this programlisting, we will be installing the addon on the Windows machine and using the check_nt plugin to communicate with the NSClient++ addon. The check_nt plugin should already be installed on the Nagios server if you followed the quickstart guide.

Other Windows agents (like ) could be used instead of NSClient++ if you wish - provided you change command and service definitions, etc. a bit. For the sake of simplicity I will only cover using the NSClient++ addon in these instructions.

4.7.3. Steps

There are several steps you'll need to follow in order to monitor a new Windows machine. They are:

  1. Perform first-time prerequisites
  2. Install a monitoring agent on the Windows machine
  3. Create new host and service definitions for monitoring the Windows machine
  4. Restart the Nagios daemon
4.7.4. What's Already Done For You

To make your life a bit easier, a few configuration tasks have already been done for you:

  1. A check_nt command definition has been added to the commands.cfg file. This allows you to use the check_nt plugin to monitor Window services.
  2. A Windows server host template (called windows-server) has already been created in the templates.cfg file. This allows you to add new Windows host definitions in a simple manner.

The above-mentioned config files can be found in the /usr/local/nagios/etc/objects/ directory. You can modify the definitions in these and other definitions to suit your needs better if you'd like. However, I'd recommend waiting until you're more familiar with configuring Nagios before doing so. For the time being, just follow the directions outlined below and you'll be monitoring your Windows boxes in no time.

4.7.5. Prerequisites

The first time you configure Nagios to monitor a Windows machine, you'll need to do a bit of extra work. Remember, you only need to do this for the *first* Windows machine you monitor.

Edit the main Nagios config file.

vi /usr/local/nagios/etc/nagios.cfg

Remove the leading pound (#) sign from the following line in the main configuration file:

#cfg_file=/usr/local/nagios/etc/objects/windows.cfg

Save the file and exit.

What did you just do? You told Nagios to look to the /usr/local/nagios/etc/objects/windows.cfg to find additional object definitions. That's where you'll be adding Windows host and service definitions. That configuration file already contains some sample host, hostgroup, and service definitions. For the *first* Windows machine you monitor, you can simply modify the sample host and service definitions in that file, rather than creating new ones.

4.7.6. Installing the Windows Agent

Before you can begin monitoring private services and attributes of Windows machines, you'll need to install an agent on those machines. I recommend using the NSClient++ addon, which can be found at . These instructions will take you through a basic installation of the NSClient++ addon, as well as the configuration of Nagios for monitoring the Windows machine.

1. Download the latest stable version of the NSClient++ addon from

2. Unzip the NSClient++ files into a new C:\NSClient++ directory

3. Open a command prompt and change to the C:\NSClient++ directory

4. Register the NSClient++ system service with the following command:

nsclient++ /install

5. Install the NSClient++ systray with the following command ('SysTray' is case-sensitive):

nsclient++ SysTray

6. Open the services manager and make sure the NSClientpp service is allowed to interact with the desktop (see the 'Log On' tab of the services manager). If it isn't already allowed to interact with the desktop, check the box to allow it to.

7. Edit the NSC.INI file (located in the C:\NSClient++ directory) and make the following changes:

  1. Uncomment all the modules listed in the [modules] section, except for CheckWMI.dll and RemoteConfiguration.dll
  2. Optionally require a password for clients by changing the 'password' option in the [Settings] section.
  3. Uncomment the 'allowed_hosts' option in the [Settings] section. Add the IP address of the Nagios server to this line, or leave it blank to allow all hosts to connect.
  4. Make sure the 'port' option in the [NSClient] section is uncommented and set to '12489' (the default port).

8. Start the NSClient++ service with the following command:

nsclient++ /start

9. If installed properly, a new icon should appear in your system tray. It will be a yellow circle with a black 'M' inside.

10. Success! The Windows server can now be added to the Nagios monitoring configuration...

4.7.7. Configuring Nagios

Now it's time to define some in your Nagios configuration files in order to monitor the new Windows machine.

Open the windows.cfg file for editing.

vi /usr/local/nagios/etc/objects/windows.cfg

Add a new definition for the Windows machine that you're going to monitor. If this is the *first* Windows machine you're monitoring, you can simply modify the sample host definition in windows.cfg. Change the host_name, alias, and address fields to appropriate values for the Windows box.

define host{

use windows-server ; Inherit default values from a Windows server template (make sure you keep this line!)

host_name winserver

alias My Windows Server

address 192.168.1.2

}

Good. Now you can add some service definitions (to the same configuration file) in order to tell Nagios to monitor different aspects of the Windows machine. If this is the *first* Windows machine you're monitoring, you can simply modify the sample service definitions in windows.cfg.

Note: Replace "winserver" in the programlisting definitions below with the name you specified in the host_name directive of the host definition you just added.

Add the following service definition to monitor the version of the NSClient++ addon that is running on the Windows server. This is useful when it comes time to upgrade your Windows servers to a newer version of the addon, as you'll be able to tell which Windows machines still need to be upgraded to the latest version of NSClient++.

define service{

use generic-service

host_name winserver

service_description NSClient++ Version

check_command check_nt!CLIENTVERSION

}

Add the following service definition to monitor the uptime of the Windows server.

define service{

use generic-service

host_name winserver

service_description Uptime

check_command check_nt!UPTIME

}

Add the following service definition to monitor the CPU utilization on the Windows server and generate a CRITICAL alert if the 5-minute CPU load is 90% or more or a WARNING alert if the 5-minute load is 80% or greater.

define service{

use generic-service

host_name winserver

service_description CPU Load

check_command check_nt!CPULOAD!-l 5,80,90

}

Add the following service definition to monitor memory usage on the Windows server and generate a CRITICAL alert if memory usage is 90% or more or a WARNING alert if memory usage is 80% or greater.

define service{

use generic-service

host_name winserver

service_description Memory Usage

check_command check_nt!MEMUSE!-w 80 -c 90

}

Add the following service definition to monitor usage of the C:\ drive on the Windows server and generate a CRITICAL alert if disk usage is 90% or more or a WARNING alert if disk usage is 80% or greater.

define service{

use generic-service

host_name winserver

service_description C:\ Drive Space

check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90

}

Add the following service definition to monitor the W3SVC service state on the Windows machine and generate a CRITICAL alert if the service is stopped.

define service{

use generic-service

host_name winserver

service_description W3SVC

check_command check_nt!SERVICESTATE!-d SHOWALL -l W3SVC

}

Add the following service definition to monitor the Explorer.exe process on the Windows machine and generate a CRITICAL alert if the process is not running.

define service{

use generic-service

host_name winserver

service_description Explorer

check_command check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe

}

That's it for now. You've added some basic services that should be monitored on the Windows box. Save the configuration file.

4.7.8. Password Protection

If you specified a password in the NSClient++ configuration file on the Windows machine, you'll need to modify the check_nt command definition to include the password. Open the commands.cfg file for editing.

vi /usr/local/nagios/etc/commands.cfg

Change the definition of the check_nt command to include the "-s " argument (where PASSWORD is the password you specified on the Windows machine) like this:

define command{

command_name check_nt

command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -s PASSWORD -v $ARG1$ $ARG2$

}

Save the file.

4.7.9. Restarting Nagios

You're done with modifying the Nagios configuration, so you'll need to and .

If the verification process produces any errors messages, fix your configuration file before continuing. Make sure that you don't (re)start Nagios until the verification process completes without any errors!

4.8. 监控Linux/Unix主机
4.8.1. 介绍

本文档描述了如果监控Linux/UNIX的"私有"服务和属性,如:

  1. CPU负荷
  2. 内存占用率
  3. 磁盘利用率
  4. 登录用户
  5. 运行进程

由Linux系统上的公众服务(HTTP、FTP、SSH、SMTP等)可以按照这篇文档。

注意
如下内容是假定已经按照安装并设置好Nagios。如下例子参考了样例配置文件(commands.cfgtemplates.cfg等)里的对象定义,样例配置文件已经在安装过程中安装就位。
4.8.2. 概览

[注意:本文档没有结束。推荐阅读文档里如何监控远程Linux/Unix服务器中的指令]

有几种不同方式来监控远程Linux/UNIX服务器的服务与属性。一个是应用共享式SSH密钥运行check_by_ssh插件来执行对远程主机的检测。这种方法本文档不讨论,但它会导致安装有Nagios的监控服务器很高的系统负荷,尤其是你要监控成百个主机中的上千个服务时,这是因为要建立/毁构SSH联接的总开销很高。

另一种方法是使用监控远程主机。NRPE外部构件可以在远程的Linux/Unix主机上执行插件程序。如果是要象监控本地主机一样对远程主机的磁盘利用率、CPU负荷和内存占用率等情况下,NRPE外部构件非常有用。

4.9. 监控路由器和交换机
4.9.1. 介绍

本文档将介绍如何来监控路由器和交换机的状态。一些便宜的"无网管"功能的交换机与集线器不能配置IP地址而且对于网络是不可见的组成构件,因而没办法来监控这种东西。稍贵些的交换机和路由器可以配置IP地址可以用PING检测或是通过SNMP来查询状态信息。

下面将描述如果来监控这些有网管功能的交换机、集线器和路由器:

  1. 包丢弃率,平均回包周期RTA
  2. SNMP状态信息
  3. 带宽与流量
注意
如下指令是假定你已经按安装好Nagios。参考的样例配置是在已经按指南安装就位的配置文件(commands.cfgtemplates.cfg等)。
4.9.2. 概览

监控交换机与路由器可简可繁-主要是看拥有什么样设备与想监控什么内容。做为极为重要的网络组成构件,毫无疑问至少要监控一些基本状态。

交换机与路由器可以简单地用PING来监控丢包率、RTA等数据。如果交换机支持SNMP,就可以监控端口状态等,用check_snmp插件,也可以监控带宽(如果用了MRTG),用check_mrtgtraf插件。

check_snmp插件只有当系统里安装了net-snmp和net-snmp-utils包后才编译。先确定插件已经在/usr/local/nagios/libexec目录里再继续做,如果没有这个文件,安装net-snmp和net-snmp-utils包并且重编译并重新安装Nagios插件包。

4.9.3. 步骤

要监控交换机与路由器要有几步工作:

  1. 第一时间执行些必备工作;
  2. 给设备创建要监控的主机与服务对象定义;
  3. 重启动Nagios守护进程。
4.9.4. 已经做了什么?

为了让工作轻松点,几个配置任务已经做好了:

  1. 两个命令定义(check_snmpcheck_local_mrtgtraf)已经加到了commands.cfg文件中。可以用check_snmpcheck_mrtgtraf插件来监控网络打印机。
  2. 一个交换机模板(命名为generic-switch)已经创建在templates.cfg文件里。可以在对象定义里更容易地加一个新的交换机与路由器设备。

以上的监控配置文件可以在/usr/local/nagios/etc/objects/目录里找到。如果愿意可以修改这些定义或是加入其他适合需要的更好的定义。但推荐你最好是等到你熟练地掌握了Nagios配置之后再这么做。开始的时候,只要按上述的配置来监控网络里的路由器和交换机就可以了。

4.9.5. 必备工作

要配置Nagios用于监控网络里的交换机之前,有必要做点额外工作。记住,这是首先要做的工作才能监控。

编辑Nagios的主配置文件

vi /usr/local/nagios/etc/nagios.cfg

移除文件里下面这行的最前面的(#)符号

#cfg_file=/usr/local/nagios/etc/objects/switch.cfg

保存文件并退出。

为何要这么做?这是要让Nagios检查/usr/local/nagios/etc/objects/switch.cfg配置文件来找些额外的对象定义。在文件里可以增加有关路由器和交换机设备的主机与服务定义。配置文件已经包含了几个样本主机、主机组和服务定义。做为监控路由器与交换机的第一步工作是最好在样例的主机与服务对象定义之上修改而不是重建一个。

4.9.6. 配置Nagios

需要做些以监控新的交换机与路由器设备。

打开switch.cfg文件进行编辑。

vi /usr/local/nagios/etc/objects/switch.cfg

给要监控的交换机加一个新的对象定义。如果这是第一台要监控的交换机设备,可以简单地修改switch.cfg里的样例配置。修改主机对象里的host_namealiasaddress域值来适用于监控。

define host{ use generic-switch ; Inherit default values from a template host_name linksys-srw224p ; The name we're giving to this switch alias Linksys SRW224P Switch ; A longer name associated with the switch address 192.168.1.253 ; IP address of the switch hostgroups allhosts,switches ; Host groups this switch is associated with }

4.9.7. 监控服务

现在可以加些针对监控交换机的服务对象定义(在同一个配置文件)。如果是第一台要监控的交换机设备,可以简单地修改switch.cfg里的样例配置。

注意
替换样例定义里的"linksys-srw224p"主机名为你刚才定义的名字,是修改在host_name域。
4.9.8. 监控丢包率和RTA

增加如下的服务定义以监控自Nagios监控主机到交换机的丢包率和平均回包周期RTA,在一般情况下每5分钟检测一次。

define service{ use generic-service ; Inherit values from a template host_name linksys-srw224p ; The name of the host the service is associated with service_description PING ; The service description check_command check_ping!200.0,20%!600.0,60% ; The command used to monitor the service normal_check_interval 5 ; Check the service every 5 minutes under normal conditions retry_check_interval 1 ; Re-check the service every minute until its final/hard state is determined }

这个服务的状态将会处于:

  1. 紧急(CRITICAL)-条件是RTA大于600ms或丢包率大于等于60%;
  2. 告警(WARNING)-条件是RTA大于200ms或是丢包率大于等于20%;
  3. 正常(OK)-条件是RTA小于200ms或丢包率小于20%
4.9.9. 监控SNMP状态信息

如果交换机与路由器支持SNMP接口,可以用check_snmp插件来监控更丰富的信息。如果不支持SNMP,跳过此节。

加入如下服务定义到你刚才修改的交换机对象定义之中

define service{ use generic-service ; Inherit values from a template host_name linksys-srw224p service_description Uptime check_command check_snmp!-C public -o sysUpTime.0 }

在上述服务定义中的check_command域里,用"-C public"来指定SNMP共同体名称为"public",用"-o sysUpTime.0"指明要检测的OID(译者注-MIB节点值)。

如果要确保交换机上某个指定端口或接口的状态处于运行状态,可以在对象定义里加入一段定义:

define service{ use generic-service ; Inherit values from a template host_name linksys-srw224p service_description Port 1 Link Status check_command check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB }

在上例中,"-o ifOperStatus.1"指出取出交换机的端口编号为1的OID状态。"-r 1"选项是让check_snmp插件检查返回一个正常(OK)状态,如果是在SNMP查询结果中存在"1"(1说明交换机端口处于运行状态)如果没找到1就是紧急(CRITICAL)状态。"-m RFC1213-MIB"是可选的,它告诉check_snmp插件只加载"RFC1213-MIB"库而不是加载每个在系统里的MIB库,这可以加快插件运行速度。

这就是给SNMP库的例子。有成百上千种信息可以通过SNMP来监控,这完全取决于你需要做什么和如果来做监控。祝你好运!

提示
通常可以用如下命令来寻找你想用于监控的OID节点(用你的交换机IP替换192.168.1.253):snmpwalk -v1 -c public 192.168.1.253 -m ALL .1
4.9.10. 监控带宽和流量

可以监控交换机或路由器的带宽利用率,用绘图并让Nagios在流量超出指定门限时报警。check_mrtgtraf插件(它已经包含在Nagios插件软件发行包中)可以实现。

需要让check_mrtgtraf插件知道如何来保存MRTG数据并存入文件,以及门限等。在例子中,监控了一个Linksys交换机。MRTG日志保存于/var/lib/mrtg/192.168.1.253_1.log文件中。这就是我用于监控的服务定义,它可以用于监控带宽数据到日志文件之中...

define service{ use generic-service ; Inherit values from a template host_name linksys-srw224p service_description Port 1 Bandwidth Usage check_command check_local_mrtgtraf!/var/lib/mrtg/192.168.1.253_1.log!AVG!1000000,2000000!5000000,5000000!10 }

在上例中,"/var/lib/mrtg/192.168.1.253_1.log"参数传给check_local_mrtgtraf命令意思是插件的MRTG日志文件在这个文件里读写,"AVG"参数的意思是取带宽的统计平均值,"1000000,200000"参数是指流入的告警门限(以字节为单位),"5000000,5000000"是输出流量紧急状态门限(以字节为单位),"10"是指如果MRTG日志如果超过10分钟没有数据返回一个紧急状态(应该每5分钟更新一次)。

保存该配置文件

4.9.11. 重启动Nagios

一旦给switch.cfg文件里加好新的主机与服务对象定义,就可以开始对路由器与交换机进行监控。为了开始监控,需要先再。

如果验证过程有有任何错误信息,修改配置文件再继续。一定要保证配置验证过程中没有错误信息再启动Nagios!

4.10. 监控网络打印机
4.10.1. 介绍

本文件描述了如何监控网络打印机。特别是有内置或外置JetDirect卡的HP惠普打印机设备,或是其他(象Troy PocketPro 100S或Netgear PS101)支持JetDirect协议的打印机。

check_hpjd插件(该命令是Nagios插件软件发行包的标准组成部分)可以用SNMP使能的方式来监控JetDirect兼容型打印机。该插件可以检查如下打印机状态:

  1. 卡纸
  2. 无纸
  3. 打印机离线
  4. 需要人工干预
  5. 墨盒墨粉低
  6. 内存不足
  7. 开外壳
  8. 输出托盘已满
  9. 和其他...
注意
如下指令假定你已经按照安装好Nagios。可以参考安装好的样本配置文件(commands.cfgtemplates.cfg等)。
4.10.2. 概览

监控网络打印机的状态很简单。有JetDirect功能的打印机一般提供SNMP功能,可以用check_hpjd插件来检测状态。

check_hpjd插件只是当当前系统中安装有net-snmp和net-snmp-utils软件包时才会被编译和安装。要保证在/usr/local/nagios/libexec目录下有check_hpjd文件再继承,否则,要安装好net-snmp和net-snmp-utils软件包再重新编译安装Nagios插件包。

4.10.3. 步骤

监控打印机需要做如下几步:

  1. 做些事先准备工作;
  2. 创建一个用于监控打印机的主机与服务对象定义;
  3. 重启动Nagios守护进程。
4.10.4. 已经做了什么?

为使这项工作更轻松,几个配置工作已经做好:

  1. check_hpjd的命令定义已经加到了commands.cfg配置文件中,可以用check_hpjd插件来监控网络打印机;
  2. 一个网络打印机模板(命名为generic-printer)已经在templates.cfg配置文件里创建好,可以更方便地加入一个新打印机设备的主机对象。

上面的监控配置文件可以在/usr/local/nagios/etc/objects/目录里找到。如果想做,可以修改里面的定义以更好地适用于你的情况。但是在此之前,推荐你要熟悉Nagios的配置之后再做。起初,最好只是按下面的大概修改一下以实现对网络打印机的监控。

4.10.5. 事先准备工作

在配置Nagios用于监控网络打印机之前,有些额外工作,记住这是要对第一台打印机设备进行监控。

编辑Nagios的主配置文件。

vi /usr/local/nagios/etc/nagios.cfg

移除下面这行最前面的(#)号:

#cfg_file=/usr/local/nagios/etc/objects/printer.cfg

保存文件并退出编辑。

为何要这样?告诉Nagios查找/usr/local/nagios/etc/objects/printer.cfg文件以取得额外对象定义。该文件中将加入网络打印机设备的主机与服务对象定义。这个配置文件里已经包含有一个样本主机、主机组和服务定义。给第一台打印机设备做监控,可以简单地修改这个文件而不需重生成一个。

4.10.6. 配置Nagios

需要创建几个以进行网络打印机的监控。

打开printer.cfg文件并编辑它。

vi /usr/local/nagios/etc/objects/printer.cfg

增加一个你要监控的网络打印机设备的。如果这是第一台打印机设备,可以简单地修改printer.cfg文件里的样本主机定义。将合理的值赋在host_namealiasaddress域里。

define host{ use generic-printer ; Inherit default values from a template host_name hplj2605dn ; The name we're giving to this printer alias HP LaserJet 2605dn ; A longer name associated with the printer address 192.168.1.30 ; IP address of the printer hostgroups allhosts ; Host groups this printer is associated with }

现在可以给监控的打印机加些服务定义(在同一个配置文件里)。如果是第一台被监控的网络打印机,可以简单地修改printer.cfg里的服务配置。

注意
要用你要刚刚加上的被监控打印机主机名替换样例对象"hplj2605dn"里的host_name域值。

按如下方式加好对打印机状态检测的服务定义。服务用check_hpjd插件来检测打印机状态,默认情况下每10分钟检测一次。SNMP共同体串是"public"。

define service{ use generic-service ; Inherit values from a template host_name hplj2605dn ; The name of the host the service is associated with service_description Printer Status ; The service description check_command check_hpjd!-C public ; The command used to monitor the service normal_check_interval 10 ; Check the service every 10 minutes under normal conditions retry_check_interval 1 ; Re-check the service every minute until its final/hard state is determined }

加入一个默认每10分钟进行一次的PING检测服务。用于检测RTA、丢包率和网络联接状态。

define service{ use generic-service host_name hplj2605dn service_description PING check_command check_ping!3000.0,80%!5000.0,100% normal_check_interval 10 retry_check_interval 1 }

保存配置文件。

4.10.7. 重启动Nagios

一旦在printer.cfg文件里加好新的主机和服务对象定义就可以监控网络打印机。为了开始,应该先并。

如果在验证配置过程中有任何错误信息,修改好配置文件再继续。保证验证过程完成且没有任何错误的情况下再重启动Nagios!

4.11. 监控Netware服务器
4.11.1. 介绍

本文档描述了如何对Netware服务器的"私有"服务和属性进行监控,象这些:

  1. 内存占用率
  2. 处理器利用率
  3. 缓冲区使用情况
  4. 活动的联接
  5. 磁盘卷使用率

由Netware服务器提供的公众服务(HTTP、FTP等)的监控可以按文档来做。

4.11.2. 概览

TODO...

注意
我在找一个志愿者来写就HOWTO文档。我只能接触到一台旧的Netware 4.11服务器,所以无法跟上形势需要。如果可以更新这个文档,请把它张贴到里。
4.11.3. 其他资源

Novell有一些有关Nagios如何来做Netware监控的文档,在网站的栏目里,包括:

4.12. 监控公众服务平台
4.12.1. Introduction

This document describes how you can monitor publicly available services, applications and protocols. By "public" I mean services that are accessible across the network - either the local network or the greater Internet. Examples of public services include HTTP, POP3, IMAP, FTP, and SSH. There are many more public services that you probably use on a daily basis. These services and applications, as well as their underlying protocols, can usually be monitored by Nagios without any special access requirements.

Private services, in contrast, cannot be monitored with Nagios without an intermediary agent of some kind. Examples of private services associated with hosts are things like CPU load, memory usage, disk usage, current user count, process information, etc. These private services or attributes of hosts are not usually exposed to external clients. This situation requires that an intermediary monitoring agent be installed on any host that you need to monitor such information on. More information on monitoring private services on different types of hosts can be found in the documentation on:

Tip: Occassionally you will find that information on private services and applications can be monitored with SNMP. The SNMP agent allows you to remotely monitor otherwise private (and inaccessible) information about the host. For more information about monitoring services using SNMP, check out the documentation on .

Note: These instructions assume that you've installed Nagios according to the . The sample configuration entries below reference objects that are defined in the sample commands.cfg and localhost.cfg config files.

4.12.2. Plugins For Monitoring Services

When you find yourself needing to monitor a particular application, service, or protocol, chances are good that a exists to monitor it. The official Nagios plugins distribution comes with plugins that can be used to monitor a variety of services and protocols. There are also a large number of contributed plugins that can be found in the contrib/ subdirectory of the plugin distribution. The website hosts a number of additional plugins that have been written by users, so check it out when you have a chance.

If you don't happen to find an appropriate plugin for monitoring what you need, you can always write your own. Plugins are easy to write, so don't let this thought scare you off. Read the documentation on developing plugins for more information.

I'll walk you through monitoring some basic services that you'll probably use sooner or later. Each of these services can be monitored using one of the plugins that gets installed as part of the Nagios plugins distribution. Let's get started...

4.12.3. Creating A Host Definition

Before you can monitor a service, you first need to define a that is associated with the service. You can place host definitions in any object configuration file specified by a directive or placed in a directory specified by a directive. If you have already created a host definition, you can skip this step.

For this programlisting, lets say you want to monitor a variety of services on a remote host. Let's call that host remotehost. The host definition can be placed in its own file or added to an already exiting object configuration file. Here's what the host definition for remotehost might look like:

define host{

use generic-host ; Inherit default values from a template

host_name remotehost ; The name we're giving to this host

alias Some Remote Host ; A longer name associated with the host

address 192.168.1.50 ; IP address of the host

hostgroups allhosts ; Host groups this host is associated with

}

Now that a definition has been added for the host that will be monitored, we can start defining services that should be monitored. As with host definitions, service definitions can be placed in any object configuration file.

4.12.4. Creating Service Definitions

For each service you want to monitor, you need to define a in Nagios that is associated with the host definition you just created. You can place service definitions in any object configuration file specified by a directive or placed in a directory specified by a directive.

Some programlisting service definitions for monitoring common public service (HTTP, FTP, etc) are given below.

4.12.5. Monitoring HTTP

Chances are you're going to want to monitor web servers at some point - either yours or someone else's. The check_http plugin is designed to do just that. It understands the HTTP protocol and can monitor response time, error codes, strings in the returned HTML, server certificates, and much more.

The commands.cfg file contains a command definition for using the check_http plugin. It looks like this:

define command{

name check_http

command_name check_http

command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$

}

A simple service definition for monitoring the HTTP service on the remotehost machine might look like this:

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description HTTP

check_command check_http

}

This simple service definition will monitor the HTTP service running on remotehost. It will produce alerts if the web server doesn't respond within 10 seconds or if it returns HTTP errors codes (403, 404, etc.). That's all you need for basic monitoring. Pretty simple, huh?

Tip: For more advanced monitoring, run the check_http plugin manually with --help as a command-line argument to see all the options you can give the plugin. This --help syntax works with all of the plugins I'll cover in this document.

A more advanced definition for monitoring the HTTP service is shown below. This service definition will check to see if the /download/index.php URI contains the string "latest-version.tar.gz". It will produce an error if the string isn't found, the URI isn't valid, or the web server takes longer than 5 seconds to respond.

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description Product Download Link

check_command check_http!-u /download/index.php -t 5 -s "latest-version.tar.gz"

}

4.12.6. Monitoring FTP

When you need to monitor FTP servers, you can use the check_ftp plugin. The commands.cfg file contains a command definition for using the check_ftp plugin, which looks like this:

define command{

command_name check_ftp

command_line $USER1$/check_ftp -H $HOSTADDRESS$ $ARG1$

}

A simple service definition for monitoring the FTP server on remotehost would look like this:

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description FTP

check_command check_ftp

}

This service definition will monitor the FTP service and generate alerts if the FTP server doesn't respond within 10 seconds.

A more advanced service definition is shown below. This service will check the FTP server running on port 1023 on remotehost. It will generate an alert if the server doesn't respond within 5 seconds or if the server response doesn't contain the string "Pure-FTPd [TLS]".

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description Special FTP

check_command check_ftp!-p 1023 -t 5 -e "Pure-FTPd [TLS]"

}

4.12.7. Monitoring SSH

When you need to monitor SSH servers, you can use the check_ssh plugin. The commands.cfg file contains a command definition for using the check_ssh plugin, which looks like this:

define command{

command_name check_ssh

command_line $USER1$/check_ssh $ARG1$ $HOSTADDRESS$

}

A simple service definition for monitoring the SSH server on remotehost would look like this:

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description SSH

check_command check_ssh

}

This service definition will monitor the SSH service and generate alerts if the SSH server doesn't respond within 10 seconds.

A more advanced service definition is shown below. This service will check the SSH server and generate an alert if the server doesn't respond within 5 seconds or if the server version string string doesn't match "OpenSSH_4.2".

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description SSH Version Check

check_command check_ssh!-t 5 -r "OpenSSH_4.2"

}

4.12.8. Monitoring SMTP

The check_smtp plugin can be using for monitoring your email servers. The commands.cfg file contains a command definition for using the check_smtp plugin, which looks like this:

define command{

command_name check_smtp

command_line $USER1$/check_smtp -H $HOSTADDRESS$ $ARG1$

}

A simple service definition for monitoring the SMTP server on remotehost would look like this:

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description SMTP

check_command check_smtp

}

This service definition will monitor the SMTP service and generate alerts if the SMTP server doesn't respond within 10 seconds.

A more advanced service definition is shown below. This service will check the SMTP server and generate an alert if the server doesn't respond within 5 seconds or if the response from the server doesn't contain "mygreatmailserver.com".

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description SMTP Response Check

check_command check_smtp!-t 5 -e "mygreatmailserver.com"

}

4.12.9. Monitoring POP3

The check_pop plugin can be using for monitoring the POP3 service on your email servers. The commands.cfg file contains a command definition for using the check_pop plugin, which looks like this:

define command{

command_name check_pop

command_line $USER1$/check_pop -H $HOSTADDRESS$ $ARG1$

}

A simple service definition for monitoring the POP3 service on remotehost would look like this:

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description POP3

check_command check_pop

}

This service definition will monitor the POP3 service and generate alerts if the POP3 server doesn't respond within 10 seconds.

A more advanced service definition is shown below. This service will check the POP3 service and generate an alert if the server doesn't respond within 5 seconds or if the response from the server doesn't contain "mygreatmailserver.com".

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description POP3 Response Check

check_command check_pop!-t 5 -e "mygreatmailserver.com"

}

4.12.10. Monitoring IMAP

The check_imap plugin can be using for monitoring IMAP4 service on your email servers. The commands.cfg file contains a command definition for using the check_imap plugin, which looks like this:

define command{

command_name check_imap

command_line $USER1$/check_imap -H $HOSTADDRESS$ $ARG1$

}

A simple service definition for monitoring the IMAP4 service on remotehost would look like this:

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description IMAP

check_command check_imap

}

This service definition will monitor the IMAP4 service and generate alerts if the IMAP server doesn't respond within 10 seconds.

A more advanced service definition is shown below. This service will check the IAMP4 service and generate an alert if the server doesn't respond within 5 seconds or if the response from the server doesn't contain "mygreatmailserver.com".

define service{

use generic-service ; Inherit default values from a template

host_name remotehost

service_description IMAP4 Response Check

check_command check_imap!-t 5 -e "mygreatmailserver.com"

}

4.12.11. Restarting Nagios

Once you've added the new host and service definitions to your object configuration file(s), you're ready to start monitoring them. To do this, you'll need to and .

If the verification process produces any errors messages, fix your configuration file before continuing. Make sure that you don't (re)start Nagios until the verification process completes without any errors!

第 5 章 准备配置Nagios
5.1. 配置概览
5.1.1. 介绍

在你开始监控网络与系统之前要有同个不同配置文件需要创建和编辑。耐心点,配置Nagios可能是要花些时间特别是对于那些初次使用者。弄清其机理所有的将它们搞定绝对是值得的。 :-)

注意

样本配置文件在安装时放在了/usr/local/nagios/etc/目录下,如果你是按照前面给出的来做的话。

5.1.2. 主配置文件

主配置文件包括了一系列的设置,它们会影响Nagios守护进程。不仅是Nagios守护进程要使用主配置文件,CGIs程序组模块也需要,因此,主配置文件是你开始学习配置其他文件的基础。

有关主配置文件的文档在。

5.1.3. 资源配置文件

资源文件可以保存用户自定义的宏。资源文件的一个主要用处是用于保存一些敏感的配置信息如系统口令等不能让CGIs程序模块获取到的东西。

你可以在主配置文件中设置指向一个或是多个资源文件。

5.1.4. 对象定义文件

对象定义文件用于定义主机、服务、主机组、服务组、联系人、联系人组、命令等等。这些将定义你需要监控什么并将如何监控它们。

你可以在主配置文件里设置加上来指向一个或是多个对象定义文件。

有关对象定义和与其他间关系的文档是。

5.1.5. CGI配置文件

CGI配置文件包含了一系列的设置,它们会影响程序模块。还有一些保存在主配置文件之中,因此CGI程序会知道你是如何配置的Nagios并且在哪里保存了对象定义。

有关CGI配置文件的文档在。

5.2. 主配置文件选项
注意

当创建或编辑配置文件时,要遵守如下要求:

  • 以符号'#'开头的行将视为注释不做处理;
  • 变量必须是新起的一行 - 变量之前不能有空格符;
  • 变量名是大小写敏感的;
提示

样例配置文件(/usr/local/nagios/etc/nagios.cfg)已经安装到位,如果你是按照来操作的话。

5.2.1. 配置文件的位置

主配置文件一般(实际是固定的)是nagios.cfg,存放位置在/usr/local/nagios/etc/目录里(--如果是rpm包来安装,应该是在/etc/nagios/)。

5.2.2. 配置文件里的变量

下面将对每个主配置文件里的选项进行说明...

表 5.1. 日志文件

格式: log_file=
样例: log_file=/usr/local/nagios/var/nagios.log

这个变量用于设定Nagios在何处创建其日志文件。它应该是你主配置文件里面的第一个变量,当Nagios找到你配置文件并发现配置里有错误时会向该文件中写入错误信息。如果你使能了,Nagios将在每小时、每天、每周或每月对日志进行回滚。

表 5.2. 对象配置文件

格式: cfg_file=
样例:

cfg_file=/usr/local/nagios/etc/hosts.cfg

cfg_file=/usr/local/nagios/etc/services.cfg

cfg_file=/usr/local/nagios/etc/commands.cfg


该变量用于指定一个包含有将用于Nagios监控对象的。对象配置文件中包括有主机、主机组、联系人、联系人组、服务、命令等等对象的定义。配置信息可以切分为多个文件并且用cfg_file=语句来指向每个待处理的配置文件。

表 5.3. 对象配置目录

格式: cfg_dir=
样例:

cfg_dir=/usr/local/nagios/etc/commands

cfg_dir=/usr/local/nagios/etc/services

cfg_dir=/usr/local/nagios/etc/hosts


该变量用于指定一个目录,目录里包含有将用于Nagios监控对象的。所有的在这个目录下的且以.cfg为扩展名的文件将被作为配置文件来处理。另外,Nagios将会递归该目录下的子目录并处理其子目录下的全部配置文件。你可以把配置放入不同的目录并且用cfg_dir=语句来指向每个待处理的目录。

表 5.4. 对象缓冲文件

格式: object_cache_file=
样例: object_cache_file=/usr/local/nagios/var/objects.cache

该变量用于指定一个用于缓冲复本的文件存放位置。对象缓冲将在每次Nagios的启动和重启时和使用CGI模块时被创建或重建。它试图加快在CGI里的配置缓冲并使得你在编辑时可以让正在运行的Nagios不影响CGI的显示输出。

表 5.5. 预缓冲对象文件

格式: precached_object_file=
样例: precached_object_file=/usr/local/nagios/var/objects.precache

该变量用于指定一个用于指定一个用于预处理、预缓冲 This directive is used to specify a file in which a pre-processed, pre-cached copy of 复本的文件存放位置。在大型或复杂Nagios安装模式下这个文件可用于显著地减少Nagios的启动时间。如何加快启动的更多信息可以查看内容。

表 5.6. 资源文件

格式: resource_file=
样例: resource_file=/usr/local/nagios/etc/resource.cfg

该变量用于指定一个可选的包含有$USERn$定义的可选资源文件。$USERn$宏在存放用户名、口令及通用的命令定义内容(如目录路径)时非常有用。CGIs模块将不会试图读取资源文件,所以你可以限定这权文件权限(600或660)来保护敏感信息。你可以在主配置文件里用resource_file语句来加入多个资源文件-Nagios将会处理它们。如何定义$USERn$宏参见样例resource.cfg文件,它放在Nagios发行包的sample-config/子目录下。

表 5.7. 临时文件

格式: temp_file=
样例: temp_file=/usr/local/nagios/var/nagios.tmp

该变量用于指定一个临时文件,Nagios将在更新注释数据、状态数据等时周期性地创建它。该文件不再需要时会删除它。

表 5.8. 临时路径

格式: temp_path=
样例: temp_path=/tmp

这个变量是一个目录,该目录是块飞地,在监控过程中用于创建临时文件。你应在该目录内运行tmpwatch或类似的工具程序以删除早于24小时的文件(这是个垃圾文件存放地)。

表 5.9. 状态文件

格式: status_file=
样例: status_file=/usr/local/nagios/var/status.dat

这个变量指向一个文件,文件被Nagios用于保存当前状态、注释和宕机信息。CGI模块也会用这个文件以通过Web接口来显示当前被监控的状态,CGI模块必须要有这个文件的读取权限以使工作正常。在Nagios停机或在重启动时将会删除并重建该文件。

表 5.10. 状态文件更新间隔

格式: status_update_interval=
样例: status_update_interval=15

这个变量设置了Nagios更新的速度(秒为单位),最小更新间隔是1秒。

表 5.11. Nagios用户

格式: nagios_user=
样例: nagios_user=nagios

该变量指定了Nagios进程使用哪个用户运行。当程序启动完成并开始监控对象之前,Nagios将切换自己的权限并使用该用户权限运行。你可以指定用户或是UID名。

表 5.12. Nagios组

格式: nagios_group=
样例: nagios_group=nagios

该变量用于指定Nagios使用哪个用户组运行。当程序启动完成并开始监控对象之前,Nagios将切换自己的权限并以该用户组权限运行。你可以拽定用户组或GID名。

表 5.13. 通知选项

格式: enable_notifications=<0/1>
样例: enable_notifications=1

该选项决定了Nagios在初始化启动或重启动时是否要送出。如果这个选项不使能,Nagios将不会向任何主机或服务送出通知。注意,如果你打开了选项,Nagios在其启动和重启时将忽略此设置并用这个选项的最近的一个设置(已经保存在)的值来工作,除非你取消了选项。如果你想在使能状态保存选项(并且是使能)的情况下更改这个选项,你必须要通过合适的或是通过Web接口来修改它。选项的取值可以是:

  1. 0 = 关闭通知
  2. 1 = 打开通知(默认)

表 5.14. 服务检测执行选项

格式: execute_service_checks=<0/1>
样例: execute_service_checks=1

这个选项指定了Nagios在初始的启动或重启时是否要执行服务检测。如果这个没有使能,Nagios将不会主动地执行任何服务的检测并且保持一系列的"静默"状态(它仍旧可以接收除非你已经将选项关闭)。这个选项经常用于备份被监控服务配置,被监控服务的配置备份在文档或设置成一个监控环境中有描述。注意:如果你已经使能了,Nagios在其启动或重启时将会忽略这个选项设置并使用和旧的设置值(旧值保存于),除非你关闭了选项。如果你想在状态保持使能(和选项使能)的情况下修改这个选项,你只得用适当的或是通过Web接口来修改它。选项可用的值有:

  1. 0 = 不执行服务检测
  2. 1 = 执行服务检测(默认)

表 5.15. 被动服务检测结果接受选项

格式: accept_passive_service_checks=<0/1>
样例: accept_passive_service_checks=1

该选项决定了Nagios在其初始化启动或重启后是否要授受,如果它关闭了,Nagios将不会接受任何强制服务检测结果。注意:如果你已经使能了,Nagios在其启动或重启时将会忽略这个选项设置并使用和旧的设置值(旧值保存于),除非你关闭了选项。如果你想在状态保持使能(和选项使能)的情况下修改这个选项,你只得用适当的或是通过Web接口来修改它。选项可用的值有:

  1. 0 = 不接受强制服务检测结果
  2. 1 = 接受强制服务检测结果(默认)

表 5.16. 主机检测执行选项

格式: execute_host_checks=<0/1>
样例: execute_host_checks=1

该选项将决定Nagios在初始地启动或重启时是否执行按需地和有规律规划检测。如果该选项不使能,那么Nagios将不会对任何主机进行检测,然而它仍旧可以接收结果除非你已经将选项关闭。该选项通常用于监控服务器的配置备份,详细信息请查看的配置,或是用于设置一个监控环境中。注意:如果你已经使能状态保持选项,Nagios将在启动和重启时使用旧的选项值(保存于状态保持文件中)而忽略此设置,除非你关闭了选项。如果你想在保持选项使能(且选项使能)的情况下修改这个选项,你只得用适当的或是通过Web接口来修改它。选项可用的值有:

  1. 0 = 不执行主机检测
  2. 1 = 执行主机检测(默认)

表 5.17. 强制主机检测接受选项

格式: accept_passive_host_checks=<0/1>
样例: accept_passive_host_checks=1

该选项决定了在Nagios初始启动或重启后是否要接受结果。如果这个选项关闭,Nagios将不再接受任何强制主机检测结果。注意:如果你使能状态保持选项,Nagios将在启动或重启动时使用旧的选项设置(保存于状态保持文件中)而忽略这个设置。除非你已经关闭选项。如果你想在保持选项使能(且选项使能)的情况下修改这个选项,你只得用适当的或是通过Web接口来修改它。选项可用的值有:

  1. 0 = 不接受强制主机检测结果
  2. 1 = 接受强制主机检测结果(默认)

表 5.18. 事件处理选项

格式: enable_event_handlers=<0/1>
样例: enable_event_handlers=1

该选项决定了在Nagios初始启动或重启后是否要运行,如果该选项关闭,Nagios将不做任何主机或服务的事件处理。注意:如果你使能状态保持选项(保存于状态保持文件中)而忽略这个设置,除非你已经关闭选项。如果你想在保持选项使能(且选项使能)的情况下修改这个选项,你只得用适当的或是通过Web接口来修改它。选项可用的值有:

  1. 0 = 禁止事件处理
  2. 1 = 打开事件处理(默认)

表 5.19. 日志回滚方法

格式: log_rotation_method=
样例: log_rotation_method=d

该选项决定了你想让Nagios以何种方法回滚你的日志文件。可用的值有:

  1. n = None (不做日志回滚 - 这个是默认值)
  2. h = Hourly (每小时做一次日志回滚)
  3. d = Daily (每天午夜做日志回滚)
  4. w = Weekly (每周六午夜做日志回滚)
  5. m = Monthly (每月最后一天的午夜做日志回滚)

表 5.20. 日志打包路径

格式: log_archive_path=
样例: log_archive_path=/usr/local/nagios/var/archives/

该选项将指定一个用于存放回滚日志文件的保存路径。如果没有使用功能时会忽略此设置。

表 5.21. 外部命令检查选项

格式: check_external_commands=<0/1>
样例: check_external_commands=1

该选项决定了Nagios是否要检查存于里的将要执行的命令。这个选项在你计划通过Web接口来运行时必须要打开它。更多的关于外部命令的信息可以查阅。

  1. 0 = 不做外部命令检测
  2. 1 = 检测外部命令(默认值)

表 5.22. 外部命令检测间隔

格式: command_check_interval=[s]
样例: command_check_interval=1

如果你指定了一个数字加一个"s"(如30s),那么外部检测命令的间隔是这个数值以为单位的时间间隔。如果没有用"s",那么外部检测命令的间隔是以这个数值的“时间单位”的时间间隔,除非你把的值(下面有说明)从默认60给更改了,这个值的意思是60s,即一分钟。

注意:将这个值设置为-1可令Nagios尽可能频繁地对外命令进行检测。在进行其他任务之前,Nagios每次都将会读入并处理保存于之中的全部命令以进行命令检查。更多的关于外部命令的信息可以查阅。

表 5.23. 外部命令文件

格式: command_file=
样例: command_file=/usr/local/nagios/var/rw/nagios.cmd

这是一个Nagios用于外部命令检测处理的文件,将命令写入该文件,外部命令文件实现成一个命名管道(先入先出),在Nagios启动时创建它,并在关闭时删除它。如果在Nagios启动时该文件已经存在,那么Nagios会给出一个错误信息后中止。更多的关于外部命令的信息可以查阅。

表 5.24. 外部命令缓冲队列数

格式: external_command_buffer_slots=<#>
样例: external_command_buffer_slots=512

注意:这是个高级特性。该选项决定了Nagios将使用多少缓冲队列来缓存外部命令,外部命令是从一个工作线程从外部命令文件将命令读入的,但这些外部命令还没有被Nagios的主守护程序处理。缓冲中的每个位置可以处理一个外部命令,所以这个选项决定了有多少命令可以被缓冲处理。为了对一个有大量被动检测系统(比如)进行安装时,你可能需要降低这个值。你要考虑使用MRTG工具来绘制外部命令缓冲的利用率图表,如何配置绘制图表可阅读文档。

表 5.25. 互锁文件

格式: lock_file=
样例: lock_file=/tmp/nagios.lock

该选项指定了Nagios在以守护态运行(以-d命令行参数运行)时在哪个位置上创建互锁文件。该文件包含有运行Nagios的进程id值(PID)。

表 5.26. 状态保持选项

格式: retain_state_information=<0/1>
样例: retain_state_information=1

该选项决定了Nagios是否要在程序的两次启动之间保存主机和服务的状态信息。如果你使能了这个选项,你应预先给出了变量的值,当选项使能时,Nagios将会在程序停止(或重启)时保存全部的主机和服务的状态信息并且会在启动时再次预读入保存的状态信息。

  1. 0 = 不保存状态保持信息
  2. 1 = 保留状态保持信息(默认)

表 5.27. 状态保持文件

格式: state_retention_file=
样例: state_retention_file=/usr/local/nagios/var/retention.dat

该文件用于在Nagios停止之前保存状态、停机时间和注释等信息。当Nagios重启时它会在开始监控工作之前使用保存于这个文件里的信息用于初始化主机与服务的状态。为使Nagios在程序的启动之间利用状态保持信息,你必须使能选项。

表 5.28. 自动状态保持的更新间隔

格式: retention_update_interval=
样例: retention_update_interval=60

该选项决定了Nagios需要以什么频度(分钟为单位)在正常操作时自动地保存状态保持信息。如果你把这个值设置为0,Nagios将不会以规则的间隔保存状态保持数据,但是Nagios仍旧会在停机或重启之前做保存状态保持数据的工作。如果你关闭了状态保持功能(用选项设置),这个选项值将无效。

表 5.29. 程序所用状态的使用选项

格式: use_retained_program_state=<0/1>
样例: use_retained_program_state=1

这个设置将决定了Nagios是否要使用保存于状态保持文件之中的值以更新程序范围内的变量状态。有些程序范围内的变量的状态将在程序重启时被保存于状态保持文件之中,包括、、、execute_service_checks和选项。如果你没有使用状态保持选项使能,这个选项将无效。

  1. 0 = 不使用程序变量的状态值
  2. 1 = 使用状态保持文件中的程序变量状态记录(默认)

表 5.30. 使用保持计划表信息选项

格式: use_retained_scheduling_info=<0/1>
样例: use_retained_scheduling_info=1

该选项决定Nagios在重启时是否要使用主机和服务的保持计划表信息(下次检测时间)。如果增加了很多数量(或很大百分比)的主机和服务,建议你在首次重启动Nagios时关闭选项,因为这个选项将会使初始检测误入歧途。其他情况下你可以要使能这个选项。

  1. 0 = 不使用计划表信息
  2. 1 = 使用保存的计划表信息(默认)

表 5.31. 保持主机和服务属性掩码

格式:

retained_host_attribute_mask=

retained_service_attribute_mask=

样例:

retained_host_attribute_mask=0

retained_service_attribute_mask=0


警告:这是个高级特性。你需要读一下源程序以看清楚它是如何起效果的。

该选项决定了哪个主机和服务的属性在程序重启时不会被保留。这些选项值是与指定的"MODATTR_"值进行按位与运算出的,MODATTR_在源程序的include/common.h里定义,默认情况下,全部主机和服务的属性都会被保持。

表 5.32. 保持进程属性掩码

格式:

retained_process_host_attribute_mask=

retained_process_service_attribute_mask=

样例:

retained_process_host_attribute_mask=0

retained_process_service_attribute_mask=0


警告:这是个高级特性。你需要读一下源程序以看清楚它是如何起效果的。

该选项决定了哪个进程属性在程序重启时不会被保留。有两个属性掩码因为经常是主机和服务的进程属性可以分别被修改。例如,主机检测在程序层面上被关闭,而服务检测仍旧被打开。这些选项值是与指定的"MODATTR_"值进行按位与运算出的,MODATTR_在源程序的include/common.h里定义,默认情况下,全部主机和服务的属性都会被保持。

表 5.33. 保持联系人属性掩码

格式:

retained_contact_host_attribute_mask=

retained_contact_service_attribute_mask=

样例:

retained_contact_host_attribute_mask=0

retained_contact_service_attribute_mask=0


警告:这是个高级特性。你需要读一下源程序以看清楚它是如何起效果的。

该选项决定了哪个联系人属性在程序重启时不会被保留。有两个属性掩码因为经常是主机和服务的联系人属性可以分别被修改。这些选项值是与指定的"MODATTR_"值进行按位与运算出的,MODATTR_在源程序的include/common.h里定义,默认情况下,全部主机和服务的属性都会被保持。

表 5.34. Syslog日志选项

格式: use_syslog=<0/1>
样例: use_syslog=1

该选项决定了是否将日志信息记录到本地的Syslog中。可用的值有:

  1. 0 = 不使用Syslog机制
  2. 1 = 使用Syslog机制

表 5.35. 通知记录日志选项

格式: log_notifications=<0/1>
样例: log_notifications=1

该选项决定了是否将通知信息记录进行记录,如果有很多联系人或是有规律性的服务故障时,记录文件将会增长很快。使用这个选项来保存已发出的通知记录。

  1. 0 = 不记录通知
  2. 1 = 记录通知

表 5.36. 服务检测重试记录选项

格式: log_service_retries=<0/1>
样例: log_service_retries=1

该选项决定了是否将服务检测重试进行记录。服务检测重试发生在服务检测结果返回一个异常状态信息之时,而且你已经配置Nagios在对故障出现时进行一次以上的服务检测重试。此时有服务状态被认为是处理“软”故障状态。当调试Nagios或对服务的进行测试时记录下服务检测的重试是非常有用的。

  1. 0 = 不记录服务检测重试
  2. 1 = 记录服务检测重试

表 5.37. 主机检测重试记录选项

格式: log_host_retries=<0/1>
样例: log_host_retries=1

该选项决定了是否将主机检测重试进行记录。当调试Nagios或对主机的进行测试时记录下主机检测的重试是非常有用的。

  1. 0 = 不记录主机检测重试
  2. 1 = 记录主机检测重试

表 5.38. 事件处理记录选项

格式: log_event_handlers=<0/1>
样例: log_event_handlers=1

该选项决定了是否将服务和主机的进行记录。一旦发生服务或主机状态迁移时,可选的事件处理命令会被执行。当调试Nagios或首次尝试事件处理脚本时记录下事件处理是非常有用的。

  1. 0 = 不记录事件处理
  2. 1 = 记录事件处理

表 5.39. 初始状态记录选项

格式: log_initial_states=<0/1>
样例: log_initial_states=1

该选项决定了Nagios是否要强行记录全部的主机和服务的初始状态,即便状态报告是OK也要记录。只是在第一次检测发现主机和服务有异常时才会记录下初始状态。如果想用应用程序扫描一段时间内的主机和服务状态以生成统计报告时,使能这个选项将有很有帮助。

  1. 0 = 不记录初始状态(默认)
  2. 1 = 记录初始状态

表 5.40. 外部命令记录选项

格式: log_external_commands=<0/1>
样例: log_external_commands=1

该选项决定了Nagios是否要记录,外部命令是从外部命令文件中提取的。注意:这个选项并不控制是否要对 (一种外部命令类型)进行记录。为使能或关闭对强制服务检测的记录,使用强制检测记录选项。

  1. 0 = 不记录外部命令
  2. 1 = 记录外部命令(默认)

表 5.41. 强制检测记录选项

格式: log_passive_checks=<0/1>
样例: log_passive_checks=1

该选项决定了Nagios是否要记录来自于外部命令文件的命令。如果要设置一个或是计划在规整的基础上要对大量的强制检测的结果进行处理时,需要关闭这个选项以防止日志文件过份增长。

  1. 0 = 不记录强制检测
  2. 1 = 记录强制检测(默认)

表 5.42. 全局主机事件处理选项

格式: global_host_event_handler=
样例: global_host_event_handler=log-host-event-to-db

该选项指定了当每个主机状态迁移时需要执行的主机事件处理命令。全局事件处理命令将优于在每个主机定义的事件处理命令而立即执行。命令参数是在里定义的命令的短名称。由事件处理超时选项控制的这个命令可运行的最大次数。更多的有关事件处理的信息可以查阅。

表 5.43. 全局服务事件处理选项

格式: global_service_event_handler=
样例: global_service_event_handler=log-service-event-to-db

该选项指定了当每个服务状态迁移时需要执行的服务事件处理命令。全局事件处理命令将优于在每个服务定义的事件处理命令而立即执行。命令参数是在里定义的命令的短名称。由事件处理超时选项控制的这个命令可运行的最大次数。更多的有关事件处理的信息可以查阅。

表 5.44. 检测休止时间间隔

格式: sleep_time=
样例: sleep_time=1

它指定了Nagios在进行计划表的下一次服务或主机检测命令执行之前应该休止多少秒。注意Nagios只是在已经进行了服务故障的排队检测之后才会休止。

表 5.45. 服务检测迟滞间隔计数方法

格式: service_inter_check_delay_method=
样例: service_inter_check_delay_method=s

该选项容许你控制服务检测将如何初始展开事件队列。 Using a "smart" delay calculation (the default) will cause Nagios to calculate an average check interval and spread initial checks of all services out over that interval, thereby helping to eliminate CPU load spikes. Using no delay is generally not recommended, as it will cause all service checks to be scheduled for execution at the same time. This means that you will generally have large CPU spikes when the services are all executed in parallel. More information on how to estimate how the inter-check delay affects service check scheduling can be found . Values are as follows:

  1. n = Don't use any delay - schedule all service checks to run immediately (i.e. at the same time!)
  2. d = Use a "dumb" delay of 1 second between service checks
  3. s = Use a "smart" delay calculation to spread service checks out evenly (default)
  4. x.xx = Use a user-supplied inter-check delay of x.xx seconds

表 5.46. 最大服务检测传播时间

格式: max_service_check_spread=
样例: max_service_check_spread=30

This option determines the maximum number of minutes from when Nagios starts that all services (that are scheduled to be regularly checked) are checked. This option will automatically adjust the service inter-check delay method (if necessary) to ensure that the initial checks of all services occur within the timeframe you specify. In general, this option will not have an affect on service check scheduling if scheduling information is being retained using the use_retained_scheduling_info option. 默认值是30分钟。

表 5.47. 服务交错因子

格式: service_interleave_factor=x>
样例: service_interleave_factor=s

This variable determines how service checks are interleaved. Interleaving allows for a more even distribution of service checks, reduced load on remote hosts, and faster overall detection of host problems. Setting this value to 1 is equivalent to not interleaving the service checks (this is how versions of Nagios previous to 0.0.5 worked). Set this value to s (smart) for automatic calculation of the interleave factor unless you have a specific reason to change it. The best way to understand how interleaving works is to watch the (detailed view) when Nagios is just starting. You should see that the service check results are spread out as they begin to appear. More information on how interleaving works can be found .

  1. x = A number greater than or equal to 1 that specifies the interleave factor to use. An interleave factor of 1 is equivalent to not interleaving the service checks.
  2. s = Use a "smart" interleave factor calculation (default)

表 5.48. 最大并发服务检测数

格式: max_concurrent_checks=
样例: max_concurrent_checks=20

该选项可指定在任意给定时间里可被同时运行的服务检测命令的最大数量。如果指定这个值为1,则说明不允许任何并行服务检测,如果指定为0(默认值)则是对并行服务检测。你须按照可运行Nagios的机器上的机器资源情况修改这个值,因为它会直接影响系统最大负荷,它施加于系统(处理器利用率、内存使用率等)之上。更多的关于如何评估需要设置多少并行检测值的信息可以查阅这篇文档。

表 5.49. 检测结果的回收频度

格式: check_result_reaper_frequency=
样例: check_result_reaper_frequency=5

该选项控制检测结果的回收事件的处理频度(以秒为单位)。从主机和服务的检测过程里“回收”事件处理结果将是对已经执行结束的检测。事件的构成在Nagios里是监控逻辑里的核心内容。

表 5.50. 最大检测结果回收时间段

格式: max_check_result_reaper_time=
样例: max_check_result_reaper_time=30

该选项决定主机和服务检测结果回收时对结果回收时间段的控制,这个值是个以秒为单位的最大时间跨度。从主机和服务的检测过程里“回收”事件处理结果将是对已经执行结束的检测。如果有许多结果要处理,回收事件过程将占用很长时间来完成它,这将延迟对新的主机和服务检测的执行。该选项可以限制从检测结果得到与回收处理之间的最大时间间隔以使Nagios可以完成对其他监控逻辑的转换处理。

表 5.51. 检测结果保存路径

格式: check_result_path=
样例: check_result_path=/var/spool/nagios/checkresults

该选项决定了Nagios将在处理检测结果之前使用哪个目录来保存主机和服务检测结果。这个目录不能保存其他文件,因为Nagios会周期性地清理这个目录下的旧文件(更多信息见选项)。

注意:确保只有一个Nagios的实例在操作检测结果保存路径。如果有多个Nagios的实例来操作相同的目录,将会因为错误的Nagios实例不正确地处理导致有错误结果!

表 5.52. 检测结果文件的最大生存时间

格式: max_check_result_file_age=
样例: max_check_result_file_age=3600

该选项决定用最大多少秒来限定那些在设置所指向目录里的检测结果文件是合法的。如果检测结果文件超出了这个门限,Nagios将会把过旧的文件删除而且不会处理内含的检测结果。若设置该选项为0,Nagios将处理全部的检测结果文件-即便这些文件比你的硬件还老旧。

表 5.53. 主机检测迟滞间隔计数方式

格式: host_inter_check_delay_method=
样例: host_inter_check_delay_method=s

This option allows you to control how host checks that are scheduled to be checked on a regular basis are initially "spread out" in the event queue. Using a "smart" delay calculation (the default) will cause Nagios to calculate an average check interval and spread initial checks of all hosts out over that interval, thereby helping to eliminate CPU load spikes. Using no delay is generally not recommended. Using no delay will cause all host checks to be scheduled for execution at the same time. More information on how to estimate how the inter-check delay affects host check scheduling can be found .Values are as follows:

  1. n = Don't use any delay - schedule all host checks to run immediately (i.e. at the same time!)
  2. d = Use a "dumb" delay of 1 second between host checks
  3. s = Use a "smart" delay calculation to spread host checks out evenly (default)
  4. x.xx = Use a user-supplied inter-check delay of x.xx seconds

表 5.54. 最大主机检测传播时间

格式: max_host_check_spread=
样例: max_host_check_spread=30

This option determines the maximum number of minutes from when Nagios starts that all hosts (that are scheduled to be regularly checked) are checked. This option will automatically adjust the host inter-check delay method (if necessary) to ensure that the initial checks of all hosts occur within the timeframe you specify. In general, this option will not have an affect on host check scheduling if scheduling information is being retained using the use_retained_scheduling_info option. Default value is 30 (minutes).

表 5.55. 计数间隔长度

格式: interval_length=
样例: interval_length=60

该选项指定了“单位间隔”是多少秒数,单位间隔用于计数计划队列处理、再次通知等。单位间隔在对象配置文件被用于决定以何频度运行服务检测、以何频度与联系人再通知等。

重要:默认值是60,这说明在对象配置文件里设定的“单位间隔”是60秒(1分钟)。我没测试过其他值,所以如果要用其他值要自担风险!

表 5.56. 自动计划检测选项

格式: auto_reschedule_checks=<0/1>
样例: auto_reschedule_checks=1

该选项决定了Nagios是否要试图自动地进行计划的自主检测主机与服务以使在之后的时间里检测更为“平滑”。这可以使得监控主机保持一个均衡的负载,也使得在持续检测之间的保持相对一致,其代价是要更刚性地按计划执行检测工作。

WARNING: THIS IS AN EXPERIMENTAL FEATURE AND MAY BE REMOVED IN FUTURE VERSIONS. ENABLING THIS OPTION CAN DEGRADE PERFORMANCE - RATHER THAN INCREASE IT - IF USED IMPROPERLY!

表 5.57. Auto-Rescheduling Interval

格式: auto_rescheduling_interval=
样例: auto_rescheduling_interval=30

This option determines how often (in seconds) Nagios will attempt to automatically reschedule checks. This option only has an effect if the auto_reschedule_checks option is enabled. Default is 30 seconds.

WARNING: THIS IS AN EXPERIMENTAL FEATURE AND MAY BE REMOVED IN FUTURE VERSIONS. ENABLING THE AUTO-RESCHEDULING OPTION CAN DEGRADE PERFORMANCE - RATHER THAN INCREASE IT - IF USED IMPROPERLY!

表 5.58. Auto-Rescheduling Window

格式: auto_rescheduling_window=
样例: auto_rescheduling_window=180

This option determines the "window" of time (in seconds) that Nagios will look at when automatically rescheduling checks. Only host and service checks that occur in the next X seconds (determined by this variable) will be rescheduled. This option only has an effect if the auto_reschedule_checks option is enabled. Default is 180 seconds (3 minutes).

WARNING: THIS IS AN EXPERIMENTAL FEATURE AND MAY BE REMOVED IN FUTURE VERSIONS. ENABLING THE AUTO-RESCHEDULING OPTION CAN DEGRADE PERFORMANCE - RATHER THAN INCREASE IT - IF USED IMPROPERLY!

表 5.59. 进取式主机检测选项

格式: use_aggressive_host_checking=<0/1>
样例: use_aggressive_host_checking=0

Nagios tries to be smart about how and when it checks the status of hosts. In general, disabling this option will allow Nagios to make some smarter decisions and check hosts a bit faster. Enabling this option will increase the amount of time required to check hosts, but may improve reliability a bit. Unless you have problems with Nagios not recognizing that a host recovered, I would suggest not enabling this option.

  1. 0 = Don't use aggressive host checking (default)
  2. 1 = Use aggressive host checking

表 5.60. 传递强制主机检测结果选项

格式: translate_passive_host_checks=<0/1>
样例: translate_passive_host_checks=1

This option determines whether or not Nagios will DOWN/UNREACHABLE passive host check results to their "correct" state from the viewpoint of the local Nagios instance. This can be very useful in distributed and failover monitoring installations. More information on passive check state translation can be found .

  1. 0 = Disable check translation (default)
  2. 1 = Enable check translation

表 5.61. Passive Host Checks Are SOFT Option

格式: passive_host_checks_are_soft=<0/1>
样例: passive_host_checks_are_soft=1

This option determines whether or not Nagios will treat as HARD states or SOFT states. By default, a passive host check result will put a host into a . You can change this behavior by enabling this option.

  1. 0 = Passive host checks are HARD (default)
  2. 1 = Passive host checks are SOFT

表 5.62. Predictive Host Dependency Checks Option

格式: enable_predictive_host_dependency_checks=<0/1>
样例: enable_predictive_host_dependency_checks=1

This option determines whether or not Nagios will execute predictive checks of hosts that are being dependended upon (as defined in ) for a particular host when it changes state.

Predictive checks help ensure that the dependency logic is as accurate as possible. More information on how predictive checks work can be found .

  1. 0 = Disable predictive checks
  2. 1 = Enable predictive checks (default)

表 5.63. Predictive Service Dependency Checks Option

格式: enable_predictive_service_dependency_checks=<0/1>
样例: enable_predictive_service_dependency_checks=1

This option determines whether or not Nagios will execute predictive checks of services that are being dependended upon (as defined in ) for a particular service when it changes state.

Predictive checks help ensure that the dependency logic is as accurate as possible. More information on how predictive checks work can be found .

  1. 0 = Disable predictive checks
  2. 1 = Enable predictive checks (default)

表 5.64. Cached Host Check Horizon

格式: cached_host_check_horizon=
样例: cached_host_check_horizon=15

This option determines the maximum amount of time (in seconds) that the state of a previous host check is considered current. Cached host states (from host checks that were performed more recently than the time specified by this value) can improve host check performance immensely. Too high of a value for this option may result in (temporarily) inaccurate host states, while a low value may result in a performance hit for host checks. Use a value of 0 if you want to disable host check caching. More information on cached checks can be found .

表 5.65. Cached Service Check Horizon

格式: cached_service_check_horizon=
样例: cached_service_check_horizon=15

This option determines the maximum amount of time (in seconds) that the state of a previous service check is considered current. Cached service states (from service checks that were performed more recently than the time specified by this value) can improve service check performance when a lot of are used. Too high of a value for this option may result in inaccuracies in the service dependency logic. Use a value of 0 if you want to disable service check caching. More information on cached checks can be found .

表 5.66. Large Installation Tweaks Option

格式: use_large_installation_tweaks=<0/1>
样例: use_large_installation_tweaks=0

This option determines whether or not the Nagios daemon will take several shortcuts to improve performance. These shortcuts result in the loss of a few features, but larger installations will likely see a lot of benefit from doing so. More information on what optimizations are taken when you enable this option can be found .

  1. 0 = Don't use tweaks (default)
  2. 1 = Use tweaks

表 5.67. 子进程内存选项

格式: free_child_process_memory=<0/1>
样例: free_child_process_memory=0

This option determines whether or not Nagios will free memory in child processes when they are fork()ed off from the main process. By default, Nagios frees memory. However, if the option is enabled, it will not. By defining this option in your configuration file, you are able to override things to get the behavior you want.

  1. 0 = Don't free memory
  2. 1 = Free memory

表 5.68. 子进程二次派生选项

格式: child_processes_fork_twice=<0/1>
样例: child_processes_fork_twice=0

This option determines whether or not Nagios will fork() child processes twice when it executes host and service checks. By default, Nagios fork()s twice. However, if the option is enabled, it will only fork() once. By defining this option in your configuration file, you are able to override things to get the behavior you want.

  1. 0 = Fork() just once
  2. 1 = Fork() twice

表 5.69. 环境变量中标准宏可用性选项

格式: enable_environment_macros=<0/1>
样例: enable_environment_macros=0

This option determines whether or not the Nagios daemon will make all standard available as environment variables to your check, notification, event hander, etc. commands. In large Nagios installations this can be problematic because it takes additional memory and (more importantly) CPU to compute the values of all macros and make them available to the environment.

  1. 0 = Don't make macros available as environment variables
  2. 1 = Make macros available as environment variables (default)

表 5.70. Flap Detection Option

格式: enable_flap_detection=<0/1>
样例: enable_flap_detection=0

This option determines whether or not Nagios will try and detect hosts and services that are "flapping". Flapping occurs when a host or service changes between states too frequently, resulting in a barrage of notifications being sent out. When Nagios detects that a host or service is flapping, it will temporarily suppress notifications for that host/service until it stops flapping. Flap detection is very experimental at this point, so use this feature with caution! More information on how flap detection and handling works can be found .注意:如果你使能状态保持选项(保存于状态保持文件中)而忽略这个设置,除非你已经关闭选项。如果你想在保持选项使能(且选项使能)的情况下修改这个选项,你只得用适当的或是通过Web接口来修改它。选项可用的值有:

  1. 0 = Don't enable flap detection (default)
  2. 1 = Enable flap detection

表 5.71. Low Service Flap Threshold

格式: low_service_flap_threshold=
样例: low_service_flap_threshold=25.0

This option is used to set the low threshold for detection of service flapping. For more information on how flap detection and handling works (and how this option affects things) read .

表 5.72. High Service Flap Threshold

格式: high_service_flap_threshold=
样例: high_service_flap_threshold=50.0

This option is used to set the low threshold for detection of service flapping. For more information on how flap detection and handling works (and how this option affects things) read .

表 5.73. Low Host Flap Threshold

格式: low_host_flap_threshold=
样例: low_host_flap_threshold=25.0

This option is used to set the low threshold for detection of host flapping. For more information on how flap detection and handling works (and how this option affects things) read .

表 5.74. High Host Flap Threshold

格式: high_host_flap_threshold=
样例: high_host_flap_threshold=50.0

This option is used to set the low threshold for detection of host flapping. For more information on how flap detection and handling works (and how this option affects things) read .

表 5.75. Soft State Dependencies Option

格式: soft_state_dependencies=<0/1>
样例: soft_state_dependencies=0

This option determines whether or not Nagios will use soft state information when checking . Normally Nagios will only use the latest hard host or service state when checking dependencies. If you want it to use the latest state (regardless of whether its a soft or hard ), enable this option.

  1. 0 = Don't use soft state dependencies (default)
  2. 1 = Use soft state dependencies

表 5.76. 服务检测超时

格式: service_check_timeout=
样例: service_check_timeout=60

This is the maximum number of seconds that Nagios will allow service checks to run. If checks exceed this limit, they are killed and a 紧急 state is returned. A timeout error will also be logged.

There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off plugins which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more), so that each service check normally finishes executing within this time limit. If a service check runs longer than this limit, Nagios will kill it off thinking it is a runaway processes.

表 5.77. 主机检测超时

格式: host_check_timeout=
样例: host_check_timeout=60

This is the maximum number of seconds that Nagios will allow host checks to run. If checks exceed this limit, they are killed and a 紧急 state is returned and the host will be assumed to be DOWN. A timeout error will also be logged.

There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off plugins which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more), so that each host check normally finishes executing within this time limit. If a host check runs longer than this limit, Nagios will kill it off thinking it is a runaway processes.

表 5.78. 事件处理超时

格式: event_handler_timeout=
样例: event_handler_timeout=60

This is the maximum number of seconds that Nagios will allow to be run. If an event handler exceeds this time limit it will be killed and a warning will be logged.

There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off commands which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more), so that each event handler command normally finishes executing within this time limit. If an event handler runs longer than this limit, Nagios will kill it off thinking it is a runaway processes.

表 5.79. 通知超时

格式: notification_timeout=
样例: notification_timeout=60

This is the maximum number of seconds that Nagios will allow notification commands to be run. If a notification command exceeds this time limit it will be killed and a warning will be logged.

There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off commands which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more), so that each notification command finishes executing within this time limit. If a notification command runs longer than this limit, Nagios will kill it off thinking it is a runaway processes.

表 5.80. Obsessive Compulsive Service Processor Timeout

格式: ocsp_timeout=
样例: ocsp_timeout=5

This is the maximum number of seconds that Nagios will allow an obsessive compulsive service processor command to be run. If a command exceeds this time limit it will be killed and a warning will be logged.

表 5.81. Obsessive Compulsive Host Processor Timeout

格式: ochp_timeout=
样例: ochp_timeout=5

This is the maximum number of seconds that Nagios will allow an obsessive compulsive host processor command to be run. If a command exceeds this time limit it will be killed and a warning will be logged.

表 5.82. 性能数据处理命令超时

格式: perfdata_timeout=
样例: perfdata_timeout=5

This is the maximum number of seconds that Nagios will allow a host performance data processor command or service performance data processor command to be run. If a command exceeds this time limit it will be killed and a warning will be logged.

表 5.83. Obsess Over Services Option

格式: obsess_over_services=<0/1>
样例: obsess_over_services=1

This value determines whether or not Nagios will "obsess" over service checks results and run the obsessive compulsive service processor command you define. I know - funny name, but it was all I could think of. This option is useful for performing . If you're not doing distributed monitoring, don't enable this option.

  1. 0 = Don't obsess over services (default)
  2. 1 = Obsess over services

表 5.84. Obsessive Compulsive Service Processor Command

格式: ocsp_command=
样例: ocsp_command=obsessive_service_handler

This option allows you to specify a command to be run after every service check, which can be useful in . This command is executed after any or commands. The command argument is the short name of a that you define in your 对象配置文件. The maximum amount of time that this command can run is controlled by the ocsp_timeout option. More information on distributed monitoring can be found . This command is only executed if the obsess_over_services option is enabled globally and if the obsess_over_service directive in the is enabled.

表 5.85. Obsess Over Hosts Option

格式: obsess_over_hosts=<0/1>
样例: obsess_over_hosts=1

This value determines whether or not Nagios will "obsess" over host checks results and run the obsessive compulsive host processor command you define. I know - funny name, but it was all I could think of. This option is useful for performing . If you're not doing distributed monitoring, don't enable this option.

  1. 0 = Don't obsess over hosts (default)
  2. 1 = Obsess over hosts

表 5.86. Obsessive Compulsive Host Processor Command

格式: ochp_command=
样例: ochp_command=obsessive_host_handler

This option allows you to specify a command to be run after every host check, which can be useful in . This command is executed after any or commands. The command argument is the short name of a that you define in your 对象配置文件. The maximum amount of time that this command can run is controlled by the ochp_timeout option. More information on distributed monitoring can be found . This command is only executed if the obsess_over_hosts option is enabled globally and if the obsess_over_host directive in the is enabled.

表 5.87. 性能数据处理选项

格式: process_performance_data=<0/1>
样例: process_performance_data=1

该选项决定Nagios是否要处理主机和服务检测。

  1. 0 = Don't process performance data (default)
  2. 1 = Process performance data

表 5.88. 主机性能数据处理命令

格式: host_perfdata_command=
样例: host_perfdata_command=process-host-perfdata

This option allows you to specify a command to be run after every host check to process host that may be returned from the check. The command argument is the short name of a that you define in your 对象配置文件. This command is only executed if the process_performance_data option is enabled globally and if the process_perf_data directive in the is enabled.

表 5.89. 服务性能数据处理命令

格式: service_perfdata_command=
样例: service_perfdata_command=process-service-perfdata

This option allows you to specify a command to be run after every service check to process service that may be returned from the check. The command argument is the short name of a that you define in your 对象配置文件. This command is only executed if the process_performance_data option is enabled globally and if the process_perf_data directive in the is enabled.

表 5.90. 主机性能数据文件

格式: host_perfdata_file=
样例: host_perfdata_file=/usr/local/nagios/var/host-perfdata.dat

This option allows you to specify a file to which host will be written after every host check. Data will be written to the performance file as specified by the host_perfdata_file_template option. Performance data is only written to this file if the process_performance_data option is enabled globally and if the process_perf_data directive in the is enabled.

表 5.91. 服务性能数据文件

格式: service_perfdata_file=
样例: service_perfdata_file=/usr/local/nagios/var/service-perfdata.dat

This option allows you to specify a file to which service will be written after every service check. Data will be written to the performance file as specified by the option. Performance data is only written to this file if the process_performance_data option is enabled globally and if the process_perf_data directive in the is enabled.

表 5.92. 主机性能数据文件模板

格式: host_perfdata_file_template=