2011-12-05 10:04:15
Nagios is an open source host, service andwork monitoring program. Who uses it? Lots of people, including many big companies and organizations。
一. 相关包的下载:
可以先下载到windows在通过mount 挂载到redhat上,也可以通过wget 直接下载:
Nagios Core OSS consists of various Open Source components that provide the foundation for rock-solid IT infrastructure monitoring. Download all the components you need to get started.
Step 1 - Get Nagios Core
Required. Contains the core monitoring application and web interface.
# ./wget http://ncu.dl.sourceforge.net/project/nagios/nagios-3.x/nagis-3.2.1/nagios-3.2.1.tar.gz
Step 2 - Get Nagios Plugins
Also Required. Allows you to monitor services, applications, metrics, and more.
# ./wget http://ncu.dl.sourceforge.net/project/nagiosplug/nagiosplug/.4.14/nagios-plugins-1.4.14.tar.gz
Step 3 - Get Nagios Addons
Trick out your Nagios install by extending its capabilities with hundreds of community-contributed addons.
# ./wget http://ncu.dl.sourceforge.net/project/nagios/nrpe-2.x/nrpe-212/nrpe-2.12.tar.gz
# ./wget http://ncu.dl.sourceforge.net/project/nagios/nsca-2.x/nsca-27.2/nsca-2.7.2.tar.gz
# ./wget http://ncu.dl.sourceforge.net/project/nagios/ndoutils-1.x/ndutils-1.4b9/ndoutils-1.4b9.tar.gz
NRPE allows you to remotely execute Nagios plugins on other Linux/Unix machines. This allows you to monitor remote machine metrics (disk usage, CPU load, etc.). NRPE can also communicate with Windows agent addons like NSClient++, so you can check metrics on remote Windows machines as well.
NSCA allows you to integrate passive alerts and checks from remote machines and applications with Nagios. Useful for processing security alerts, as well as deploying redundant and distributed Nagios setups.
NDOUtils allows you to export current and historical data from one or more Nagios instances to a MySQL database. Experimental/beta at this point in time, but several community addons use this as one of their data sources.
监控windows 需要NSClient,下载:
How To Monitor Remote Windows Machine Using Nagios on Linux
监控Linux,需要安装NRPE 包 和 plugins 程序。
Web 服务器:
二. Nagios 安装
2.1 linux 下软件安装步骤
1)解压:Tar zxvf *.gz
2)运行: ./configure,会生成makefile文件。
3)编译: ./make all
4)安装软件: ./make install,
2.2 安装顺序:
1). 安装Nagios
... ...
Creating sample config files in sample-config/ ...
*** Configuration summary for nagios 3.2.1 03-09-2010 ***:
General Options:
Nagios executable: nagios
Nagios user/group: nagios,nagios
Command user/group: nagios,nagios
Embedded Perl: no
Event Broker: yes
Install ${prefix}: /usr/local/nagios
Lock file: ${prefix}/var/nagios.lock
Check result directory: ${prefix}/var/spool/checkresults
Init directory: /etc/rc.d/init.d
Apache conf.d directory: /etc/httpd/conf.d
Mail program: /bin/mail
Host OS: linux-gnu
Web Interface Options:
CGI URL: cgi-bin/
Traceroute (used by WAP): /bin/traceroute
Review the options above for accuracy. If they look okay,
type 'make all' to compile the main program and CGIs.
#groupadd nagios
#useradd -g nagios nagios
#passwd nagios
[root@Dave nagios-3.2.1]# cd ./base
[root@Dave base]# pwd
[root@Dave base]#./make all
If the main program and CGIs compiled without any errors, you
can continue with installing Nagios as follows (type 'make'
without any arguments for a list of all possible options):
make install
- This installs the main program, CGIs, and HTML files
make install-init
- This installs the init script in /etc/rc.d/init.d
make install-commandmode
- This installs and configures permissions on the
directory for holding the external command file
make install-config
- This installs *SAMPLE* config files in /usr/local/nagios/etc
You'll have to modify these sample files before you can
use Nagios. Read the HTML documentation for more info
on doing this. Pay particular attention to the docs on
object configuration files, as they determine what/how
things get monitored!
make install-webconf
- This installs the Apache config file for the Nagios
web interface
根据上面的提示,在make install 的时候我们要分别执行:
make install
make install-init
make install-commandmode
make install-config
make install-webconf
验证程序是否被正确安装。切换目录到安装路径(这里是/usr/local/nagios),看是否存在 etc、bin、 sbin、 share、 var这五个目录,如果存在则可以表明程序被正确的安装到系统了。
bin |
Nagios执行程序所在目录,这个目录只有一个文件nagios |
etc |
Nagios配置文件位置,初始安装完后,只有几个*.cfg-sample文件 |
sbin |
Nagios Cgi文件所在目录,也就是执行外部命令所需文件所在的目录 |
Share |
Nagios网页文件所在的目录 |
Var |
Nagios日志文件、spid 等文件所在的目录 |
2). 插件安装:
Make install
3). 安装web服务器apache
1、 解包、配置:tar zxvf httpd-*.tar.gz ; ./configure –prefix=/usr/local/apache 。
2、 编译安装: make ; make install 。
安装完成后,执行命令 ./usr/local/apache/bin/apachectl –t 检查一下apache是否正确安装
4) -Get Nagios Addons。 在待监控的机子上安装NRPE for linux. NSClient for windows 这个在后面配置的时候有说明
Make install
三.Nagios 配置前的准备
1、添加系统帐户nagios, 在安装Nagios的时候,已经做过这一步.
#groupadd nagios
#useradd -g nagios nagios
#passwd nagios
2、更改目录属组:chown –R nagios.nagios /usr/local/nagios 。请注意,有的/linux的版本用户和属组分隔符号不是“.”,可能会是这样的形式 chown -R nagios:nagios /usr/local/nagios 。
3、sendmail。我们需要使用sendmail来发送故障报警信息,所以这个包必须能够正常工作。 现在一般的Linux系统都自带了sendmail,我们只需要启动sendmail即可。
四. Nagios 的配置
4.1 Apache 配置:
A sample Apache config file snippet is created when you run the configure script - you can find the sample config file (named httpd.conf) in the sample-config/ subdirectory of the Nagios distribution. You will need to add the contents of this file to your Apache configuration files before you can access the Nagios web interface. The instructions found below detail how to manually add the appropriate configuration entries to Apache.
Configure Aliases and Directory Options For The Web Interface
First you'll need to create appropriate entries for the Nagios web interface (HTML and CGIs) in your web server config file. Add the following snippet to your web server configuration file (i.e. httpd.conf), changing it to match any directory differences on your system.
ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user
Alias /nagios /usr/local/nagios/share
Options None
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user
Note: The default Nagios installation expects to find the HTML files and CGIs at
Important! If you are installing Nagios on a multi-user system, you may want use to provide additional security between the CGIs and the . If you decide to use CGIWrap, the ScriptAlias you'll end up using will most likely be different from that mentioned above. More information on doing this can be found here.
Restart The Web Server
Once you've finished editing the Apache configuration file, you'll need to restart the web server with a command like this...
/etc/rc.d/init.d/httpd restart
Configure Web Authentication
Once you have installed the web interface properly, you'll need to specify who can access the Nagios web interface.
If you haven't done so already, you'll need to add the appropriate entries to your web server config file to enable basic authentication for the CGI and HTML portions of the Nagios web interface. Instructions for doing so can be found .
Now that you've configured your web server to require authentication for the Nagios web interface, you'll need to specify who has access. This is done by using the htpasswd command supplied with Apache.
Running the following command will create a new file called htpasswd.users in the /usr/local/nagios/etc directory. It will also create an username/password entry for nagiosadmin. You will be asked to provide a password that will be used when nagiosadmin authenticates to the web server.
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
Continue adding more users until you've created an account for everyone you want to access the CGIs. Use the following command to add additional users, replacing
htpasswd /usr/local/nagios/etc/htpasswd.users
Okay, so you're done with the first part of what needs to be done. At this point you should be prompted for a username and password if you point your web browser to the Nagios web interface. If you have problems getting user authentication to work at this point, read your webserver documentation for more info.
Verify Your Changes
Don't forget to check and see if the changes you made to Apache work. You should be able to point your web browser at and get the web interface for Nagios. The CGIs may not display any information, but this will be remedied once you configure everything and start Nagios.
The information from :Setting Up The Web Interface
4.2 Nagios 的配置文件
Objects are all the elements that are involved in the monitoring and notification logic. Types of objects
Service Groups
Host Groups
Contact Groups
Time Periods
Notification Escalations
Notification and Execution Dependencies
1)/usr/local/nagios/etc/nagios.cfg。 -- Nagios 的主要配置文件
# You can specify individual object config files as shown below:
# Definitions for monitoring the local (Linux) host
# Definitions for monitoring a Windows machine
# Definitions for monitoring a router/switch
# Definitions for monitoring a network printer
里面还有与是否发送警告的相关配置参数. 默认都是启动的。
Nagios 的配置修改之后要
[root@Dave objects]# service nagios restart
#chkconfig nagios on
2) /usr/local/nagios/etc/cgi.cfg
cgi.cfg 内容如下:
如果有多个用户,用逗号隔开, 这些用户就是在apache配置中添加的:
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
3) /usr/local/nagios/etc/objects/timeperiods.cfg
define timeperiod{
timeperiod_name 24x7
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
define timeperiod{
timeperiod_name none
alias No Time Is A Good Time
define timeperiod{
name us-holidays
timeperiod_name us-holidays
alias U.S. Holidays
january 1 00:00-00:00 ; New Years
monday -1 may 00:00-00:00 ; Memorial Day (last Monday in May)
july 4 00:00-00:00 ; Independence Day
monday 1 september 00:00-00:00 ; Labor Day (first Monday in Sep tember)
thursday -1 november 00:00-00:00 ; Thanksgiving (last Thursday in November)
december 25 00:00-00:00 ; Christmas
define timeperiod{
timeperiod_name 24x7_sans_holidays
alias 24x7 Sans Holidays
use us-holidays ; Get holiday exceptions from ot her timeperiod
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
4) /usr/local/nagios/etc/objects/Commands.cfg
# 'notify-host-by-email' command definition
define command{
command_name notify-host-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****/n/nNotification
DDRESS$/nInfo: $HOSTOUTPUT$/n/nDate/Time: $LONGDATETIME$/n" | /bin/mail -s "** $
# 'notify-service-by-email' command definition
define command{
command_name notify-service-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****/n/nNotification
$HOSTADDRESS$/nState: $SERVICESTATE$/n/nDate/Time: $LONGDATETIME$/n/nAdditional
Info:/n/n$SERVICEOUTPUT$" | /bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert:
5) /usr/local/nagios/etc/objects/contacts.cfg 文件, 该文件保存是接收报警联系人的信息
define contact{
contact_name Dave
use generic-contact
alias Dave
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-service-by-email
host_notification_commands notify-host-by-email
email tianlesoftware@vip.qq.com
define contact{
contact_name nagiosadmin
use generic-contact
alias Nagios Admin
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-service-by-email
host_notification_commands notify-host-by-email
email tianlesoftware@qq.com
define contactgroup{
contactgroup_name admins
alias Nagios Administrators
members nagiosadmin,Dave
上面的文件定义了2个联系人,如果有更多联系人的话,照这个格式在后面追加即可。在contactgroup里, 多个成员之间用逗号做分界符,如果有更多的联系组,就依相同的格式在文件中追加余下的组。
Table 1. Service notification options
Notify on transition |
Option |
WARNING service states |
w |
UNKNOWN service states |
u |
CRITICAL service states |
c |
Service RECOVERY states |
r |
Send NO service notifications |
n |
Table 2. Host notification options
Notify on transition |
Option |
DOWN host states |
d |
UNREACHABLE host states |
u |
HOST RECOVERIES (return to UP state) |
r |
Send NO host notifications |
n |
7 ) /usr/local/nagios/etc/objects/windows.cfg 监控windows的客户端及监控配置文件
这些文件的参数可以可以参考templates.cfg 文件。 Windows 和Linux 的格式有些出入, 这点要注意。
define host{
use windows-server
alias My Windows Server
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period 24x7
notification_interval 30
notification_options d,u,r
contact_groups admins
hostgroups windows-servers
define host{
use windows-server
alias My Windows
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period 24x7
notification_interval 30
notification_options d,u,r
contact_groups admins
hostgroups windows-servers
define hostgroup{
hostgroup_name windows-servers ; The name of the hostgroup
alias Windows Servers ; Long name of the group
define service{
use generic-service
service_description NSClient++ Version
check_command check_nt!CLIENTVERSION
define service{
use generic-service
service_description Uptime
check_command check_nt!UPTIME
define service{
use generic-service
service_description CPU Load
check_command check_nt!CPULOAD!-l 5,80,90
define service{
use generic-service
service_description Memory Usage
check_command check_nt!MEMUSE!-w 80 -c 90
define service{
use generic-service
service_description C:/ Drive Space
check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90
define service{
use generic-service
service_description W3SVC
check_command check_nt!SERVICESTATE!-d SHOWALL -l W3SVC
define service{
use generic-service
service_description Explorer
check_command check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe
如果有多台windows PC, 只需要把相关的属性复制一下,然后把host_name改成对应PC的就可以了。
在安装windows 客户端的时候NSClient 参数也需要做一些,具体参考blog:
How To Monitor Remote Windows Machine Using Nagios on Linux
8 ) /usr/local/nagios/etc/objects/localhost.cfg 监控Linux的客户端及监控配置文件, 这个问题文件名也可以自己指定,在nagios.cfg 中修改即可。
# Define a host for the local machine
define host{
use linux-server
host_name localhost
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period 24x7
notification_interval 120
notification_options d,u,r
contact_groups admins
define host{
use linux-server
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period 24x7
notification_interval 120
notification_options d,u,r
contact_groups admins
# Define an optional hostgroup for Linux machines
define hostgroup{
hostgroup_name linux-servers
alias Linux Servers
members localhost,
# Define a service to "ping" the local machine
define service{
use local-service ; Name of service template to use
service_description PING
check_command check_ping!100.0,20%!500.0,60%
# Define a service to check the disk space of the root partition
# on the local machine. Warning if < 20% free, critical if
# < 10% free space on partition.
define service{
use local-service ; Name of service template to use
service_description Root Partition
check_command check_local_disk!20%!10%!/
# Define a service to check the number of currently logged in
# users on the local machine. Warning if > 20 users, critical
# if > 50 users.
define service{
use local-service ; Name of service template to use
service_description Current Users
check_command check_local_users!20!50
# Define a service to check the number of currently running procs
# on the local machine. Warning if > 250 processes, critical if
# > 400 users.
define service{
use local-service ; Name of service template to use
service_description Total Processes
check_command check_local_procs!250!400!RSZDT
# Define a service to check the load on the local machine.
define service{
use local-service ; Name of service template to use
service_description Current Load
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
# Define a service to check the swap usage the local machine.
# Critical if less than 10% of swap is free, warning if less than 20% is free
define service{
use local-service ; Name of service template to use
service_description Swap Usage
check_command check_local_swap!20!10
# Define a service to check SSH on the local machine.
# Disable notifications for this service by default, as not all users may have SSH enabled.
define service{
use local-service ; Name of service template to use
service_description SSH
check_command check_ssh
notifications_enabled 0
# Define a service to check HTTP on the local machine.
# Disable notifications for this service by default, as not all users may have HTTP enabled.
define service{
use local-service ; Name of service template to use
service_description HTTP
check_command check_http
notifications_enabled 0
如果有多台Linux, 只需要把相关的属性复制一下,然后把host_name改成对应PC的就可以了。Linux的客户端需要安装NRPE 包 和 plugins 程序。
9) 验证:
运行程序/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg来检查所有配置文件的正确性。如果运行完毕将在输出尾部出现
Total Warnings: 0
Total Errors: 0
则配置正确,如果有错误,提示也是很明显。 修改过来即可。
Error: Invalid hostgroup object directive 'membes'.
Error: Could not add object property in file '/usr/local/nagios/etc/objects/windows.cfg' on line 58.
Error processing object config files!
4.3 配置Nagios,通过飞信将警报发送到手机
4.3.1 飞信安装
[root@localhost src]# tar zxvf library_linux.tar.gz
[root@localhost src]# mv libACE* libcrypto.so.0.9.8 libssl.so.0.9.8 /usr/lib
[root@localhost src]# tar zxvf fetion20090406003-linux.tar.gz
[root@localhost src]# mv install /usr/local/fetion
[root@localhost src]# chmod -R 755 /usr/local/fetion
[root@localhost src]# chown -R nagios:nagios /usr/local/fetion
[root@localhost src]# tar zxvf fetion20091117-linux.tar.gz
[root@localhost src]# cp fx/* /usr/local/fetion
[root@localhost src]# vi /etc/ld.so.conf
include ld.so.conf.d/*.conf
/usr/local/fetion #增加目录
[root@localhost src]# ldconfig
[root@localhost src]#/usr/local/fetion/fetion --mobile=138***** --pwd=*** --to=138***** --msg-utf8="test" --debug
注意:这里的password 是飞信的登陆密码。 如果能收到短信,飞信就安装完成了。
[root@localhost src]#/usr/local/fetion/fetion ##帮助
[root@localhost src]#cp /usr/local/fetion/fetion /usr/bin/
4.3.2 Nagios 中的飞信配置
1)在commads.cfg 文件中添加2个选项
define command {
command_name notify-host-by-fetion
command_line /usr/bin/fetion --mobile=13865997399 --pwd=woshidmm --to=$CONTACTPAGER$ --msg-utf8="Host $HOSTSTATE$ alert for $HOSTNAME$! on '$LONGDATETIME$'" $CONTACTPAGER$
define command {
command_name notify-service-by-fetion
command_line /usr/bin/fetion --mobile=13865997399 --pwd=woshidmm --to=$CONTACTPAGER$ --msg-utf8="$HOSTADDRESS$ $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ on $LONGDATETIME$" $CONTACTPAGER$
2) 在contacts.cfg 的联系人中添加 pager 选项, 和相关命令的调用。
define contact{
contact_name nagiosadmin
use generic-contact
alias Nagios Admin
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-service-by-email,notify-service-by-fetion
host_notification_commands notify-host-by-email,notify-host-by-fetion
email daimm@sf-express.com
pager 13888888888,13888888888
重启Nagios: service nagios reload
启动apache 服务:
[root@Dave bin]# service httpd start
启动nagios 服务:
[root@Dave bin]# service nagios start
4.4 测试:
在IE中输入地址: 就可以看到管理界面了。 如果有更多的服务器,建议使用数据来管理监控对象。
以上内容是Nagios 的简单的安装部署应用, 如果说要监听打印机,交换机之类的,还需要安装对应的插件。 等以后有空的时候在慢慢研究了. 如果感兴趣的也可以研究下Nagios 的文档。