2011年(7)
分类: LINUX
2011-10-21 22:38:15
Nagios是一款用于系统和网络监控的应用程序。它可以在你设定的条件下对主机和服务进行监控,在状态变差和变好的时候给出告警信息。
nagios特征简单说明:
监控网络服务(SMTP、POP3、HTTP、NNTP、PING等);
监控主机资源(处理器负荷、磁盘利用率等);
简单地插件设计使得用户可以方便地扩展自己服务的检测方法;
并行服务检查机制;
具备定义网络分层结构的能力,用"parent"主机定义来表达网络主机间的关系,这种关系可被用来发现和明晰主机宕机或不可达状态;
当服务或主机问题产生与解决时将告警发送给联系人(通过EMail、短信、用户定义方式);
具备定义事件句柄功能,它可以在主机或服务的事件发生时获取更多问题定位;
自动的日志回滚;
可以支持并实现对主机的冗余监控;
可选的WEB界面用于查看当前的网络状态、通知和故障历史、日志文件等
nagios-3.0.6.tar.gz -----------------------主程序
nagios-plugins-1.4.13.tar.gz------------------插件
nrpe_2.8.1.tar.gz --------------------------监控Linux需要
nsclient 0.3.5 ---------------------------监控windows需要
nagios服务器端(192.168.1.176)
linux被监控端 (192.168.1.175)
一。安装
nagios服务器端配置
1.准备软件包 (我偷懒了,嘿嘿)
yum install httpd
yum install gcc
yum install glibc glibc-common
yum install gd gd-devel
yum install mysql mysql-server mysql-devel
yum install gnutls
2.建立用户
useradd nagios
passwd nagios
建立一个用户组名为nagcmd组,用于web借口执行外部命令。并将nagios用户和apache用户都加到这个组中
groupadd nagcmd
usermod -G nagcmd nagios
usermod -G nagcmd apache
3.下载nagios和插件程序包
wget
wget
wget
4.安装nagiso
tar xzf nagios-3.0.6.tar.gz
cd nagios-3.0.6.tar.gz
运行Nagios配置脚本并使用先前开设的用户及用户组:
./configure --with-group=nagios --with-user=nagios --with-command-group=nagcmd --with-gd-lib=/usr/lib --with-gd-inc=/usr/include
编译Nagios程序包源码
make all
安装二进制运行程序、初始化脚本、配置文件样本并设置运行目录权限
make install
make install-init
make install-config
make install-commandmode
5.定义收报警邮件的邮箱
vi /usr/local/nagios/etc/objects/contacts.cfg
更改email地址nagiosadmin的联系人定义信息中的EMail信息为你的EMail信息以接收报警内容。
6.配置web接口
安装Nagios 的WEB配置文件到Apache的conf.d 目录下
make install-webconf
创建一个nagiosadmin 的用户用于Nagios的WEB接口登录。记下你所设置的登录口令,一会儿你会用到它。
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
输入密码 (记住密码,这是你等下登陆nagios web页面的用户名和密码)
重启Apache服务以使设置生效。
service httpd restart
chown -R nagiso.nagiso /usr/local/nagios/etc/htpasswd.users
(这个一定要修改,这个属主权限没有更改为nagios的话,web页面很多没有权限打开,我因为这个,调试了很久)
编辑httpd.conf配置文件
vi /etc/httpd/conf/httpd.conf
在配置文件最后添加如下内容
ScriptAlias "/nagios/cgi-bin" "/usr/local/nagios/sbin"
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user
Alias /nagios "/usr/local/nagios/share"
Options None
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user
重启apache
killall httpd
service httpd restart
[root@duoduo-test /]# service httpd restart
Stopping httpd: [ OK ]
Starting httpd: [Fri Mar 26 00:51:01 2010] [warn] The ScriptAlias directive in /etc/httpd/conf/httpd.conf at line 992 will probably never match because it overlaps an earlier ScriptAlias.
[Fri Mar 26 00:51:01 2010] [warn] The Alias directive in /etc/httpd/conf/httpd.conf at line 1003 will probably never match because it overlaps an earlier Alias.
httpd: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName
[ OK ]
重启httpd服务,会出现警告信息,但是不会影响nagios的运行,此问题我在网上查询了很久的资料,没有明确的方法
7.编译比安装nagios插件
tar -zxvf nagios-plugins-1.4.11.tar.gz
cd nagios-plugins-1.4.11
./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-gourp=nagios
make&&make install
8.验证nagios的样例配置文件
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
出现这样的就代表没有错误,假如有错误,会指出哪个配置文件哪行有错误,只要去修改就行
[root@duoduo-test local]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios 3.0.6
Copyright (c) 1999-2008 Ethan Galstad ()
Last Modified: 12-01-2008
License: GPL
Reading configuration data...
Running pre-flight check on configuration data...
Checking services...
Checked 19 services.
Checking hosts...
Checked 2 hosts.
Checking host groups...
Checked 1 host groups.
Checking service groups...
Checked 0 service groups.
Checking contacts...
Checked 1 contacts.
Checking contact groups...
Checked 1 contact groups.
Checking service escalations...
Checked 0 service escalations.
Checking service dependencies...
Checked 0 service dependencies.
Checking host escalations...
Checked 0 host escalations.
Checking host dependencies...
Checked 0 host dependencies.
Checking commands...
Checked 25 commands.
Checking time periods...
Checked 5 time periods.
Checking for circular paths between hosts...
Checking for circular host and service dependencies...
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
[root@duoduo-test local]#
chkconfig --add nagios
chkconfig nagios on
如果没有报错,可以启动Nagios服务
service nagios start
9.关闭selinux
vi /etc/sysconfig/selinux
SELINUX=disabled
将selinux设置为disabled状态,重启系统使selinux配置生效
10。登陆web界面查看nagiso
输入刚刚设置的nagiosadmin的用户民和密码就ok
另外。我配置的时候遇到了2个问题
(1)关于cgi的权限问题无法分配
修改/usr/local/nagios的属主组权限为nagios
(2)页面无法显示的
编辑vi /usr/local/nagios/etc/cgi.cfg
use_authentication=1修改为0
安装完毕!!!!
二。监控配置
linux系统
1.被监控端端配置(192.168.1.175),需要安装nrpe_2.8.1.tar.gz和插件nagios-plugins-1.4.13.tar.gz
useradd nagios (新建用户nagios)
passwd nagios (修改密码)
wget (下载nagiso插件)
tar -zxvf nagios-plugins-1.4.13.tar.gz
cd nagios-plugins-1.4.13
./configure
make
make install
编译完后,会在/usr/local/nagios/下生成两个目录libexec和share,请查看
chown -R nagios.nagios /usr/local/nagios (修改目录属主)
2.安装nrpe
tar -zxvf nrpe_2.8.1.tar.gz
cd nrpe_2.8.1
./configure
make all
make install-plugin
make install-daemon
make install-daemon-config
vi /usr/local/nagios/etc/nrpe.cfg
将allowed_hosts=127.0.0.1改为192.168.1.176(我的nagios服务器端)
修改成你的nagios服务器的ip
启动nrpe
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
查看5666端口是否已监听,防火墙开放5666端口
netstat -antl | grep 5666
可以看到里面监控对象
vi /usr/local/nagios/etc/nrpe.cfg
# The following examples use hardcoded command arguments...
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
nagiso服务器上配置(192.168.1.175)
1。安装nrpe
tar -zxvf nagios-nrpe_2.12.tar.gz
cd nagios-nrpe_2.12
./configure
make all
make install-plugin
测试连通性
/usr/local/nagios/libexec/check_nrpe -H 被监控端的IP
[root@duoduo-test local]# /usr/local/nagios/libexec/check_nrpe -H 192.168.1.175
NRPE v2.8.1
如果返回nrpe的版本号,就正常啰
如果返回连接拒绝,那就先telnet ip 5666,然后在查看iptables的策略
3.修改配置文件
1)。定义nrpe
由于nrpe为外构组件,所以必须在commands.cfg中定义
[root@duoduo-test local]# vi /usr/local/nagios/etc/objects/commands.cfg
配置文件最下面添加
#check nrpe
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
2.定义监控对象的配置文件
vi /usr/local/nagios/etc/nagios.cfg
添加
cfg_file=/usr/local/nagios/etc/objects/linuxserver.cfg
配置文件名linuxserver.cfg可以自己更改但是要以.cfg为后缀
新建linuxserver.cfg
vi /usr/local/nagios/etc/objects/linuxserver.cfg
添加
define host{
use linux-server
host_name aiyo-mailserver
alias aiyo-mailserver
address 210.51.47.213
}
define service{
use generic-service
host_name aiyo-mailserver
service_description HTTP
check_command check_http
}
define service{
use generic-service
host_name aiyo-mailserver
service_description FTP
check_command check_ftp
}
define service{
use generic-service
host_name aiyo-mailserver
service_description SSH
check_command check_ssh
}
define service{
use generic-service
host_name aiyo-mailserver
service_description SMTP
check_command check_smtp
}
define service{
use generic-service
host_name aiyo-mailserver
service_description POP3
check_command check_pop
}
define service{
use generic-service
host_name aiyo-mailserver
service_description check-swap
check_command check_nrpe!check_swap
}
define service{
use generic-service
host_name aiyo-mailserver
service_description check-load
check_command check_nrpe!check_load
}
define service{
use generic-service
host_name aiyo-mailserver
service_description check-disk
check_command check_nrpe!check_had1
}
define service{
use generic-service
host_name aiyo-mailserver
service_description zombie_procs
check_command check_nrpe!check_zombie_procs
}
define service{
use generic-service
host_name aiyo-mailserver
service_description check-users
check_command check_nrpe!check_users
}
define service{
use generic-service
host_name aiyo-mailserver
service_description total_procs
check_command check_nrpe!check_total_procs
}
保存退出
此配置文件中定义了对象和服务
2.检测配置文件的正确性
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
重启nagios,查看web页面
4.针对mysql监控
定义mysql命令
vi /usr/local/nagios/etc/commands.cfg
在最后增加
# 'check_mysql' command definition
define command{
command_name check_mysql
command_line $USER1$/check_Mysql -H $HOSTADDRESS$ -u nagios -d nagdb
}
vi /usr/local/nagios/etc/objects/linuxserver.cfg
增加mysql的监控
define service{
use generic-service
host_name linux-192.168.1.175
service_description mysql
check_command check_mysql
}
检测配置文件正确性
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
重启nagios
killall nagios
service nagios restart
PS:
增加nagios和nrpe开机自动运行
echo "/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg –d" >> /etc/rc.local
ehco "/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg" >>/etc/rc.local