分类: 系统运维
2009-08-15 00:53:35
Nagios 监控Linux服务器 加 飞信机器人报警
搭建平台:
监控服务器:RHEL5(192.168.0.9)+nagios-3.0.5+ nagios-plugins-1.4.11+ nrpe_2.8.1
被监控端RHEL5 (192.168.0.10)+ nagios-plugins-1.4.11+ nrpe_2.8.1
做为运维工程师,要时时掌握公司服务器的运行状况,最担心那些重要的在线系统在我不知情的情况下停机或者停止网络服务,而且那些发生故障的服务或主机有时候可能要好长一段时间才知道。尤其是遇到节假日,运维工程师就很是紧张,以为服务器出器出现的事情太多了,因此就想办法来监控服务器,
所以监控软件发展迅速起来,之前有mon等,到目前最流行是开源的nagios,但是配置还是有些麻烦,
下面我就把我的实施过程秀一下吧!呵呵!!
一:配置监控端
安装nagios;
1>添加nagios用户 否则编译的时候出现问题
Useradd –m nagios
Passwd nagios 创建密码
Groupadd nagcmd 创建组
Usermod –a –G nagcmd nagios 添加到组
Usermod –a –G nagcmd daemon
2> tar -xzvf nagios-3.0.3.tar.gz
./ /configure --prefix=/usr/local/nagios --with-command-group=nagcmd --with-gd-lib=/usr/lib --with-gd-inc=/usr/include (./configure --with-command-goup-nagcmd --with-gd-lib=/usr/local/lib --with-gd-inc=/usr/include/
make all
make install
make install-init
make install-config
make install-commandmode
3>安装nagio-plugin插件
cd nagios-plugins-1.4.11
./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-redhat-pthread-workaround
make
make install (./configure --with-openssl --with-nagios-user=nagios --with-nagios-group=nagios )
4>安装nrpe插件
tar -zxvf nagios-nrpe_2.8.1.orig.tar.gz
cd nagios-nrpe_2.8.1
./configure (会自动加载SSL 但是最好还是手动安装)
( ./configure --with-kerberos-inc=/usr/include/ --with-nrpe-user=nagios --with-nrpe-group=nagios --enable-ssl--with-ssl=/usr/lib #如果后面make报错,加如下参数
./configure --enable-ssl --with-ssl-lib=/usr/lib/(当然前提要有openssl)
make all
make install-plugin
make install-daemon
make install-daemon-config
编辑nrpe.cfg文件
Allowed_host 将允许的ip地址加入 否则会出现问题
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg –d
#或
vi /etc/rc.d/rc.local
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg –d
验证nrpe:
netstat -an | grep 5666
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
NRPE v2.8.1
#服务端测试
/usr/local/nagios/libexec/check_nrpe -H l92.168.0.21
NRPE v2.8.1
#常见错误
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
CHECK_NRPE: Error - Could not complete SSL handshake.
配置allowed_hosts=192.168.0.20,127.0.0.1,192.168.0.99,然后kill进程再重启就OK了
2./usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
Connection refused by host
Nrpe进程没有启动
5>安装Apache 添加web页面 如果自己编译就不需要了
Vi httpd.conf
参考:(在你编译nagios时 加上make install-webconf 就会自动生成一个nagios.conf文件内容就是一个例子)
ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin
//Cgi文件所在目录
AuthType Basic
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthUserFile /usr/local/nagios/etc/htpasswd。users //验证文件路径
Require valid-user
Alias /nagios /usr/local/nagios/share
//nagios页面文件目录
AuthType Basic
Options None
AllowOverride None
Order allow,deny
Allow from all
AuthName "nagios Access"
AuthUserFile /usr/local/nagios/etc/htpasswd。users //验证文件路径
Require valid-user
注意:很多的时候我们为了安全,将apache的用户改为web用户,
如www:而且还要定义一个web目录,这时我们要把nagios安装到那个web目录
这里就需要注意权限的修改,
】# ll
nagios www 4096 Jul 2 09:48 nagios
这个要和apache运行的用户一致,否则出现 不能访问/nagios/的报错。
而nagios里边的东西用户和组都是nagios 这个不变。
虚拟主机配置的时候要注意路径。
创建登陆用户及密码:
htpasswd –c /usr/local/nagios/etc/htpasswd.users nagiosadmin
/usr/local/apache/bin/apachctl –t 检查配置文件是否正确(生产环境时不能重启可用此命令)
6》定义commands.cfg 定义外部构件nrpe
vi /usr/local/nagios/etc/objects/commands.cfg
#添加
#check nrpe
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
7>配置要监控的Linux主机
vi /usr/local/nagios/etc/nagios.cfg
#中间添加
cfg_file=/usr/local/nagios/etc/objects/mylinux.cfg
6》建立mylinux.cfg要监控的内容
vi /usr/local/nagios/etc/objects/mylinux.cfg
define host{
use linux-server
host_name mylinux
alias mylinux
address 192.168.0.21(客户端IP既被监控的IP)
}
define service{
use generic-service
host_name mylinux
service_description check-swap
check_command check_nrpe!check_swap
}
define service{
use generic-service
host_name mylinux
service_description check-load
check_command check_nrpe!check_load
}
define service{
use generic-service
host_name mylinux
service_description check-disk
check_command check_nrpe!check_had1
define service{
use generic-service
host_name mylinux
service_description check-users
check_command check_nrpe!check_users
}
define service{
use generic-service
host_name mylinux