===============================================================
NRPE总共由两部分组成:
check_nrpe 插件,位于在监控主机上
NRPE daemon,运行在远程的linux主机上(通常就是被监控机)
按照上图,整个的监控过程如下:
当nagios需要监控某个远程linux主机的服务或者资源情况时
1.nagios会运行check_nrpe这个插件,告诉它要检查什么.
2.check_nrpe插件会连接到远程的NRPE daemon,所用的方式是SSL
3.NRPE daemon会运行相应的nagios插件来执行检查
4.NRPE daemon将检查的结果返回给check_nrpe插件,插件将其递交给nagios做处理.
注意:NRPE daemon需要nagios插件安装在远程的linux主机上,否则,daemon不能做任何的监控.
===========================================================================
被监控机需要安装
nagios插件 nagios-plugins-1.4.9.tar.gz
nrpe-2.8.1.tar.gz
监控机上
nagios-2.0rc2.tar.gz
nrpe-2.8.1.tar.gz(check_nrpe)插件 make install-plugin
被监控机不需要(check_nrpe)插件
openssl-0.9.7i.tar.gz
==========================================================================
Naigos的安装
1、所需软件
nagios-2.0rc2.tar.gz
nagios-plugins-1.4.tar.gz
imagepak-base.tar.gz
nrpe-2.5.1.tar.gz
=========================================================
2、安装
2.1、nagios安装
tar –xvzf nagios-2.0rc2.tar.gz
useradd nagios –d /usr/local/nagios
chmod 755 /usr/local/nagios
cd nagios-2.0rc2
./configure --prefix=/usr/local/nagios --with-gd-lib=/usr/local/lib --with-gd-inc=/usr/local/include
make all
make install
make install-init ##This installs the init script in /usr/local/etc/rc.d
make install-commandmode
make install-config ##将初始的配置文件安装到/usr/local/nagios/etc
注:在makeinstall-init这一步的时候可能会进行不下去,这时只要加个root组即可。
addgroup root
===================================================================
安装监测的模块
2.2、nagios-plugins的安装
tar –xvzf nagios-plugins-1.4.tar.gz
mkdir /usr/local/nagios-plugins
cd nagios-plugins-1.4
./configure --prefix=/usr/local/nagios-plugins
make all
make install
安装完成以后在/usr/local/nagios-plugins会产生一个libexec的目录,将该目录全部移动到/usr/local/nagios目录下
mv /usr/local/nagios-plugins/libexec /usr/local/nagios
===================================================================
2.3、imagepak-base的安装
tar –xvzf imagepak-base.tar.gz
解压以后是base目录
cp –R base /usr/local/nagios/share/images/logos
2.4、安装过程全部结束
===================================================================
Nagios的设定:
1、配置apache
在apache的配置文件httpd.conf中追加
ScriptAlias /nagios/cgi-bin/ /usr/local/nagios/sbin/
AllowOverride AuthConfig
Options ExecCGI
Order allow,deny
Allow from all
Alias /nagios/ /usr/local/nagios/share/
Options None
AllowOverride AuthConfig
Order allow,deny
Allow from all
===============================================================
2、设置访问权限
2.1、在/usr/local/nagios/share目录下
vi .htaccess
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/.htpasswd
require valid-user
2.2、在/usr/local/nagios/sbin目录下
vi .htpasswd
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/.htpasswd
require valid-user
2.3、/usr/local/apache/bin/htpasswd –c /usr/local/nagios/etc/.htpasswd nagios
这个apache目录根据安装目录的不同而不同,主要所以用.htpasswd这个命令生成用户名和密码
==================================================================
3、配置nagios
3.1、在/usr/local/nagios/etc下是nagios的配置模板文件.cfg-sample,把.cfg-sample文件全部拷贝成.cfg
例如:cp nagios.cfg-sample nagios.cfg
全部拷贝完成即可.
3.2、vi dependencies.cfg (在2.0没有生成这个文件,自己创建)
然后保存即可.(在1.2用一个空的文件代替原来的dependencies.cfg文件,否则会出错)
3.3、修改minimal.cfg,把里面所有定义command的全部注释掉
vi /etc/minimal.cfg
修改cgi.cfg
修改use_authentication=1为use_authentication=0,即不用验证.不然有一些页面不会显示。
3.4、然后检查配置文件是否出错
/usr/local/nagios/bin/nagios –v /usr/local/nagios/etc/nagios.cfg
出现Total Warnings: 0
Total Errors: 0
为正常
出错的话,就是.cfg文件有问题
======================================================================
3.5、启动后台进程
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
或者/etc/init.d/nagios start
3.6、
(如果有些页面看不到的话.可以在cgi.cfg文件中
把带有authorized的选项前的#号去掉即可)
这些配置完以后,基本的nagios配置完成。
================================================================
NRPE的安装.及监按指令的布署
nrpe 安装与使用
1、远程主机的配置(FreeBSD)
1.1、安装openssl
#./config --prefix=/usr/local/nagios/openssl
#make
#make install
1.2、安装nrpe与配置
tar -zxvf nrpe-2.5.1.tar.gz
#./configure --enable-ssl --with-ssl-lib=/usr/local/nagios/openssl/lib--with-kerberos-inc=/usr/local/nagios/openssl/include --enable-command-args
#make all
#mkdir /usr/local/nagios/etc
#mkdir /usr/local/nagios/bin
#mkdir /usr/local/nagios/libexec
#chown -R nagios:nagios /usr/local/nagios
#cp src/check_nrpe /usr/local/nagios/libexec
#cp nrpe.cfg /usr/local/nagios/etc
#cp src/nrpe /usr/local/nagios/bin
#vi /usr/local/nagios/etc/nrpe.cfg
改成你允许的IP
allowed_hosts=127.0.0.1,10.0.153.57(57是nagios服务器)
改成你准备监视的服务
command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
检查磁盘,当 / 剩余$ARG1$%报警(浅黄色),剩余$ARG2$%“出错”(红色)-p 后为指定分区
1.3、启动nrpe,端口为5666
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/ -d
==================================================================
nagios 简单配置
在/usr/local/nagios/etc 建立自己所需要的两配置文件,
其中的一个主配置文件已经存在(nagios.cfg)
commands.cfg ..........命令配置文件
financeweb.cfg.........服务配置文件
在主配置文件nagios.cfg中将如下配置文件注销
checkcommands.cfg
misccommands.cfg
minimal.cfg
==================================
监控机上(172.18.3.205)配置两个文件
命令文件commands.cfg 服务文件financeweb.cfg
commands.cfg 内容如下,可以根据自己的需要往里面追加内容
[root@localhost etc]# cat commands.cfg
# 'notify-by-email' command definition
define command{
command_name notify-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /bin/mail -s "** $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
}
# 'host-notify-by-email' command definition
define command{
command_name host-notify-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s "Host $HOSTSTATE$ alert for $HOSTNAME$!" $CONTACTEMAIL$
}
############################################################################################
define command{
command_name check_nrpe
command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS$ -n -c $ARG1$ -a $ARG2$
}
#define command{
#command_name check_nrpe
#command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
# }
define command{
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1
}
define command{
command_name check_nrpe_load
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -n -c $ARG1$ -a $ARG2$ $ARG3$
}
define command{
command_name check_nrpe_swap
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -n -c $ARG1$ -a $ARG2$ $ARG3$
}
=======================================================================
financeweb.cfg 内容如下
[root@localhost etc]# cat financeweb.cfg
#定义监测时间
define timeperiod{
timeperiod_name 24x7
alias Financeweb,7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}
##########################################################################
#定义联系小组及小组内成员
define contactgroup{
contactgroup_name LDF-SYS
alias LDF-SYS Administrators
members Financeweb ;,PTS,kabu
}
#定义联系成员
define contact{
contact_name Financeweb
alias Finance
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email
host_notification_commands host-notify-by-email
email root
}
#定义主机组及组内的主机
define hostgroup{
hostgroup_name Financewebgroup
alias Financeweb Servers
members 121
}
#定义主机类型
define host{
name financeweb-host
notifications_enabled 1
active_checks_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
register 0
}
##################################################################################
#定义服务组及组内服务名.
#define servicegroup{
# servicegroup_name services
# alias Mysql Http services
# members 120,LD_PING
# }
################################################################################
#定义服务类型
define service{
name financeweb-service
passive_checks_enabled 0
active_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 1
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
register 0
}
#以120为例
#############################################
#10.0.153.120 DB2
#############################################
define host{
use financeweb-host
host_name 121
alias DB2
address 172.18.3.207
check_command check-host-alive
max_check_attempts 10
notification_interval 1
notification_period 24x7
notification_options d,u,r
contact_groups LDF-SYS
}
define service{
use financeweb-service
host_name 121
service_description check-load
check_command check_nrpe_load!check_load!5,8,10!20,25,30
max_check_attempts 5
normal_check_interval 1
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
notification_interval 1;960
contact_groups LDF-SYS
}
define service{
host_name 121
service_description check-swap
check_command check_nrpe_swap!check_swap!90!80
max_check_attempts 5
normal_check_interval 1
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups LDF-SYS
}
define service{
host_name 121
service_description check-gw-proce
check_command check_nrpe!PR_check_proce!httpd
max_check_attempts 5
normal_check_interval 1
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups LDF-SYS
}
#define service{
# use financeweb-service
# host_name 121
# service_description check-load_1
# check_command check_nrpe!check_load
# max_check_attempts 5
# normal_check_interval 1
# retry_check_interval 2
# check_period 24x7
# notification_interval 10
# notification_period 24x7
# notification_options w,u,c,r
# notification_interval 1;960
# contact_groups LDF-SYS
# }
================================================================
说明:在commands.cfg 定义了check_nrpe check-host-alive check_nrpe_load
check_nrpe_swap 等名字,只是个名字可以随便定义,在里面用到了
这三个插件check_nrpe check_nrpe,
其中PR_check_proce是自己定义的一个插件
[root@localhost libexec]# cat PR_check_proce
#!/bin/sh
name=`basename $0`
process=`ps aux |grep -w $1 | grep -vE "grep|$name|supervise"|wc -l |tr -d ' '`
if [ $process -ge 1 ]
then
echo "$1 proces running...."
exit 0
else
echo "$1 proces no exist! "
exit 2
fi
========================================================
在客户机上(172.18.3.207)只需要定义一个文件nrpe.cfg
[root@localhost etc]# cat nrpe.cfg|grep -Ev "#|^$"
pid_file=/var/run/nrpe.pid
server_port=5666
nrpe_user=nagios
nrpe_group=nagios
dont_blame_nrpe=1
debug=0
command_timeout=60
command[check_swap]=/usr/local/nagios/libexec/check_swap -w $ARG1$ -c $ARG2$
command[PR_check_proce]=/usr/local/nagios/libexec/PR_check_proce $ARG1$
command[check_load]=/usr/local/nagios/libexec/check_load -w $ARG1$ -c $ARG2$
=====================================================================
说明: 因为监控机上是这样定义的. /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS$ -n -c $ARG1$ -a $ARG2$
所以客户机上要这样启动nrpe
/usr/local/nagios/bin/nrpe -n -c /usr/local/nagios/etc/nrpe.cfg -d
所以要把配置文件nrpe.cfg的dont_blame_nrpe=0该为1才有效.
=========================================================
检测配置文件是否正确
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
启动nagios
/etc/init.d/nagios star
启动nrpe
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
如果在web页面能停止nagios服务
需要注意
nagios.cfg
check_external_commands=1
cgi.cfg
default_user_name=nagios
use_authentication=1
authorized_for_system_information=nagiosadmin,theboss,jdoe,nagios
authorized_for_configuration_information=nagiosadmin,jdoe,nagios
authorized_for_system_commands=nagiosadmin,nagios
authorized_for_all_services=nagiosadmin,nagios
authorized_for_all_hosts=nagiosadmin,nagios
authorized_for_all_service_commands=nagiosadmin,nagios
authorized_for_all_host_commands=nagiosadmin,nagios
httpd.conf
User nagios
Group nagios
如上配置
===============================