欢迎喜欢linux技术的朋友共同交流
分类: 系统运维
2013-10-08 14:41:27
Nagios+Centreon部署及使用文档
一、系统环境
1、LAMP(参考http://blog.chinaunix.net/uid-8319462-id-3406527.html)
安装完mysql后需要创建一个db_name为nagios的数据库
2、Nagios+Nrpe(参考http://blog.chinaunix.net/uid-8319462-id-3416628.html)
3、Ndoutils
需要使用创建的nagios数据库
4、Rrdtool
5、 Centreon
二、安装Nagios
1、添加nagios用户
groupadd nagios
useradd -g nagios -d /usr/local/nagios -s /bin/false nagios
-d:指定登陆起始目录
-s:指定登陆后使用的shell
2、开始安装
tar zxvf nagios-3.2.0.tar.gz
cd nagios-3.2.0
./configure --prefix=/usr/local/nagios
make all
make install
make install-init
make install-commandmode
make install-config
3、安装nagios插件
tar zxvf nagios-plugins-1.4.14.tar.gz
cd nagios-plugins-1.4.14
./configure --prefix=/usr/local/nagios/
(如果是AS4,则需要添加参数:--enable-redhat-pthread-workaround)
make
make install
chown -R nagios:nagios /usr/local/nagios
chmod 755 /usr/local/nagios
4、整合apache
验证,将以下内容添加到httpd.conf文件最后:
Alias /nagios/cgi-bin/images/ "/usr/local/nagios/share/images/"
AllowOverride None
Options None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd
Require valid-user
ScriptAlias /nagios/cgi-bin/ "/usr/local/nagios/sbin/"
AllowOverride None
Options None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd
Require valid-user
Alias /nagios/ "/usr/local/nagios/share/"
AllowOverride None
Options None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd
Require valid-user
/usr/local/apache/bin/htpasswd -c /usr/local/nagios/etc/htpasswd nagios(登陆账号)
提示设置两便密码
6、修改权限
检查htpasswd文件权限,改成nagios用户nagios组
chmod 755 /usr/local/nagios
否则通过页面访问会提示:
You don't have permission to access
/usr/local/apache/bin/apachectl -t检查httpd.conf文件语法是否正确,确认ok重启apache
使用域名方式登录,看到对话框输入用户名和密码即可(使用ip登陆则不会出现验证窗口)
cd /usr/local/nagios/etc
vi nagios.cfg
vi resource.cfg
vi cgi.cfg
修改cgi.cfg 改use_authentication=1为use_authentication=0,即不用验证.不然有一些页面不会显示
7、配置nagios
vi commands.cfg
重点内容如下:
# 'notify-host-by-email' command definition
define command{
command_name notify-host-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState:
$HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/local/bin/sendEmail -f vip@east.net -
t $CONTACTEMAIL$ -s smtp.east.net -u "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" -m "Type: $NOTIFICATIONTYPE$\n
Host: $HOSTNAME$\nState:$HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" -xu vip -xp 13579
}
# 'notify-service-by-email' command definition
define command{
command_name notify-service-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\
nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT
$" | /usr/local/bin/sendEmail -f vip@east.net -t lvbin@east.net -s smtp.east.net -u "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALI
AS$/$SERVICEDESC$ is $SERVICESTATE$ **" -m "Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState:$HOSTSTATE$\nAddress: $HOSTADDRESS$\nI
nfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" -xu vip -xp 13579
}
# 'notify-host-by-sms' command definition
define command{
command_name notify-host-by-sms
command_line /usr/local/nagios/fetion2009/fetion --config=/soft/install/login.conf --index=1 --to=13426201234 --msg-utf8="HOST $HOSTADDRESS$ $SERVICESTATE$"
}
# 'notify-service-by-sms' command definition
define command{
command_name notify-service-by-sms
command_line /usr/local/nagios/fetion2009/fetion --config=/soft/install/login.conf --index=1 --to=$CONTACTPAGER$ --msg-utf8="$SERVICEDESC$ $HOSTADDRESS$ $SERVICESTATE$"
}
vi hosts.cfg
vi services.cfg
8、设置自动运行
vi /etc/rc.d/rc.local
加入:
/usr/local/apache/bin/apachectl start
/etc/init.d/nagios start
9、启动
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
/usr/local/etc/rc.d/nagios start
10、添加飞信服务
tar zcvf fetion2009.tar.gz
tar zcvf library_linux.tar.gz
cp library_linux/*.* fetion2009/lib/
测试:/usr/local/nagios/fetion2009/fetion --config=/soft/install/login.conf --index=1 --to=$CONTACTPAGER$ --msg-utf8="test"
实现飞信自动报警,必须将飞信的文件目录权限改成nagios:nagios
11、安装nrpe
tar nrpe-2.12.tar.gz
cd nrpe-2.12
./configure --prefix=/usr/local/nrpe
make all
make install-plugin
make install-daemon
make install-daemon-config
make install-xinetd
安装nrpe,编译的时候提示以下信息
checking for SSL headers... configure: error: Cannot find ssl headers
原因是缺少openssl-devel包,解决办法
yum -y install openssl-devel
服务器端和客户端的nrpe版本必须一致才能正确采集数据,否则会出现一下报错:
CHECK_NRPE: Socket timeout after 10 seconds
Connection refused or timed out
安装完nrpe后,在安装目录/usr/local/nrpe/libexec只有一个文件check_nrpe,而在nagios插件目录,却缺少这个文件,因此需要把这个文件复制到nagios插件目录;同样,因为nrpe需要调用的诸如check_disk等插件在自己的目录没有,可是这些文件确是nagios插件所存在的,所以也需要从nagios目录复制一份过来
cp /usr/local/nrpe/libexec/check_nrpe /usr/local/nagios/libexec
cp /usr/local/nagios/libexec/check_disk /usr/local/nrpe/libexec
cp /usr/local/nagios/libexec/check_load /usr/local/nrpe/libexec
cp /usr/local/nagios/libexec/check_ping /usr/local/nrpe/libexec
cp /usr/local/nagios/libexec/check_procs /usr/local/nrpe/libexec
vi /etc/services
加入nrpe 5666/tcp # NRPE
vi /etc/sysconfig/network
HOSTNAME=192.168.0.7改成ip格式
service xinetd restart
netstat -at | grep nrpe
12、测试nrpe
/usr/local/nagios/libexec/check_nrpe -H localhost
显示NRPE v2.12说明安装成功
vi /etc/sysconfig/iptables
插入-I RH-Firewall-1-INPUT -m tcp -p tcp --dport 5666 -j ACCEPT
service iptables save
vi /usr/local/nagios/etc/nrpe.cfg
13、启动nrpe
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg --daemon
至此Nagios服务安装完毕,现在既可使用Nagios来监控主机及相关业务。
如何监控windows主机
下载NSClient++-Win32-0.3.5.zip
在windows主机安装NSClient++-Win32-0.3.5.zip
解压并将文件夹改名为NSClient,移到C盘根目录
打开DOS:
nsclient++ /install
nsclient++ SysTray #如果出错不用管!
编辑NES.ini:
在 [modules] 选项里
去掉所有的注释符号; 除了
CheckWMI.dll和RemoteConfiguration.dll
修改allowd_host=210.x.x.x(nagios服务器的ip)
如果这一步要修改passwd,那么nagios服务器里面command.cfg也要修改!
[NSClient] 里面,去掉port=12489的注释!
他靠端口12489侦听,所以防火墙要打开这个端口!
然后启动nsclient
nsclient++ /start
配置nagios.cfg
vi /usr/local/nagios/etc/nagios.cfg
#cfg_file=/usr/local/nagios/etc/objects/windows.cfg 去掉这句话的注释
如果监控多台主机,需要在增加相应的配置文件,如:
#cfg_file=/usr/local/nagios/etc/objects/eastnt14.cfg
配置windows.cfg
vi /usr/local/nagios/etc/objects/windows.cfg
define host{
use windows-server
host_name winserver alias
My Windows Server
address 被监控端的IP
}
修改hostname和address,很重要!!
然后下面的很多定义,都可以不用改,想知道每个定义的意思,去看看官方的文档!!
下面的定义全部修改hostname 都改为自己的!一定要一样!
保存并退出!
vi /usr/local/nagios/etc/cgi.cfg
修改
use_authentication=1
为
use_authentication=0
重启nagios
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
/usr/local/etc/rc.d/nagios restart
如果这个时候出错!尝试去telnet win服务器的ip 12489端口!!
监控80端口(http服务)
/usr/local/nagios/libexec/check_http -H test.east.net -p 80 -I 192.168.0.17
返回:HTTP OK: HTTP/1.1 200 OK - 2095 bytes in 0.007 second response time |time=0.007139s;;;0.000000 size=2095B;;;0
主机分组
1、在templates.cfg添加主机组名称
vi templates.cfg
在HOST TEMPLATES最后添加以下信息
define host{
name Daiwei ; The name of this host template
use Daiwei ; Inherit default values from the generic-host template
check_period 24x7 ; By default, switches are monitored round the clock
check_interval 2 ; Switches are checked every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 10 ; Check each switch 10 times (max)
check_command check-host-alive ; Default command to check if routers are "alive"
notification_period 24x7 ; Send notifications at any time
notification_interval 10 ; Resend notifications every 30 minutes
notification_options d,r ; Only send notifications for specific host states
contact_groups admins ; Notifications get sent to the admins by default
register 0 ; DONT REGISTER THIS - ITS JUST A TEMPLATE
}
需要将所有被监控的主机进行分组,则只能在其中一个主机的配置文件里面添加如下信息:
vi localhost.cfg
在最后添加:
define hostgroup{
hostgroup_name linux-server ; The name of the hostgroup
alias Linux Servers ; Long name of the group
members localhost,web3 ; Comma separated list of hosts that belong to this group
}
vi skysymbol_com_cn.cfg
use里面换成Hostgroup名称:
define host{
use Daiwei ; Inherit default values from a template
host_name skysymbol.com.cn ; The name we're giving to this host
alias skysymbol.com.cn ; A longer name associated with the host
address 211.100.28.224 ; IP address of the host
}
define service{
use generic-service
host_name skysymbol.com.cn
service_description Http
check_command check_http!-H
}
最后添加:
define hostgroup{
hostgroup_name Daiwei ; The name of the hostgroup
alias Daiwei ; Long name of the group
members skysymbol.com.cn,citizen.com.cn
}
并将其他组成员的host_name添加到members中且一定要注意host_name必须正确,一个define hostgroup只能添加到一个主机配置文件中,重复添加则会在检测nagios配置文件时报错。
所以如果你想分3个组,分别是group1,group2,group3,则需要在各主机成员中找一个配置文件加入define hostgroup信息,在members中加入各个组成员的host_name,如host1,host2……,那么最后在nagios检测页面的Host Groups栏目中就可以看到分组信息,例如:
group1 group2
host1 host4
host2 host5
host3 host6
报警频率调整:
vi escalations.cfg
#主机报警
define hostescalation{
host_name BACKEND_10.75.1.109,BACKEND_10.75.1.108,BACKEND_10.75.1.91,BACKEND_10.69.3.176,BACKEND_10.73.14.229,BACKEND_10.75.1.61,BACKEND_10.54.40.27,BACKEND_10.73.14.45,BACKEND_10.54.40.32,BACKEND_10.75.1.80,BACKEND_10.81.11.27,BACKEND_10.75.1.88
first_notification 3 #第三条报警以后改变报警频率
last_notification 0 #第n条后报警频率回复,0为不恢复
notification_interval 120 #变更频率后间隔120分钟报警一次
contact_groups test_admin
}
#服务报警
define serviceescalation{
host_name BACKEND_10.75.1.109,BACKEND_10.75.1.108,BACKEND_10.75.1.91,BACKEND_10.69.3.176,BACKEND_10.73.14.229,BACKEND_10.75.1.61,BACKEND_10.54.40.27,BACKEND_10.73.14.45,BACKEND_10.54.40.32,BACKEND_10.75.1.80,BACKEND_10.81.11.27,BACKEND_10.75.1.88
service_description PING
first_notification 3
last_notification 0
notification_interval 120
contact_groups test_admin
}
三、Nagios安装及使用过程中的问题:
1、contacts.cfg里面定义服务的名称一定要和commands.cfg里面定义的一致
2、timeperiods.cfg里定义的hostgroup一定要和hosts.cfg里hostgroup_name定义的一致
3、主机配置文件(例如eastnt.cfg)里定义主机中的user只能用windows-server或者linux-server,否则启动会报错
# Definitions for monitoring the local (Linux) host
cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
4、默认是注销掉的,一定要把注销的符号去掉,否则检查nagios的配置文件要报错
5、这里添加每个需要监控的主机,如:cfg_file=/usr/local/nagios/etc/objects/windows.cfg,然后在objects建立windows.cfg才能被监控,两个地方要一一对应
6、出现以下报错,请查看nagios.cfg是否禁用了timeperiods.cfg文件,该文件记录了服务监控的周期
Error: Check period '24x7' specified for service 'Total Processes' on host 'localhost' is not defined anywhere!
Error: Notification period '24x7' specified for service 'Total Processes' on host 'localhost' is not defined anywhere!
7、出现以下报错,请搜索generic-contact看其他文件内是否有相同的项目,只可保留一项。
Duplicate definition found for contact 'generic-contact' (config file '/usr/local/nagios/etc/objects/templates.cfg', starting on line 28)
8、出现如下报错,请检查command.cfg里面有没有与service.cfg内定义的服务内容
Error: Service check command 'check_nrpe' specified in service 'Root Partition' for host 'ChongQing-SERVER-160' not defined anywhere!
如:service.cfg内容为:
define service{
hostgroup_name test-hosts
service_description Root Partition
check_period 24x7
max_check_attempts 4
normal_check_interval 3
retry_check_interval 2
contact_groups admins
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
check_command check_nrpe!check_disk1
}
则:command.cfg内必须定义check_nrpe服务,如:
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 7877 -c $ARG1$ -to 20
}
9、出现以下报错,请重新安装nrpe,编译的时候增加--enable-command-args参数:
$ /usr/local/nagios/libexec/check_nrpe -H 192.168.1.20 -c check_disk -a 60 80 /dev/sdb1
CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages.
./configure --prefix=/usr/local/nagios--enable-command-args
如执行/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_disk遇到报错NRPE: Unable to read output
请执行
$ chmod 755 /usr/local/nagios
10、出现CHECK_NRPE: Error - Could not complete SSL handshake.报错,需要检查nrpe.cfg的allowed_hosts,将nagios服务器公网ip加到白名单中即可,并重启nrpe服务。
四、安装Ndoutils
1、安装
tar zxf ndoutils-1.5.2.tar.gz
cd ndoutils-1.5.2
/configure --prefix=/usr/local/nagios --with-mysql-lib=/usr/lib/mysql --with-mysql-inc=/usr/include/mysql
make
cp src/ndo2db-3x src/file2sock src/log2ndo src/ndomod-3x.o /usr/local/nagios/bin/
cp config/ndo2db.cfg-simple /usr/local/nagios/etc/ndo2db.cfg
cp config/ndomod.cfg-simple /usr/local/nagios/etc/ ndomod.cfg
chown nagios.nagios -R /usr/local/nagios/bin /usr/local/nagios/etc/ndo
2、配置并创建数据库
vi /usr/local/nagios/etc/ndo2db.cfg
修改下面内容
db_host=localhost
db_name=nagios
db_prefix=nagios_
db_user=root
db_pass=123456
cd db/
mysql -u root -p123456 nagios < mysql.sql
vi /usr/local/nagios/etc/nagios.cfg
加入下面内容
event_broker_options=-1
broker_module=/usr/local/nagios/bin/ndomod-3x.o config_file=/usr/local/nagios/etc/ndomod.cfg
3、启动
/usr/local/nagios/bin/ndo2db-3x -c /usr/local/nagios/etc/ndo2db.cfg
4、重启nagios
/usr/local/nagios/bin/nagios –s reload
五、安装Rrdtool
tar zxvf rrdtool-1.4.7.tar.gz
cd rrdtool-1.4.7
./configure --prefix=/usr/local/rrdtool
make && make install
六、安装Centreon
1、安装
tar zxf centreon-2.2.2.tar.gz
cd centreon-2.2.2;./install.sh –i
按要求配置即可,这里面不做介绍了,网上资料很多,再安装的时候我再把截图补上
2、如果需要重新安装,按下面操作删除一些文件夹后再安装,以免有问题
rm -rf /usr/local/centreon /etc/centreon /var/lib/centreon /etc/httpd/conf.d/centreon.conf
3、登陆
浏览器输入,这个地址看apache里面怎么配置,登陆页面后继续配置centreon
配置完成后即可显示登陆页面,至此centreon部署完毕,后面的内容我还需要继续实践,包括配置使用、批量添加主机和服务、邮件和短信报警、分布式部署等内容会陆续补充……