In practice, centcore takes care of the data transfers between the
different servers. The central server has to be equipped with a complete
monitoring installation (Nagios, Centreon, , , etc.), in contrast with the satellite monitors that only have Nagios and installed.
一、安装nagios
1、创建用户和组
-
useradd -m nagios
-
groupadd nagcmd
-
usermod -G nagios,nagcmd www-data
2、安装
wget
tar -xvzf nagios-3.5.0.tar.gz
./configure --prefix=/usr/local/nagios --with-command-group=nagcmd --enable-nanosleep --enable-broker
make all
make install
make install-init
make install-commandmode
make install-config
make install-exfoliation #后面centreon会用到其中的image
cp ./p1.pl /usr/local/nagios
二、安装nagios插件
./configure --with-nagios-user=nagios --with-nagios-group=nagios --with-openssl=/usr/bin/openssl --enable-perl-modules
make
make install
关于错误:
Validate.xs:208:5: error: duplicate case value
Validate.xs:205:5: error: previously used here
问题出在Params::Validate 0.88上,经测试,0.90没有问题,所以,替换nagios plugin里的Params::Validate为0.90
下载
拷贝到目录nagios-plugins-1.4.16/perlmods,删除之前的0.88版本
三、安装
allows you to export current and historical data from one or more Nagios instances to a database. Several community addons use this as one of their data sources. consists of a standalone daemon, a Nagios event broker, and several helper utilities
Centreon
gets all its status and performace data directly from the database. To
get the data into the DB, ndoutilsareused as an additional layer between
nagios and centreon. consist of 2 parts: ndomod and ndo2db - the first one is thesender, the second thereceiver. As you can read in and see in the image, each nagios instance sends data through the
ndomod module to the ndo2db daemon, which writes the data into the
database. To enable such a setup some manual configuration steps are
needed.
软件主要是用于将数据存数据库,然后又可读取出来,它在nagios与centerton之间接收和发送数据。
./configure --prefix=/usr/local/nagios --enable-mysql --disable-pgsql --with-ndo2db-user=nagios --with-ndo2db-groups=nagios
make
make install
cp ./src/ndomod-3x.o /usr/local/nagios/bin/ndomod.o
cp ./src/ndo2db-3x /usr/local/nagios/bin/ndo2db
cp ./config/ndo2db.cfg-sample /usr/local/nagios/etc/ndo2db.cfg
cp ./config/ndomod.cfg-sample /usr/local/nagios/etc/ndomod.cfg
chmod 774 /usr/local/nagios/bin/ndo*
chown nagios:nagios /usr/local/nagios/bin/ndo*
创建或修改启动脚本,软件包里的脚本停止服务有些问题,修改下:
# cp ./daemon-init /etc/init.d/ndo2db
-
# vi /etc/init.d/ndo2db //修改killproc_ndo2db()和stop)里的内容如下:
-
-
killproc_ndo2db ()
-
{
-
kill `pidof ndo2db |cut -f1 -d " "` >/dev/null 2>&1 //重复两行
-
-
kill`pidof ndo2db |cut -f1 -d " "` >/dev/null 2>&1
-
-
}
-
stop)
-
echo "Stopping $servicename..."
-
-
killproc_ndo2db
-
;;
-
-
-
chmod u+x /etc/init.d/ndo2db
chmod u+x /etc/init.d/ndo2db
四、安装centreon
1、安装centreon
安装依赖包
apt-get install sudo tofrodos bsd-mailx lsb-release mysql-server libmysqlclient-dev \
apache2 apache2-mpm-prefork php5 php5-mysql php-pear php5-ldap php5-snmp php5-gd \
rrdtool librrds-perl libconfig-inifiles-perl libcrypt-des-perl libdigest-hmac-perl \
libgd-gd2-perl snmp snmpd libnet-snmp-perl libsnmp-perl
libmysqlclient15-dev 换成libmysqlclient-dev
12.04找不到 libdigest-sha1-perl,原因;
This functionality is already provided by Digest::SHA which is included with perl.
apt-get install snmp-mibs-downloader
cd centreon-2.4.1/
./install.sh -i
注:所有下面的配置,都可以通过文件来修改,目录为/etc/centreon
注意:nagios的相关配置,默认为/var/log/nagios的,应创建相应目录和给予权限,否则后面无法启动nagios。如果输入错误,要通过web页面的main.cfg来修改,然后export。注意记得勾选下面两项。
其中,关于RRDs.pm,可以通过下面的方法找到
updatedb
locate RRDs.pm
找不到?前面的依赖包没有装?
Where is your Centreon etc directory
default to [/etc/centreon] 默认就好
Where is your Centreon variable library directory?
default to [/var/lib/centreon] 默认就好
What is the Centreon group ? [centreon]
default to [centreon] 默认就好
What is the Monitoring engine user ?
> nagios
If you are using NDOUtils:
What is the Broker user ? (optional)
> nagios
What is the Monitoring engine log directory ?
>/usr/local/nagios/var/
注意:Where is your monitoring plugins (libexec) directory ?
/usr/local/nagios/libexec/
What is the Monitoring engine init.d script ?
> /etc/init.d/nagios
What is the Monitoring engine binary ?
> /usr/local/nagios/bin/nagios
What is the Monitoring engine configuration directory ?
> /usr/local/nagios/etc
Where is the configuration directory for broker module ?
> /usr/local/nagios/etc
Where is the init script for broker module daemon ?
> /etc/init.d/ndo2db
Do you want me to configure your sudo ? (WARNING)
选yes
Where is your CentPlugins lib directory
default to [/var/lib/centreon/centplugins] 改为/usr/local/nagios/libexec
2、接下来访问web页面来继续安装
Monitoring engine nagios
Nagios directory /usr/local/nagios
Nagiostats binary /usr/local/nagios/bin/nagiostats
Nagios image directory /usr/local/nagios/share/images #前面安装nagios时make install-exfoliation会安装这个目录
Embedded Perl initialisation file /usr/local/nagios/p1.pl
Broker Module ndoutils
Ndomod binary (ndomod.o) * /usr/local/nagios/bin/ndomod.o
2、Widget
The Centreon Widgets allow you to customise the Centreon web interface and build your own dashboard.
文档:
http://documentation.centreon.com/docs/centreon/en/latest/extending/widgets/install.html
下载
下载完成,并解压后放到/usr/local//centreon/www/widgets目录,注意,路径名应该为:
hostgroup-monitoring
host-monitoring
require.php
servicegroup-monitoring
service-monitoring
否则自定义view时不能显示图标
登陆web页面,访问[Administration] > [Modules] > [Widget] > [Setup],安装即可
3、启动
/etc/init.d/centcore start
/etc/init.d/centstorage start
4、关于centreon web修改文件的机制
(1)、通过在web页面中修改配置
(2)、修改完成后,访问页面Configuration -> Monitoring Engines->Generate
只
勾选Generate Configuration Files和Run monitoring engine debug
(-v),执行Export,此时的作用是检查所做的修改是否有问题,实际上就是执行的nagios的-v校验nagios的相关配置文件是否正确。
(3)、如果校验没有问题,将Move Export Files和Restart Monitoring Engine也勾选上,执行Export,此时会将配置文件写入到nagios的实际的配置目录中,即/usr/local/nagios/etc
5、NDO相关
注:下面的配置不用修改,直接在centreon web中修改,即使手动修改了,执行export 后,centreon web仍然会修改回来。
web中基本不用修改,默认即可,可以对照下面的配置文件,几个重要的选项是否正确。
/usr/local/nagios/etc/ndomod.cfg
修改下面的两处配置
output_type=tcpsocket
#output_type=unixsocket
#output=/usr/local/nagios/var/ndo.dat
output=127.0.0.1
#output=/usr/local/nagios/var/ndo.sock
/usr/local/nagios/etc/ndo2db.cfg
修改下面的配置
#socket_type=unix
socket_type=tcp
#socket_name=/usr/local/nagios/var/ndo.sock
db_servertype=mysql
db_host=localhost
db_port=3306
db_name=centstatus
#db_prefix=nagios_
db_user=centreon
db_pass=root
5、导出
Configuration->Monitoring Engines ->Generate
这步的作用就是将web上添加的host,service等写到nagios的配置文件中,路径为/usr/local/nagios/etc
分两步,
第一步
勾选
Generate Configuration Files
Run monitoring engine debug (-v)
执行export,将会把配置文件写入到一个临时路径,并检查正确性。
如果这里没有问题,接下来执行第二步。
第二步
将 Move Export Files
Restart Monitoring Engine
也勾选上,再次执行export,将会把配置文件写入到/usr/local/nagios/etc中
cgi.cfg contactTemplates.cfg hostTemplates.cfg meta_escalations.cfg misccommands.cfg resource.cfg
checkcommands.cfg dependencies.cfg meta_commands.cfg meta_host.cfg nagios.cfg servicegroups.cfg
connectors.cfg escalations.cfg meta_contact.cfg meta_hostgroup.cfg ndo2db.cfg services.cfg
contactgroups.cfg hostgroups.cfg meta_contactgroup.cfg meta_services.cfg ndomod.cfg serviceTemplates.cfg
contacts.cfg hosts.cfg meta_dependencies.cfg meta_timeperiod.cfg objects timeperiods.cfg
root@firefoxchina:/data/nagios/etc#
注:对 main.cfg(即nagios.cfg)的修改也会在这时生效。
错误:
sudo: no tty present and no askpass program specified
配置了错误的Monitoring engine binary
修改vi /etc/sudoers,修改下面几行
# Monitoring engine test config
CENTREON ALL = NOPASSWD: /usr/local/nagios/bin/nagios* -v *
CENTREON ALL = NOPASSWD: /usr/local/nagios/bin/nagios -v *
# Monitoring engine test for optim config
CENTREON ALL = NOPASSWD: /usr/local/nagios/bin/nagios* -s *
CENTREON ALL = NOPASSWD: /usr/local/nagios/bin/nagios -s *
修改/etc/centreon/instCentWeb.conf
MONITORINGENGINE_BINARY=/usr/local/nagios/bin/nagios
控制台中
Configuration -> Centreon -> Pollers
->centreon ->Monitoring Engine Information ->Monitoring Engine
Binary 修改为/usr/local/nagios/bin/nagios
6、修改sshport
centreon server :Configuration -> Centreon -> Pollers ->centreon 中修改
satelite :
7、检查服务是否启动
注意:nagios 一定要在ndo2db之后启动,启动后,检查nagios.log中应该有ndomod: Successfully connected to data sink。
# /etc/init.d/mysql status
# /etc/init.d/apache2 status
# /etc/init.d/ndo2db status
# /etc/init.d/centstorage status
# /etc/init.d/nagios status
In a distributed setup centcore is needed additionally on each poller. Check its status and start if needed by executing:
# /etc/init.d/centcore status
# /etc/init.d/centcore start
nagios无法启动?
测试
/usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
发现错误:Error: Could not create external command file '/var/log/nagios/rw/nagios.cmd' as named pipe: (2) -> No such file or directory.
需要在Configuration- Monitoring Engines - main.cfg 里面对应的项修改,修改完成后,执行Generate Configuration Files里的export。记得勾选下面两项。
五、安装NRPE
1、NRPE用于监控远程服务器的本地服务,如CPU,内存等,当然这些也可以通过SNMP来获
取,可以选择装或者不装NRPE,而是用SNMP,SNMP的安装见后面。如果部署分布式Centreon的话,还是建议是用NRPE,不同区域的节点可
以由本地的nagios去监控,如果使用SNMP的话,所有的节点都要由centreon core本地的snmp去监控。
除了监控端需要NRPE,被监控端也需要安装NRPE,还需要nagios-plugins
确保安装了openssl
apt-get install openssl libssl-dev
wget
./configure --prefix=/usr/local/nagios --with-ssl=/usr/bin/openssl --with-ssl-lib=/usr/lib/x86_64-linux-gnu/ --with-nrpe-port=7888
make all
make install-plugin
make install-daemon
make install-daemon-config
vi /usr/local/nagios/etc/nrpe.cfg
server_address=192.168.7.223 //本机监控的地址
allowed_hosts=192.168.7.191,127.0.0.1 //Nagios监控平台的地址或域名
启动 NRPE 守护进程:
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
2、监控端
nagios plugins
./configure --prefix=/usr/local/nagios
make
make install
3、被监控端
groupadd nagios
useradd -g nagios -d /usr/local/nagios -s /sbin/nologin nagios
nagios plugins
./configure --prefix=/usr/local/nagios
make
make install
添加nrpe相关命令到centreon
因为nagios的command.cfg由centreon管理,所以不能直接修改该文件,需要在web上添加,然后export。
路径为: Configuration-》Commands -》 Checks
另外要在Configuration-》Services -》Templates添加对应的service templates
下面附一个小脚本,创建下面这几个command,免去手动去粘贴,但是还要自己选择Graph template,有时间再完善。
-
#!/usr/bin/perl
-
use strict;
-
use warnings;
-
use DBI;
-
-
my $dbh = connect_centreon() or die $!;
-
-
# add command
-
#check command type is 2,notification command type is 1
-
my %add_command=(
-
check_nrpe_cpu => ['2','$USER1$/check_nrpe -H $HOSTADDRESS$ -c check_cpu'],
-
check_nrpe_mem => ['2','$USER1$/check_nrpe -H $HOSTADDRESS$ -c check_mem'],
-
check_nrpe_procs => ['2','$USER1$/check_nrpe -H $HOSTADDRESS$ -c check_total_procs'],
-
check_nrpe_swap => ['2','$USER1$/check_nrpe -H $HOSTADDRESS$ -c check_swap'],
-
check_nrpe_traffic => ['2','$USER1$/check_nrpe -H $HOSTADDRESS$ -c check_traffic'],
-
check_nrpe_disk => ['2','$USER1$/check_nrpe -H $HOSTADDRESS$ -c check_disk'],
-
);
-
-
my $sth_c = $dbh->prepare("select command_name from command where command_name=? and command_type=?");
-
foreach my$command_name(keys%add_command){
-
my $command_line=$add_command{$command_name}[1];
-
my $command_type=$add_command{$command_name}[0];
-
-
#check command_name exist
-
$sth_c->execute($command_name,$command_type);
-
-
if($sth_c->fetchrow_array()){
-
print "$command_name has exist,do nothing\n";
-
}else{
-
my $insertsql = "insert into command (command_name,command_line,command_type) values ('$command_name','$command_line
-
','$command_type')";
-
print "add command $command_name;\n";
-
$dbh->do($insertsql);
-
}
-
}
-
-
#add service template
-
#get generic-service service_id
-
#my $sth_s = $dbh->prepare("select service_id from service where service_alias='generic-service'");
-
-
sub connect_centreon{
-
my $host = "127.0.0.1";
-
my $port = "3306";
-
my $db = "centreon";
-
my $user = "centreon";
-
my $pass = 'centreon';
-
my $dbh = DBI->connect("DBI:mysql:database=$db:host=$host:port=$port",
-
$user, $pass, {"RaiseError" => 1,
-
"AutoCommit" => 1}) or die $!;
-
return $dbh;
-
}
Command Name: check_nrpe_cpu
Command Line: $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_cpu
Graph template:CPU
Command Name:check_nrpe_disk
Command Line:$USER1$/check_nrpe -H $HOSTADDRESS$ -c check_disk
Graph template:storage
Command Name:check_nrpe_mem
Command Line:$USER1$/check_nrpe -H $HOSTADDRESS$ -c check_mem
Graph template:memory
Command Name:check_nrpe_procs
Command Line:$USER1$/check_nrpe -H $HOSTADDRESS$ -c check_total_procs
Graph template:default-graph
Command Name:check_nrpe_swap
Command Line:$USER1$/check_nrpe -H $HOSTADDRESS$ -c check_swap
Graph template:
Command Name:check_nrpe_traffic
Command Line:$USER1$/check_nrpe -H $HOSTADDRESS$ -c check_traffic
Graph template:traffic
六、SNMP
apt-get install snmp snmpd
Ubuntu 12.04 didn’t install the MIB files. Even if you install the SNMP applications.
Just check the preamble of /etc/snmp/snmp.conf
#
# As the snmp packages come without MIB files due to license reasons, loading
# of MIBs is disabled by default. If you added the MIBs you can reenable
# loaging them by commenting out the following line.
mibs :
Sure, but how can I install them ?
Fortunately there is a package to deal with that.
$ sudo apt-get install snmp-mibs-downloader
It will download the IETF MIB files and install them under the usual /usr/share/mibs/
If for any reason you don’t see it happen force it with
$ sudo download-mibs
You can repeat this command later to update any new MIB file.
安装完成后,测试
snmpwalk -v 2c -c public localhost
Unlinked OID in IPATM-IPMC-MIB: marsMIB ::= { mib-2 57 }
Undefined identifier: mib-2 near line 18 of /usr/share/mibs/ietf/IPATM-IPMC-MIB
Bad operator (INTEGER): At line 73 in /usr/share/mibs/ietf/SNMPv2-PDU
Undefined OBJECT-GROUP (diffServMIBMultiFieldClfrGroup): At line 2195 in /usr/share/mibs/ietf/IPSEC-SPD-MIB
Undefined OBJECT-GROUP (diffServMultiFieldClfrNextFree): At line 2157 in /usr/share/mibs/ietf/IPSEC-SPD-MIB
Undefined OBJECT-GROUP (diffServMIBMultiFieldClfrGroup): At line 2062 in /usr/share/mibs/ietf/IPSEC-SPD-MIB
Expected "::=" (RFC5644): At line 493 in /usr/share/mibs/iana/IANA-IPPM-METRICS-REGISTRY-MIB
Expected "{" (EOF): At line 651 in /usr/share/mibs/iana/IANA-IPPM-METRICS-REGISTRY-MIB
Bad object identifier: At line 651 in /usr/share/mibs/iana/IANA-IPPM-METRICS-REGISTRY-MIB
Bad parse of OBJECT-IDENTITY: At line 651 in /usr/share/mibs/iana/IANA-IPPM-METRICS-REGISTRY-MIB
解决办法:
#!/bin/bash
for i in /usr/share/mibs/ietf/IPSEC-SPD-MIB /usr/share/mibs/ietf/IPATM-IPMC-MIB /usr/share/mibs/iana/IANA-IPPM-METRICS-REGISTRY-M
IB /usr/share/mibs/ietf/SNMPv2-PDU
do
mv $i /usr/share/mibs
done
另外snmpwalk -v 2c -c public 192.168.1.1 mem可以使用,snmpwalk -v 2c
-c public 192.168.1.1 cpu不能使用,因为不存在cpu这个logical name,所以,直接使用OID,snmpwalk
-v 2c -c public 192.168.1.1 .1.3.6.1.2.1.25.3.3.1.2
snmpwalk -v 2c -c public 192.168.1.1 .1.3.6.1.2.1.25.3.3.1.2
HOST-RESOURCES-MIB::hrProcessorLoad.768 = INTEGER: 1
HOST-RESOURCES-MIB::hrProcessorLoad.769 = INTEGER: 0
HOST-RESOURCES-MIB::hrProcessorLoad.770 = INTEGER: 0
HOST-RESOURCES-MIB::hrProcessorLoad.771 = INTEGER: 1
HOST-RESOURCES-MIB::hrProcessorLoad.772 = INTEGER: 0
HOST-RESOURCES-MIB::hrProcessorLoad.773 = INTEGER: 0
HOST-RESOURCES-MIB::hrProcessorLoad.774 = INTEGER: 1
HOST-RESOURCES-MIB::hrProcessorLoad.775 = INTEGER: 0
HOST-RESOURCES-MIB::hrProcessorLoad.776 = INTEGER: 1
HOST-RESOURCES-MIB::hrProcessorLoad.777 = INTEGER: 1
HOST-RESOURCES-MIB::hrProcessorLoad.778 = INTEGER: 1
HOST-RESOURCES-MIB::hrProcessorLoad.779 = INTEGER: 2
HOST-RESOURCES-MIB::hrProcessorLoad.780 = INTEGER: 0
HOST-RESOURCES-MIB::hrProcessorLoad.781 = INTEGER: 1
HOST-RESOURCES-MIB::hrProcessorLoad.782 = INTEGER: 0
HOST-RESOURCES-MIB::hrProcessorLoad.783 = INTEGER: 0
精简的snmpd.conf
####
# First, map the community name "public" into a "security name"
#ec.name source community
com2sec my_user 192.168.1.1 public
####
# Second, map the security name into a group name:
# groupName securityModel securityName
group my_group v2c my_user
####
# Third, create a view for us to let the group have rights to:
# name incl/excl subtree mask(optional)
view all included .1 80
####
# Finally, grant the group read-only access to the systemview view.
# group context sec.model sec.level prefix read write notif
access my_group "" any noauth exact all none none
问题
在ubuntu10.10中,使用的5.4.3~dfsg-1ubuntu3 0,出现问题:
CRITICAL: Interface speed equal 0! Interface must be down.
网卡实际上是启动着的,通过
/usr/local/nagios/libexec/check_centreon_snmp_traffic -v 2c -C public -H 192.168.1.2 -s
Interface 1 :: lo :: up
Interface 2 :: eth0 :: down
Interface 3 :: eth1 :: up
/usr/local/nagios/libexec/check_centreon_snmp_traffic -v 2c -C public -H 192.168.1.2 -n -i eth1
CRITICAL: Interface speed equal 0! Interface must be down.|traffic_in=0B/s traffic_out=0B/s
通过snmpwalk查询网卡信息:
snmpwalk -v 2c -c public 192.168.1.2 1.3.6.1.2.1.31.1.1.1
IF-MIB::ifName.1 = STRING: lo
IF-MIB::ifName.2 = STRING: eth0
IF-MIB::ifName.3 = STRING: eth1
IF-MIB::ifHighSpeed.1 = Gauge32: 10
IF-MIB::ifHighSpeed.2 = Gauge32: 0
IF-MIB::ifHighSpeed.3 = Gauge32: 0
其中,::ifHighSpeed.3 = Gauge32: 0应该为100M网卡,但是这里显示的是0。另外发现重启snmp服务后,10S钟之内能够正常显示100,所以考虑更换snmp版本,使用net-snmp编译安装net-snmp-5.7.2.tar.gz,问题解决。
20140812补充
又遇到这问题,ubuntu14.04编译安装无效
{{{
There is this command snmpwalk -v 1 -c public hostname 1.3.6.1.2.1.31.1.1.1 that lists a lot of OID's and from there you can see 'IF-MIB::ifName' which stand for the interfaces. And if you execute IF-MIB::ifInOctets.x where x corresponds to the interface you are interested in you can find a number in bytes.
traffic的oid 是1.3.6.1.2.1.31.1.1.1
重这里可以提出traffic in 和traffic out
查找
if ( $speed_card == 0 ) {
print "CRITICAL: Interface speed equal 0! Interface must be down.|traffic_in=0B/s traffic_out=0B/s\n";
exit($ERRORS{"CRITICAL"});
}
可以看到,当获取的网卡速度为0时会出现这个错误,在脚本中搜索speed_card
####### Get SPEED of interface
my $speed_card;
if (defined($opt_T)){
$speed_card = $opt_T * 1000000;
} else {
$speed_card = $result->{$OID_SPEED};
if (!defined($result->{$OID_SPEED}) || int($result->{$OID_SPEED}) !~ /^[0-9]+$/) {
print "ERROR: Card speed is null or incorrect. You should force the value with -T option.\n";
exit $ERRORS{'UNKNOWN'};
}
if (defined($OPTION{'64-bits'})) {
$speed_card = $speed_card * 1000000;
}
}
查找OID_SPEED,打印出OID:.1.3.6.1.2.1.2.2.1.5.2
snmpwalk .1.3.6.1.2.1.2.2.1.5
snmpwalk -v 2c -c public 54.92.17.114 .1.3.6.1.2.1.2.2.1.5
IF-MIB::ifSpeed.1 = Gauge32: 10000000
IF-MIB::ifSpeed.2 = Gauge32: 0
原因是把lo和eth0的值取反了。
还有一个defined($opt_T),所以,就加个-T 参数吧
/usr/local/nagios/libexec/check_centreon_snmp_traffic -v 2c -C public -H 54.92.17.114 -n -i eth0 -T 100
问题解决
后记:centreon为这种场景创建了command,叫做check_centreon_traffic_limited
但是,command template将-T 的ARGV指定为ARGV6,但是却只有4个参数,所以,在页面上指定该参数时会无法保存,解决的办法就是去修改该template,将ARGV6改为ARGV4
查看centreon使用的OID?
{{{
check_centreon_snmp_traffic :
centreon = Centreon::SNMP::Utils::load_oids($ERRORS{'UNKNOWN'}, "/usr/local/nagios/libexec/centreon.conf");
/usr/local/nagios/libexec/centreon.conf:
就在这里
}}}
}}}
七、使用相关
1、关于Cannot Execute this command due to an path security problem
Cannot Execute this command due to an path security problem
因为我创建了软连接,这里执行时使用了真实的目录,所以出现了上面的问题。
解决办法:取消软链接,使用mount --bind来挂载目录。
2、新建host,状态为down
Check Command 使用check_host_alive
3、centreon使用
定义几个常用的模板很重要,
4、关于出图
官方非常好的文档:
/var/lib/centreon/metrics目录为空,没有rrd文件。
http://documentation.centreon.com/docs/centreon/en/latest/faq/index.html
Configuration -> Monitoring Engines->main.cfg->Data
Performance Data Processing Option应该为yes
head -43 /usr/local/nagios/libexec/process-service-perfdata
Configuration->Centreon->Pollers Perfdata file
Configuration -> Monitoring Engines->main.cfg->Data
monitor里面有要监控的服务,但是view里没有图?
检查monitor里的服务是否正常,Service Status是否OK,是否有Performance Data。
5、关于监控模板
Configuration -> Commands -> Checks里定义command
Configuration ->Services -> Templates 里定义service
其中,只需要填写 Alias *,Service Template Name *, Service Template Model 选择generic_service,其余没有填写的项都继承自generic_service
如何为服务器添加要监控的服务?
(1)、
(2)、 by host-add添加service,并Relations里添加host。
check_disk 来源:nrpe自带
修改被监控服务器的/usr/local/nagios/etc/nrpe.cfg
添加command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -u GB -p / -p /data
重启nrpe :pkill -9 -f nrpe &&/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
对要将监控的分区做了修改?
Administration -> Options -> -> Manage修改分区,rebuild rrd database
单位由MB改为GB,不需做任何操作,下次check时,图像和monitor都会自动修改。图像中的刻度随后也会更改
check_mem 来源:nagios exchane
放到/usr/local/nagios/libexec/目录,注意权限 chown nagios.nagios /usr/local/nagios/libexec/check_mem.pl
修改被监控服务器的/usr/local/nagios/etc/nrpe.cfg
添加command[check_mem]=/usr/local/nagios/libexec/check_mem.pl -w 95% -c 99%
重启nrpe :pkill -9 -f nrpe &&/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
关于memory的监控,如果使用centreon自带的snmp_memory,该服务模板使用的command是check_centreon_snmp_memory,默认会讲swap也加入到内存的总容量中,添加下面的参数来取消该行为。
-S Do not add swap size in total memory size, perfdata will show ram size corresponding physical size installed
CPU 来源:centreon自带,check_centreon_snmp_cpu,通过SNMP查询
测试./check_centreon_snmp_cpu -H 192.168.1.1 -v 2c -C public
关于ubuntu的snmp问题
apt-get install snmp-mibs-downloader
sudo download-mibs
#!/bin/bash
for i in /usr/share/mibs/ietf/IPSEC-SPD-MIB /usr/share/mibs/ietf/IPATM-IPMC-MIB /usr/share/mibs/iana/IANA-IPPM-METRICS-REGISTRY-M
IB /usr/share/mibs/ietf/SNMPv2-PDU
do
mv $i /usr/share/mibs
done
精简的snmpd.conf,注意修改IP为监控端IP地址
# First, map the community name "public" into a "security name"
#ec.name source community
com2sec my_user 192.168.1.1 public
####
# Second, map the security name into a group name:
# groupName securityModel securityName
group my_group v2c my_user
####
# Third, create a view for us to let the group have rights to:
# name incl/excl subtree mask(optional)
view all included .1 80
####
# Finally, grant the group read-only access to the systemview view.
# group context sec.model sec.level prefix read write notif
access my_group "" any noauth exact all none none
另外,centreon中service的args需要填写。
CPU的图像中,每个cpu单独绘图,如何合并?
Views -》 Graphs -》 Templates-》CPU,取消Split Components
Traffic 来源:centreon自带,check_centreon_snmp_traffic,通过SNMP查询
需要创建多个traffic的service模板,分别对应eth0,eht1,通过在servicce里的args里区分。centreon中service的args需要填写。
traffic,第一次执行check时会创建buffer,缓存网卡信息。
Http 来源:nagios plugin
因为存在单一IP运行多虚拟机的问题,所以需要修改默认的command。
Configuration-》Commands-》Checks-》check_http
Command Line修改为$$/check_http -H $HOSTADDRESS$ -u $$ -w 7 -c 10
Argument Example修改为!
点击Describe arguments,添加描述: : URL
关于check_snmp
使用snmpwalk可以取到数据,使用check_snmp有报错:
SNMP OK - = No Such Instance currently exists at this OID |
原因:
check_snmp
doesn't do a walk. You must be very specific to the exact OID you want
to check. You'll also need to specify failure criteria if you want to
get any kind of alert.
即一定要指定准确的OID
How can I assign the curves to the services?
First, in you web interface, have a look here :
Administration > Options > > Manage
You will see which metric are used by the of your service
Write the metric's name of your service on a paper. :)
Now go here :
Views > Graphs > Curves
You will see all the curves definition with "Data Source Name".
Each curve is associated with the metric with the same name as "Data Source Name"
In other words, if your metric's name is "time" , the curve who wille be used is the one who get "time" as "Data Source Name".
监控mysql
监控mysql没有可以用于绘图的performance date,仅仅用来监控mysql状态。
mysql监听127.0.0.1,又不想更改为监听其他地址,远程怎么监控?
使用snmp的extend,
在被监控的服务器上,snmpd.conf,添加
extend .1.3.6.1.4.1.2021.50 mysql_monitor /bin/bash /opt/nagios_check_mysql.sh
内容为:
/opt/check_mysql
-u root -p password
其中check_mysql为nagios的插件,直接拷贝过来的,如果手动执行时遇到so文件问题,解决比较困难的话,可以在本地编译安装nagios-plugin即可。
centreon里添加一个command
check_snmp_mysql_liseten_local:
$USER1$/check_snmp -H $HOSTADDRESS$ -P 2c -C public -o $ARG1$ -r Uptime
之所以要添加-r Uptime是因为如果不对snmp的返回结果进行判断,当snmp返回为不存在这个OID时,snmp的状态也是OK,这样就不符合我们对mysql运行状态的判断,所以,当返回加过中没有uptime时,判断snmp的状态为CRITICAL.
7、邮件发送问题
7.1、ubuntu 12.04默认使用postfix,配置为本地,不能外发。
执行命令重新配置 dpkg-reconfigure postfix:
邮件类型选择internet site,否则无法外发邮件
system mail name应该为FQDN域名,如aaa.com,否则无法向163发送邮件,可以通过/etc/mailname文件修改。
其余全部选默认
修改完成后,将监听网卡改为127.0.0.1,方法为修改文件/etc/postfix/main.cf
inet_interfaces = loopback-only
重启postfix
测试 echo “test”|mail aaa@163.com
7.2 Configuration ->Users->Contacts / Users
对应的user打开Enable Notifications,如果所有的user都没有打开的话,当服务出现故障,需要报警,虽然对服务设置了报警,但是Last Service Notification会一直显示为空,即没有报警。
7.3mail 命令
Configuration->Users->Commands
检查命令是否正确,之前遇到不能正确发送报警邮件,问题出在mail的命令不正确的问题。
需要注意检查host-notify-by-email和service-notify-by-email。
例如:service-notify-by-email应该如下,默认的缺少了个mail
/usr/bin/printf "%b" "***** centreon Notification *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $DATE$ Additional Info : $SERVICEOUTPUT$" |mail -s "** $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
关于自己写插件,如何定义返回状态时OK,还是CRITICAL?
参考:
https://blog.centreon.com/good-practices-how-to-develop-monitoring-plugin-nagios/
Return codes
A plugin have to send a return code. This
interpreted code is the result of the plugin execution. We call this
result “status”. This is two summary tables about return codes for hosts
and services :
Hosts:
Plugin return code
|
Host status
|
0
|
UP
|
1
|
DOWN
|
Other
|
Maintains last known state
|
Services:
Return code
|
Service status
|
0
|
OK
|
1
|
WARNING
|
2
|
CRITICAL
|
3
|
UNKNOWN
|
Other
|
CRITICAL : unknown return code
|
例如,对于shell,分别exit 0或exit 2就可以了
关于错误
ndo2db: Warning: queue send error, retrying...
ndo2db: Message sent to queue.
修改系统内核参数
vi /etc/sysctl.conf
kernel.msgmax = 131072000
kernel.msgmnb = 131072000
kernel.msgmni = 65536000
MSGMNB 每个消息队列的最大字节限制。
MSGMNI 整个系统的最大数量的消息队列。
MSGMAX 单个消息的最大size。在某些操作系统例如BSD中,你不必设置这个。BSD自动设置它为MSGSSZ * MSGSEG。其他操作系统中,你也许需要改变这个参数的默认值,你可以设置它与MSGMNB相同