Chinaunix首页 | 论坛 | 博客
  • 博客访问: 873699
  • 博文数量: 72
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 1693
  • 用 户 组: 普通用户
  • 注册时间: 2014-08-04 15:53
个人简介

主要从事Linux,云原生架构改造,服务网格,ELK,python,golang等相关技术。

文章分类

全部博文(72)

文章存档

2015年(52)

2014年(20)

分类: 架构设计与优化

2015-03-25 12:10:37

     Nagios是一款开源的免费网络监视工具,能有效监控Windows、Linux和Unix的主机状态,交换机路由器等网络设置,打印机等。在系统或服务状态异常时发出邮件或短信报警第一时间通知网站运维人员,在状态恢复后发出正常的邮件或短信通知。
     Nagios 可以监控的功能有:
     1.监控网络服务(SMTP、POP3、HTTP、NNTP、PING等);
     2.监控主机资源(处理器负荷、磁盘利用率等);
     3.简单地插件设计使得用户可以方便地扩展自己服务的检测方法;
     4.并行服务检查机制;
     5.具备定义网络分层结构的能力,用"parent"主机定义来表达网络主机间的关系,这种关系可被用来发现和明晰主机宕机或不可达状态;
     6.当服务或主机问题产生与解决时将告警发送给联系人(通过EMail、短信、用户定义方式);
     7.可以定义一些处理程序,使之能够在服务或者主机发生故障时起到预防作用;
     8.自动的日志滚动功能;
     9.可以支持并实现对主机的冗余监控;
     10.可选的WEB界面用于查看当前的网络状态、通知和故障历史、日志文件等;
       
下面我们开始Nagios的安装

1.安装基础支持套件和添加用户
[root@vm ~]#  yum install httpd gcc glibc perl-ExtUtils-Embed -y
[root@vm ~]#  yum localinstall gd-devel-2.0.35-10.el6.x86_64.rpm -y   #这里我不推荐这样安装,因为虽然安装了,但是在web端查看拓扑图的时候会出现错误,直接配置成网络源然后安装
[root@vm ~]# useradd nagios
[root@vm ~]# usermod -G nagios apache

2.安装Nagios
[root@vm ~]# tar jxf nagios-cn-3.2.3.tar.bz2
[root@vm ~]# cd nagios-cn-3.2.3
[root@vm nagios-cn-3.2.3]# ./configure --enable-embedded-perl
[root@vm nagios-cn-3.2.3]# make all
[root@vm nagios-cn-3.2.3]# make install
[root@vm nagios-cn-3.2.3]# make install-init
[root@vm nagios-cn-3.2.3]# make install-config
[root@vm nagios-cn-3.2.3]# make install-commandmode
[root@vm nagios-cn-3.2.3]# make install-webconf
[root@vm nagios-cn-3.2.3]# htpasswd /usr/local/nagios/etc/htpasswd.users nagiosadmin
New password:
Re-type new password:
Updating password for user nagiosadmin
[root@vm nagios-cn-3.2.3]# cat /usr/local/nagios/etc/htpasswd.users
nagiosadmin:yCS49o40QLgYU
[root@vm ~]# vim /usr/local/nagios/etc/cgi.cfg
use_authentication=0                #将这一项该为0,不然有的页面不能正常显示
[root@vm nagios-cn-3.2.3]# /etc/init.d/httpd restart
[root@vm ~]# /etc/init.d/nagios start
[root@vm ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
我们在浏览器里面输入:192.168.1.105/nagios,然后输入用户名(nagiosadmin),密码(westos这是我们之前设置的):

注意:如果登录页面错误,请检查selinux、以及火墙的设置,可以暂时关掉这两项,然后刷新页面。

3. 安装nagios-plugins
nagios-plugins是nagios官方提供的一套插件程序,nagios监控主机的功能其实都是通过执行插

件程序来实现的。
[root@vm ~]# tar zxf nagios-plugins-1.5.tar.gz
[root@vm ~]# cd nagios-plugins-1.5
[root@vm nagios-plugins-1.5]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios
[root@vm nagios-plugins-1.5]# make && make install
[root@vm nagios-plugins-1.5]# /etc/init.d/nagios restart
[root@vm nagios-plugins-1.5]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg        #重新检测出现下面警告0,错误0表示我们的配置正常
Total Warnings: 0
Total Errors:   0                            #然后我们可以在web端查看我们的主机状态了

4.我们对某台主机的监控,实现监控主机与服务分开来监控
[root@vm ~]# cd /usr/local/nagios/etc/
[root@vm etc]# vim nagios.cfg
  1. # You can specify individual object config files as shown below:
  2. cfg_file=/usr/local/nagios/etc/objects/commands.cfg
  3. cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
  4. cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
  5. cfg_file=/usr/local/nagios/etc/objects/templates.cfg
  6. cfg_file=/usr/local/nagios/etc/objects/hosts.cfg           #添加主机监控模块
  7. cfg_file=/usr/local/nagios/etc/objects/services.cfg        #添加服务监控模块
[root@vm etc]# cd objects/
[root@vm objects]# cp -p localhost.cfg hosts.cfg
[root@vm objects]# cp -p localhost.cfg services.cfg
[root@vm objects]# vim hosts.cfg
  1. # Define a host for the local machine

  2. define host{
  3. use linux-server ; 给网络设备用的
  4. host_name vm1.example.com
  5. alias nagios监控主机
  6. address 192.168.1.104
  7. icon_image router.gif
  8. statusmap_image router.gd2
  9. 2d_coords 300,100
  10. 3d_coords 300,100,100
  11. }
  12. # Define an optional hostgroup for Linux machines

  13. define hostgroup{
  14. hostgroup_name linux-servers ; The name of the hostgroup
  15. alias Linux Servers ; Long name of the group
  16. members * ; Comma separated list of hosts that belong to this group
  17. }
[root@vm objects]# vim services.cfg
  1. ###############################################################################

  2. # Define a service to "ping" the local machine

  3. define service{
  4. use local-service ; Name of service template to use
  5. host_name *
  6. service_description PING
  7. check_command check_ping!100.0,20%!500.0,60%
  8. }

  9. # Define a service to check the disk space of the root partition
  10. # on the local machine. Warning if < 20% free, critical if
  11. # < 10% free space on partition.

  12. define service{
  13. use local-service ; Name of service template to use
  14. host_name vm1.example.com
  15. service_description 根分区
  16. check_command check_local_disk!20%!10%!/
  17. }



  18. # Define a service to check the number of currently logged in
  19. # users on the local machine. Warning if > 20 users, critical
  20. # if > 50 users.

  21. define service{
  22. use local-service ; Name of service template to use
  23. host_name vm1.example.com
  24. service_description 登录用户数
  25. check_command check_local_users!20!50
  26. }


  27. # Define a service to check the number of currently running procs
  28. # on the local machine. Warning if > 250 processes, critical if
  29. # > 400 users.

  30. define service{
  31. use local-service ; Name of service template to use
  32. host_name vm1.example.com
  33. service_description 进程总数
  34. check_command check_local_procs!250!400!RSZDT
  35. }

  36. # Define a service to check the load on the local machine.

  37. define service{
  38. use local-service ; Name of service template to use
  39. host_name vm1.example.com
  40. service_description 系统负荷
  41. check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
  42. }



  43. # Define a service to check the swap usage the local machine.
  44. # Critical if less than 10% of swap is free, warning if less than 20% is free

  45. define service{
  46. use local-service ; Name of service template to use
  47. host_name vm1.example.com
  48. service_description 交换空间利用率
  49. check_command check_local_swap!20!10
  50. }



  51. # Define a service to check SSH on the local machine.
  52. # Disable notifications for this service by default, as not all users may have SSH enabled.

  53. define service{
  54. use local-service ; Name of service template to use
  55. host_name vm1.example.com
  56. service_description SSH
  57. check_command check_tcp!22!1.0!10.0
  58. notifications_enabled 0
  59. }



  60. # Define a service to check HTTP on the local machine.
  61. # Disable notifications for this service by default, as not all users may have HTTP enabled.

  62. define service{
  63. use local-service ; Name of service template to use
  64. host_name vm1.example.com
  65. service_description HTTP
  66. check_command check_http
  67. notifications_enabled 0
  68. }
[root@vm objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg    #检测配置文件是否正确
Total Warnings: 0
Total Errors:   0
[root@vm objects]# /etc/init.d/nagios reload

5.设置被监控主机vm1.example.com(192.168.1.104)
(selinux and iptables is disabled)
[root@vm1 ~]# yum install mysql-server -y
[root@vm1 ~]# /etc/init.d/mysqld start
[root@vm1 ~]# mysql
mysql> create database nagdb;
mysql> grant select on nagdb.* to nagios@'192.168.1.104';
mysql> flush privileges;
[root@vm1 ~]# mysql -u nagios -h 192.168.1.104        #授权完成后我们进行一下测试
在vm.example.com(192.168.1.105)装有nagios主机上设置:
[root@vm ~]# cd /usr/local/nagios/libexec/
[root@vm libexec]# ./check_mysql -H 192.168.1.104 -u nagios -d nagdb            #检测104主机mysql的联通性
Uptime: 4020  Threads: 1  Questions: 44  Slow queries: 0  Opens: 15  Flush tables: 1  Open tables: 8  Queries per second avg: 0.10|Connections=23c;;; Open_files=16;;; Open_tables=8;;; Qcache_free_memory=0;;; Qcache_hits=0c;;; Qcache_inserts=0c;;; Qcache_lowmem_prunes=0c;;; Qcache_not_cached=0c;;; Qcache_queries_in_cache=0;;; Queries=44c;;; Questions=44c;;; Table_locks_waited=0c;;; Threads_connected=1;;; Threads_running=1;;; Uptime=4020c;;;
[root@vm libexec]# cd ../etc/objects/
[root@vm objects]# vim commands.cfg         #添加mysql监控模块
# 'check_mysql' command definition
define command{
        command_name    check_mysql
        command_line    $USER1$/check_mysql -H $HOSTADDRESS$ -u $ARG1$ -d $ARG2$
        }
[root@vm objects]# vim services.cfg    #添加下面的内容
###############check_mysql###########################
define service{
        use                                         local-service
        host_name                            vm1.example.com
        service_description             mysql
        check_command                 check_mysql!nagios!nagdb            #这里的nagios用于上面的$ARG1$,nagdb用于上面的$ARG2$
        notifications_enabled         0
        }
[root@vm objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg    #检测配置是否正确
Total Warnings: 0
Total Errors:   0
[root@vm objects]# /etc/init.d/nagios reload
接下来我们在web端去查看我们的vm1主机是否已经被监控,出现下面状态证明配置成功

再看看服务是否正常:

此时我们停止104主机上的mysql然后在查看他的监控状态:


6.Nagios 通过 NRPE 监控远程主机系统状况
NRPE是监控软件nagios的一个扩展,它被用于被监控的服务器上,向nagios监控平台提供该服务器的一些本地的情况。例如,cpu负载、内存使用、硬盘使用等等。NRPE可以称为nagios的for linux 客户端。

QQ截图20150405130418.png
远程监控主机(192.168.1.104)的设定 
[root@vm1 nagios-plugins-1.4.15]# yum install xinetd openssl-devel mysql-devel perl-ExtUtils-MakeMaker.x86_64 -y
[root@vm1 ~]# tar zxf nagios-plugins-1.4.15.tar.gz 
[root@vm1 ~]# cd nagios-plugins-1.4.15
[root@vm1 nagios-plugins-1.4.15]#  ./configure --with-nagios-user=nagios --with-nagios-group=nagios  --enable-extra-opts --enable-perl-modules --enable-libtap
QQ截图20150405170619.png
[root@vm1 nagios-plugins-1.4.15]# make 
[root@vm1 nagios-plugins-1.4.15]# make install 
[root@vm1 ~]# tar zxf nrpe-2.15.tar.gz 
[root@vm1 ~]# cd nrpe-2.15 
[root@vm1 nrpe-2.15]# ./configure
[root@vm1 nrpe-2.15]# make all 
[root@vm1 nrpe-2.15]# make install-plugin 
[root@vm1 nrpe-2.15]# make install-daemon 
[root@vm1 nrpe-2.15]# make install-daemon-config 
[root@vm1 nrpe-2.15]# make install-xinetd 
[root@vm1 nrpe-2.15]# vim /etc/xinetd.d/nrpe
only_from       = 127.0.0.1 192.168.1.104 192.168.1.105            #添加监控主机的ip,以空格隔开
[root@vm1 nrpe-2.15]# vim /etc/services 
nrpe            5666/tcp                # nrpe                    #在最后添加这一行 
[root@vm1 nrpe-2.15]# vim /usr/local/nagios/etc/nrpe.cfg        #添加对根分区的监控 
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p / 
[root@vm1 nrpe-2.15]# /etc/init.d/xinetd start 
[root@vm1 nrpe-2.15]# netstat -antlp|grep xinetd 
tcp        0      0 :::5666                     :::*                        LISTEN      50424/xinetd  
[root@vm1 ~]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 
NRPE v2.15                #看到这个信息,证明我们的nrpe配置正常 
要查看主机的其他监控信息,可以使用一下命令 
QQ截图20150405145455.png

在运行nagios的主机上: 
[root@vm nrpe-2.15]# yum install xinetd 
[root@vm ~]# tar zxf nrpe-2.15.tar.gz 
[root@vm ~]# cd nrpe-2.15 
[root@vm nrpe-2.15]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios
[root@vm nrpe-2.15]# make all
[root@vm nrpe-2.15]# make install-plugin
[root@vm nrpe-2.15]# make install-daemon
[root@vm nrpe-2.15]# make install-daemon-config
[root@vm nrpe-2.15]# make install-xinetd 
[root@vm nrpe-2.15]# /usr/local/nagios/libexec/check_nrpe -H 192.168.1.104 
NRPE v2.15 
检测192.168.1.104的其他状态,将下面的localhost换成192.168.1.104 
QQ截图20150405145455.png

[root@vm objects]# vim /usr/local/nagios/etc/objects/commands.cfg        #添加下面内容
  1. # 'check_nrpe' command definition
  2. define command{
  3. command_name check_nrpe
  4. command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
  5. }
[root@vm objects]# vim /usr/local/nagios/etc/objects/services.cfg         #添加下面内容
  1. #####################check_nrpe#########################
  2. define service{
  3. use local-service ; Name of service template to use
  4. host_name vm1.example.com
  5. service_description 根分区
  6. check_command check_nrpe!check_disk
  7. }
  8. # Define a service to check the number of currently logged in
  9. # users on the local machine. Warning if > 20 users, critical
  10. # if > 50 users.
  11. define service{
  12. use local-service ; Name of service template to use
  13. host_name vm1.example.com
  14. service_description 登录用户数
  15. check_command check_nrpe!check_users
  16. }
  17. # Define a service to check the number of currently running procs
  18. # on the local machine. Warning if > 250 processes, critical if
  19. # > 400 users.
  20. define service{
  21. use local-service ; Name of service template to use
  22. host_name vm1.example.com
  23. service_description 进程总数
  24. check_command check_nrpe!check_total_procs
  25. }
[root@vm objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg 
[root@vm objects]# /etc/init.d/nagios reload 
通过web端测试: 可能需要等一会儿将变为正常状态 
QQ截图20150405173713.png
QQ截图20150405130418.png
nagios整合fetion(飞信)实现报警功能 
下载 fetion 主程序与支持库,这里我们将不在下载,直接使用原来下载好的 
[root@vm ~]# mv fetion /usr/local/nagios/libexec/ 
[root@vm ~]# chown nagios.nagios /usr/local/nagios/libexec/fetion 
[root@vm ~]# tar zxf linuxso_20101113.tar.gz -C /lib
[root@vm ~]# chmod +x /lib/lib* 
[root@vm ~]# su - nagios 
[nagios@vm ~]$ chmod +x /usr/local/nagios/libexec/fetion 
[nagios@vm ~]$ /usr/local/nagios/libexec/fetion 
在安装fetion的时候可能会出现下列报错,解决办法: 
-bash: /usr/local/nagios/libexec/fetion: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory 
[root@vm ~]#yum install ld-linux.so.2 -y 
[root@vm ~]# su - nagios -c /usr/local/nagios/libexec/fetion 
[root@vm ~]# yum install libstdc++.so.6 -y 
[root@vm yum.repos.d]# su - nagios -c /usr/local/nagios/libexec/fetion 
/usr/local/nagios/libexec/fetion: error while loading shared libraries: libgssapi_krb5.so.2: cannot open shared object file: No such file or directory 
[root@vm yum.repos.d]# yum install libgssapi_krb5.so.2 -y 
[root@vm yum.repos.d]# su - nagios -c /usr/local/nagios/libexec/fetion 
/usr/local/nagios/libexec/fetion: error while loading shared libraries: libz.so.1: cannot open shared object file: No such file or directory 
[root@vm yum.repos.d]# yum install libz.so.1 -y 

安装完成后我们进行设置 

[root@vm yum.repos.d]# su - nagios                                                                这里的密码是飞信的密码 
[nagios@vm ~]$ /usr/local/nagios/libexec/fetion --mobile=13649216631 --pwd=westos --to=13649216631 --msg-utf8="good luck" 
JAFmblBMcjI+TO7pz06Vo15X4EXXPF9lb+Lf7cZJjbC2ia5sI354Mu3bt+XeMep2da7m2tPE8dzGbjWdKmtgw3oulyJIy9wG88gH32n6VTVtHuSnfVG3RRRSGFee+Jvhtc+KvHGneIpdWGmf2aVWEWkQeaQKdwYu/yqck/LsYY7nOB6FRR1T7B0a7nl3jX4deJPEOt6BqUOs2V6ukXAkSG8iMDyLuViXeMMpYlQOI1AHY963irwp4m1T4ueHvEZ0iO40qwwjLbXaF12sSHZZNgHLDhS3C+uBXrVFC0a8ncHqmu6seL3un6pdftEadrd9oeqw6ZDD5UM/wBmMyE7XVSWi3BAWYn5iCBycVzHxU1G38PfHvRtX85FSJbaW5KncUAYhsgcg7McV9H1Ws9OsdO877DZW9t58hll8iJU8xz1ZsDkn1PNC05fLUHrzX6qxS0e70jX4oPEWmkzLPCYo7go8ZaPdnG1gDjIyCR344PNu30vT7S3mt7axtoYZ2Z5Y4oVVZGb7xYAYJPcnrVuih22Ags7K0060jtLG1htbaMYSGCMIi854UcCp6KKACiiigAooooAKKKKACiiigAooooAKKKKACiiigD/2Q==" /> 
图形验证码已经生成,文件名为:13649216631.jpg请识别后输入图形验证码: 
pmfw 
您输入的识别码是:pmfw 
这样就完成了。 
[nagios@vm libexec]$ vim fetion.sh 
/usr/local/nagios/libexec/fetion --mobile=13669281264 --pwd="haiying.910201" --to="$1" --msg-utf8="$2" 
[nagios@vm libexec]$ chmod +x fetion.sh 
执行 fetion.sh 脚本看是否可以发送信息 
注: 第一次调用飞信脚本时会要求你输入验证码, 在飞信程序所在的 /usr/local/fetion 目录下会生 
成以你手机号命名的 jpg 图片文件,上面存放着验证码。
[root@vm yum.repos.d]# /usr/local/nagios/libexec/fetion.sh 13649216631 "hello world" 
[root@vm objects]# vim commands.cfg        #添加下面两行 
  1. # 'notify-service-by-fetion' command definition
  2. define command{
  3. command_name notify-host-by-fetion
  4. command_line $USER1$/fetion.sh $CONTACTPAGERS "$NOTIFICATIONTYPE$ Host Alert: $HOSTALIAS$ is $HOSTSTATE$
  5. }
  6. # 'notify-service-by-fetion' command definition
  7. define command{
  8. command_name notify-service-by-fetion
  9. command_line $USER1$/fetion.sh $CONTACTPAGERS "$NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ "
  10. }
[root@vm objects]# vim templates.cfg
  1. # Generic contact definition template - This is NOT a real contact, just a template!
  2. define contact{
  3. name generic-contact ; The name of this contact template
  4. service_notification_period 24x7 ; service notifications can be sent anytime
  5. host_notification_period 24x7 ; host notifications can be sent anytime
  6. service_notification_options w,u,c,r,f,s ; send notifications for all service states, flapping events, and scheduled downtime events
  7. host_notification_options d,u,r,f,s ; send notifications for all host states, flapping events, and scheduled downtime events
  8. service_notification_commands notify-service-by-email, notify-service-by-fetion ; send service notifications via email
  9. host_notification_commands notify-host-by-email, notify-host-by-fetion ; send host notifications via email
  10. register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
  11. }
[root@vm objects]# vim contacts.cfg
  1. # template which is defined elsewhere.
  2. define contact{
  3. contact_name nagiosadmin ; Short name of user
  4. use generic-contact ; Inherit default values from generic-contact template (defined above)
  5. alias Nagios Admin ; Full name of user
  6. email 1020659371@qq.com ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
  7. pager 13669281264
  8. }
[root@vm objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg 
Total Warnings: 0 
Total Errors:   0 
[root@vm objects]# /etc/init.d/nagios reload 

我们将192.168.1.104主机上的mysql服务停掉看看会不会进行报警
[root@vm1 ~]# /etc/init.d/mysqld stop
去查看我的邮箱看是否收到报警邮件 
如果不能收到邮件查看这篇博文:http://blog.chinaunix.net/uid-29784755-id-4939485.html



阅读(8089) | 评论(2) | 转发(0) |
给主人留下些什么吧!~~

china_Linux_hy2015-03-31 22:15:50

yancui135790:虽然看不懂,但是很棒的样子、、。。。

回复 | 举报

yancui1357902015-03-31 08:43:00

虽然看不懂,但是很棒的样子、、。。。