Chinaunix首页 | 论坛 | 博客
  • 博客访问: 7058
  • 博文数量: 2
  • 博客积分: 70
  • 博客等级: 民兵
  • 技术积分: 40
  • 用 户 组: 普通用户
  • 注册时间: 2011-09-04 18:18
文章分类

全部博文(2)

文章存档

2012年(2)

我的朋友

分类: 系统运维

2012-01-14 14:00:45

最近公司需要上线监控系统,而且需要部署很多的监控,环境与设备也大都不一样,所以我就写了一份安装监控的技术文档,让我公司的运维来根据我的文档来进行监控的部署。

我的系统是redhat5.4,关闭了iptables与selinux。

1、安装yum(如果本机有了yum,则可以不安装,跳过此步到第3步)
  1. [root@localhost yum.repos.d]# wget http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.1-1.el5.rf.i386.rpm
  2. [root@localhost yum.repos.d]# wget http://dag.wieers.com/rpm/packages/RPM-GPG-KEY.dag.txt
  3. [root@localhost yum.repos.d]# rpm -Uvh rpmforge-release-0.5.1-1.el5.rf.i386.rpm
  4. [root@localhost yum.repos.d]# rpm --import RPM-GPG-KEY.dag.txt
  5. [root@localhost yum.repos.d]# yum install yum-fastestmirror yum-presto

2、安装apache(如果本机默认安装了,那么可以跳过这一步,如果没有安装,则可以yum安装)

  1. [root@localhost ~]# yum -y install httpd

安装nagios需要一些基础支持套件

  1. [root@localhost etc]# yum -y install gd gd-devel glibc glibc-common gcc

3、配置apache来支持nagios

1)建立nagios用户

  1. [root@localhost ~]# useradd nagios

  2. [root@localhost etc]# /usr/sbin/groupadd nagcmd 添加nagcmd用户组,用以通过web页面提交外部控制命令

  3. [root@localhost etc]# /usr/sbin/usermod -a -G nagcmd nagios将nagios用户加入nagcmd组

  4. [root@localhost etc]# /usr/sbin/usermod -a -G nagcmd apache将apache用户加入nagcmd组

  5. [root@localhost etc]# /usr/sbin/usermod -a -G apache nagios将nagios用户加入apache组

  6. [root@localhost etc]# /usr/sbin/usermod -a -G nagios apache将apache用户加入nagios组

2)修改apache运行用户和组。默认是daemon,需要把它改成nagios。这样它才能有权限访问我们安装的nagios目录,执行相关的cgi命令,如通过浏览器界面关闭nagios、停止某个故障对象发送报警信息等。(此步可以省略,因为我在部署nagios的时候,没有改变apache的用户与组,也没有出现问题)

3)添加nagios访问目录(nagios 的安装路径/usr/local/nagios),同时使用http用户验证。把下面的内容追加到httpd.conf文件的末尾:

  1. ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin
  2. <Directory "/usr/local/nagios/sbin">
  3. Options ExecCGI
  4. AllowOverride None
  5. Order allow,deny
  6. Allow from all
  7. AuthName "Nagios Access"
  8. AuthType Basic
  9. AuthUserFile /usr/local/nagios/etc/htpasswd
  10. Require valid-user
  11. </Directory>
  12. Alias /nagios /usr/local/nagios/share
  13. <Directory "/usr/local/nagios/share">
  14. Options None
  15. AllowOverride None
  16. Order allow,deny
  17. Allow from all
  18. AuthName "Nagios Access"
  19. AuthType Basic
  20. AuthUserFile /usr/local/nagios/etc/htpasswd
  21. Require valid-user
  22. </Directory>

4、安装nagios

  1. [root@localhost tmp]# tar zxvf nagios-3.3.1.tar.gz
  2. [root@localhost nagios]# ./configure --prefix=/usr/local/nagios -with-command-group=nagcmd
  3. [root@localhost nagios]# make all
  4. [root@localhost nagios]# make install
  5. [root@localhost nagios]# make install-init
  6. [root@localhost nagios]# make install-config
  7. [root@localhost nagios]# make install-commandmode
  8. [root@localhost nagios]# make install-webconf

5、安装nagios插件nagios-plugin

  1. [root@localhost nagios]#cd /tmp
  2. [root@localhost tmp]# tar zxvf nagios-plugins-1.4.15.tar.gz
  3. [root@localhost nagios-plugins-1.4.15]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios
  4. [root@localhost nagios-plugins-1.4.15]# make
  5. [root@localhost nagios-plugins-1.4.15]# make install
6、配置nagios
  1. [root@localhost nagios-plugins-1.4.15]# cd /usr/local/
  2. [root@localhost local]# chown -R nagios:nagios nagios/
  3. [root@localhost local]# chown -R nagios:nagios nagios/*
  4. [root@localhost local]# cd nagios/etc/
  5. [root@localhost etc]# vim nagios.cfg     ###修改nagios.cfg配置文件,内容如下:
  6. cfg_file=/usr/local/nagios/etc/hosts.cfg #增加主机配置文件
  7. cfg_file=/usr/local/nagios/etc/hostgroups.cfg #增加主机组配置文件
  8. cfg_file=/usr/local/nagios/etc/contacts.cfg #增加联系人配置文件
  9. cfg_file=/usr/local/nagios/etc/contactgroups.cfg #增加联系人配置文件
  10. cfg_file=/usr/local/nagios/etc/services.cfg ##增加服务配置文件
  11. cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg #时间周期配置文件
  12. cfg_file=/usr/local/nagios/etc/objects/commands.cfg #命令配置文件

修改cgi.cfg配置文件,修改内容如下:

  1. [root@localhost etc]# vim cgi.cfg
  2. #如有多个用户,中间用逗号隔开
  3. authorized_for_system_information=nagios
  4. authorized_for_configuration_information= nagios
  5. authorized_for_system_commands= nagios
  6. authorized_for_all_services= nagios
  7. authorized_for_all_hosts= nagios
  8. authorized_for_all_service_commands= nagios
  9. authorized_for_all_host_commands= nagios
在这里指定的用户
nagios”可以通过浏览器操纵nagios服务的关闭、重启等各种操作
  1. [root@localhost etc]# sed -i 's/nagiosadmin/nagios/g' cgi.cfg ##或者用此命令修改
  2. (1)、配置主机文件hosts.cfg
  3. define host{
  4. host_name                        web1## 主机名为web1,可以在hostname里查看
  5. alias                            Nagios Server ##主机别名为Server
  6. address                          192.168.10.223##主机的ip地址
  7. check_command                    check-host-alive ##检查使用的命令,需要在命令定
  8. 义文件定义,默认是定义好的。
  9. check_interval                   5 ##检测的时间间隔
  10. retry_interval                   1 ##检测失败后重试的时间间隔
  11. max_check_attempts               5 ##最大重试次数
  12. check_period                     24x7 ##检测的时段
  13. process_perf_data                0
  14. retain_nonstatus_information     0
  15. contact_groups                   admin ###联系组,就是设置邮件报警的组
  16. notification_interval            30 ##通知间隔
  17. notification_period              24x7 ##通知周期设置
  18. notification_options             d,u,r ####定义什么状态时报警,定义报警状态中的w表示warning,u表示unknown,c表示critial,r表示recovery(即恢复后是否发送通知);报警选项一般生产环境下设置w,c,r即可
  19. }
  20. (2)、配置主机组文件hostgroups.cfg
  21. define hostgroup {
  22. hostgroup_name                    Nagios-Example ##定义主机组的名字
  23. alias                             Nagios Example ##定义主机组的别名
  24. members                           web1 ##主机组的成员,跟hosts.cfg里的hostname一致,否则出错
  25. }
  26. (3)、配置联系人文件contacts.cfg
  27. define contact{
  28. contact_name                      nagiosadmin #联系名称
  29. alias                             Nagios Admin #联系别名
  30. service_notification_period       24x7 #服务监控时间为任何时候
  31. host_notification_period          24x7 #主机监控时间为任何时候
  32. service_notification_options      w,u,c,r #服务监控的状态
  33. host_notification_options         d,u,r #主机监控的状态
  34. service_notification_commands     notify-service-by-email #邮件报警
  35. host_notification_commands        notify-host-by-email #同上
  36. email                             #接收报警的邮箱
  37. }
  38. (4)、配置联系组文件contactgroups.cfg
  39. define contactgroup{
  40. contactgroup_ name                admin #联系组的名字
  41. alias                             Nagios Administrators #联系组的别名
  42. members                           nagiosadmin #联系组里的成员,与contacts.cfg里的contact_name 保存一致

  43. }
  44. (5)、配置服务文件 services.cfg
  45. define service {
  46. host_name                         web1 #与hosts.cfg里的host-name保持一致
  47. service_description               check-host-alive #服务描述
  48. check_period                      24x7 #服务描述
  49. max_check_attempts                4 #最大检测次数
  50. normal_check_interval             3 #检测的时间间隔
  51. retry_check_interval              2 #重复检测的时间间隔
  52. contact_groups                    admin #发生故障通知的联系人组
  53. notification_interval             10 #通知间隔
  54. notification_period               24x7 #通知的时间段
  55. notification_options              w,u,c,r #定义什么状态时报警,定义报警状态中
  56. check_command                     check-host-alive #检测的命令
  57. }
  58. define service {
  59. host_name                         web1
  60. service_description               PING
  61. check_period                      24x7
  62. max_check_attempts                4
  63. normal_check_interval             3
  64. retry_check_interval              2
  65. contact_groups                    admin
  66. notification_interval             10
  67. notification_period               24x7
  68. notification_options              w,u,c,r
  69. check_command                     check_ping!100.0,20%!500.0,60%
  70. }
  71. define service {
  72. host_name                         web1
  73. service_description               Root Partition
  74. check_period                      24x7
  75. max_check_attempts                4
  76. normal_check_interval             3
  77. retry_check_interval              2
  78. contact_groups                    admin
  79. notification_interval             10
  80. notification_period               24x7
  81. notification_options              w,u,c,r
  82. check_command                     check_local_disk!20%!10%!/
  83. }
  84. define service {
  85. host_name                         web1
  86. service_description               Current Users
  87. check_period                      24x7
  88. max_check_attempts                4
  89. normal_check_interval             3
  90. retry_check_interval              2
  91. contact_groups                    admin
  92. notification_interval             10
  93. notification_period               24x7
  94. notification_options              w,u,c,r
  95. check_command                     check_local_users!20!50
  96. }
  97. define service {
  98. host_name                         web1
  99. service_description               Total Processes
  100. check_period                      24x7
  101. max_check_attempts                4
  102. normal_check_interval             3
  103. retry_check_interval              2
  104. contact_groups                    admin
  105. notification_interval             10
  106. notification_period               24x7
  107. notification_options              w,u,c,r
  108. check_command                     check_local_procs!250!400!RSZDT
  109. }
  110. define service {
  111. host_name                         web1
  112. service_description               Current Load
  113. check_period                      24x7
  114. max_check_attempts                4
  115. normal_check_interval             3
  116. retry_check_interval              2
  117. contact_groups                    admin
  118. notification_interval             10
  119. notification_period               24x7
  120. notification_options              w,u,c,r
  121. check_command                     check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
  122. }
  123. define service {
  124. host_name                         web1
  125. service_description               Swap Usage
  126. check_period                      24x7
  127. max_check_attempts                4
  128. normal_check_interval             3
  129. retry_check_interval              2
  130. contact_groups                    admin
  131. notification_interval             10
  132. notification_period               24x7
  133. notification_options              w,u,c,r
  134. check_command                     check_local_swap!20!10
  135. }
  136. define service {
  137. host_name                         web1
  138. service_description               SSH
  139. check_period                      24x7
  140. max_check_attempts                4
  141. normal_check_interval             3
  142. retry_check_interval              2
  143. contact_groups                    admin
  144. notification_interval             10
  145. notification_period               24x7
  146. notifications_enabled             0
  147. notification_options              w,u,c,r
  148. check_command                     check_ssh
  149. }
  150. define service {
  151. host_name                         web1
  152. service_description               HTTP
  153. check_period                      24x7
  154. max_check_attempts                4
  155. normal_check_interval             3
  156. retry_check_interval              2
  157. contact_groups                    admin
  158. notification_interval             10
  159. notification_period               24x7
  160. notifications_enabled             0
  161. notification_options              w,u,c,r
  162. check_command                     check_http
  163. }
7、安装nrpe
  1. [root@localhost etc]# cd /tmp/
  2. [root@localhost tmp]# tar zxvf nrpe-2.12.tar.gz
  3. [root@localhost tmp]# cd nrpe-2.12
  4. [root@localhost nrpe-2.12]# ./configure --prefix=/usr/local/nrpe
  5. [root@localhost nrpe-2.12]# make
  6. [root@localhost nrpe-2.12]# make install
复制文件
  1. [root@localhost nrpe-2.12]# cp /usr/local/nrpe/libexec/check_nrpe /usr/local/nagios/libexec
  2. [root@localhost nrpe-2.12]# cp /usr/local/nagios/libexec/check_disk /usr/local/nrpe/libexec
  3. [root@localhost nrpe-2.12]# cp /usr/local/nagios/libexec/check_load /usr/local/nrpe/libexec
  4. [root@localhost nrpe-2.12]# cp /usr/local/nagios/libexec/check_ping /usr/local/nrpe/libexec
  5. [root@localhost nrpe-2.12]# cp /usr/local/nagios/libexec/check_procs /usr/local/nrpe/libexec
配置nrpe
  1. [root@localhost nrpe-2.12]# mkdir /usr/local/nrpe/etc
  2. [root@localhost nrpe-2.12]# cp sample-config/nrpe.cfg /usr/local/nrpe/etc/

修改nrpe.cfg的配置问题,如果是服务端的话,可以不修改,如果是客户端的话,则修改下面:

allowed_hosts=127.0.0.1

可以在allowed_hosts里加入服务都的ip

启动nrpe

  1. [root@localhost nrpe-2.12]# /usr/local/nrpe/bin/nrpe -c /usr/local/nrpe/etc/nrpe.cfg -d
  2. [root@localhost nrpe-2.12]# ps -ef|grep nrpe
  3. nagios 4465 1 0 21:02 ? 00:00:00 /usr/local/nrpe/bin/nrpe -c /usr/local/nrpe/etc/nrpe.cfg -d
  4. root 4467 12877 0 21:02 pts/2 00:00:00 grep nrpe
  5. [root@localhost nrpe-2.12]# lsof -i:5666
  6. COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
  7. nrpe 4465 nagios 4u IPv4 81685 TCP *:5666 (LISTEN)

修改nagios与nrpe的所属用户与组

  1. [root@localhost local]# chown -R nagios:nagios /usr/local/nagios/*
  2. [root@localhost local]# chown -R nagios:nagios /usr/local/nrpe/*

8、启动nagios

先查看nagios的配置是否有问题
  1. [root@localhost etc]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

  2. Nagios Core 3.3.1
  3. Copyright (c) 2009-2011 Nagios Core Development Team and Community Contributors
  4. Copyright (c) 1999-2009 Ethan Galstad
  5. Last Modified: 07-25-2011
  6. License: GPL

  7. Website: http://www.nagios.org
  8. Reading configuration data...
  9.    Read main config file okay...
  10. Processing object config file '/usr/local/nagios/etc/objects/commands.cfg'...
  11. Processing object config file '/usr/local/nagios/etc/objects/timeperiods.cfg'...
  12. Processing object config file '/usr/local/nagios/etc/hosts.cfg'...
  13. Processing object config file '/usr/local/nagios/etc/hostgroups.cfg'...
  14. Processing object config file '/usr/local/nagios/etc/contacts.cfg'...
  15. Processing object config file '/usr/local/nagios/etc/contactgroups.cfg'...
  16. Processing object config file '/usr/local/nagios/etc/services.cfg'...
  17.    Read object config files okay...
  18. Running pre-flight check on configuration data...

  19. Checking services...
  20.     Checked 9 services.
  21. Checking hosts...
  22.     Checked 1 hosts.
  23. Checking host groups...
  24.     Checked 1 host groups.
  25. Checking service groups...
  26.     Checked 0 service groups.
  27. Checking contacts...
  28.     Checked 2 contacts.
  29. Checking contact groups...
  30.     Checked 1 contact groups.
  31. Checking service escalations...
  32.     Checked 0 service escalations.
  33. Checking service dependencies...
  34.     Checked 0 service dependencies.
  35. Checking host escalations...
  36.     Checked 0 host escalations.
  37. Checking host dependencies...
  38.     Checked 0 host dependencies.
  39. Checking commands...
  40.     Checked 24 commands.
  41. Checking time periods...
  42.     Checked 5 time periods.
  43. Checking for circular paths between hosts...
  44. Checking for circular host and service dependencies...
  45. Checking global event handlers...
  46. Checking obsessive compulsive processor commands...
  47. Checking misc settings...
  48. Total Warnings: 0
  49. Total Errors: 0
  50. Things look okay - No serious problems were detected during the pre-flight check
没有问题,则启动nagios
  1. [root@localhost etc]# chkconfig --add nagios 将nagios添加到服务中
  2. [root@localhost etc]# chkconfig nagios on 设置服务为自启动
  3. [root@localhost etc]# service nagios start 启动nagios
创建web验证用户
  1. [root@localhost etc]# htpasswd -c /usr/local/nagios/etc/htpasswd nagios
  2. New password:
  3. Re-type new password:
  4. Adding password for user nagios
创建开机启动nrpe
  1. [root@localhost etc]#echo "/usr/local/nrpe/bin/nrpe -c /usr/local/nrpe/etc/nrpe.cfg -d" >>/etc/rc.local

启动sendmail,接收报警

  1. [root@localhost etc]#service sendmail start
之后你断掉httpd服务就能收到报警,如果出现了解决不了的问题,可以联系我。
或者直接浏览我的下一篇文章 “文章为什么nagios不能发生报警邮件
阅读(441) | 评论(0) | 转发(0) |
0

上一篇:没有了

下一篇:为什么nagios不能发送报警邮件

给主人留下些什么吧!~~