一个用于监控Dell PowerEdge服务器硬件状态的nagios/icinga插件-coolcole-ChinaUnix博客

coolber

首页　| 　博文目录　| 　关于我

coolcole

博客访问： 1044705
博文数量： 361
博客积分： 25
博客等级：民兵
技术积分： 1759
用户组：普通用户
注册时间： 2012-09-22 23:18

个人简介

学海无涯个人blog lnmps.com 新站

文章分类

全部博文（361）

DNS（2）
cache（1）
测试（4）
架构（2）
python（3）
security（6）
Kernel（35）
Route（1）
Monitor（42）
Bsd（2）
Linux（79）
SHELL（64）
中间件（45）
SQL（33）
未分配的博文（42）

文章存档

2017年（1）

2015年（2）

2014年（55）

2013年（303）

我的朋友

相关博文

一个用于监控Dell PowerEdge服务器硬件状态的nagios/icinga插件

分类： LINUX

2014-05-04 10:45:11

手头有几台dell服务器，分别是PE2850和PE R710，想把硬件状态监控加入icinga中，但是网上提供的大多是依赖dell openmanager的snmp服务，用起来有些不对劲，自己对snmp所知较少，尤其是那些OID，一大串数字，不知道具体代表什么。

前几天发现openmanager自带的命令omreport可以直接执行，于是写了这个脚本，很简单，分别检查chassis（基础构件，包括主板，电源）和storage（存储）

1. 脚本

vim /usr/local/nagios/libexec/check_dell_omreport

#!/bin/bash

# Program : check_dell_omreport
# Version : 1.0
# Date : Jul 28 2012
# Author : huky -
# Summary : a simple nagios/icinga plugin that checks the status of chassis &
# storage on Dell PowerEdge servers with omreport in Dell Openmanager
# Licence : GPL - summary below, full text at

#这里指定openmanager安装路径，默认是/opt/dell/srvadmin
DELL_SRV_DIR=/opt/dell/srvadmin
PATH=$PATH:$DELL_SRV_DIR/oma/bin:$DELL_SRV_DIR/bin:$DELL_SRV_DIR/sbin
#OMREPORT=`find $DELL_SRV_DIR -name omreport 2> /dev/null`
STOR_CTRL=/tmp/dell.storage.ctr
LOG_FILE=/tmp/dell_omreport.log

STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKOWN=3

if [ ! -d $DELL_SRV_DIR ]; then
echo "Please install OpenManger and define the PATH after DELL_SRV_DIR" && exit $STATE_UNKOWN
fi

/etc/init.d/dataeng status >> /dev/null
if [ ! $? -eq 0 ]; then
echo "Please start the service dataeng" && exit $STATE_UNKOWN
fi

#check chassis
omreport chassis | grep ^[^Ok] | grep ":" | sed '/COMPONENT/d' > $LOG_FILE

#check storage
omreport storage controller | grep "^ID" | cut -d":" -f2 > $STOR_CTRL
if [ ! -s $STOR_CTRL ]; then
echo "Have you installed the package for storage?" >> $LOG_FILE
fi

for CONTR_ID in `cat $STOR_CTRL`
do
omreport storage controller controller=$CONTR_ID | grep -2 ^Status | sed '/--/d' | awk '{if (NR%5==0){print $0} else {printf"%s ",$0}}' | grep -v Ok | tr -s " *" " " >> $LOG_FILE
done

if [ -s $LOG_FILE ]; then
        paste -s $LOG_FILE > $LOG_FILE.2
        if [ `grep -c "Critical" $LOG_FILE` -eq `grep -c "\-Critical" $LOG_FILE` ]; then
                echo `cat $LOG_FILE.2` && exit $STATE_WARNING
        else
                echo `cat $LOG_FILE.2` && exit $STATE_CRITICAL
        fi
else
        echo "Machine is Health" && exit $STATUS_OK
fi

2. 安装

2.1 把脚本放在受控端相应位置（默认是这里：/usr/local/nagios/libexec/check_dell_omreport）

2.2 然后在受控端修改nrpe服务的配置文件

vim /usr/local/nagios/etc/nrpe.cfg

增加一行

command[check_omreport]=/usr/local/nagios/libexec/check_dell_omreport

3. 监控

主控端修改相应的监控配置，我是把这几个服务放在一个服务组里面，如下：

define service {

use generic-service

host_name 主机名1,主机名3,主机名3,主机名4,主机名5

service_description Dell_OM

check_command check_nrpe_1arg!check_omreport

}

define servicegroup{
    servicegroup_name Hardware_Status
    alias 硬件状态
    members 主机名1,Dell_OM,主机名2,Dell_OM,主机名3,Dell_OM,主机名4,Dell_OM,主机名5,Dell_OM

}

4. 测试

# /usr/local/icinga/libexec/check_nrpe -H 192.168.10.121 -c check_omreport

Controllers ID : 0 Status : Non-Critical Name : PERC H700 Integrated Slot ID : Embedded Physical Disks ID : 0:0:0 Status : Non-Critical Name : Physical Disk 0:0:0 State : Online ID : 0:0:1 Status : Non-Critical Name : Physical Disk 0:0:1 State : Online ID : 0:0:2 Status : Non-Critical Name : Physical Disk 0:0:2 State : Online

5. 启用

重启服务后，在服务组里面可以看到相关的信息

一个用于监控Dell PowerEdge服务器硬件状态的nagios/icinga插件 - 胡子 - 胡子的博客

警告级别的：

阵列卡和磁盘

严重级别的：

电源

阅读(1498) | 评论(0) | 转发(0) |

上一篇：数组和变量的一些概念

下一篇：系统吞吐量（TPS）、用户并发量、性能测试概念和公式

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6