Nagios通过check_megaraid_sas(基于MecaCli工具的插件)对RAID卡和硬盘进行监控的方法
对于使用了LSI MegaRAID卡搭建RAID的, 通过LSI公司提供的MegaCli工具, 就可以实现对RAID卡和硬盘的监控.
注: DELL PERC5/6(PowerEdge RAID ControllerPERC)阵列卡实际上也就是LSI MegaRAID SAS controllers.
最新MegaCli工具包下载地址: %3D%22AQ1NaXNjZWxsYW5lb3VzCWFzc2V0dHlwZQEBXgEk%22%20productfacet%3D%22AQxNZWdhUkFJRCBTQVMMcHJvZHVjdGZhY2V0AQJeIgIiJA%3D%3D%22
1. 安装前提
1) 查看服务器类型
(新版本dmidecode使用)
# dmidecode -s system-product-name
PowerEdge R710
(低版本dmidecode使用)
# dmidecode | grep "Product Name"
Product Name: PowerEdge R710
Product Name: 0N4YV2
Lenovo WQ R520 G7
2) 确认是否使用MegaRAID卡
...Dell PowerEdge R710 显示如下
# dmesg | grep RAID
scsi0 : LSI SAS based MegaRAID driver
md: Autodetecting RAID arrays.
...Lenovo WQ R520 G7 显示如下
# dmesg | grep RAID
scsi0 : LSI SAS based MegaRAID driver
Vendor: LSI Model: MegaRAID 8708ELP Rev: 1.20
md: Autodetecting RAID arrays.
3) 确认是否已安装
# rpm -qa | egrep 'Lib_Utils|MegaCli'
2.安装MegaCli
建议下载安装使用最新的MegaCli, 这样就支持更多的SAS硬盘类型的监控.<文档开头所提供的链接>
# mkdir /usr/local/src/megacli
# cd /usr/local/src/megacli
# unzip 8.02.16_MegaCLI.zip (解压MegaCli软件包/这个包当中含有各系统的版本)
[root@localhost megacli-8.02.16]# ll
total 12056
-rw-rw-rw- 1 root root 23852 Aug 12 10:13 8.02.16_MegaCLI.txt
-rw-r--r-- 1 root root 12244704 Aug 12 22:47 8.02.16_MegaCLI.zip
drwxr-xr-x 2 root root 4096 Sep 13 15:19 DOS
drwxr-xr-x 2 root root 4096 Sep 13 15:19 FREEBSD
drwxr-xr-x 2 root root 4096 Sep 13 15:22
drwxr-xr-x 2 root root 4096 Sep 13 15:19 SOLARIS
drwxr-xr-x 2 root root 4096 Sep 13 15:19 VMWARE
drwxr-xr-x 2 root root 4096 Sep 13 15:19 WINDOWS
#cd LINUX
# unzip MegaCliLin.zip (进一步解压MegaCliLin软件包)
其中MegaCli-8.01.06-1.i386.rpm包是我们需要的(32bit或64bit系统都使用该包), 如果操作系统缺失了MegaCli相关的依赖包, 那么就需要先安装Lib_Utils-1.00-08.noarch.rpm了:
# rpm -ivh Lib_Utils-1.00-09.noarch.rpm
# rpm -Uvh MegaCli-8.02.16-1.i386.rpm
# rpm -ql MegaCli (确认MegaCli包的安装文件信息)
/opt/MegaRAID/MegaCli/MegaCli
/opt/MegaRAID/MegaCli/MegaCli64
如果是32bit系统, 就使用MegaCli; 如果是64bit系统就是使用MegaCli64.
# /opt/MegaRAID/MegaCli/MegaCli (该命令直接执行会提示如下错误)
or
# /opt/MegaRAID/MegaCli/MegaCli64 (该命令直接执行会提示如下错误)
Fatal error - Command Tool invoked with wrong parameters
Exit Code: 0x01
3. 测试MegaCli
# arch (确定操作系统架构)
i686
原文件有大小写和数字, 且路径太长, 建议做个软连接到/usr/bin目录:
# ln -sf /opt/MegaRAID/MegaCli/MegaCli /usr/bin/megacli (32bit系统)
or
# ln -sf /opt/MegaRAID/MegaCli/MegaCli64 /usr/bin/megacli (64bit系统)
现在就可以直接执行软连接后的文件了:
# megacli -help (查看命令帮助)
# megacli -adpCount (查看适配器个数)
#megacli -LdGetNum -aALL (查看逻辑盘个数)
[root@localhost MegaCli]# megacli -LdInfo -LALL -aAll
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name :
RAID Level : Primary-1, Secondary-3, RAID Level Qualifier-0
Size : 744.0 GB
Mirror Data : 744.0 GB
State : Optimal
Strip Size : 64 KB
Number Of Drives per span:2 //表示每2个物理盘做成一个RAID1盘组
Span Depth : 2 //表示共2个RAID1盘组做成了RAID10
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None
Is VD Cached: No
Exit Code: 0x00
# megacli -PdList -aAll| more (显示所有的物理盘信息
Adapter #0
Enclosure Device ID: 32
Slot Number: 0
Drive's postion: DiskGroup: 0, Span: 0, Arm: 0
Enclosure position: 0
Device Id: 0
WWN: 5000C5000AE5CFB8
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 372.528 GB [0x2e90edd0 Sectors]
Non Coerced Size: 372.028 GB [0x2e80edd0 Sectors]
Coerced Size: 372.0 GB [0x2e800000 Sectors]
Firmware state: Online, Spun Up
Device Firmware Level: NS27
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x5000c5000ae5cfb9
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: SEAGATE ST3400755SS NS273RJ1J73Q
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: Unknown
Link Speed: Unknown
Media Type: Hard Disk Device
Drive Temperature :28C (82.40 F)
# megacli -cfgdsply -aALL | more (显示Raid卡型号,Raid设置,Disk相关信息)
# megacli -FwTermLog -Dsply -aALL | more (查看Raid卡日志)
# megacli -AdpAllInfo -aALL | more (查看Raid卡功能详细说明)
4. 安装check_megaraid_sas
就是一个通过MegaCli命令来获取监控信息的Nagios插件, 使用perl编写的.
下载地址: http://www.techno-obscura.com/~delgado/code/check_megaraid_sas
# vi check_megaraid_sas
-------------------------------------------------------------------------
# 35行修改如下
use lib qw(/usr/local/nagios/libexec); # possible pathes to your Nagios plugins and utils.pm
# 52-53行修改如下
my $megaclibin = '/usr/bin/megacli'; # the full path to your MegaCli binary
my $megacli = "$megaclibin"; # how we actually call MegaCli
-------------------------------------------------------------------------
# cp check_megaraid_sas /usr/local/nagios/libexec/check_megaraid_sas
# chmod 755 /usr/local/nagios/libexec/check_megaraid_sas
# /usr/local/nagios/libexec/check_megaraid_sas -h (查看使用帮助)
Usage: /usr/local/nagios/libexec/check_megaraid_sas [-s number] [-m number] [-o number]
-s is how many hotspares are attached to the controller
-m is the number of media errors to ignore
-p is the predictive error count to ignore
-o is the number of other disk errors to ignore
[root@localhost libexec]# ./check_megaraid_sas
OK: 0:0:RAID-10:4 drives:744.0GB:Optimal Drives:4
如果报告有错误信息, 那么通过如下命令获得哪些物理盘有错误:
# megacli -PdList -aAll| egrep "Slot Number|Error Count|Failure Count"
输出信息格式说明:
::::: Drives:
check_megaraid_sas 插件: MegaCli工具包可官网下载
MegaCli常用参数介绍
megacli -adpCount 【显示适配器个数】
megacli -AdpGetTime –aALL 【显示适配器时间】
megacli -AdpAllInfo -aAll 【显示所有适配器信息】
megacli -LDInfo -LALL -aAll 【显示所有逻辑磁盘组信息】
megacli -PDList -aAll 【显示所有的物理信息】
megacli -AdpBbuCmd -GetBbuStatus -aALL |grep ‘Charger Status’ 【查看充电状态】
megacli -AdpBbuCmd -GetBbuStatus -aALL【显示BBU状态信息】
megacli -AdpBbuCmd -GetBbuCapacityInfo -aALL【显示BBU容量信息】
megacli -AdpBbuCmd -GetBbuDesignInfo -aALL 【显示BBU设计参数】
megacli -AdpBbuCmd -GetBbuProperties -aALL 【显示当前BBU属性】
megacli -cfgdsply -aALL 【显示Raid卡型号,Raid设置,Disk相关信息】
磁带状态的变化,从拔盘,到插盘的过程中。
Device |Normal|Damage|Rebuild|Normal
Virtual Drive |Optimal|Degraded|Degraded|Optimal
Physical Drive |Online|Failed –> Unconfigured|Rebuild|Online