最近突然发现Nagios平台上,很多SUSE10-SP1-X86系统都产生CPU使用率%100的告警。我们的插件是通过snmp来获取系统的ssCpuIdle值,这个值表示系统cpu空闲率,100减去这个值即为使用率。SUSE10-SP1-X86默认的net-snmp版本为net-snmp-5.3.0.1-25.25。
通过如下命令
#snmpwalk -v 2c -c public HOST-IP .1.3.6.1.4.1.2021.11
UCD-SNMP-MIB::ssIndex.0 = INTEGER: 1
UCD-SNMP-MIB::ssErrorName.0 = STRING: systemStats
UCD-SNMP-MIB::ssSwapIn.0 = INTEGER: 0
UCD-SNMP-MIB::ssSwapOut.0 = INTEGER: 0
UCD-SNMP-MIB::ssIOSent.0 = INTEGER: 0
UCD-SNMP-MIB::ssIOReceive.0 = INTEGER: 0
UCD-SNMP-MIB::ssSysInterrupts.0 = INTEGER: 1
UCD-SNMP-MIB::ssSysContext.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuUser.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuSystem.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuRawUser.0 = Counter32: 19587258
UCD-SNMP-MIB::ssCpuRawNice.0 = Counter32: 224742
UCD-SNMP-MIB::ssCpuRawSystem.0 = Counter32: 10389181
UCD-SNMP-MIB::ssCpuRawIdle.0 = Counter32: 2123699120
UCD-SNMP-MIB::ssCpuRawWait.0 = Counter32: 9040737
UCD-SNMP-MIB::ssCpuRawKernel.0 = Counter32: 10051463
UCD-SNMP-MIB::ssCpuRawInterrupt.0 = Counter32: 191225
UCD-SNMP-MIB::ssIORawSent.0 = Counter32: 944905154
UCD-SNMP-MIB::ssIORawReceived.0 = Counter32: 9114420
UCD-SNMP-MIB::ssRawInterrupts.0 = Counter32: 3164689293
UCD-SNMP-MIB::ssRawContexts.0 = Counter32: 2500739620
UCD-SNMP-MIB::ssCpuRawSoftIRQ.0 = Counter32: 146493
UCD-SNMP-MIB::ssRawSwapIn.0 = Counter32: 0
UCD-SNMP-MIB::ssRawSwapOut.0 = Counter32: 0
发现ssCpuIdle的值为0。这样的话100-0=100,系统cpu使用率为%100。可是登录到系统上用top观察,并非如此。原因何在?
猜想是snmp进程的缘故,进行了测试,先重启snmpd进程,然后还是上面的命令获取oid值,
UCD-SNMP-MIB::ssIndex.0 = INTEGER: 1
UCD-SNMP-MIB::ssErrorName.0 = STRING: systemStats
UCD-SNMP-MIB::ssSwapIn.0 = INTEGER: 0
UCD-SNMP-MIB::ssSwapOut.0 = INTEGER: 0
UCD-SNMP-MIB::ssIOSent.0 = INTEGER: 0
UCD-SNMP-MIB::ssIOReceive.0 = INTEGER: 2
UCD-SNMP-MIB::ssSysInterrupts.0 = INTEGER: 1
UCD-SNMP-MIB::ssSysContext.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuUser.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuSystem.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 98
UCD-SNMP-MIB::ssCpuRawUser.0 = Counter32: 19575866
UCD-SNMP-MIB::ssCpuRawNice.0 = Counter32: 224742
UCD-SNMP-MIB::ssCpuRawSystem.0 = Counter32: 10383126
UCD-SNMP-MIB::ssCpuRawIdle.0 = Counter32: 2122567657
UCD-SNMP-MIB::ssCpuRawWait.0 = Counter32: 9033460
UCD-SNMP-MIB::ssCpuRawKernel.0 = Counter32: 10045729
UCD-SNMP-MIB::ssCpuRawInterrupt.0 = Counter32: 191117
UCD-SNMP-MIB::ssIORawSent.0 = Counter32: 944169874
UCD-SNMP-MIB::ssIORawReceived.0 = Counter32: 9114412
UCD-SNMP-MIB::ssRawInterrupts.0 = Counter32: 3162422280
UCD-SNMP-MIB::ssRawContexts.0 = Counter32: 2494482369
UCD-SNMP-MIB::ssCpuRawSoftIRQ.0 = Counter32: 146280
UCD-SNMP-MIB::ssRawSwapIn.0 = Counter32: 0
UCD-SNMP-MIB::ssRawSwapOut.0 = Counter32: 0
发现重启后,可以正常获取值(并不再是0),可是大概一分钟以后,故障重现,又成为0。怎么解决呢?从网上找了如下的解决方案,还没有实施过,记录下来,方便查阅吧。。。
# while `sleep 5`; do /usr/local/yujing/snmpwalk -v2c -c cstring ip.ip.ip.ip ssCpuIdle; done
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78 ## 过了一会后值就变为0了UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78 ## snmpd restart后取到数据了,但是值不准确
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78 ## 过了一会后值就变为0了UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0
解决方案: 一(solution one):使用ssCpuRaw 替代 ssCpuIdle
二(other solution):下载如下2个包编译安装(download those packeg)1.beecrypt-4.1.2.tar.gz
2.libelf-0.8.9.tar.gz
执行如下脚本,executive this script#!/bin/sh tar zxvf beecrypt-4.1.2.tar.gz cd beecrypt-4.1.2 ./configure --prefix=/usr/local/beecrypt;make;make install cd .. tar zxvf libelf-0.8.9.tar.gz cd libelf-0.8.9 ./configure --prefix=/usr/local/libelf;make;make install cd ..
systemver=`uname -m` if [ "${systemver}" = "x86_64" ]; then ln -s /usr/local/beecrypt/lib64/libbeecrypt.la /usr/lib64/libbeecrypt.la echo "/usr/local/beecrypt/lib64" >> /etc/ld.so.conf export CFLAGS="-I/usr/local/beecrypt/include/beecrypt -I/usr/local/libelf/include -L/usr/local/beecrypt/lib64 -L/usr/local/libelf/lib" else ln -s /usr/local/beecrypt/lib/libbeecrypt.la /usr/lib/libbeecrypt.la echo "/usr/local/beecrypt/lib" >> /etc/ld.so.conf export CFLAGS="-I/usr/local/beecrypt/include/beecrypt -I/usr/local/libelf/include -L/usr/local/beecrypt/lib -L/usr/local/libelf/lib" fi
echo "/usr/local/libelf/lib" >> /etc/ld.so.conf ldconfig -v ldconfig
重启snmpd即可。restart snmpd is ok
如果还不行,直接升级net-snmp ,If still does not work directly to upgrade net-snmp
tar zxvf net-snmp-5.5.tar.gz cd net-snmp-5.5 ./configure --prefix=/usr/local/net-snmp --with-default-snmp-version=2 --with-sys-contact="root@" --with-sys-location="Unknown" --with-logfile=/var/log/snmpd.log --with-persistent-directory=/var/net-snmp make;make install
cd .. rm -rf beecrypt-4.1.2 libelf-0.8.9 net-snmp-5.4.2.1
然后重启snmpd即可 ,Then you can restart snmpd
PS:
在http://net-snmp.sourceforge.net/docs/mibs/ucdavis.html#ssCpuIdle 说明中,ssCpuidle已经被废弃,不赞成使用。 This object has been deprecated in favour of 'ssCpuRawIdle(53)', which can be used to calculate the same metric, but over any desired time period.
|
阅读(2835) | 评论(0) | 转发(0) |