Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1728397
  • 博文数量: 163
  • 博客积分: 10591
  • 博客等级: 上将
  • 技术积分: 1980
  • 用 户 组: 普通用户
  • 注册时间: 2006-08-08 18:17
文章分类

全部博文(163)

文章存档

2018年(1)

2012年(1)

2011年(47)

2010年(58)

2009年(21)

2008年(35)

分类: LINUX

2010-08-02 13:56:08

最近突然发现Nagios平台上,很多SUSE10-SP1-X86系统都产生CPU使用率%100的告警。我们的插件是通过snmp来获取系统的ssCpuIdle值,这个值表示系统cpu空闲率,100减去这个值即为使用率。SUSE10-SP1-X86默认的net-snmp版本为net-snmp-5.3.0.1-25.25。
 
通过如下命令
#snmpwalk -v 2c -c public HOST-IP .1.3.6.1.4.1.2021.11
UCD-SNMP-MIB::ssIndex.0 = INTEGER: 1
UCD-SNMP-MIB::ssErrorName.0 = STRING: systemStats
UCD-SNMP-MIB::ssSwapIn.0 = INTEGER: 0
UCD-SNMP-MIB::ssSwapOut.0 = INTEGER: 0
UCD-SNMP-MIB::ssIOSent.0 = INTEGER: 0
UCD-SNMP-MIB::ssIOReceive.0 = INTEGER: 0
UCD-SNMP-MIB::ssSysInterrupts.0 = INTEGER: 1
UCD-SNMP-MIB::ssSysContext.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuUser.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuSystem.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuRawUser.0 = Counter32: 19587258
UCD-SNMP-MIB::ssCpuRawNice.0 = Counter32: 224742
UCD-SNMP-MIB::ssCpuRawSystem.0 = Counter32: 10389181
UCD-SNMP-MIB::ssCpuRawIdle.0 = Counter32: 2123699120
UCD-SNMP-MIB::ssCpuRawWait.0 = Counter32: 9040737
UCD-SNMP-MIB::ssCpuRawKernel.0 = Counter32: 10051463
UCD-SNMP-MIB::ssCpuRawInterrupt.0 = Counter32: 191225
UCD-SNMP-MIB::ssIORawSent.0 = Counter32: 944905154
UCD-SNMP-MIB::ssIORawReceived.0 = Counter32: 9114420
UCD-SNMP-MIB::ssRawInterrupts.0 = Counter32: 3164689293
UCD-SNMP-MIB::ssRawContexts.0 = Counter32: 2500739620
UCD-SNMP-MIB::ssCpuRawSoftIRQ.0 = Counter32: 146493
UCD-SNMP-MIB::ssRawSwapIn.0 = Counter32: 0
UCD-SNMP-MIB::ssRawSwapOut.0 = Counter32: 0
 
发现ssCpuIdle的值为0。这样的话100-0=100,系统cpu使用率为%100。可是登录到系统上用top观察,并非如此。原因何在?
 
猜想是snmp进程的缘故,进行了测试,先重启snmpd进程,然后还是上面的命令获取oid值,
UCD-SNMP-MIB::ssIndex.0 = INTEGER: 1
UCD-SNMP-MIB::ssErrorName.0 = STRING: systemStats
UCD-SNMP-MIB::ssSwapIn.0 = INTEGER: 0
UCD-SNMP-MIB::ssSwapOut.0 = INTEGER: 0
UCD-SNMP-MIB::ssIOSent.0 = INTEGER: 0
UCD-SNMP-MIB::ssIOReceive.0 = INTEGER: 2
UCD-SNMP-MIB::ssSysInterrupts.0 = INTEGER: 1
UCD-SNMP-MIB::ssSysContext.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuUser.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuSystem.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 98
UCD-SNMP-MIB::ssCpuRawUser.0 = Counter32: 19575866
UCD-SNMP-MIB::ssCpuRawNice.0 = Counter32: 224742
UCD-SNMP-MIB::ssCpuRawSystem.0 = Counter32: 10383126
UCD-SNMP-MIB::ssCpuRawIdle.0 = Counter32: 2122567657
UCD-SNMP-MIB::ssCpuRawWait.0 = Counter32: 9033460
UCD-SNMP-MIB::ssCpuRawKernel.0 = Counter32: 10045729
UCD-SNMP-MIB::ssCpuRawInterrupt.0 = Counter32: 191117
UCD-SNMP-MIB::ssIORawSent.0 = Counter32: 944169874
UCD-SNMP-MIB::ssIORawReceived.0 = Counter32: 9114412
UCD-SNMP-MIB::ssRawInterrupts.0 = Counter32: 3162422280
UCD-SNMP-MIB::ssRawContexts.0 = Counter32: 2494482369
UCD-SNMP-MIB::ssCpuRawSoftIRQ.0 = Counter32: 146280
UCD-SNMP-MIB::ssRawSwapIn.0 = Counter32: 0
UCD-SNMP-MIB::ssRawSwapOut.0 = Counter32: 0
发现重启后,可以正常获取值(并不再是0),可是大概一分钟以后,故障重现,又成为0。怎么解决呢?从网上找了如下的解决方案,还没有实施过,记录下来,方便查阅吧。。。
 
 

# while `sleep 5`; do /usr/local/yujing/snmpwalk -v2c -c cstring ip.ip.ip.ip ssCpuIdle; done

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78 ## 过了一会后值就变为0了UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78 ## snmpd restart后取到数据了,但是值不准确

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 78 ## 过了一会后值就变为0了UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0


解决方案:
(solution one):使用ssCpuRaw 替代 ssCpuIdle

(other solution):下载如下2个包编译安装(download those packeg)1.beecrypt-4.1.2.tar.gz

2.libelf-0.8.9.tar.gz

 

执行如下脚本,executive this script#!/bin/sh
tar zxvf beecrypt-4.1.2.tar.gz
cd beecrypt-4.1.2
./configure --prefix=/usr/local/beecrypt;make;make install
cd ..
tar zxvf libelf-0.8.9.tar.gz
cd libelf-0.8.9
./configure --prefix=/usr/local/libelf;make;make install
cd ..

systemver=`uname -m`
if [ "${systemver}" = "x86_64" ]; then
ln -s /usr/local/beecrypt/lib64/libbeecrypt.la /usr/lib64/libbeecrypt.la
echo "/usr/local/beecrypt/lib64" >> /etc/ld.so.conf
export CFLAGS="-I/usr/local/beecrypt/include/beecrypt
                 -I/usr/local/libelf/include
                 -L/usr/local/beecrypt/lib64
                 -L/usr/local/libelf/lib"

else
ln -s /usr/local/beecrypt/lib/libbeecrypt.la /usr/lib/libbeecrypt.la
echo "/usr/local/beecrypt/lib" >> /etc/ld.so.conf
export CFLAGS="-I/usr/local/beecrypt/include/beecrypt
                 -I/usr/local/libelf/include
                 -L/usr/local/beecrypt/lib
                 -L/usr/local/libelf/lib"

fi

echo "/usr/local/libelf/lib" >> /etc/ld.so.conf
ldconfig -v
ldconfig

重启snmpd即可。restart snmpd is ok

如果还不行,直接升级net-snmp ,If still does not work directly to upgrade net-snmp

 

tar zxvf net-snmp-5.5.tar.gz
cd net-snmp-5.5
./configure --prefix=/usr/local/net-snmp
      --with-default-snmp-version=2
      --with-sys-contact="root@"
      --with-sys-location="Unknown"
      --with-logfile=/var/log/snmpd.log
      --with-persistent-directory=/var/net-snmp
make;make install

cd ..
rm -rf beecrypt-4.1.2 libelf-0.8.9 net-snmp-5.4.2.1

然后重启snmpd即可 ,Then you can restart snmpd


PS:

在http://net-snmp.sourceforge.net/docs/mibs/ucdavis.html#ssCpuIdle 说明中,ssCpuidle已经被废弃,不赞成使用。
This object has been deprecated in favour of 'ssCpuRawIdle(53)', which can be used to calculate the same metric, but over any desired time period.


阅读(2815) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~