07/08/2009 09:15:03 AM - SPINE: Poller[0] Host[7] DS[67] WARNING: SNMP timeout detected [500 ms], ignoring host '10.249.86.146'
07/08/2009 09:15:03 AM - SPINE: Poller[0] Host[7] DS[68] WARNING: SNMP timeout detected [500 ms], ignoring host '10.249.86.146'
07/08/2009 09:15:03 AM - SPINE: Poller[0] Host[10] DS[93] WARNING: SNMP timeout detected [500 ms], ignoring host '10.255.147.80'
07/08/2009 09:15:03 AM - SPINE: Poller[0] Host[10] DS[94] WARNING: SNMP timeout detected [500 ms], ignoring host '10.255.147.80'
07/08/2009 09:15:03 AM - SPINE: Poller[0] Host[10] DS[95] WARNING: SNMP timeout detected [500 ms], ignoring host '10.255.147.80'
解决办法:
CACTID: Host[...] DS[....] WARNING: SNMP timeout detected [500 ms], ignoring host '........'
For "reasonable" timeouts, this may be related to a snmpbulkwalk issue. To change this, see Settings, Poller and lower the value for The Maximum SNMP OID's Per SNMP Get Request. Start at a value of 10 and increase it again, if the poller starts working. Some agent's don't have the horsepower to deliver that many OID's at a time. Therefore, we can reduce the number for those older/underpowered devices.
增加了The Maximum SNMP OID's Per SNMP Get Request 的值,默认为10,我增加到了30(可以适当调大),其中的一台服务器正常了,但是,还有其他的服务器仍然不出图。
对于仍然不出图的服务器,在console-management-devices里选择不出图的服务器,修改里面的Maximum OID's Per Get Request选项,增加到30(在此之前这里还是默认的10.)
最后,我更改了setting里的Maximum Threads per Process 和 Maximum Concurrent Poller Processes为5,默认为1.
使用0.8.7c没有问题,升级倒出问题了~~
另:在spine0.8.7e的changelog里发现:
bug: If host has MAX OID's set to 0, timeouts occur
补充:这两天发现,被监控的服务器重启或停止后,cacti的监控就会报这个错误,手动修改一下出问题的服务器的监控选项里的The Maximum SNMP OID's Per SNMP Get Request ,后问题就消失了,不论是改大或是改小,似乎是spine 0.8.7e的BUG,但又不想换回低版本的spine,忍着吧,希望下个版本可以解决这个问题。