插件下载:
一、被监控端
-
cp scripts/iostat-persist.pl /usr/local/bin/
-
chmod +x /usr/local/bin/iostat-persist.pl
添加计划任务:
-
*/2 * * * * cd /tmp && iostat -xkd 30 2 | sed 's/,/\./g' > io.tmp && mv io.tmp iostat.cache
注:我试图不用添加计划任务,功能可以实现,但是数据断断续续。详细见后面,。
添加oid
pass_persist .1.3.6.1.3.1 /usr/bin/perl /usr/local/bin/iostat-persist.pl
iostat-persist.pl和iostat.pl的区别:
There is also a much improved persistent script which involves a lot less forking,
and also a caching mechanism. If you would like to use this version (recommended)
二、cacti服务器
1、测试
-
snmpwalk -v 2c 59.51.24.18 -c public .1.3.6.1.3.1.13
关于ubuntu,可以取到数据,但是不是利用率%util而是前面的w_await,是因为iostat的返回多了两列r_await和w_await,所以需要修改iostat-persist.pl,详细的修改见文章后面。
SNMPv2-SMI::experimental.1.13 = No Such Instance currently exists at this OID
如果snmp中已经添加了上面的pass_persist,则有可能是selinux造成的,查看/var/log/audit/audit.log,有下面的日志(对于RHEL,注意查看/var/log/message)
type=AVC msg=audit(1316500461.989:3076): avc: denied { read } for pid=17209 comm="perl" name="iostat.cache" dev=dm-0 ino=1177351 scontext=user_u:system_r:snmpd_t:s0 tcontext=root:object_r:tmp_t:s0 tclass=file
看不到该日志?
-
Turn on all AVC Messages for which
-
-
SELinux currently is "dontaudit"ing.
-
-
semodule -DB
-
-
Turn "dontaudit" rules back on.
-
-
semodule -B
可以看到权限确实有问题。
解决办法一:不要放到/tmp目录
解决办法二:
将上面的日志保存到一个文件中,如1.log,执行
-
audit2allow -m local -l -i ./1.log >local.te
audit2allow:
rhel5:policycoreutils
rhel6:policycoreutils-python.x86_64
local.te的内容类似于
module local 1.0;
require {
type snmpd_t;
type tmp_t;
class file read ;
}
#============= snmpd_t =============
allow snmpd_t tmp_t:file read ;
继续:
-
checkmodule -M -m -o local.mod local.te
-
semodule_package -o local.pp -m local.mod
-
semodule -i local.pp
成功后,重新执行snmpwalk -v 2c 59.51.24.18 -c public .1.3.6.1.3.1.13,发现又有新的报错,
type=AVC msg=audit(1316500461.989:3076): avc: denied { read } for pid=17209 comm="perl" name="/tmp/iostat.cache" dev=dm-0 ino=1177351 scontext=user_u:system_r:snmpd_t:s0 tcontext=root:object_r:tmp_t:s0 tclass=file
type=AVC msg=audit(1316501743.698:109): avc: denied { ioctl } for pid=3294 comm="perl" path="/tmp/iostat.cache" dev=dm-0 ino=1177349 scontext=user_u:system_r:snmpd_t:s0 tcontext=root:object_r:tmp_t:s0 tclass=file
type=AVC msg=audit(1316501743.698:110): avc: denied { getattr } for pid=3294 comm="perl" path="/tmp/iostat.cache" dev=dm-0 ino=1177349 scontext=user_u:system_r:snmpd_t:s0 tcontext=root:object_r:tmp_t:s0 tclass=file
重复上面的步骤,这次把新的两条和上面的一条错误一起放到log文件里。
-
audit2allow -m local -l -i ./1.log >local.te
local.te内容为:
module local 1.0;
require {
type snmpd_t;
type tmp_t;
class file { read ioctl getattr };
}
#============= snmpd_t =============
allow snmpd_t tmp_t:file { read ioctl getattr };
-
checkmodule -M -m -o local.mod local.te && semodule_package -o local.pp -m local.mod && semodule -i local.pp
执行snmpwalk,这次成功取到了数据
snmpwalk -v 2c 1.1.1.1 -c public .1.3.6.1.3.1.13
SNMPv2-SMI::experimental.1.13.1 = STRING: "0.11"
SNMPv2-SMI::experimental.1.13.2 = STRING: "0.00"
SNMPv2-SMI::experimental.1.13.3 = STRING: "0.11"
SNMPv2-SMI::experimental.1.13.4 = STRING: "0.11"
SNMPv2-SMI::experimental.1.13.5 = STRING: "0.00"
Ubuntu取不到数据
iso.3.6.1.3.1 = No Such Instance currently exists at this OID
可能原因1
/etc/snmp/snmpd.conf
pass_persist .1.3.6.1.3.1 /usr/local/bin/iostat-persist.pl
OID和脚本中间不要额外写名字
可能原因2
发现snmp的pass_persist脚本没有起来
https://community.opmantek.com/display/NMIS/Extending+SNMPd+for+custom+monitoring
pass_persist的脚本一直运行,如果ps -ef是可以看到该脚本一直在运行着。及时不是随snmp启动,至少第一次请求后将一直存在。
It's also very efficient because the pass_persist program
is running permanently and there is no repeated startup overhead, and
the program can do whatever it needs to do, whenever and however it
wants to.
Debug:
/usr/sbin/snmpd -D -f -u snmp -g snmp >./11111111 2>&1
查看日志
加载了pass_persist模块
snmpd_register_app_config_handler: registering .conf token for "pass_persist"
trace: internal_register_config_handler(): read_config.c, 217:
9:read_config:register_handler: registering snmpd pass_persist
启动snmpd不会立刻启动iostat脚本,通过snmpwalk请求一遍,可以看到log
ucd-snmp/pass_persist: open_persist_pipe(1,'/usr/bin/perl /usr/local/bin/iostat-persist.pl') recurse=0
trace: open_persist_pipe(): ucd-snmp/pass_persist.c, 543:
ucd-snmp/pass_persist: open_persist_pipe: opened the pipes
trace: open_persist_pipe(): ucd-snmp/pass_persist.c, 579:
ucd-snmp/pass_persist: open_persist_pipe: Got perl: warning: Setting locale failed.
instead of PONG!
# /usr/local/bin/iostat-persist.pl
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LC_CTYPE = "UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
所以问题还是perl的LC_ALL问题,为最小化减小对现有系统影响,将该环境变量添加到/etc/ini.d/snmpd,重启snmpd,OK
2、
导入cacti的模板
-
cp /opt/markround-Cacti-iostat-templates-7394c7b/snmp_queries/linux/iostat.xml /usr/local/nginx/html/cacti/resource/snmp_queries/
-
-
chown cacti.cacti/usr/local/nginx/html/cacti/resource/snmp_queries/iostat.xml
我只想监控磁盘利用率,所以我只导入了以下几个模板
Import Templates菜单下导入:
cacti_data_query_iostat_-_utilisation.xml
cacti_graph_template_iostat_-_utilisation.xml
注:导如后,建议修改一下
原始模板是在Current:后面换行,所以造成监控图中看不到current的数据,更改为在max后换行
graph templates-> iostat - Utilisation ->
将Item # 5 中的Insert Hard Return的对勾去掉。
将Item # 4 中的Insert Hard Return的对勾勾选上。
画图
Associated Data Queries下添加即可
关于时无法获取信息:
-
+ Running data query [12].
-
+ Found type = '3' [SNMP Query].
-
+ Found data query XML file at '/usr/local/nginx/html/cacti/resource/snmp_queries/iostat.xm'
-
+ XML file parsed ok.
-
+ Invalid field <index_order>ioDescr:ioName:ioIndex</index_order>
-
+ Must contain <direction>input</direction> fields only
解决办法:
+ Invalid field <index_order>ioDescr:ioName:ioIndex</index_order>
更改为
+ Invalid field <index_order>ioDescr:ioIndex</index_order>
即去掉ioName:
关于ubuntu,可以取到数据,但是不是利用率%util而是前面的w_await,是因为iostat的返回多了两列r_await和w_await,所以需要修改iostat-persist.pl,详细的修改见文章后面。
if ($ostype eq 'linux') {
/^([a-z0-9\-\/]+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[
\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)/;
-
-
$stats{"$base_oid.1.$devices"} = $devices; # index
-
$stats{"$base_oid.2.$devices"} = $1; # device name
-
$stats{"$base_oid.3.$devices"} = $2; # rrqm/s
-
$stats{"$base_oid.4.$devices"} = $3; # wrqm/s
-
$stats{"$base_oid.5.$devices"} = $4; # r/s
-
$stats{"$base_oid.6.$devices"} = $5; # w/s
-
$stats{"$base_oid.7.$devices"} = $6; # rkB/s
-
$stats{"$base_oid.8.$devices"} = $7; # wkB/s
-
$stats{"$base_oid.9.$devices"} = $8; # avgrq-sz
-
$stats{"$base_oid.10.$devices"} = $9; # avgqu-sz
-
$stats{"$base_oid.11.$devices"} = $10; # await
-
$stats{"$base_oid.12.$devices"} = $11; # r_await
-
$stats{"$base_oid.13.$devices"} = $12; # w_await
-
$stats{"$base_oid.14.$devices"} = $13; # svctm
-
$stats{"$base_oid.15.$devices"} = $14; # %util
即
if ($ostype eq 'linux') {
/^([a-z0-9\-\/]+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[
\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)/;
更改为
if ($ostype eq 'linux') {
/^([a-z0-9\-\/]+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[
\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)/;
并增加$11,$12两行,并重启snmpd。
关于nagios check_snmp
使用snmpwalk可以取到数据,使用check_snmp有报错:
SNMP OK - = No Such Instance currently exists at this OID |
原因:
check_snmp doesn't do a walk. You must be very specific to the exact OID you want to check. You'll also need to specify failure criteria if you want to get any kind of alert.
即一定要指定准确的OID
如.1.3.6.1.3.1.13.1
关于不cache iostat的数据,每次snmp来取数时再用iostat获取数据,这样就不需要添加cron计划任务,也不需要iostat.cache文件了,但是缺点是经常会取不到数据。
1、snmpd.conf中添加:pass .1.3.6.1.3.1 /usr/bin/perl /usr/local/bin/iostat.pl
2、修改iostat.pl
iostat.pl修改为:
-
#!/usr/bin/env perl
-
use strict;
-
-
use constant debug => 0;
-
my $base_oid = ".1.3.6.1.3.1";
-
my $req;
-
my %stats;
-
my $devices;
-
-
process();
-
-
my $mode = shift(@ARGV);
-
if ( $mode eq "-g" ) {
-
$req = shift(@ARGV);
-
getoid($req);
-
}
-
elsif ( $mode eq "-n" ) {
-
$req = shift(@ARGV);
-
my $next = getnextoid($req);
-
getoid($next);
-
}
-
else {
-
$req = $mode;
-
getoid($req);
-
}
-
-
sub process {
-
$devices = 1;
-
open( IOSTAT, "iostat -xkd 1 2 | sed 's/,/\./g'|" )
-
or return ("Could not run iostat : $!");
-
-
my $header_seen = 0;
-
-
while (<IOSTAT>) {
-
if (/^[D|d]evice/) {
-
$header_seen++;
-
next;
-
}
-
next if ( $header_seen < 2 );
-
next if (/^$/);
-
-
/^([a-z0-9\-\/]+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)\s+(\d+[\.,]\d+)/;
-
-
$stats{"$base_oid.1.$devices"} = $devices; # index
-
$stats{"$base_oid.2.$devices"} = $1; # device name
-
$stats{"$base_oid.3.$devices"} = $2; # rrqm/s
-
$stats{"$base_oid.4.$devices"} = $3; # wrqm/s
-
$stats{"$base_oid.5.$devices"} = $4; # r/s
-
$stats{"$base_oid.6.$devices"} = $5; # w/s
-
$stats{"$base_oid.7.$devices"} = $6; # rkB/s
-
$stats{"$base_oid.8.$devices"} = $7; # wkB/s
-
$stats{"$base_oid.9.$devices"} = $8; # avgrq-sz
-
$stats{"$base_oid.10.$devices"} = $9; # avgqu-sz
-
$stats{"$base_oid.11.$devices"} = $10; # await
-
$stats{"$base_oid.12.$devices"} = $11; # svctm
-
$stats{"$base_oid.13.$devices"} = $12; # %util
-
-
-
$devices++;
-
}
-
-
}
-
-
sub getoid {
-
my $oid = shift(@_);
-
print "Fetching oid : $oid\n" if (debug);
-
if ( $oid =~ /^$base_oid\.(\d+)\.(\d+).*/ && exists( $stats{$oid} ) ) {
-
print $oid. "\n";
-
if ( $1 == 1 ) {
-
print "integer\n";
-
}
-
else {
-
print "string\n";
-
}
-
-
print $stats{$oid} . "\n";
-
}
-
}
-
-
sub getnextoid {
-
my $first_oid = shift(@_);
-
my $next_oid = '';
-
my $count_id;
-
my $index;
-
-
if ( $first_oid =~ /$base_oid\.(\d+)\.(\d+).*/ ) {
-
print("getnextoid($first_oid): index: $2, count_id: $1\n") if (debug);
-
if ( $2 + 1 >= $devices ) {
-
$count_id = $1 + 1;
-
$index = 1;
-
}
-
else {
-
$index = $2 + 1;
-
$count_id = $1;
-
}
-
print(
-
"getnextoid($first_oid): NEW - index: $index, count_id: $count_id\n"
-
) if (debug);
-
$next_oid = "$base_oid.$count_id.$index";
-
}
-
elsif ( $first_oid =~ /$base_oid\.(\d+).*/ ) {
-
$next_oid = "$base_oid.$1.1";
-
}
-
elsif ( $first_oid eq $base_oid ) {
-
$next_oid = "$base_oid.1.1";
-
}
-
print("getnextoid($first_oid): returning $next_oid\n") if (debug);
-
return $next_oid;
-
}