在日常工作中,有一些經常要處理的事情.
1. 檢查oracle是否正常.
2. oracle job運行是否正常. job有broken的, scheduler job運行FAILED.
3. 業務數據一致性檢測.
4. tablespace空間
5. 有多個不同的數據庫, 每一台需檢測的不一樣
6. nagios的 check_oracle_health需要定義很多server, 且不是很符合我的需求
需要一種較為通用的nagios插件來處理,以簡化流程.
我的解決方案是這樣子.
1. 在oracle管理員帳戶中建立dt_nagios_check表, 保存需要check的sql .
如有需求變更, nagios插件不需修改, 只修改dt_nagios_check中的數據.
-
create table dt_nagios_check
-
( id number(4) not null,
-
check_sql varchar2(2000),
-
msg_ok varchar2(2000),
-
msg_warning varchar2(2000) ,
-
enable char(1) default 'Y' ,
-
constraints pk_dt_nagios_check primary key (id)
-
);
數據範例:
每一台oracle按業務需求分別創建需檢核的sql.
注意: check_sql只能是一個匯總後的錯誤總數. msg_ok或msg_warning中只能一個參數.
如msg_ok為null,則表示檢測無數據時,nagios不顯示信息
-
insert into dt_nagios_check (id,check_sql,msg_ok,msg_warning) values
-
(1, q'# SELECT count(*) FROM dba_scheduler_job_run_details WHERE log_date >SYSDATE-1 AND status<>'SUCCEEDED' #', 'Oracle Scheduler jobs Succeeded ','Warning: <%d> Scheduler jobs failed') ;
-
-
insert into dt_nagios_check (id, check_sql, msg_warning) values
-
(2, ' select sum(num_rows) from yy3mat.dt_data_check_logs ', 'Warning: <%d> yy3mat data check error') ;
2. perl開發nagios 插件,抓dt_nagios_check中的sql,循環執行. 輸出報錯信息
-
#!/usr/bin/perl
-
# nagios: -epn
-
BEGIN {
-
$ENV{ORACLE_HOME} = "/usr/lib/oracle/11.2/client";
-
$ENV{NLS_LANG}="TRADITIONAL CHINESE_TAIWAN.ZHT16MSWIN950";
-
$ENV{LD_LIBRARY_PATH} = "$ENV{ORACLE_HOME}/lib:/usr/lib";
-
-
}
-
use DBI;
-
use strict;
-
use warnings;
-
#use utils ;
-
use constant STATE_CRITICAL=>2 ;
-
use constant STATE_WARNING=>1 ;
-
-
#my ($username,$password,$cmd,$database) ;
-
my $arg ;
-
while ($arg = shift @ARGV) {
-
$main::username = shift @ARGV if ($arg eq "-u");
-
$main::password = shift @ARGV if ($arg eq "-p");
-
$main::database = shift @ARGV if ($arg eq "--db");
-
}
-
-
&usage($0) if ( $main::database eq "" ) ;
-
-
# Connect to database
-
my $dbh = DBI->connect( "dbi:Oracle:$main::database", $main::username, $main::password,
-
{ PrintError => 1, AutoCommit => 0 } )
-
or &die_state(STATE_CRITICAL,"Cannot connect: $DBI::errstr\n\n");
-
-
my $cnt = &check_nagios_dynsql ;
-
$dbh->disconnect;
-
-
exit( STATE_WARNING ) if ($cnt >0 ) ;
-
exit 0 ;
-
#-------------------------------------------
-
-
-
-
sub die_state()
-
{
-
my $aa = shift ;
-
printf shift ;
-
exit $aa ;
-
}
-
-
sub select_count()
-
{
-
my $rs = $dbh->selectall_arrayref(shift );
-
return $rs->[0]->[0];
-
-
}
-
-
-
-
-
sub check_nagios_dynsql()
-
{
-
printf("Database:%s\n",$main::database) ;
-
-
my ($cnt,$err) ;
-
$cnt =0 ;
-
my $sth = $dbh->prepare("SELECT check_sql,msg_ok,msg_warning FROM dt_nagios_check where enable='Y' order by id ");
-
$sth->execute();
-
while (my $data = $sth->fetchrow_hashref) {
-
-
$err = &select_count( $data->{CHECK_SQL}) ;
-
if ($err <=0) {
-
print $data->{MSG_OK}, "\n" if defined($data->{MSG_OK}) ;
-
} else {
-
printf($data->{MSG_WARNING}, $err) ;
-
print "\n";
-
}
-
$cnt += $err ;
-
-
}
-
-
-
return $cnt
-
}
-
-
-
sub usage()
-
{
-
print <<_EOF_;
-
usage: $0
-
[--db database] - the oracle database
-
[-u username] - the user for oracle
-
[-p password] - the oracle password
-
_EOF_
-
exit(1);
-
-
}
3. 配置nagios3服務
-
define command{
-
command_name oracle_nagios_dynsql
-
command_line /usr/local/nagios/plugins/check_oracle_dynsql.pl --db '$ARG1$' -u '$ARG2$' -p '$ARG3$'
-
}
-
-
-
-
define service{
-
use oracle-service-yydg
-
host_name oracle1.yydg.com.cn
-
service_description oracle status oracle1
-
normal_check_interval 15
-
check_command oracle_nagios_dynsql!oracle1.yydg.com.cn!dbauser!dbapass
-
}
4
以上. 如無異常.即可運行.
如想順利配置, 需要了解一些nagios,perl ,oracle相關知識.
如想改為監控別的數據庫,如mysql, postgresql,sqlsever之類的. 也可按照這個思路修改check_oracle_dynsql.pl插件程序.
我的nagios+cacti運行的平台是跑在kvm上的256M內存的Debian Linux.
效果不錯.
如有更好的建議或方法,請與我聯絡. gangjh@gmail.com
阿飛
2014/01/25
阅读(1568) | 评论(0) | 转发(0) |