分类: Oracle
2008-10-17 14:09:10
这篇文章介绍了DBA每天在监控Oracle数据库方面的职责,讲述了如何通过shell脚本来完成这些重复的监控工作。本文首先回顾了一些DBA常用的Unix命令,以及解释了如何通过Unix Cron来定时执行DBA脚本。同时文章还介绍了8个重要的脚本来监控Oracle数据库:
检查实例的可用性
检查监听器的可用性
检查alert日志文件中的错误信息
在存放log文件的地方满以前清空旧的log文件
分析table和index以获得更好的性能
检查表空间的使用情况
找出无效的对象
监控用户和事务
DBA需要的Unix基本知识
基本的UNIX命令
ps--显示进程
grep--搜索文件中的某种文本模式
mailx--读取或者发送mail
cat--连接文件或者显示它们
cut--选择显示的列
awk--模式匹配语言
df--显示剩余的磁盘空间
以下是DBA如何使用这些命令的一些例子:
显示服务器上的可用实例:
$ ps -ef | grep smonoracle ------------------------ 21832 1 0 Feb 24 ? 19:05 ora_smon_oradb1 oracle 898 1 0 Feb 15 ? 0:00 ora_smon_oradb2 dliu 25199 19038 0 10:48:57 pts/6 0:00 grep smon oracle 27798 1 0 05:43:54 ? 0:00 ora_smon_oradb3 oracle 28781 1 0 Mar 03 ? 0:01 ora_smon_oradb4、 |
$ ps -ef | grep listener | grep -v grep ---------------------------------- (译者注:grep命令应该加上-i参数,即grep -i listener,该参数的作用是忽略大小写, 因为有些时候listener是大写的,这时就会看不到结果) oracle 23879 1 0 Feb 24 ? 33:36 /8.1.7/bin/tnslsnr listener_db1 -inherit oracle 27939 1 0 05:44:02 ? 0:00 /8.1.7/bin/tnslsnr listener_db2 -inherit oracle 23536 1 0 Feb 12 ? 4:19 /8.1.7/bin/tnslsnr listener_db3 -inherit oracle 28891 1 0 Mar 03 ? 0:01 /8.1.7/bin/tnslsnr listener_db4 -inherit |
查看Oracle存档目录的文件系统使用情况
$ df -k | grep oraarch ---------------- /dev/vx/dsk/proddg/oraarch 71123968 4754872 65850768 7% /u09/oraarch |
$ cat alert.log | wc -l ----------------- 2984 |
列出alert.log文件中的全部Oracle错误信息:
$ grep ORA- alert.log ----------------- ORA-00600: internal error code, arguments: [kcrrrfswda.1], [], [], [], [], [] ORA-00600: internal error code, arguments: [1881], [25860496], [25857716], [] |
Unix命令或者Shell脚本
要编辑一个crontab文件,输入:
Crontab -e |
要查看一个crontab文件,输入:
Crontab -l ------------------------ 0 4 * * 5 /dba/admin/analyze_table.ksh 30 3 * * 3,6 /dba/admin/hotbackup.ksh /dev/null 2>&1 |
监控数据库的常用Shell脚本
以下提供的8个shell脚本覆盖了DBA每日监控工作的90%,你可能还需要修改UNIX的环境变量。
检查Oracle实例的可用性
oratab文件中列出了服务器上的所有数据库
$ cat /var/opt/oracle/oratab -------------------------------- ####################################### ## /var/opt/oracle/oratab## ####################################### oradb1:/u01/app/oracle/product/8.1.7:Y oradb2:/u01/app/oracle/product/8.1.7:Y oradb3:/u01/app/oracle/product/8.1.7:N oradb4:/u01/app/oracle/product/8.1.7:Y |
以下的脚本检查oratab文件中列出的所有数据库,并且找出该数据库的状态(启动还是关闭)
###################################################################
## ckinstance.ksh ##
################################################################### ORATAB=/var/opt/oracle/oratab echo "`date` " echo "Oracle Database(s) Status `hostname` :\n" db=`egrep -i ":Y|:N" $ORATAB | cut -d":" -f1 | grep -v "\#" | grep -v "\*"` pslist="`ps -ef | grep pmon`" for i in $db ; do echo "$pslist" | grep "ora_pmon_$i" > /dev/null 2>$1 if (( $? )); then echo "Oracle Instance - $i: Down" else echo "Oracle Instance - $i: Up" fi done |
使用以下的命令来确认该脚本是可以执行的:
$ chmod 744 ckinstance.ksh $ ls -l ckinstance.ksh-rwxr- -r-- 1 oracle dba 657 Mar 5 22:59 ckinstance.ksh* |
$ ckinstance.ksh ----------------------------- Mon Mar 4 10:44:12 PST 2002 Oracle Database(s) Status for DBHOST server: Oracle Instance - oradb1: Up Oracle Instance - oradb2: Up Oracle Instance - oradb3: Down Oracle Instance - oradb4: Up |
以下有一个类似的脚本检查Oracle监听器。如果监听器停了,该脚本将会重新启动监听器:
#######################################################################
## cklsnr.sh ##
####################################################################### #!/bin/ksh DBALIST="primary.dba@company.com,another.dba@company.com";export DBALIST cd /var/opt/oracle rm -f lsnr.exist ps -ef | grep mylsnr | grep -v grep > lsnr.exist if [ -s lsnr.exist ] then echo else echo "Alert" | mailx -s "Listener mylsnr on `hostname` is down" $DBALIST TNS_ADMIN=/var/opt/oracle; export TNS_ADMIN ORACLE_SID=db1; export ORACLE_SID ORAENV_ASK=NO; export ORAENV_ASK PATH=$PATH:/bin:/usr/local/bin; export PATH . oraenv LD_LIBRARY_PATH=${ORACLE_HOME}/lib;export LD_LIBRARY_PATH lsnrctl start mylsnr fi |
############################################################
## oracle.profile ##
############################################################ EDITOR=vi;export EDITOR ORACLE_BASE=/u01/app/oracle; export ORACLE_BASE ORACLE_HOME=$ORACLE_BASE/product/8.1.7; export ORACLE_HOME LD_LIBRARY_PATH=$ORACLE_HOME/lib; export LD_LIBRARY_PATH TNS_ADMIN=/var/opt/oracle; export TNS_ADMIN NLS_LANG=american; export NLS_LANG NLS_DATE_FORMAT=Mon DD YYYY HH24:MI:SS; export NLS_DATE_FORMAT ORATAB=/var/opt/oracle/oratab; export ORATAB PATH=$PATH:$ORACLE_HOME:$ORACLE_HOME/bin:/usr/ccs/bin:/bin:/usr/bin:/usr/sbin:/sbin:/usr/openwin/bin:/opt/bin:.; export PATH DBALIST="primary.dba@company.com,another.dba@company.com";export DBALIST |
############################################################ ## ckalertlog.sh ## ############################################################ #!/bin/ksh .. /etc/oracle.profile for SID in `cat $ORACLE_HOME/sidlist` do cd $ORACLE_BASE/admin/$SID/bdump if [ -f alert_${SID}.log ] then mv alert_${SID}.log alert_work.log touch alert_${SID}.log cat alert_work.log >> alert_${SID}.hist grep ORA- alert_work.log > alert.err fi if [ `cat alert.err|wc -l` -gt 0 ] then mailx -s "${SID} ORACLE ALERT ERRORS" $DBALIST < alert.err fi rm -f alert.err rm -f alert_work.log done |
清除旧的归档文件
以下的脚本将会在log文件达到90%容量的时候清空旧的归档文件:
$ df -k | grep arch Filesystem kbytes used avail capacity Mounted on /dev/vx/dsk/proddg/archive 71123968 30210248 40594232 43% /u08/archive |
####################################################################### ## clean_arch.ksh ## ####################################################################### #!/bin/ksh df -k | grep arch > dfk.result archive_filesystem=`awk -F" " { print $6 } dfk.result` archive_capacity=`awk -F" " { print $5 } dfk.result` if [[ $archive_capacity > 90% ]] then echo "Filesystem ${archive_filesystem} is ${archive_capacity} filled" # try one of the following option depend on your need find $archive_filesystem -type f -mtime +2 -exec rm -r {} \; tar rman fi |
分析表和索引(以得到更好的性能)
以下我将展示如果传送参数到一个脚本中:
#################################################################### ## analyze_table.sh ## #################################################################### #!/bin/ksh # input parameter: 1: password # 2: SID if (($#<1)) then echo "Please enter oracle user password as the first parameter !" exit 0 fi if (($#<2)) then echo "Please enter instance name as the second parameter!" exit 0 fi |
$ analyze_table.sh manager oradb1 脚本的第一部分产生了一个analyze.sql文件,里面包含了分析表用的语句。脚本的第二部分分析全部的表: ##################################################################### ## analyze_table.sh ## ##################################################################### sqlplus -s < oracle/$1@$2 set heading off set feed off set pagesize 200 set linesize 100 spool analyze_table.sql select ANALYZE TABLE || owner || . || segment_name || ESTIMATE STATISTICS SAMPLE 10 PERCENT; from dba_segments where segment_type = TABLE and owner not in (SYS, SYSTEM); spool off exit ! sqlplus -s < oracle/$1@$2 @./analyze_table.sql exit ! |
以下是analyze.sql的一个例子:
$ cat analyze.sql -------------------- ANALYZE TABLE HIRWIN.JANUSAGE_SUMMARY ESTIMATE STATISTICS SAMPLE 10 PERCENT; ANALYZE TABLE HIRWIN.JANUSER_PROFILE ESTIMATE STATISTICS SAMPLE 10 PERCENT; ANALYZE TABLE APPSSYS.HIST_SYSTEM_ACTIVITY ESTIMATE STATISTICS SAMPLE 10 PERCENT; ANALYZE TABLE HTOMEH.QUEST_IM_VERSION ESTIMATE STATISTICS SAMPLE 10 PERCENT; ANALYZE TABLE JSTENZEL.HIST_SYS_ACT_0615 ESTIMATE STATISTICS SAMPLE 10 PERCENT; ANALYZE TABLE JSTENZEL.HISTORY_SYSTEM_0614 ESTIMATE STATISTICS SAMPLE 10 PERCENT; ANALYZE TABLE JSTENZEL.CALC_SUMMARY3 ESTIMATE STATISTICS SAMPLE 10 PERCENT; ANALYZE TABLE IMON.QUEST_IM_LOCK_TREE ESTIMATE STATISTICS SAMPLE 10 PERCENT; ANALYZE TABLE APPSSYS.HIST_USAGE_SUMMARY ESTIMATE STATISTICS SAMPLE 10 PERCENT; ANALYZE TABLE PATROL.P$LOCKCONFLICTTX ESTIMATE STATISTICS SAMPLE 10 PERCENT; |
检查表空间的使用
以下的脚本检测表空间的使用。如果表空间只剩下10%,它将会发送一个警告email。 ##################################################################### ## ck_tbsp.sh ## ##################################################################### #!/bin/ksh sqlplus -s < oracle/$1@$2 set feed off set linesize 100 set pagesize 200 spool tablespace.alert SELECT F.TABLESPACE_NAME, TO_CHAR ((T.TOTAL_SPACE - F.FREE_SPACE),999,999) "USED (MB)", TO_CHAR (F.FREE_SPACE, 999,999) "FREE (MB)", TO_CHAR (T.TOTAL_SPACE, 999,999) "TOTAL (MB)", TO_CHAR ((ROUND ((F.FREE_SPACE/T.TOTAL_SPACE)*100)),999)|| % PER_FREE FROM ( SELECT TABLESPACE_NAME, ROUND (SUM (BLOCKS*(SELECT VALUE/1024 FROM V\$PARAMETER WHERE NAME = db_block_size)/1024) ) FREE_SPACE FROM DBA_FREE_SPACE GROUP BY TABLESPACE_NAME ) F, ( SELECT TABLESPACE_NAME, ROUND (SUM (BYTES/1048576)) TOTAL_SPACE FROM DBA_DATA_FILES GROUP BY TABLESPACE_NAME ) T WHERE F.TABLESPACE_NAME = T.TABLESPACE_NAME AND (ROUND ((F.FREE_SPACE/T.TOTAL_SPACE)*100)) < 10; spool off exit ! if [ `cat tablespace.alert|wc -l` -gt 0 ] then cat tablespace.alert -l tablespace.alert > tablespace.tmp mailx -s "TABLESPACE ALERT for ${2}" $DBALIST < tablespace.tmp fi |
警告email输出的例子如下:
TABLESPACE_NAME USED (MB) FREE (MB) TOTAL (MB) PER_FREE
------------------- --------- ----------- ------------------- ------------------
SYSTEM 2,047 203 2,250 9 %
STBS01 302 25 327 8 %
STBS02 241 11 252 4 %
STBS03 233 19 252 8 %
查找出无效的数据库对象
以下查找出无效的数据库对象:
##################################################################### ## invalid_object_alert.sh ## ##################################################################### #!/bin/ksh . /etc/oracle.profile sqlplus -s set heading off column object_name format a30 spool invalid_object.alert SELECT OWNER, OBJECT_NAME, OBJECT_TYPE, STATUS FROM DBA_OBJECTS WHERE STATUS = INVALID ORDER BY OWNER, OBJECT_TYPE, OBJECT_NAME; spool off exit ! if [ `cat invalid_object.alert|wc -l` -gt 0 ] then mailx -s "INVALID OBJECTS for ${2}" $DBALIST < invalid_object.alert fi$ cat invalid_object.alert OWNER OBJECT_NAME OBJECT_TYPE STATUS ---------------------------------------------------------------------- HTOMEH DBMS_SHARED_POOL PACKAGE BODY INVALID HTOMEH X_$KCBFWAIT VIEW INVALID IMON IW_MON PACKAGE INVALID IMON IW_MON PACKAGE BODY INVALID IMON IW_ARCHIVED_LOG VIEW INVALID IMON IW_FILESTAT VIEW INVALID IMON IW_SQL_FULL_TEXT VIEW INVALID IMON IW_SYSTEM_EVENT1 VIEW INVALID IMON IW_SYSTEM_EVENT_CAT VIEW INVALIDLBAILEY CHECK_TABLESPACE_USAGE PROCEDURE INVALID PATROL P$AUTO_EXTEND_TBSP VIEW INVALID SYS DBMS_CRYPTO_TOOLKIT PACKAGE INVALID SYS DBMS_CRYPTO_TOOLKIT PACKAGE BODY INVALID SYS UPGRADE_SYSTEM_TYPES_TO_816 PROCEDURE INVALID SYS AQ$_DEQUEUE_HISTORY_T TYPE INVALID SYS HS_CLASS_CAPS VIEW INVALID SYS HS_CLASS_DD VIEW INVALID |
################################################################### ## deadlock_alert.sh ## ################################################################### #!/bin/ksh .. /etc/oracle.profile sqlplus -s < oracle/$1@$2 set feed off set heading off spool deadlock.alert SELECT SID, DECODE(BLOCK, 0, NO, YES ) BLOCKER, DECODE(REQUEST, 0, NO,YES ) WAITER FROM V$LOCK WHERE REQUEST > 0 OR BLOCK > 0 ORDER BY block DESC; spool off exit ! if [ `cat deadlock.alert|wc -l` -gt 0 ] then mailx -s "DEADLOCK ALERT for ${2}" $DBALIST < deadlock.alert fi |
结论
0,20,40 7-17 * * 1-5 /dba/scripts/ckinstance.sh > /dev/null 2>&1
0,20,40 7-17 * * 1-5 /dba/scripts/cklsnr.sh > /dev/null 2>&1
0,20,40 7-17 * * 1-5 /dba/scripts/ckalertlog.sh > /dev/null 2>&1
30 * * * 0-6 /dba/scripts/clean_arch.sh > /dev/null 2>&1
* 5 * * 1,3 /dba/scripts/analyze_table.sh > /dev/null 2>&1
* 5 * * 0-6 /dba/scripts/ck_tbsp.sh > /dev/null 2>&1
* 5 * * 0-6 /dba/scripts/invalid_object_alert.sh > /dev/null 2>&1
0,20,40 7-17 * * 1-5 /dba/scripts/deadlock_alert.sh > /dev/null 2>&1