OS redhat as4 u2,DB ORACLE 10201
作者:
早晨检查数据库发现,系统负载很高,
检查报警日志,没有错误报出,
有好多sqlplus进程,于是将非oracle系统进行kill掉,负载正常
于是使用sqlplus "/as sysdba"登陆数据库,发现无法登陆
于是使用sqlplus -prelim "/ as sysdba"同样无法登陆,于是判定数据库hang了。
使用strace跟踪sqlplus,得到一些看不懂的内容
[oracle@CCN-BJ-5-5D2 ~]$ strace sqlplus "/as sysdba" 2>&1 |less
execve("/u01/app/oracle/product/10.2.0/db_1/bin/sqlplus", ["sqlplus", "/as sysdba"], [/* 29 vars */]) = 0
uname({sys="Linux", node="CCN-BJ-5-5D2", ...}) = 0
brk(0) = 0x804a000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/u01/app/oracle/product/10.2.0/db_1/lib/tls/i686/sse2/libsqlplus.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat64("/u01/app/oracle/product/10.2.0/db_1/lib/tls/i686/sse2", 0xbfffeeac) = -1 ENOENT (No such file or directory)
open("/u01/app/oracle/product/10.2.0/db_1/lib/tls/i686/libsqlplus.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat64("/u01/app/oracle/product/10.2.0/db_1/lib/tls/i686", 0xbfffeeac) = -1 ENOENT (No such file or directory)
open("/u01/app/oracle/product/10.2.0/db_1/lib/tls/sse2/libsqlplus.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat64("/u01/app/oracle/product/10.2.0/db_1/lib/tls/sse2", 0xbfffeeac) = -1 ENOENT (No such file or directory)
open("/u01/app/oracle/product/10.2.0/db_1/lib/tls/libsqlplus.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat64("/u01/app/oracle/product/10.2.0/db_1/lib/tls", 0xbfffeeac) = -1 ENOENT (No such file or directory)
open("/u01/app/oracle/product/10.2.0/db_1/lib/i686/sse2/libsqlplus.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat64("/u01/app/oracle/product/10.2.0/db_1/lib/i686/sse2", 0xbfffeeac) = -1 ENOENT (No such file or directory)
open("/u01/app/oracle/product/10.2.0/db_1/lib/i686/libsqlplus.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat64("/u01/app/oracle/product/10.2.0/db_1/lib/i686", 0xbfffeeac) = -1 ENOENT (No such file or directory)
open("/u01/app/oracle/product/10.2.0/db_1/lib/sse2/libsqlplus.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat64("/u01/app/oracle/product/10.2.0/db_1/lib/sse2", 0xbfffeeac) = -1 ENOENT (No such file or directory)
open("/u01/app/oracle/product/10.2.0/db_1/lib/libsqlplus.so", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@H\1\000"..., 512) = 512
...
lseek(6, 1024, SEEK_SET) = 1024
read(6, "\25\7\'\0072\7>\7j\7\276\17$\'\6K5S\24TfT\307T(VsV\222"..., 86) = 86
times(NULL) = -2140869880
times(NULL) = -2140869880
times(NULL) = -2140869880
times(NULL) = -2140869880
times(NULL) = -2140869880
times(NULL) = -2140869880
times(NULL) = -2140869880
times(NULL) = -2140869880
times(NULL) = -2140869880
times(NULL) = -2140869880
[oracle@CCN-BJ-5-5D2 admin]$ ps -ef|grep ora
oracle 4243 1 0 2007 ? 01:15:30 ora_psp0_vipbill
oracle 4245 1 0 2007 ? 01:16:30 ora_mman_vipbill
oracle 4247 1 0 2007 ? 03:01:13 ora_dbw0_vipbill
oracle 4249 1 0 2007 ? 06:16:19 ora_lgwr_vipbill
oracle 4251 1 0 2007 ? 01:46:55 ora_ckpt_vipbill
oracle 4253 1 0 2007 ? 02:35:54 ora_smon_vipbill
oracle 4255 1 0 2007 ? 01:14:33 ora_reco_vipbill
oracle 4257 1 0 2007 ? 01:22:21 ora_cjq0_vipbill
oracle 4259 1 0 2007 ? 01:24:14 ora_mmon_vipbill
oracle 4261 1 0 2007 ? 01:17:34 ora_mmnl_vipbill
oracle 4263 1 0 2007 ? 01:14:29 ora_d000_vipbill
oracle 4265 1 0 2007 ? 00:00:00 ora_s000_vipbill
oracle 4345 1 0 2007 ? 02:29:16 ora_arc0_vipbill
oracle 4347 1 0 2007 ? 02:28:14 ora_arc1_vipbill
oracle 4402 1 0 2007 ? 01:15:12 ora_qmnc_vipbill
oracle 4485 1 0 2007 ? 01:13:51 ora_q000_vipbill
oracle 19170 1 0 Mar23 ? 01:15:03 ora_a001_vipbill
oracle 19172 1 0 Mar23 ? 01:15:57 ora_p000_vipbill
oracle 19174 1 0 Mar23 ? 00:02:12 ora_p001_vipbill
oracle 24168 1 0 Apr02 ? 00:00:00 ora_q002_vipbill
oracle 9708 1 6 Apr02 ? 01:16:11 ora_m000_vipbill
root 23823 23775 0 09:25 pts/0 00:00:00 su - oracle
oracle 23824 23823 0 09:25 pts/0 00:00:00 -bash
oracle 4241 1 0 2007 ? 01:16:12 ora_pmon_vipbill
root 1253 28005 0 10:29 pts/3 00:00:00 su - oracle
oracle 1254 1253 0 10:29 pts/3 00:00:00 -bash
oracle 19412 1 15 11:42 ? 00:12:31 oraclevipbill (LOCAL=NO)
oracle 19714 1 15 11:42 ? 00:12:16 oraclevipbill (LOCAL=NO)
oracle 20060 1 15 11:42 ? 00:12:31 oraclevipbill (LOCAL=NO)
oracle 20259 1 15 11:42 ? 00:12:43 oraclevipbill (LOCAL=NO)
oracle 20599 1 15 11:42 ? 00:12:28 oraclevipbill (LOCAL=NO)
oracle 22751 1 15 11:43 ? 00:12:25 oraclevipbill (LOCAL=NO)
oracle 9373 23824 0 13:02 pts/0 00:00:00 ps -ef
oracle 9374 23824 0 13:02 pts/0 00:00:00 grep ora
[oracle@CCN-BJ-5-5D2 admin]$
[root@CCN-BJ-5-5D2 ~]# uptime
13:14:27 up 199 days, 16:41, 3 users, load average: 6.90, 18.59, 23.06
[root@CCN-BJ-5-5D2 ~]#
查看系统运行时间是199天,而数据库出现故障是在前一天,说明数据库在198天出的错,
这个现象跟bug 4612267 十分相似,唯一不同的是bug 4612267 提到的运行时间是248天。
从google和baidu搜索发现也有其他人在198天出现类似现象,所以可以初步判断是bug,
将所有oracle进程杀掉,数据库仍然不能连接,只能进行重启机器
[oracle@CCN-BJ-5-5D2 admin]$ ps -ef|grep ora
root 23823 23775 0 09:25 pts/0 00:00:00 su - oracle
oracle 23824 23823 0 09:25 pts/0 00:00:00 -bash
root 1253 28005 0 10:29 pts/3 00:00:00 su - oracle
oracle 1254 1253 0 10:29 pts/3 00:00:00 -bash
oracle 10994 23824 0 13:13 pts/0 00:00:00 ps -ef
oracle 10995 23824 0 13:13 pts/0 00:00:00 grep ora
[oracle@CCN-BJ-5-5D2 admin]$ sqlplus -prelim "/ as sysdba"
解决方案:将数据库升级到10203
Bug 4612267 - OCI client spins when machine uptime >= 249 days
文档 ID: 注释:4612267.8
Description
OCI clients may spin once the machine has been up for longer than
249 days (24.9 days on some machines).
The spin will show repeated calls to times() from sltrgatime64().
SQL*Plus With Instant Client 10.2.0.1 Hangs, When System Uptime Is More Than 248 Days
文档 ID: 注释:338461.1
This is a known, unpublished bug.
Bug 4612267 OCI CLIENT IS IN AN INFINITE LOOP WHEN MACHINE UPTIME HITS 248 DAYS
当机器运行时间达到248天,OCI客户端会进入一个无限死循环,意思是不是当我们此时执行sqlplus /as sysdba就进入死循环了,无法登陆,并且资源耗费巨大。
第一篇文档中提到当系统运行超过249天就会出错,第二篇说是超过248天,估计是Oracle方面也不太确定到底是哪天。呵呵。
并且第一篇提到某些机器上24.9天就有可能出现该错误,那么我们是不是可以猜测一下在24.9的倍数也会出现问题呢
24.9*8=199.2
24.8*8=198.4
跟198天很接近
重新启动机器,然后数据库启动正常,随后找机会将其升级到10203
itpub中也有人提到该错误
http://www.itpub.net/viewthread.php?tid=832227&extra=&page=1情况跟我类似,最后解决方法也是重启机器。