Chinaunix首页 | 论坛 | 博客
  • 博客访问: 3673214
  • 博文数量: 715
  • 博客积分: 1860
  • 博客等级: 上尉
  • 技术积分: 7745
  • 用 户 组: 普通用户
  • 注册时间: 2008-04-07 08:51
个人简介

偶尔有空上来看看

文章分类

全部博文(715)

文章存档

2023年(75)

2022年(134)

2021年(238)

2020年(115)

2019年(11)

2018年(9)

2017年(9)

2016年(17)

2015年(7)

2014年(4)

2013年(1)

2012年(11)

2011年(27)

2010年(35)

2009年(11)

2008年(11)

分类: Oracle

2021-04-08 21:08:13


aix 16G内存 单机 10.2.0.4
应用反馈无法访问数据库了。

sqlplus / as sysdba 挂起

看alert.log 不断输出
Tue Apr 6 10:01:41 2021
PMON failed to acquire latch, see PMON dump
Tue Apr 6 10:02:41 2021
PMON failed to acquire latch, see PMON dump
Tue Apr 6 10:02:55 2021
ksvcreate: Process(m000) creation failed
Tue Apr 6 10:03:41 2021
PMON failed to acquire latch, see PMON dump

kill -9 pmon进程后,告警日志输出以下内容:

Tue Apr 6 10:46:30 2021
MMNL: terminating instance due to error 472
Tue Apr 6 10:46:30 2021
Errors in file /home/oracle/admin/orcl/udump/orcl_ora_5865490.trc:
ORA-00600: internal error code, arguments: [504], [0x700000010007608], [640], [5], [session allocation], [0], [0], [0x000000000]
Tue Apr 6 10:46:31 2021
Errors in file /home/oracle/admin/orcl/udump/orcl_ora_5865490.trc:
ORA-07445: exception encountered: core dump [kgscDump+01dc] [SIGSEGV] [Address not mapped to object] [0x7C000FAC7C017000] [] []
ORA-00600: internal error code, arguments: [504], [0x700000010007608], [640], [5], [session allocation], [0], [0], [0x000000000]
Tue Apr 6 10:46:31 2021

奇怪的是,告警日志中提示MMNL进程由于472错误终止实例。
重启实例正常了。




检查当时的其他trc文件,发现mmnl的trc里有很多信息。


而 mmon 的跟踪文件中不断输出以下信息:



再看 pmon 的跟踪文件输出以下信息:


。。。

。。。

还有如下信息:

----------------------------------------
SO: 70000006f6b84c0, type: 2, owner: 0, flag: INIT/-/-/0x00
  (process) Oracle pid=92, calls cur/top: 0/70000002ed1a930, flag: (0) -
            int error: 0, call error: 0, sess error: 0, txn error 0
  (post info) last post received: 0 0 0
              last post received-location: No post
              last process to post me: none
              last post sent: 0 0 0
              last post sent-location: No post
              last process posted by me: none
  (latch info) wait_event=0 bits=280
    holding    (efd=6) 70000006b9d8148 Child library cache level=5 child#=2 
        Location from where latch is held: kghfrunp: clatch: nowait: 
        Context saved from call: 0
        state=busy, wlstate=free
        waiters [orapid (seconds since: put on list, posted, alive check)]:
         8 (2181, 1617676902, 0)
         waiter count=1
    holding    (efd=6) 7000000100e7668 Child shared pool level=7 child#=1 
        Location from where latch is held: kghfrunp: alloc: session dur: 
        Context saved from call: 0
        state=busy, wlstate=free
        waiters [orapid (seconds since: put on list, posted, alive check)]:
         89 (2450, 1617676902, 0)
         94 (2450, 1617676902, 0)
         78 (2445, 1617676902, 0)
         28 (2442, 1617676902, 0)
         84 (2442, 1617676902, 0)
         省略500字
         85 (2070, 1617676902, 0)
         130 (2061, 1617676902, 3)
         138 (1992, 1617676902, 0)
         146 (1923, 1617676902, 0)
         waiter count=46
    Process Group: DEFAULT, pseudo proc: 70000006f783298
    O/S info: user: oracle, term: UNKNOWN, ospid: 5865490 
    OSD pid info: Unix process pid: 5865490, image: oracle@p550-zjgl1
    Short stack dump: 
ksdxfstk+002c<-ksdxcb+04e4<-sspuser+0074<-000044F0<-skgpwwait+00bc<-ksliwat+06c0<-kslwaitns_timed+0024<-kskthbwt+022c<-kslwait+00f4<-kksLockWait+01fc<-kgxWait+0168<-kgxExclusive+00bc<-kksFreeHeapGetMutex+0158<-kksCursorFreeCallBack+0088<-kglobf0+0264<-kglhpd_internal+0228<-kglhpd+0010<-kghfrx+0028<-kghfrunp+0b44<-kghfnd+07e8<-kghalo+0a24<-ksp_param_handle_alloc+0168<-kspcrec+01bc<-ksucre+0408<-ksucresg+017c<-kpolna+02c4<-kpogsk+00c4<-opiodr+0ae0<-ttcpip+1020<-opitsk+1124<-opiino+0990<-opiodr+0ae0<-opidrv+0484<-sou2o+0090<-opimai_real+01bc<-main+0098<-__start+0098
Dump of memory from 0x070000006F657380 to 0x070000006F657588
70000006F657380 00000004 00000000 07000000 6946E648  [............iF.H]
70000006F657390 00000010 000313A7 07000000 2ED1A930  [...............0]
70000006F6573A0 00000003 000313A7 07000000 6FD593E0  [............o...]
70000006F6573B0 0000000B 000313A7 07000000 6F9CBEA8  [............o...]
70000006F6573C0 00000004 0003129B 00000000 00000000  [................]
70000006F6573D0 00000000 00000000 00000000 00000000  [................]
        Repeat 26 times
70000006F657580 00000000 00000000                    [........]        
*** 2021-04-06 10:41:42.821
PMON unable to acquire latch  7000000100e7668 Child shared pool level=7 child#=1 
        Location from where latch is held: kghfrunp: alloc: session dur: 
        Context saved from call: 0
        state=busy, wlstate=free

通过筛选possible holder关键字发现:



可能时pid=92 130的两个进程持有资源导致pmon异常。

停库前做了 HANG ANALYSIS,根据输出的文件,只找到了 92 这个pid的信息


印象中,10点此数据库不是非常忙,应该是解析方面出了问题,如果杀掉92这个会话可能会轻量级解决。

下次再发生类似问题时,优先看pmon的trc。

搜集信息方法:

sqlplus -prelim / as sysdba
oradebug setmypid
oradebug unlimit
oradebug hanganalyze 3
oradebug dump systemstate 258
等待30秒
oradebug hanganalyze 3
oradebug dump systemstate 258
oradebug tracefile_name
exit

或者


alter session set events 'immediate trace name systemstate level 258';
alter session set events 'immediate trace name hanganalyze level 3';
等待1分钟
alter session set events 'immediate trace name systemstate level 258';
alter session set events 'immediate trace name hanganalyze level 3';


参考:
Patch For Bug:5377099 Is Superseded By Patch For Bug:8426816 (Doc ID 817033.1)


其他参考:
awk -f ass109.awk /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_9976.trc
ass109.zip
https://fritshoogland.files.wordpress.com/2020/04/mutexes-2.pdf
https://andreynikolaev.files.wordpress.com/2016/03/latches_and_mutexes_in_oracle_12c.pdf

阅读(1152) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~