PMON failed to acquire latch, see PMON dump-brjl-ChinaUnix博客

沧海拾贝brjl.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

brjl

博客访问： 3744211
博文数量： 715
博客积分： 1860
博客等级：上尉
技术积分： 7745
用户组：普通用户
注册时间： 2008-04-07 08:51

个人简介

偶尔有空上来看看

文章分类

全部博文（715）

ogg（1）
达梦（39）
分区（3）
安全（2）
统计信息（8）
安装（31）
基础概念（88）
opatch（5）
asm（11）
脚本（9）
rman（15）
dataguard（32）
故障诊断（71）
优化器（4）
pdb（4）
VMWARE（2）
OMS（15）
借鉴参考（98）
学习目标（33）
遇到疑问（59）
待做实验（36）
参数基线（40）
经验感想（75）
未分配的博文（34）

文章存档

2023年（75）

2022年（134）

2021年（238）

2020年（115）

2019年（11）

2018年（9）

2017年（9）

2016年（17）

2015年（7）

2014年（4）

2013年（1）

2012年（11）

2011年（27）

2010年（35）

2009年（11）

2008年（11）

我的朋友

相关博文

PMON failed to acquire latch, see PMON dump

分类： Oracle

2021-04-08 21:08:13

aix 16G内存单机 10.2.0.4
应用反馈无法访问数据库了。

sqlplus / as sysdba 挂起

看alert.log 不断输出
Tue Apr 6 10:01:41 2021
PMON failed to acquire latch, see PMON dump
Tue Apr 6 10:02:41 2021
PMON failed to acquire latch, see PMON dump
Tue Apr 6 10:02:55 2021
ksvcreate: Process(m000) creation failed
Tue Apr 6 10:03:41 2021
PMON failed to acquire latch, see PMON dump

kill -9 pmon进程后，告警日志输出以下内容：

Tue Apr 6 10:46:30 2021
MMNL: terminating instance due to error 472
Tue Apr 6 10:46:30 2021
Errors in file /home/oracle/admin/orcl/udump/orcl_ora_5865490.trc:
ORA-00600: internal error code, arguments: [504], [0x700000010007608], [640], [5], [session allocation], [0], [0], [0x000000000]
Tue Apr 6 10:46:31 2021
Errors in file /home/oracle/admin/orcl/udump/orcl_ora_5865490.trc:
ORA-07445: exception encountered: core dump [kgscDump+01dc] [SIGSEGV] [Address not mapped to object] [0x7C000FAC7C017000] [] []
ORA-00600: internal error code, arguments: [504], [0x700000010007608], [640], [5], [session allocation], [0], [0], [0x000000000]
Tue Apr 6 10:46:31 2021

奇怪的是，告警日志中提示MMNL进程由于472错误终止实例。
重启实例正常了。

检查当时的其他trc文件，发现mmnl的trc里有很多信息。

而 mmon 的跟踪文件中不断输出以下信息：

再看 pmon 的跟踪文件输出以下信息：

。。。

。。。

还有如下信息：

----------------------------------------
SO: 70000006f6b84c0, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=92, calls cur/top: 0/70000002ed1a930, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=280
holding (efd=6) 70000006b9d8148 Child library cache level=5 child#=2
Location from where latch is held: kghfrunp: clatch: nowait:
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
8 (2181, 1617676902, 0)
waiter count=1
holding (efd=6) 7000000100e7668 Child shared pool level=7 child#=1
Location from where latch is held: kghfrunp: alloc: session dur:
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
89 (2450, 1617676902, 0)
94 (2450, 1617676902, 0)
78 (2445, 1617676902, 0)
28 (2442, 1617676902, 0)
84 (2442, 1617676902, 0)
省略500字
85 (2070, 1617676902, 0)
130 (2061, 1617676902, 3)
138 (1992, 1617676902, 0)
146 (1923, 1617676902, 0)
waiter count=46
Process Group: DEFAULT, pseudo proc: 70000006f783298
O/S info: user: oracle, term: UNKNOWN, ospid: 5865490
OSD pid info: Unix process pid: 5865490, image: oracle@p550-zjgl1
Short stack dump:
ksdxfstk+002c<-ksdxcb+04e4<-sspuser+0074<-000044F0<-skgpwwait+00bc<-ksliwat+06c0<-kslwaitns_timed+0024<-kskthbwt+022c<-kslwait+00f4<-kksLockWait+01fc<-kgxWait+0168<-kgxExclusive+00bc<-kksFreeHeapGetMutex+0158<-kksCursorFreeCallBack+0088<-kglobf0+0264<-kglhpd_internal+0228<-kglhpd+0010<-kghfrx+0028<-kghfrunp+0b44<-kghfnd+07e8<-kghalo+0a24<-ksp_param_handle_alloc+0168<-kspcrec+01bc<-ksucre+0408<-ksucresg+017c<-kpolna+02c4<-kpogsk+00c4<-opiodr+0ae0<-ttcpip+1020<-opitsk+1124<-opiino+0990<-opiodr+0ae0<-opidrv+0484<-sou2o+0090<-opimai_real+01bc<-main+0098<-__start+0098
Dump of memory from 0x070000006F657380 to 0x070000006F657588
70000006F657380 00000004 00000000 07000000 6946E648 [............iF.H]
70000006F657390 00000010 000313A7 07000000 2ED1A930 [...............0]
70000006F6573A0 00000003 000313A7 07000000 6FD593E0 [............o...]
70000006F6573B0 0000000B 000313A7 07000000 6F9CBEA8 [............o...]
70000006F6573C0 00000004 0003129B 00000000 00000000 [................]
70000006F6573D0 00000000 00000000 00000000 00000000 [................]
Repeat 26 times
70000006F657580 00000000 00000000 [........]
*** 2021-04-06 10:41:42.821
PMON unable to acquire latch 7000000100e7668 Child shared pool level=7 child#=1
Location from where latch is held: kghfrunp: alloc: session dur:
Context saved from call: 0
state=busy, wlstate=free

通过筛选possible holder关键字发现：

可能时pid=92 130的两个进程持有资源导致pmon异常。

停库前做了 HANG ANALYSIS，根据输出的文件，只找到了 92 这个pid的信息

印象中，10点此数据库不是非常忙，应该是解析方面出了问题，如果杀掉92这个会话可能会轻量级解决。

下次再发生类似问题时，优先看pmon的trc。

搜集信息方法：

sqlplus -prelim / as sysdba
oradebug setmypid
oradebug unlimit
oradebug hanganalyze 3
oradebug dump systemstate 258
等待30秒
oradebug hanganalyze 3
oradebug dump systemstate 258
oradebug tracefile_name
exit

或者

alter session set events 'immediate trace name systemstate level 258';
alter session set events 'immediate trace name hanganalyze level 3';
等待1分钟
alter session set events 'immediate trace name systemstate level 258';
alter session set events 'immediate trace name hanganalyze level 3';

参考：
Patch For Bug:5377099 Is Superseded By Patch For Bug:8426816 (Doc ID 817033.1)

其他参考：
awk -f ass109.awk /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_9976.trc

ass109.zip
https://fritshoogland.files.wordpress.com/2020/04/mutexes-2.pdf
https://andreynikolaev.files.wordpress.com/2016/03/latches_and_mutexes_in_oracle_12c.pdf

阅读(1284) | 评论(0) | 转发(0) |

上一篇：19c 节点2打补丁时遇到 oui-patch.xml 权限不够

下一篇：ahf 再出发，单机安装

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6