全部博文(27)
分类: Oracle
2011-05-16 15:13:29
摘要:
近期处理了一个故障拿出来供大家分享,应用的一套单机测试库异常关闭后无法启动,请求帮助分析处理,凭借之前处理过类似问题的经验,及时处理后顺利打开该故障数据库。
操作流程如下:
从局方那边拿到故障数据库主机的root用户和密码后我开始分析该问题:
AIX
Version 5
Copyright
IBM Corporation, 1982, 2008.
login:
root
root's
Password:
*******************************************************************************
*
*
* *
* Welcome to AIX Version 5.3!
*
*
*
*
*
* Please see the README file in /usr/lpp/bos
for information pertinent to
*
* this release of the AIX Operating
System.
*
*
*
*
*
*******************************************************************************
Last
unsuccessful login: Tue Apr 26 11:17:02 BEIST 2011 on /dev/pts/3 from
10.223.26.2
Last
login: Thu Apr 28 17:33:33 BEIST 2011 on /dev/pts/3 from
10.224.17.147
p595_1:/#
id
uid=0(root)
gid=0(system)
groups=2(bin),3(sys),7(security),8(cron),10(audit),11(lp)
p595_1:/#
su - ora9i
ora9i:p595_1:/oracle/tbcs>
id
uid=212(ora9i)
gid=206(dba) groups=101(hagsuser)
ora9i:p595_1:/oracle/tbcs>
ls -ltr
total
8
drwxrwxrwx 5 ora9i
dba 256 Mar 26 08:15
admin
-rw-r--r-- 1 ora9i
dba 1754 Mar 26 08:22
jfyy_new.ora
ora9i:p595_1:/oracle/tbcs>
sqlplus "/ as sysdba"
SQL*Plus:
Release
Copyright
(c) 1982, 2002, Oracle Corporation. All
rights reserved.
Connected
to:
Oracle9i
Enterprise Edition Release
With
the Partitioning and Real Application Clusters options
JServer
Release
Connected
to an idle instance.
SQL>
startup pfile = 'jfyy_new.ora'
ORA-27123:
unable to attach to shared memory segment
IBM AIX
RISC System/6000 Error: 13: Permission denied
Ok,终于也见到了他们反映的报错了。
根据之前处理过类似问题的经验,初步怀疑:
①:数据库在主机分配的共享内存段未释放。
②:数据库在主机上占用的主机内存信号量未释放。
根据这个思路我进行如下操作:
处理过程:
①:查看现在主机的共享内存段的情况:
p595_1:/#
ipcs -mop
IPC
status from /dev/mem as of Thu Apr 21 10:30:48 BEIST 2011
T ID
KEY MODE OWNER
GROUP NATTCH CPID LPID
Shared
Memory:
m 2097152 0x670010ae --rw-r--r-- root
system 3 286960
462952
m 1048577 0xffffffff --rw-rw---- root
system 1 139376
139376
m 1048578 0x78000009 --rw-rw-rw- root
system 2 266374
368716
m 3 0xffffffff --rw-rw---- root
system 1 139376
139376
m 4 0x0d000cfe --rw-rw---- root
system 11 397378
3154128
m 5 0x680010ae --rw-r--r-- root
system 3 286960
462952
m 6 0x700010ae --rw------- root
system 3 286960
462952
m 7 0xffffffff --rw-rw---- root
haemrm 1 462952
462952
m 8 0xffffffff --rw-rw---- root
system 1 139376
139376
m 15728649 0x4205efec --rw-r----- oracle
dba 89 1015914
1908752
m 5242890 0xb9561818 --rw-r----- ora
m 12582923 0xd7eabc80 --rw-r----- ora9i
dba 0 1515640
2527412
==>可以看出内存段确实未释放
②:查看主机的内存信号量
p595_1:/#
ipcs -sa
IPC
status from /dev/mem as of Thu Apr 21 10:30:12 BEIST 2011
T ID
KEY MODE OWNER
GROUP CREATOR CGROUP NSEMS OTIME
CTIME
Semaphores:
s 7340032 0x6300b757 --ra-ra---- dsg
dba dsg dba
48 10:30:12 11:37:34
s 1 0x
s 4194306 0x63007065 --ra-ra---- dsg
dba dsg dba
48 10:30:12 11:33:10
s 3145731 0x58001084 --ra-ra-r-- root
system root system
1 10:40:03 10:40:03
s 4194308 0x690010ae --ra-ra-ra- root
system root system
2 10:28:13 10:40:00
s 3145733 0x010000fe --ra------- root
system root system
1 10:16:58 10:31:11
s 6 0x63007000 --ra-ra---- dsg
dba dsg dba
48 9:39:20
11:39:12
s 7 0x63007021 --ra-ra---- dsg
dba dsg dba
48 10:30:12 11:41:38
s 8 0x6300b6de --ra-ra---- dsg
dba dsg dba
48 10:30:11 11:42:38
s 1048585 0x
结论:未发现内存信号量
Ok,至此该问题已经一目了然了,符合之前的推论①:数据库在主机分配的共享内存段未释放。
于是我开始清除这个共享内存段:
p595_1:/#
ipcrm -m 12582923
再次验证:
操作结束后,我再次查看主机内存的共享内存段使用情况:
p595_1:/#
ipcs -mop
IPC
status from /dev/mem as of Thu Apr 21 10:31:48 BEIST 2011
T ID
KEY MODE OWNER
GROUP NATTCH CPID LPID
Shared
Memory:
m 2097152 0x670010ae --rw-r--r-- root
system 3 286960
462952
m 1048577 0xffffffff --rw-rw---- root
system 1 139376
139376
m 1048578 0x78000009 --rw-rw-rw- root
system 2 266374
368716
m 3 0xffffffff --rw-rw---- root
system 1 139376
139376
m 4 0x0d000cfe --rw-rw---- root
system 11 397378
3154128
m 5 0x680010ae --rw-r--r-- root
system 3 286960 462952
m 6 0x700010ae --rw------- root
system 3 286960
462952
m 7 0xffffffff --rw-rw---- root
haemrm 1 462952
462952
m 8 0xffffffff --rw-rw---- root
system 1 139376
139376
m 15728649 0x4205efec --rw-r----- oracle
dba 89 1015914
1908752
m 5242890 0xb9561818 --rw-r----- ora
经过确认已经彻底清除,下面我再次启动数据库:
SQL> startup
pfile='jfyy_new.ora';
ORACLE
instance started.
Total
System Global Area 1.1266E+10 bytes
Fixed
Size 758648
bytes
Variable
Size 1577058304
bytes
Database
Buffers 9630121984
bytes
Redo
Buffers 57946112
bytes
Database
mounted.
Database
opened.
SQL>
select status from v$instance;
STATUS
------------
OPEN
SQL>
show parameter instance_name
NAME TYPE VALUE
------------------------------------
---------------------- ------------------------------
instance_name string tbcs1
结论:
至此该故障顺利解决。