昨天机房搞网络灾备切换演练,核心数据库在网络中断恢复之后,session数量突然增高,一般数据库的sessoin在475左右,突然接近600个会话,
因为网络中断之后,会话没有及时释放资源,
导致会话数量居高不下,下面是解决过程:
(注:因为昨天在处理的的时候,来不及整理现场操作的资料,现在虚拟机上实验证明一下,思路是一样的)
1.测试用户xiangyang,开启4个会话连接数据库(真实环境一般是用JDBC_Thin_Client程序连接数据库):
D:\Documents and Settings\Administrator>sqlplus /nolog
SQL*Plus: Release 9.2.0.8.0 - Production on 星期日 3月 13 21:35:31 2011
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
SQL> conn xiangyang/xiangyang
已连接。
SQL>
D:\Documents and Settings\Administrator>sqlplus /nolog
SQL*Plus: Release 9.2.0.8.0 - Production on 星期日 3月 13 21:35:31 2011
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
SQL> conn xiangyang/xiangyang
已连接。
SQL>
D:\Documents and Settings\Administrator>sqlplus /nolog
SQL*Plus: Release 9.2.0.8.0 - Production on 星期日 3月 13 21:35:31 2011
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
SQL> conn xiangyang/xiangyang
已连接。
SQL>
D:\Documents and Settings\Administrator>sqlplus /nolog
SQL*Plus: Release 9.2.0.8.0 - Production on 星期日 3月 13 21:35:31 2011
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
SQL> conn xiangyang/xiangyang
已连接。
SQL>
2.查看该用户在数据库的连接数量:
select username,program,terminal,machine,count(1) from v$session where username='XIANGYANG'
group by username,program,terminal,machine order by 5 desc
USERNAME |
PROGRAM |
TERMINAL |
MACHINE |
COUNT(1) |
XIANGYANG |
sqlplus.exe |
SUN-001 |
WORKGROUP\SUN-001 |
4 |
用户名为xiangyang连在数据库的会话数量是4个.
3.查看xiangyang用户的会话数量的详细信息:
select b.*, 'alter system kill session' || '''' || b.sid || ',' || b.SERIAL# || '''' || ';'
from v$session b
where b.type not like 'BACKGROUND'
and b.username='XIANGYANG'
SID SERIAL# PADDR USERNAME STATUS OSUSER PROCESS MACHINE TERMINAL PROGRAM TYPE
19 78 2A243054 XIANGYANG INACTIVE SUN-001\Administrator WORKGROUP\SUN-001 SUN-001 sqlplus.exe 2011-3-13 21:35 891 alter system kill session'19,78';
20 38 2A24343C XIANGYANG INACTIVE SUN-001\Administrator WORKGROUP\SUN-001 SUN-001 sqlplus.exe 2011-3-13 21:35 873 alter system kill session'20,38';
21 30 2A243824 XIANGYANG INACTIVE SUN-001\Administrator WORKGROUP\SUN-001 SUN-001 sqlplus.exe 2011-3-13 21:36 858 alter system kill session'21,30';
23 35 2A243FF4 XIANGYANG INACTIVE SUN-001\Administrator WORKGROUP\SUN-001 SUN-001 sqlplus.exe 2011-3-13 21:36 837 alter system kill session'23,35';
由于会话数量过多,kill掉会话:
alter system kill session'19,78';
alter system kill session'20,38';
alter system kill session'21,30';
alter system kill session'23,35';
4.查看会话状态:
select sid,serial#,paddr,username,status from v$session where status='KILLED';
SID SERIAL# PADDR USERNAME STATUS
19 78 2A263E30 XIANGYANG KILLED
20 38 2A263E30 XIANGYANG KILLED
21 30 2A263E30 XIANGYANG KILLED
23 35 2A263E30 XIANGYANG KILLED
5.但是会话所占用的资源并没有释放,也就是还独占了连接数:
select username,program,terminal,machine,count(1) from v$session where username='XIANGYANG'
group by username,program,terminal,machine order by 5 desc
USERNAME |
PROGRAM |
TERMINAL |
MACHINE |
COUNT(1) |
XIANGYANG |
sqlplus.exe |
SUN-001 |
WORKGROUP\SUN-001 |
4 |
用户XIANGYANG持有的会话数仍然是4个,资源并没有释放.
需要kill掉会话所在的操作
系统进程,使用下面的SQL找出会话的进程:
select spid from v$process where addr in(select paddr from v$session where status='KILLED');
SPID
-------
没有任何显示,说明状态为KILL的会话的进程找不到了,通过v$process.addr=v$session
.paddr的方式.什么原因?
6.查看一下原因:
先看视图v$session
select s.username,s.sid,s.serial#,s.paddr from v$session s where username='XIANGYANG' and s.status='KILLED';
USERNAME |
SID |
SERIAL# |
PADDR |
XIANGYANG |
19 |
78 |
2A263E30 |
XIANGYANG |
20 |
38 |
2A263E30 |
XIANGYANG |
21 |
30 |
2A263E30 |
XIANGYANG |
23 |
35 |
2A263E30 |
v$session.paddr为2A263E30,状态为KILLED的,用户名为XIANGYANG的4个会话的进程地址为同样的
PADDR:2A263E30,说明是同时指向了另一个地址,也许是虚拟地址,现在没有考证.但是可以证明一点就是
4个session的进程地址通过v$session.paddr的方式找不到了.
再看视图v$process记录的会话进程地址:
select p.addr from v$process p where pid <> 1
ADDR
2A23F9A4
2A23FD8C
2A240174
2A24055C
2A240944
2A240D2C
2A241114
2A2414FC
2A2418E4
2A241CCC
2A2420B4
2A24249C
2A242884
2A242C6C
2A243054
2A24343C
2A243824
2A243C0C
2A243FF4
2A2443DC
20 rows selected
这20个会话的进程地址中间肯定有状态为KILLED会话的进程地址
可以通过一条SQL,把状态为KILLED的会话进程找出来,这里要使用minus这个SQL连接符号.
select p.addr from v$process p where pid <> 1
minus
select s.paddr from v$session s;
ADDR
2A243054
2A24343C
2A243824
2A243FF4
以上这4个进程地址就是状态为KILLED会话的进程地址.
通过minus把v$process拥有的addr,但是v$session没有的paddr找出来了.因为在v$session的paddr
有4个相同的paddr,其实就是替换了上面4个真实的进程地址.
7.现在就可以kill掉状态为KILLED的会话的进程,真正的释放资源了.
通过下面的脚本找出状态为KILLED会话的spid:
select 'kill -9 ' || spid||';',addr
from v$process
where --addr in (select paddr from v$session where username='XIANGYANG');
addr in (select p.addr
from v$process p
where pid <> 1
minus
select s.paddr from v$session s);
UNIX/LINUX系统:
killed ADDR
kill -9 4960; 2A243054
kill -9 3732; 2A24343C
kill -9 5036; 2A243824
kill -9 4952; 2A243FF4
kill -9 4952; 2A243FF4
windows系统:
orakill 实例sid spid
orakill westdb 4960
D:\Documents and Settings\Administrator>orakill westdb 4960
Kill of thread id 4960 in instance westdb successfully signalled.
D:\Documents and Settings\Administrator>orakill westdb 3732
Kill of thread id 3732 in instance westdb successfully signalled.
D:\Documents and Settings\Administrator>orakill westdb 5036
Kill of thread id 5036 in instance westdb successfully signalled.
D:\Documents and Settings\Administrator>orakill westdb 4952
Kill of thread id 4952 in instance westdb successfully signalled.
9.资源得到释放,问题得到解决.
select username,program,terminal,machine,count(1) from v$session where username='XIANGYANG'
group by username,program,terminal,machine order by 5 desc
USERNAME |
PROGRAM |
TERMINAL |
MACHINE |
COUNT(1) |
-----------------------------------------------------
没有任何显示,说明资源已经释放,
至此问题解决.