今天客户要进行应用发布,首先在测试环境进行测试,在测试环境测试索引重建的时候报如下错误:
ORA-04030: 在尝试分配 64544 字节 (sort subheap,sort key) 时进程内存不足。
客户在将这个问题发给我的时候,首先让他们检查一下操作系统ORACLE用户资源的限制。
因为根据以往的经验这个问题往往是oracle用户进程的data seg size限制导致的。
但是通过客户过来的信息发现 oracle用户程序数据段并没有任何限制。
$ id oracle
uid=400(oracle) gid=400(oinstall) groups=401(dba)
$ ulimit -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) unlimited
memory(kbytes) unlimited
coredump(blocks) unlimited
nofiles(descriptors) 2000
$ ulimit -Ha
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) unlimited
memory(kbytes) unlimited
coredump(blocks) unlimited
nofiles(descriptors) unlimited
然后检查了一下客户的SGA和PGA设置:
SQL> show parameter sga
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
lock_sga boolean FALSE
pre_page_sga boolean FALSE
sga_max_size big integer 15G
sga_target big integer 14G
SQL> show parameter pga
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 8G
$ lsattr -El mem0
goodsize 40960 Amount of usable physical memory in Mbytes False
size 40960 Total amount of physical memory in Mbytes False
客户的SGA设置为14G,PGA设置为8G ,而操作系统的整个内存是40G,内存远远够用的。
C:\Documents and Settings\shoupeng.yan>sqlplus xxxx/xxxx
SQL*Plus: Release 10.2.0.1.0 - Production on 星期三 3月 21 20:00:37 2012
Copyright (c) 1982, 2005, Oracle. All rights reserved.
连接到:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining Scoring Engine
and Real Application Testing options
SQL> ALTER INDEX PK_ZJ_BDZTB_01 REBUILD;
ALTER INDEX PK_ZJ_BDZTB_01 REBUILD
*
第 1 行出现错误:
ORA-04030: 在尝试分配 64544 字节 (sort subheap,sort key) 时进程内存不足
我通过windows下的sqlplus远程连到数据库中,亲自执行了一下索引重建,在重建的过程中顺监控了一下进程的PGA使用量。发现进程的PGA使用量大约在110M的时候就抛出ORA-04030错误了。而系统的整个PGA设置为8G,不应该不够用。
因此我还是认为os limits限制导致的问题原因,既然单个进程的内存段有最大限制,我们可以通过开启索引重建并行度以便每个进程占用的数据段减少一半以上。
在我开启了2个并行度之后,索引确实重建成功了。
SQL> ALTER INDEX PK_ZJ_BDZTB_01 REBUILD PARALLEL 2;
索引已更改。
更奇怪的事情是:我直接登录到数据库服务器上进程索引重建,没有加并行度也可以重建成功。
$ id oracle
uid=400(oracle) gid=400(oinstall) groups=401(dba)
$ sqlplus xxxx/xxxx
SQL*Plus: Release 10.2.0.4.0 - Production on Wed Mar 21 20:07:20 2012
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining Scoring Engine
and Real Application Testing options
SQL> ALTER INDEX PK_ZJ_BDZTB_01 REBUILD;
Index altered.
一个是通过Windows客户端连接到数据库上的,一个是通过本地sqlplus登录到数据库中的,一个重建索引不成功,一个能成功。唯一的区别是,客户端连接时通过监听器,那么问题应该出现在监听器上。
为此我做了一个假设:虽然目前的ulimit -a显示的结果表示ORACLE用户进程的数据段没有限制,但是这个修改很可能是在监听器启动之后,监听器启动之后继承了先前的ulimits的所有设置,而在此后重新修改的并没有反馈到监听器中。而通过客户端sqlplus连到数据库中的时候,
监听器派生出的服务器进程继承了监听器中的ulimits相关的设置,导致服务器进程的数据段还是有限制的。而通过本地连接的数据库不通过监听器,直接继承了当前的设置,所以本地连接建立索引不会报ORA-4030错误。
然后在OS上通过smit.script 我找了当前修改oracle用户的limits限制的脚本:
# [Dec 21 2010, 16:36:27]
#
x() {
if [ $# -ge 2 ]
then
for i in "$@"
do
spam="$spam \"$i\""
done
eval chuser $spam
fi
}
x data='-1' stack='-1' oracle
这条命令是在2010年12月21日执行的。
而监听器是在10-DEC-2010 09:44:38 启动的,到目前运行了467天,也就是监听器在chuser data='-1' stack='-1' oracle 命令执行之前就启动了。
$ lsnrctl status
LSNRCTL for IBM/AIX RISC System/6000: Version 10.2.0.4.0 - Production on 21-MAR-2012 20:11:18
Copyright (c) 1991, 2007, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=SXRYXDB)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for IBM/AIX RISC System/6000: Version 10.2.0.4.0 - Production
Start Date 10-DEC-2010 09:44:38
Uptime 467 days 10 hr. 26 min. 40 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Log File /u01/oracle/product/db10gr2/network/log/listener.log
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=SXRYXDB)(PORT=1521)))
Services Summary...
Service "zyxdb" has 1 instance(s).
Instance "zyxdb", status READY, has 1 handler(s) for this service...
Service "zyxdb_XPT" has 1 instance(s).
Instance "zyxdb", status READY, has 1 handler(s) for this service...
The command completed successfully
既然问题是由于监听器导致的,那么重启一下监听器,重新读取修改过的ulimits,即可解决问题。
$ lsnrctl stop
LSNRCTL for IBM/AIX RISC System/6000: Version 10.2.0.4.0 - Production on 21-MAR-2012 20:11:32
Copyright (c) 1991, 2007, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=SXRYXDB)(PORT=1521)))
The command completed successfully
$ lsnrctl start
LSNRCTL for IBM/AIX RISC System/6000: Version 10.2.0.4.0 - Production on 21-MAR-2012 20:11:43
Copyright (c) 1991, 2007, Oracle. All rights reserved.
Starting /u01/oracle/product/db10gr2/bin/tnslsnr: please wait...
TNSLSNR for IBM/AIX RISC System/6000: Version 10.2.0.4.0 - Production
System parameter file is /u01/oracle/product/db10gr2/network/admin/listener.ora
Log messages written to /u01/oracle/product/db10gr2/network/log/listener.log
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=SXRYXDB)(PORT=1521)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC0)))
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=SXRYXDB)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for IBM/AIX RISC System/6000: Version 10.2.0.4.0 - Production
Start Date 21-MAR-2012 20:11:43
Uptime 0 days 0 hr. 0 min. 0 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/oracle/product/db10gr2/network/admin/listener.ora
Listener Log File /u01/oracle/product/db10gr2/network/log/listener.log
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=SXRYXDB)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC0)))
Services Summary...
Service "PL***tProc" has 1 instance(s).
Instance "PL***tProc", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.4.0 - Production on Wed Mar 21 20:11:47 2012
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining Scoring Engine
and Real Application Testing options
SQL> alter system register;
System altered.
SQL>
再次通过客户端执行索引重建命令问题解决:
C:\Documents and Settings\shoupeng.yan>sqlplus xxxx/xxxx
SQL*Plus: Release 10.2.0.1.0 - Production on 星期三 3月 21 20:12:18 2012
Copyright (c) 1982, 2005, Oracle. All rights reserved.
连接到:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining Scoring Engine
and Real Application Testing options
SQL> ALTER INDEX PK_ZJ_BDZTB_01 REBUILD;
索引已更改。
SQL>