在做一次大的DML操作后,UNDO不断增长,对外服务中止,数据库无法立即关闭,在我“Shutting down instance: further logons disabled“后,依然有新的Undo产生,强行重启os,oracle依靠还算强壮的自恢复机制恢复正常,相关日志如下,高手应该能从日志中给出一些调优建议:
-----------
Fri Nov 28 09:49:06 2008
Thread 1 advanced to log sequence 111270
Current log# 1 seq# 111270 mem# 0: /opt/oracle/oradata/jlt/redo01.log
Fri Nov 28 10:03:59 2008
Thread 1 advanced to log sequence 111271
Current log# 4 seq# 111271 mem# 0: /data/jlt/log4.ora
Fri Nov 28 10:10:25 2008
Thread 1 advanced to log sequence 111272
Current log# 5 seq# 111272 mem# 0: /data/jlt/log5.ora
Fri Nov 28 10:14:31 2008
Thread 1 advanced to log sequence 111273
Current log# 2 seq# 111273 mem# 0: /opt/oracle/oradata/jlt/redo02.log
Fri Nov 28 10:24:16 2008
Undo Segment 11 Onlined
Fri Nov 28 10:24:27 2008
Created Undo Segment _SYSSMU12$
Fri Nov 28 10:24:27 2008
Undo Segment 12 Onlined
Fri Nov 28 10:24:27 2008
Created Undo Segment _SYSSMU13$
Undo Segment 13 Onlined
Fri Nov 28 10:25:36 2008
Created Undo Segment _SYSSMU14$
Undo Segment 14 Onlined
Fri Nov 28 10:27:27 2008
Created Undo Segment _SYSSMU15$
Undo Segment 15 Onlined
Fri Nov 28 10:27:47 2008
Created Undo Segment _SYSSMU16$
Undo Segment 16 Onlined
Fri Nov 28 10:27:47 2008
Created Undo Segment _SYSSMU17$
Undo Segment 17 Onlined
Fri Nov 28 10:28:43 2008
Created Undo Segment _SYSSMU18$
Undo Segment 18 Onlined
Fri Nov 28 10:28:59 2008
Created Undo Segment _SYSSMU19$
Undo Segment 19 Onlined
Fri Nov 28 10:29:05 2008
Created Undo Segment _SYSSMU20$
Undo Segment 20 Onlined
Fri Nov 28 10:29:06 2008
Created Undo Segment _SYSSMU21$
Undo Segment 21 Onlined
Fri Nov 28 10:29:14 2008
Created Undo Segment _SYSSMU22$
Undo Segment 22 Onlined
Fri Nov 28 10:29:20 2008
Created Undo Segment _SYSSMU23$
Undo Segment 23 Onlined
Fri Nov 28 10:29:26 2008
Created Undo Segment _SYSSMU24$
Undo Segment 24 Onlined
Fri Nov 28 10:32:08 2008
Undo Segment 25 Onlined
Fri Nov 28 10:32:08 2008
Created Undo Segment _SYSSMU25$
Created Undo Segment _SYSSMU26$
Undo Segment 26 Onlined
Fri Nov 28 10:33:44 2008
Created Undo Segment _SYSSMU27$
Undo Segment 27 Onlined
Fri Nov 28 10:35:15 2008
Created Undo Segment _SYSSMU28$
Undo Segment 28 Onlined
Fri Nov 28 10:35:23 2008
Created Undo Segment _SYSSMU29$
Undo Segment 29 Onlined
Fri Nov 28 10:35:42 2008
Created Undo Segment _SYSSMU30$
Undo Segment 30 Onlined
Fri Nov 28 10:36:20 2008
Created Undo Segment _SYSSMU31$
Undo Segment 31 Onlined
Fri Nov 28 10:54:59 2008
Thread 1 advanced to log sequence 111274
Current log# 3 seq# 111274 mem# 0: /opt/oracle/oradata/jlt/redo03.log
Fri Nov 28 11:12:41 2008
Created Undo Segment _SYSSMU32$
Fri Nov 28 11:12:41 2008
Undo Segment 32 Onlined
Fri Nov 28 11:16:28 2008
Created Undo Segment _SYSSMU33$
Undo Segment 33 Onlined
Fri Nov 28 11:17:10 2008
Shutting down instance: further logons disabled
Fri Nov 28 11:17:34 2008
Created Undo Segment _SYSSMU34$
Undo Segment 34 Onlined
Fri Nov 28 11:29:34 2008
Starting ORACLE instance (normal)
Fri Nov 28 11:29:34 2008
WARNING: EINVAL creating segment of size 0x0000000010400000
fix shm parameters in /etc/system or equivalent
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
SCN scheme 2
Using log_archive_dest parameter default value
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 9.2.0.1.0.
System parameters with non-default values:
processes = 800
timed_statistics = TRUE
shared_pool_size = 83886080
large_pool_size = 16777216
java_pool_size = 83886080
control_files = /opt/oracle/oradata/jlt/control01.ctl, /opt/oracle/oradata/jlt/control02.ctl, /opt/oracle/oradata/jlt/control03.ctl
db_block_size = 8192
db_cache_size = 33554432
compatible = 9.2.0.0.0
db_file_multiblock_read_count= 16
fast_start_mttr_target = 300
undo_management = AUTO
undo_tablespace = UNDOTBS1
undo_retention = 10800
remote_login_passwordfile= EXCLUSIVE
db_domain = jh.com
instance_name = jlt
dispatchers = (PROTOCOL=TCP) (SERVICE=jltXDB)
job_queue_processes = 10
hash_join_enabled = TRUE
background_dump_dest = /opt/oracle/admin/jlt/bdump
user_dump_dest = /opt/oracle/admin/jlt/udump
core_dump_dest = /opt/oracle/admin/jlt/cdump
sort_area_size = 524288
db_name = jlt
open_cursors = 3000
star_transformation_enabled= FALSE
query_rewrite_enabled = FALSE
pga_aggregate_target = 25165824
aq_tm_processes = 1
PMON started with pid=3
DBW0 started with pid=4
LGWR started with pid=2
CKPT started with pid=5
SMON started with pid=6
RECO started with pid=7
CJQ0 started with pid=8
QMN0 started with pid=9
Fri Nov 28 11:29:35 2008
starting up 1 shared server(s) ...
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
Fri Nov 28 11:29:36 2008
/* OracleOEM */ ALTER DATABASE MOUNT
Fri Nov 28 11:29:41 2008
Successful mount of redo thread 1, with mount id 2999313696.
Fri Nov 28 11:29:41 2008
Database mounted in Exclusive Mode.
Completed: /* OracleOEM */ ALTER DATABASE MOUNT
Fri Nov 28 11:29:41 2008
/* OracleOEM */ ALTER DATABASE OPEN
Fri Nov 28 11:29:41 2008
Beginning crash recovery of 1 threads
Fri Nov 28 11:29:41 2008
Started first pass scan
Fri Nov 28 11:29:50 2008
Completed first pass scan
525585 redo blocks read, 6072 data blocks need recovery
Fri Nov 28 11:30:02 2008
Started recovery at
Thread 1: logseq 111271, block 176369, scn 0.0
Recovery of Online Redo Log: Thread 1 Group 4 Seq 111271 Reading mem 0
Mem# 0 errs 0: /data/jlt/log4.ora
Recovery of Online Redo Log: Thread 1 Group 5 Seq 111272 Reading mem 0
Mem# 0 errs 0: /data/jlt/log5.ora
Recovery of Online Redo Log: Thread 1 Group 2 Seq 111273 Reading mem 0
Mem# 0 errs 0: /opt/oracle/oradata/jlt/redo02.log
Recovery of Online Redo Log: Thread 1 Group 3 Seq 111274 Reading mem 0
Mem# 0 errs 0: /opt/oracle/oradata/jlt/redo03.log
Fri Nov 28 11:30:07 2008
Recovery set larger than buffer cache: start next pass!
Recovery of Online Redo Log: Thread 1 Group 4 Seq 111271 Reading mem 0
Mem# 0 errs 0: /data/jlt/log4.ora
Recovery of Online Redo Log: Thread 1 Group 5 Seq 111272 Reading mem 0
Mem# 0 errs 0: /data/jlt/log5.ora
Recovery of Online Redo Log: Thread 1 Group 2 Seq 111273 Reading mem 0
Mem# 0 errs 0: /opt/oracle/oradata/jlt/redo02.log
Recovery of Online Redo Log: Thread 1 Group 3 Seq 111274 Reading mem 0
Mem# 0 errs 0: /opt/oracle/oradata/jlt/redo03.log
Fri Nov 28 11:30:30 2008
Ended recovery at
Thread 1: logseq 111274, block 87563, scn 2.2358780168
6072 data blocks read, 3065 data blocks written, 525585 redo blocks read
Crash recovery completed successfully
Fri Nov 28 11:30:30 2008
Thread 1 advanced to log sequence 111275
Thread 1 opened at log sequence 111275
Current log# 1 seq# 111275 mem# 0: /opt/oracle/oradata/jlt/redo01.log
Successful open of redo thread 1.
Fri Nov 28 11:30:30 2008
SMON: enabling cache recovery
Fri Nov 28 11:30:31 2008
Undo Segment 1 Onlined
Undo Segment 2 Onlined
Undo Segment 3 Onlined
Undo Segment 4 Onlined
Undo Segment 5 Onlined
Undo Segment 6 Onlined
Undo Segment 7 Onlined
Undo Segment 8 Onlined
Undo Segment 9 Onlined
Undo Segment 10 Onlined
Successfully onlined Undo Tablespace 1.
Fri Nov 28 11:30:31 2008
SMON: enabling tx recovery
Fri Nov 28 11:30:31 2008
Database Characterset is ZHS16GBK
Fri Nov 28 11:30:32 2008
SMON: about to recover undo segment 11
SMON: mark undo segment 11 as available
SMON: about to recover undo segment 12
SMON: mark undo segment 12 as available
SMON: about to recover undo segment 13
SMON: mark undo segment 13 as available
SMON: about to recover undo segment 14
SMON: mark undo segment 14 as available
SMON: about to recover undo segment 15
SMON: mark undo segment 15 as available
SMON: about to recover undo segment 16
SMON: mark undo segment 16 as available
SMON: about to recover undo segment 18
SMON: mark undo segment 18 as available
SMON: about to recover undo segment 19
SMON: mark undo segment 19 as available
SMON: about to recover undo segment 20
SMON: mark undo segment 20 as available
SMON: about to recover undo segment 21
SMON: mark undo segment 21 as available
SMON: about to recover undo segment 22
SMON: mark undo segment 22 as available
SMON: about to recover undo segment 23
SMON: mark undo segment 23 as available
SMON: about to recover undo segment 24
SMON: mark undo segment 24 as available
SMON: about to recover undo segment 25
SMON: mark undo segment 25 as available
SMON: about to recover undo segment 26
SMON: mark undo segment 26 as available
SMON: about to recover undo segment 27
SMON: mark undo segment 27 as available
SMON: about to recover undo segment 28
SMON: mark undo segment 28 as available
SMON: about to recover undo segment 29
SMON: mark undo segment 29 as available
SMON: about to recover undo segment 30
SMON: mark undo segment 30 as available
SMON: about to recover undo segment 33
SMON: mark undo segment 33 as available
Fri Nov 28 11:30:35 2008
replication_dependency_tracking turned off (no async multimaster replication found)
Completed: /* OracleOEM */ ALTER DATABASE OPEN
Fri Nov 28 11:32:25 2008
SMON: Parallel transaction recovery tried
Fri Nov 28 11:43:01 2008
Thread 1 advanced to log sequence 111276
Current log# 4 seq# 111276 mem# 0: /data/jlt/log4.ora
Fri Nov 28 11:52:39 2008
Thread 1 advanced to log sequence 111277
Current log# 5 seq# 111277 mem# 0: /data/jlt/log5.ora
---------------
在itpub上找到一片有用的文章:
-------
SMON: Parallel transaction recovery tried 问题 解决思路
大家好,今天维护人员说数据库一直报 SMON: Parallel transaction recovery tried
下面是我解决问题的思路,希望对大家以后解决问题有一定的帮助.
1. 报错 ,故障现象
tali -100 alert_ssss.log
SMON: Parallel transaction recovery tried
Wed Aug 9 07:32:01 2006
SMON: Parallel transaction recovery tried
Wed Aug 9 08:02:11 2006
SMON: Parallel transaction recovery tried
Wed Aug 9 08:17:13 2006
SMON: Parallel transaction recovery tried
Wed Aug 9 08:47:23 2006
SMON: Parallel transaction recovery tried
Wed Aug 9 09:02:27 2006
SMON: Parallel transaction recovery tried
大约每隔半个小时报一次
查询原因
一. top
CPU states: 98.0% idle, 0.0% user, 1.6% kernel, 0.4% iowait, 0.0% swap
Memory: 4.0G real, 815M free, 2.4G swap in use, 4.7G swap free
内存正常
swap 正常
二. vi /etc/system
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmmax=2147483648
set semsys:seminfo_semmns=310
set semsys:seminfo_semmni=110
set shmsys:shminfo_shmseg=128
set shmsys:shminfo_shmmni=1024
最大共享内存正常
三. 检查 sga 参数
sqlplus "/ as sysdba"
SQL*Plus: Release 9.2.0.1.0 - Production on Wed Aug 9 09:56:48 2006
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production
SQL> show sga
Total System Global Area 1813482680 bytes
Fixed Size 732344 bytes
Variable Size 754974720 bytes
Database Buffers 1056964608 bytes
Redo Buffers 811008 bytes
SQL> show parameter share
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
hi_shared_memory_address integer 0
max_shared_servers integer 20
shared_memory_address integer 0
shared_pool_reserved_size big integer 26843545
shared_pool_size big integer 536870912
shared_server_sessions integer 0
shared_servers integer 0
SQL> show parameter large
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
large_pool_size big integer 67108864
SQL> show parameter db_cache
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_cache_advice string ON
db_cache_size big integer 1056964608
SQL> archive log list;
Database log mode No Archive Mode
Automatic archival Disabled
Archive destination /opt/oracle/product/9.2.0/dbs/arch
Oldest online log sequence 438
Current log sequence 440
结论: sga参数正常
四 。 检查数据库进程
ps -ef|grep ora_
oracle 6151 1 0 09:47:43 ? 0:00 ora_p003_spmcdb
oracle 1269 1 0 Oct 24 ? 0:00 ora_reco_spmcdb
oracle 1271 1 0 Oct 24 ? 0:01 ora_cjq0_spmcdb
oracle 1267 1 0 Oct 24 ? 11:36 ora_smon_spmcdb
oracle 1263 1 0 Oct 24 ? 38:00 ora_lgwr_spmcdb
oracle 1259 1 0 Oct 24 ? 0:01 ora_pmon_spmcdb
oracle 1273 1 0 Oct 24 ? 37:45 ora_qmn0_spmcdb
oracle 1261 1 0 Oct 24 ? 9:39 ora_dbw0_spmcdb
oracle 1265 1 0 Oct 24 ? 103:58 ora_ckpt_spmcdb
oracle 6148 1 0 09:47:43 ? 0:00 ora_p002_spmcdb
oracle 6146 1 0 09:47:43 ? 0:00 ora_p001_spmcdb
oracle 6142 1 0 09:47:43 ? 0:00 ora_p000_spmcdb
发现
ora_p002_spmcdb
ora_p001_spmcdb
ora_p000_spmcdb
ora_p003_spmcdb
结论: 这个地方不正常。这些进程是非常不正常的,说明数据库在清理无用资源不能清理,而导致的,需要 dba 手工干预,重新启动数据库
五 。 检查是不是有系统告警
dmesg
Wed Aug 9 10:00:25 CST 2006
May 12 00:46:14 SPMC-SEV login: [ID 760094 auth.crit] REPEATED LOGIN FAILURES ON /dev/pts/1 FROM 192.168.133.62
May 31 01:55:12 SPMC-SEV su: [ID 810491 auth.crit] 'su root' failed for fdgw on /dev/pts/1
Jul 26 10:07:04 SPMC-SEV su: [ID 810491 auth.crit] 'su spmc_sh' failed for root on /dev/pts/1
Jul 26 11:17:47 SPMC-SEV su: [ID 810491 auth.crit] 'su oracle' failed for root on /dev/pts/1
Jul 27 10:39:49 SPMC-SEV login: [ID 658745 auth.crit] REPEATED LOGIN FAILURES ON /dev/pts/2 FROM 192.168.133.101, root
Jul 27 10:43:04 SPMC-SEV su: [ID 810491 auth.crit] 'su root' failed for oracle on /dev/pts/4
Jul 27 10:43:33 SPMC-SEV last message repeated 1 time
bash-2.05$
结论:没有操作系统告警
这样,没有找出什么特殊的地方
六。 检查系统瓶颈
select sid ,event from v$session_wait where event not like 'SQL%' and event not like 'rdbms%';
SID EVENT
---------- ----------------------------------------------------------------
9 null event
1 pmon timer
5 smon timer
8 wakeup time manager
结论: 不存在系统瓶颈。
七。 检查是不是有数据库锁表现象
(1)查出锁对应的 表名,session 号,锁的级别 :
select b.object_name ,a.session_id,a.locked_mode from v$locked_object a ,dba_objects b where a.object_id=b.object_id;
no rows selected
结论:没有锁表现象,正常
(2)
select b.object_name ,a.xidusn,a.xidslot,a.xidsqn,a.session_id,c.type,a.process,a.locked_mode from v$locked_object a ,dba_objects b,v$lock c
where a.object_id=b.object_id and a.object_id=c.id1;
no rows selected
结论:没有锁表现象,正常。
八。 检查在线日志文件状态。
SQL> select * from v$log;
GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS
---------- ---------- ---------- ---------- ---------- --- ----------------
FIRST_CHANGE# FIRST_TIM
------------- ---------
1 1 439 104857600 1 NO INACTIVE
3057207138 07-AUG-06
2 1 440 104857600 1 NO CURRENT
3058948723 09-AUG-06
3 1 438 104857600 1 NO INACTIVE
3051826063 04-AUG-06
结论:
决定重新启动数据库
$sqlplus "/ as sysdba"
sql>shutdown immediate;
sql>startup
问题得以解决