分类:
2008-03-27 11:43:04
MST STACK TRACE:
0x2ff3b400 (excpt=00000004:0a000000:00000000:00000004:00000106) (intpri=11)
IAR: .compare_and_swap+2c (0000a4ec): stw r9,0x0(r4)
LR: .[aiopin:untie_knot]+a8 (0143d7a8)
2ff3a2e0: .[aio.ext:qlioreq]+b0 (014376ec)
2ff3a340: .[aio.ext:listio]+128 (01438f5c)
2ff3b3c0: .sys_call_ret+0 (00003a6c)
0001113a: lasttocentry+fead9 (00348001)
0452-771: Cannot read return address at address 0x01892c0b.
> le 0000a4ec
No loader entry found for module address 0x0000a4ec
No loader entry found for module named '0000a4ec'
> le 0143d7a8
LoadList entry at 0x04ea7980
Module *start:0x00000000_0143bef0 Module filesize:0x00000000_0000228c
Module *end:0x00000000_0143e17c
*data:0x00000000_0143dbe8 data length:0x00000000_00000594
Use-count:0x0001 load_count:0x0000 *file:0x00000000
flags:0x00000262 TEXT DATAINTEXT DATA DATAEXISTS
*exp:0x04ed8000 *lex:0x00000000 *deferred:0x00000000 expsize:0x6e6c732f
Name: /usr/lib/drivers/aiopin
ndepend:0x0001 maxdepend:0x0001
*depend[00]:0x05039280
*le_next: 04ea7680
> le 014376ec
LoadList entry at 0x04ea7680
Module *start:0x00000000_014348c0 Module filesize:0x00000000_00007624
Module *end:0x00000000_0143bee4
*data:0x00000000_0143a4c0 data length:0x00000000_00001a24
Use-count:0x0003 load_count:0x0001 *file:0x00000000
flags:0x00000272 TEXT KERNELEX DATAINTEXT DATA DATAEXISTS
*exp:0x051e3000 *lex:0x00000000 *deferred:0x00000000 expsize:0x6c696263
Name: /etc/drivers/aio.ext
ndepend:0x0002 maxdepend:0x0002
*depend[00]:0x04ea7980
*depend[01]:0x05039280
*le_next: 04edb700
> le 01438f5c
LoadList entry at 0x04ea7680
Module *start:0x00000000_014348c0 Module filesize:0x00000000_00007624
Module *end:0x00000000_0143bee4
*data:0x00000000_0143a4c0 data length:0x00000000_00001a24
Use-count:0x0003 load_count:0x0001 *file:0x00000000
flags:0x00000272 TEXT KERNELEX DATAINTEXT DATA DATAEXISTS
*exp:0x051e3000 *lex:0x00000000 *deferred:0x00000000 expsize:0x6c696263
Name: /etc/drivers/aio.ext
ndepend:0x0002 maxdepend:0x0002
*depend[00]:0x04ea7980
*depend[01]:0x05039280
*le_next: 04edb700
经查,宕机跟Name: /usr/lib/drivers/aiopin有关,
> errpt 查看宕机时产生的错误日志
LAST ERRORS READ BY ERRDEMON (MOST RECENT LAST):
Tue May 16 15:05:18 TAIST: DSI_PROC data storage interrupt : processor
Resource Name: SYSVMM
0a000000 00000000 00000004 00000086
LAST 3 ERRORS READ BY ERRDEMON (MOST RECENT FIRST):
> od vmmerrlog 9 rpco proc - 0
SLT ST PID PPID PGRP UID EUID TCNT NAME
0 a 0 0 0 0 0 1 swapper
FLAGS: swapped_in no_swap fixed_pri kproc
Links: *child:0xe20030c0 *siblings:0x00000000 *uinfo:0x50004020(0x0038)
*ganchor:0x00000000 *pgrpl:0x00000000 *ttyl:0x00000000
Dispatch Fields: pevent:0x00000000 *synch:0xffffffff
lock:0x00000000 lock_d:0x00000000
Thread Fields: *threadlist:0xe6000000 threadcount:1
active:1 suspended:0 local:0 terminating:0
Scheduler Fields: fixed pri: 16 repage:0x00000000 scount:0 sched_pri:0
*sched_next:0x00000000 *sched_back:0x00000000 cpticks:3087
msgcnt:0 majfltsec:0
Misc: adspace:0x0003c00f kstackseg:0x00000000 xstat:0x0000
*p_ipc:0x00000000 *p_dblist:0x00000000 *p_dbnext:0x00000000
Signal Information:
pending:hi 0x00000000,lo 0x00000000
sigcatch:hi 0x00000000,lo 0x00000000 sigignore:hi 0xffffffff,lo 0xfff7ffff
Statistics: size:0x00000000(pages) audit:0x00000000
accounting page frames:0 page space blocks:0
Number of virtual pages in use :0
pctcpu:0 minflt:1987 majflt:7
> thread - 0
SLT ST TID PID CPUID POLICY PRI CPU EVENT PROCNAME
0 s 3 0 unbound FIFO 10 78 swapper
t_flags: wakeonsig kthread
Links: *procp:0xe2000000 *uthreadp:0x2ff3b400 *userp:0x2ff3b6e0
*prevthread:0xe6000000 *nextthread:0xe6000000, *stackp:0x00000000
*wchan1(real):0x00000000 *wchan2(VMM):0x00000000 *swchan:0x00000000
wchan1sid:0x00000000 wchan1offset:0x00000000
pevent:0x00000000 wevent:0x00000001 *slist:0x00000000
Dispatch Fields: *prior:0xe6000000 *next:0xe6000000
polevel:0x0000000a ticks:0x0c0f *synch:0xffffffff result:0x00000000
*eventlst:0x00000000 *wchan(hashed):0x00000000 suspend:0x0001
thread waiting for: event(s)
Scheduler Fields: cpuid:0xffffffff scpuid:0xffffffff pri: 16 policy:FIFO
affinity:0x0001 affinity_ts:0x3b6e31e cpu:0x0078 run_queue:34a900
lpri: 0 wpri:127 time:0x00 sav_pri:0x10
Misc: lockcount:0x00000000 ulock:0x00000000 *graphics:0x00000000
dispct:0x00031718 fpuct:0x00000001 boosted:0x0000
userdata:0x00000000
fsflags: 00000000 adsp_flags: 0000
Signal Information: cursig:0x00 *scp:0x00000000
pending:hi 0x00000000,lo 0x00000000 sigmask:hi 0x00000000,lo 0x00000000
> q
#lslpp -w /usr/lib/drivers/aiopin 查看相关的文件集
File Fileset Type
----------------------------------------------------------------------------
/usr/lib/drivers/aiopin bos.rte.aio File
# lslpp -ah bos.rte.aio 查看这个文件集的版本为4.3.3.1
Fileset Level Action Status Date Time
----------------------------------------------------------------------------
Path: /usr/lib/objrepos
bos.rte.aio
4.3.3.0 COMMIT COMPLETE 01/01/70 08:29:52
4.3.3.1 COMMIT COMPLETE 01/07/00 09:57:11
4.3.3.1 APPLY COMPLETE 01/07/00 09:55:52
Path: /etc/objrepos
bos.rte.aio
4.3.3.0 COMMIT COMPLETE 01/01/70 08:29:52
4.3.3.1 COMMIT COMPLETE 01/07/00 09:57:11
4.3.3.1 APPLY COMPLETE 01/07/00 09:55:53
经查,宕机跟bos.rte.aio有关,在IBM网站上查到如下内容
IY05599: AIO CRASH IN COMPARE_AND_SWAP 00/01/14 PTF PECHANGE
APAR status
Closed as program error.
Error description
When the parameter passed to the compare_and_swap() expected
to be a pointer to an integer, but the code passed an integer.
I/O on this address (small integer) caused the system crashed
with DSI.
Local fix
Problem summary
***************************************************************
*USERS AFFECTED: *
* All users with the following filesets at these levels *
* bos.rte.aio 4.3.3.1.
***************************************************************
*PROBLEM DESCRIPTION: *
* When the parameter passed to the compare_and_swap()
* expected to be a pointer to an integer, but the code
* passed an integer. I/O on this address (small
* integer) caused the system crashed with DSI.
***************************************************************
*RECOMMENDATION: *
* Apply apar IY05599
***************************************************************
Problem conclusion
Corrected the parameter passed to compare_and_swap calls.
Temporary fix
Comments
APAR information
APAR number IY05599
Reported component name AIX 4.3.0
Reported component ID 5765C3403
Reported release 430
Status CLOSED PER
PE YesPE
HIPER NoHIPER
Submitted date 1999-11-02
Closed date 1999-11-08
Last modified date 2000-10-17
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name AIX 4.3.0
Fixed component ID 5765C3403
Applicable component levels
R430 PSY U467596 UP99/12/21 I 1000
现在确定,这台机器需要打相关补丁才能彻底解决宕机.