分类:
2006-10-15 23:42:00
This is one way of finding out which processes are currently swapped on a Solaris system. There are supposedly other ways of reaching this goal but none of these are currently known to the author.
To use this method you need to able to execute the Modular Debugger with root permissions. The Modular Debugger debugger is available if you have the package SUNWmdb installed. File is named
Your 'vmstat' output is listing processes as swapped. Something like this:
# vmstat 1 5 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr s0 s1 sd sd in sy cs us sy id 0 0 10 3552488 58856 39 222 124 123 157 0 74 14 14 36 2 484 6 37 12 8 80 0 0 15 3418184 15600 31 326 64 224 264 0 344 7 7 0 0 374 1418 1337 3 4 93 0 0 15 3418184 15784 1 1 0 152 160 0 91 2 2 0 0 342 956 1238 0 0 100 0 0 15 3418168 15768 0 1 8 0 0 0 0 0 1 4 0 318 1289 1319 2 0 98 0 0 15 3418168 15768 0 0 0 0 0 0 0 0 0 16 0 325 1081 1262 23 1 76
The numbers in the third column above show that we have 15 swapped processes, but how do we identify the process IDs for these processes? Using the Module Debugger and a lot of patience is it possible track down these PIDs. We do this following the below steps:
This an example meant to illustrate above steps:
First we generate the full thread list. This amount of output can be massive so we will redirect the output to a file named 'allthreads.list'.
# mdb -k Loading modules: [ unix krtld genunix ip ptm cpc ipc random nfs ] > ::walk thread | $< thread ! cat > allthreads.list
This is part of the output:
... 0x3000ab09b28: sleepq panic_trap upimutex 1043fc70 0 0 0x3000ab09b48: nupinest 0 0x3000ab09b50: delay_lock 0x3000ab09b50: owner/waiters 0 0x3000ab09b58: unpark thlink 0 0 0x30003932bc0: link stk startpc 0 2a1003bdaf0 0 0x30003932bd8: bound_cpu affinitycnt bind_cpu 0 0 -1 0x30003932be4: flag proc_flag schedflag 2 0 0 0x30003932bea: preempt preempt_lk state 0 0 1 0x30003932bf0: pri epri 29 0 0x30003932bf8: pc sp 10079ed4 2a1003bcfb1 0x30003932c08: wchan0 wchan sobj_ops 0 3000a4dbafc 1042e238 0x30003932c20: cid clfuncs cldata 1 10464fc8 30009f9d418 0x30003932c38: ctx lofault onfault 300035450e0 0 0 0x30003932c50: ontrap swap lock 0 2a1003ba000 0 0x30003932c62: pil pi_lock cpu 0 0 1041b428 0x30003932c70: intr did tnf_tpdp 0 9778913 30005b0f7f0 0x30003932c88: tid waitfor alarmid 17 -1 0 0x30003932c98: realitimer 0x30003932c98: interval.tv_sec interval.tv_usec value.tv_sec 0 0 0 0x30003932cb0: value.tv_usec ...
The line starting with '0x30003932be4' has the value of 'schedflag' set to 0. The address for this thread can be found at the start of the line 4 lines up in the line which also contains the flag 'link'. Fortunately for us the distance back to the 'link' address is always the same, namely '0x24' (this easies the job of automating the process quite a lot). This means that the address to use in the next step can be written as '0x30003932bc0' or '0x30003932be4-0x24'.
Now we fetch the 'procp' value from the 'thread.brief' output like this:
# mdb -k Loading modules: [ unix krtld genunix ip ptm cpc ipc random nfs ] > 0x30003932be4-0x24 $< thread.brief ============== thread_id 30003932bc0 0x3000c277a28: process args ./products/gui/../../java/1.4.2/JRE/bin/java -cp java/1.3.1/lib/psi3I3FP.jar:ja 0x30003932ce8: lwp procp wchan 300059de448 3000c277568 3000a4dbafc 0x30003932bf8: pc sp cv_wait_sig_swap+0x1942a1003bcfb1
The 'procp' has a value of '3000c277568'. This value is needed for our next lookup.
Still from inside the same Module Debugger session, fetch the output of 'proc' using the 'procp' value of '3000c277568':
> 3000c277568 $< proc 0x3000c277568: 0x3000c277568: exec as lockp 3000463e840 300048bc910 30000f62880 0x3000c277580: crlock 0x3000c277580: owner/waiters 0 0x3000c277588: cred swapcnt stat 30000f14fc8 1 2 0x3000c277595: wcode pidflag wdata 0 0 0 0x3000c27759c: ppid link parent 1 0 30001fb3528 0x3000c2775b0: child sibling psibling 0 3000cb52040 3000d2e0028 0x3000c2775c8: sibling_ns child_ns next 0 0 30005724040 0x3000c2775e0: prev nextofkin orphan 3000cb52040 30003930ac8 0 0x3000c2775f8: nextorph pglink ppglink 30005724040 0 3000d2e0028 0x3000c277610: sessp pidp pgidp 30009657818 300005bc440 3000b441240 0x3000c277628: cv flag_cv lwpexit 0 0 0 0x3000c27762e: holdlwps flag utime 0 4004208 5937bc 0x3000c277640: stime cutime cstime 42bd 0 0 0x3000c277658: segacct brkbase brksize 0 29128 520298 0x3000c277670: sig ignore siginfo 0 811e000300000006ffbffeff00001fff 0x3000c277688: sigqueue sigqhdr signhdr 0 0 0 0x3000c2776a0: stopsig lwpid lwpcnt 0 398 24 0x3000c2776ac: lwprcnt lwpwait zombcnt 24 0 0 0x3000c2776b8: zomb_max zomb_tid tlist 0 0 3000b9df0c0 0x3000c2776d0: sigmask fltmask trace 0 0 0 0x3000c2776e8: plist agenttp warea 0 0 0 0x3000c277700: nwarea wpage nwpage 0 0 0 0x3000c277714: mapcnt rlink srwchan_cv 0 0 0 0x3000c277728: stksize mstart mterm e000 a1b6acccd1ff6 0 0x3000c277740: mlreal rprof_cyclic defunct 0 0 376 0x3000c277828: pflock 0x3000c277828: owner/waiters 0 0x3000c277f20: server_threads door_list unref_list 0 3000d238dd8 0 0x3000c277f38: server_cv unref_thread tnf_flags 0 0 0 0x3000c277f40: audit_data aslwptp swrss 0 3000c6fe2c0 2 0x3000c277f58: aio itimer notifsigs 0 0 0 0x3000c277f70: notifcv alarmid sc_unblocked 1 0 0 0x3000c277f88: sc_door usrstack stkprot 3000d238dd8 ffbf0000 f 0x3000c277f9c: model lcp 100000 30005cf0000 0x3000c277fa8: lcp_mutexinitlock 0x3000c277fa8: owner/waiters 0 0x3000c277fb0: utraps corefile rce 0 300003fc8f8 0 0x3000c277fc8: task taskprev tasknext 30001fb1dc8 3000cb52040 3000d011520 0x3000c277fe0: lwpdaemon lwpdwait tidhash 0 0 3000b4ee000 0x3000c277ff0: schedctl 30000559a70
Looking at the above ouput we can see that the value of 'pidp' is '300005bc440'. Using this value we can finally find the PIDs which this thread belongs to.
> 300005bc440 $< pid 0x300005bc440: bits a2 0x300005bc444: id pglink link 26125 0 0
Viola, the PID '26125' is one the swapped processes!
# ps -ef | grep [2]6125 precise 26125 1 0 May 23 ? 978:20 ./products/gui/../../java/1.4.2/JRE/bin/java -cp java/1.3.1/lib/psi3I3FP.jar:ja
To follow the above process gets tedious pretty quickly so I wrote a Perl script to automate the process. Script improvements are very welcome!
Download script [ ]
A big thank you to the SUN engineer Michael Schuster who was very patient and friendly helping me with this problem!