Chinaunix首页 | 论坛 | 博客
  • 博客访问: 414276
  • 博文数量: 83
  • 博客积分: 2010
  • 博客等级: 大尉
  • 技术积分: 900
  • 用 户 组: 普通用户
  • 注册时间: 2006-01-02 01:33
文章分类

全部博文(83)

文章存档

2011年(1)

2010年(5)

2009年(10)

2008年(4)

2007年(24)

2006年(39)

我的朋友

分类:

2006-10-15 23:42:00

 How to Find the PIDs for Swapped Processes on a Solaris System

This is one way of finding out which processes are currently swapped on a Solaris system. There are supposedly other ways of reaching this goal but none of these are currently known to the author.

To use this method you need to able to execute the Modular Debugger with root permissions. The Modular Debugger debugger is available if you have the package SUNWmdb installed. File is named '/usr/bin/mdb'.

Your 'vmstat' output is listing processes as swapped. Something like this:

  # vmstat 1 5
   procs     memory            page            disk          faults      cpu
   r b w   swap  free  re  mf pi po fr de sr s0 s1 sd sd   in   sy   cs us sy id
   0 0 10 3552488 58856 39 222 124 123 157 0 74 14 14 36 2 484   6   37 12  8 80
   0 0 15 3418184 15600 31 326 64 224 264 0 344 7 7 0  0  374 1418 1337  3  4 93
   0 0 15 3418184 15784 1   1  0 152 160 0 91 2  2  0  0  342  956 1238  0  0 100
   0 0 15 3418168 15768 0   1  8  0  0  0  0  0  1  4  0  318 1289 1319  2  0 98
   0 0 15 3418168 15768 0   0  0  0  0  0  0  0  0 16  0  325 1081 1262 23  1 76

The numbers in the third column above show that we have 15 swapped processes, but how do we identify the process IDs for these processes? Using the Module Debugger and a lot of patience is it possible track down these PIDs. We do this following the below steps:

  1. In a full thread list, search out the threads where the flag 'schflag' is set to 0 (zero). This means that the thread is not in physical memory.
  2. For each of these threads, get the 'procp' value from the 'thread.brief' output.
  3. For this 'procp' value, get the 'pipd' value from the 'proc' output.
  4. For this 'pipd' value, get the 'id' value from the 'pid' output.
  5. These 'id' value is the PID to which the thread in question belong.
  6. Now repeat for all threads which have 'schedflag = 0'.

This an example meant to illustrate above steps:

(Step #1)

First we generate the full thread list. This amount of output can be massive so we will redirect the output to a file named 'allthreads.list'.

  # mdb -k
  Loading modules: [ unix krtld genunix ip ptm cpc ipc random nfs ]
  > ::walk thread | $< thread ! cat > allthreads.list

This is part of the output:

  ...
  0x3000ab09b28:  sleepq          panic_trap      upimutex
                  1043fc70        0               0
  0x3000ab09b48:  nupinest
                  0
  0x3000ab09b50:  delay_lock
  0x3000ab09b50:  owner/waiters
                  0
  0x3000ab09b58:  unpark  thlink
                  0       0
  0x30003932bc0:  link            stk             startpc
                  0               2a1003bdaf0     0
  0x30003932bd8:  bound_cpu       affinitycnt     bind_cpu
                  0               0               -1
  0x30003932be4:  flag    proc_flag       schedflag
                  2       0               0
  0x30003932bea:  preempt preempt_lk      state
                  0       0               1
  0x30003932bf0:  pri     epri
                  29      0
  0x30003932bf8:
                  pc              sp
                  10079ed4        2a1003bcfb1
  0x30003932c08:  wchan0          wchan           sobj_ops
                  0               3000a4dbafc     1042e238
  0x30003932c20:  cid             clfuncs         cldata
                  1               10464fc8        30009f9d418
  0x30003932c38:  ctx             lofault         onfault
                  300035450e0     0               0
  0x30003932c50:  ontrap          swap            lock
                  0               2a1003ba000     0
  0x30003932c62:  pil     pi_lock cpu
                  0       0       1041b428
  0x30003932c70:  intr            did             tnf_tpdp
                  0               9778913         30005b0f7f0
  0x30003932c88:  tid             waitfor         alarmid
                  17              -1              0
  0x30003932c98:  realitimer
  0x30003932c98:  interval.tv_sec interval.tv_usec        value.tv_sec
                  0               0                       0
  0x30003932cb0:  value.tv_usec
  ...

The line starting with '0x30003932be4' has the value of 'schedflag' set to 0. The address for this thread can be found at the start of the line 4 lines up in the line which also contains the flag 'link'. Fortunately for us the distance back to the 'link' address is always the same, namely '0x24' (this easies the job of automating the process quite a lot). This means that the address to use in the next step can be written as '0x30003932bc0' or '0x30003932be4-0x24'.

(Step #2)

Now we fetch the 'procp' value from the 'thread.brief' output like this:

  # mdb -k
  Loading modules: [ unix krtld genunix ip ptm cpc ipc random nfs ]
  > 0x30003932be4-0x24 $< thread.brief
  
                  ============== thread_id        30003932bc0
  0x3000c277a28:
                  process args    ./products/gui/../../java/1.4.2/JRE/bin/java -cp java/1.3.1/lib/psi3I3FP.jar:ja
  0x30003932ce8:  lwp             procp           wchan
                  300059de448     3000c277568     3000a4dbafc
  0x30003932bf8:
                  pc              sp
                  cv_wait_sig_swap+0x1942a1003bcfb1

The 'procp' has a value of '3000c277568'. This value is needed for our next lookup.

(Step #3)

Still from inside the same Module Debugger session, fetch the output of 'proc' using the 'procp' value of '3000c277568':

  > 3000c277568 $< proc
  0x3000c277568:
  0x3000c277568:  exec            as              lockp
                  3000463e840     300048bc910     30000f62880
  0x3000c277580:  crlock
  0x3000c277580:  owner/waiters
                  0
  0x3000c277588:  cred            swapcnt         stat
                  30000f14fc8     1     
          2
  0x3000c277595:  wcode   pidflag wdata
                  0       0       0
  0x3000c27759c:  ppid            link            parent
                  1               0               30001fb3528
  0x3000c2775b0:  child           sibling         psibling
                  0               3000cb52040     3000d2e0028
  0x3000c2775c8:  sibling_ns      child_ns        next
                  0               0               30005724040
  0x3000c2775e0:  prev            nextofkin       orphan
                  3000cb52040     30003930ac8     0
  0x3000c2775f8:  nextorph        pglink          ppglink
                  30005724040     0               3000d2e0028
  0x3000c277610:  sessp           pidp            pgidp
                  30009657818     300005bc440     3000b441240
  0x3000c277628:  cv      flag_cv lwpexit
                  0       0       0
  0x3000c27762e:  holdlwps        flag            utime
                  0               4004208         5937bc
  0x3000c277640:  stime           cutime          cstime
                  42bd            0               0
  0x3000c277658:  segacct         brkbase         brksize
                  0               29128           520298
  0x3000c277670:  sig             ignore          siginfo
                  0               811e000300000006ffbffeff00001fff
  0x3000c277688:  sigqueue        sigqhdr         signhdr
                  0               0               0
  0x3000c2776a0:  stopsig lwpid   lwpcnt
                  0       398             24
  0x3000c2776ac:  lwprcnt         lwpwait         zombcnt
                  24              0               0
  0x3000c2776b8:  zomb_max        zomb_tid        tlist
                  0               0               3000b9df0c0
  0x3000c2776d0:  sigmask         fltmask         trace
                  0               0               0
  0x3000c2776e8:  plist           agenttp         warea
                  0               0               0
  0x3000c277700:  nwarea          wpage           nwpage
                  0               0               0
  0x3000c277714:  mapcnt          rlink           srwchan_cv
                  0               0               0
  0x3000c277728:  stksize         mstart          mterm
                  e000            a1b6acccd1ff6   0
  0x3000c277740:  mlreal          rprof_cyclic    defunct
                  0               0               376
  0x3000c277828:  pflock
  0x3000c277828:  owner/waiters
                  0
  0x3000c277f20:  server_threads  door_list       unref_list
                  0               3000d238dd8     0
  0x3000c277f38:  server_cv       unref_thread    tnf_flags
                  0               0               0
  0x3000c277f40:  audit_data      aslwptp         swrss
                  0               3000c6fe2c0     2
  0x3000c277f58:  aio             itimer          notifsigs
                  0               0               0
  0x3000c277f70:  notifcv alarmid sc_unblocked
                  1       0               0
  0x3000c277f88:  sc_door         usrstack        stkprot
                  3000d238dd8     ffbf0000        f
  0x3000c277f9c:  model           lcp
                  100000          30005cf0000
  0x3000c277fa8:  lcp_mutexinitlock
  0x3000c277fa8:  owner/waiters
                  0
  0x3000c277fb0:  utraps          corefile        rce
                  0               300003fc8f8     0
  0x3000c277fc8:  task            taskprev        tasknext
                  30001fb1dc8     3000cb52040     3000d011520
  0x3000c277fe0:  lwpdaemon       lwpdwait        tidhash
                  0               0               3000b4ee000
  0x3000c277ff0:  schedctl
                  30000559a70

(Step #4)

Looking at the above ouput we can see that the value of 'pidp' is '300005bc440'. Using this value we can finally find the PIDs which this thread belongs to.

  > 300005bc440 $< pid
  0x300005bc440:
                  bits
                  a2
  0x300005bc444:  id              pglink          link
                  26125           0               0

(Step #5)

Viola, the PID '26125' is one the swapped processes!

  # ps -ef | grep [2]6125
   precise 26125     1  0   May 23 ?       978:20 ./products/gui/../../java/1.4.2/JRE/bin/java -cp java/1.3.1/lib/psi3I3FP.jar:ja

Perl Script to automate this

To follow the above process gets tedious pretty quickly so I wrote a Perl script to automate the process. Script improvements are very welcome!

Download script [ ]

Thanks

A big thank you to the SUN engineer Michael Schuster who was very patient and friendly helping me with this problem!

阅读(1310) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~