crash debug case-gxy_0202-ChinaUnix博客

十年

首页　| 　博文目录　| 　关于我

gxy_0202

博客访问： 356919
博文数量： 67
博客积分： 2550
博客等级：少校
技术积分： 990
用户组：普通用户
注册时间： 2008-09-17 18:24

文章分类

全部博文（67）

english（1）
c++（4）
windows（2）
linux study（59）
未分配的博文（1）

文章存档

2011年（6）

2010年（2）

2009年（40）

2008年（19）

我的朋友

A Case Study: `"kernel BUG at pipe.c:120!"`

Upon bringing up a crash session, a great deal of information can be gained just by the invocation data. Here is what what displayed in this particular case:

...
      KERNEL: vmlinux-2.4.9-e.10.13enterprise-g
    DUMPFILE: vmcore-incomplete
        CPUS: 2
        DATE: Mon Feb 17 08:20:56 2003
      UPTIME: 4 days, 20:04:41
LOAD AVERAGE: 0.95, 1.04, 1.25
       TASKS: 110
    NODENAME: testbox.redhat.com
     RELEASE: 2.4.9-e.10.13enterprise
     VERSION: #1 SMP Mon Feb 3 12:59:26 EST 2003
     MACHINE: i686  (2788 Mhz)
      MEMORY: 6 GB
       PANIC: "kernel BUG at pipe.c:120!"
         PID: 20571
     COMMAND: "imp"
        TASK: d1566000
         CPU: 1
       STATE: TASK_RUNNING (PANIC)

crash>

In this case the PANIC string "kernel BUG at pipe.c:120!" points to the exact kernel source code line at which the panic occurred.

Then, getting a backtrace of panicking task is typically the first order of the day:

crash> bt
PID: 20571  TASK: d1566000  CPU: 1   COMMAND: "imp"
 #0 [d1567e44] die at c010785c
 #1 [d1567e54] do_invalid_op at c0107b2c
 #2 [d1567f0c] error_code (via invalid_op) at c01073de
    EAX: 0000001d  EBX: ed87b2e0  ECX: c02f6064  EDX: 00005fa1  EBP: 00001000
    DS:  0018      ESI: f640e740  ES:  0018      EDI: 00001000
    CS:  0010      EIP: c0150b6d  ERR: ffffffff  EFLAGS: 00010292
 #3 [d1567f48] pipe_read at c0150b6d
 #4 [d1567f6c] sys_read at c01468d4
 #5 [d1567fc0] system_call at c01072dc
    EAX: 00000003  EBX: 0000000a  ECX: 40b4e05c  EDX: 00002000
    DS:  002b      ESI: 00002000  ES:  002b      EDI: 40b4e05c
    SS:  002b      ESP: bffe9e88  EBP: bffe9eb8
    CS:  0023      EIP: 40aaa1d4  ERR: 00000003  EFLAGS: 00000286

The backtrace shows that the call to die() was generated by an invalid_op exception. The exception was caused by the BUG() call in the pipe_read() function:

        if (count && PIPE_WAITING_WRITERS(*inode) &&
                !(filp->f_flags & O_NONBLOCK)) {
                /*
                 * We know that we are going to sleep: signal
                 * writers synchronously that there is more
                 * room.
                 */
                wake_up_interruptible_sync(PIPE_WAIT(*inode));
                if (!PIPE_EMPTY(*inode))
                        BUG();
                goto do_more_read;
        }

In the code segment above, the pipe_read() code has previously down'd the semaphore of the inode associated with the pipe, giving it exclusive access. It had read all data in the pipe, but still needed more to satisfy the count requested. Finding that there was a writer with more data -- and who was waiting on the semaphore -- it woke up the writer. However, after doing the wakeup, it did a sanity-check on the pipe contents, and found that it was no longer empty -- which is theoretically impossible since it was still holding the semaphore. It appeared that the writer process wrote to the pipe while the reader process still had exclusive access -- somehow overriding the semaphore.

Since the semaphore mechanism was seemingly not working, it was first necessary to look at the actual semaphore structure associated with the pipe's inode. This first required looking at the first argument to the pipe_read() function; the command shows that it is a struct file pointer:

crash> whatis pipe_read
ssize_t pipe_read(struct file *, char *, size_t, loff_t *);
crash>

Using the option, each frame in the backtrace is expanded to show all stack data in the frame. Looking at the expansion of the sys_read() frame, we can see that the last thing pushed on the stack before calling pipe_read() was the file pointer address of edf3f740:

...
#3 [d1567f48] pipe_read at c0150b6d
    [RA: c01468d6  SP: d1567f4c  FP: d1567f6c  SIZE: 36]
    d1567f4c: c026701c  00000078  fffffff2  00001000
    d1567f5c: 00000000  edf3f740  ffffffea  00002000
    d1567f6c: c01468d6
 #4 [d1567f6c] sys_read at c01468d4
    [RA: c01072e3  SP: d1567f70  FP: d1567fc0  SIZE: 84]
    d1567f70: edf3f740  40b4f05c  00002000  edf3f760
    d1567f80: c03683d0  fffffffb  00000001  c0120d3b
    d1567f90: 00000046  00000046  0000000b  c0350960
    d1567fa0: 0000000b  f639eb00  c0108e0e  00000020
    d1567fb0: d1566000  00002000  40b4e05c  bffe9eb8
    d1567fc0: c01072e3
...

The task at hand is finding the inode containing the suspect semaphore from the file structure address. The file structure's f_dentry member points to its dentry structure, whose d_inode member in turn points to the pipe's inode. The command can be used to dump the complete contents of a data structure at a given address; by tagging the .member onto the structure name, we can print just the member desired. By following the structure chain, the inode address can be determined like so:

crash> struct file.f_dentry edf3f740
  f_dentry = 0xdb0ec440,
crash> struct dentry.d_inode db0ec440
  d_inode = 0xf640e740,
crash> struct inode.i_sem f640e740

  i_sem = {
    count = {
      counter = 2
    },
    sleepers = 0,
    wait = {
      lock = {
        lock = 1
      },
      task_list = {
        next = 0xf640e7ac,
        prev = 0xf640e7ac
      }
    }
  },
crash>

The dump of the semaphore structure above showed the problem: the counter value of 2 is illegal. It should never be greater than 1; in this case a value of 2 allows two successful down operations, i.e., giving two tasks access to the pipe at the same time.

(As an aside, determining the inode address above could also be accomplished by using the context-sensitive command, which dumps the associated file, dentry and inode structure addresses for each open file descriptor of a task. The dumped file descriptor list would contain one with a reference to the file structure at edf3f740, and would also show the associated inode address of f640e740.)

Before getting a dumpfile, this same panic had occurred several times. It was erroneously presumed that the problem was in the pipe-handling code, but it was eventually determined not to be the case. By instrumenting a kernel with debug code, the starting counter value of a pipe was found to be 3. Compounding that problem was the fact that the inode slab cache is one of a few special cases that presume that the freed inode's contents are left in a legitimate state so that they do not have to be completely reinitialized with each subsequent reallocation. So when the pipe's inode was created, it received an inode with a bogus counter value.

Confirming the existence of bogus inode structures in the slab cache was a multi-stepped procedure. Using the command command to access the inode slab cache, we can get the addresses of all free and currently-allocated inodes. Since there are typically several thousand inodes, the output is extremely verbose, but here is the beginning of it:

crash> kmem -S inode_cache
CACHE    NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
c7666564 inode_cache              448      11563     12339   1371     4k
SLAB      MEMORY    TOTAL  ALLOCATED  FREE
d1d82000  d1d82040      9          9     0
FREE / [ALLOCATED]
  [d1d82040]
  [d1d82200]
  [d1d823c0]
  [d1d82580]
  [d1d82740]
  [d1d82900]
  [d1d82ac0]
  [d1d82c80]
  [d1d82e40]
SLAB      MEMORY    TOTAL  ALLOCATED  FREE
f4e52000  f4e52040      9          7     2
FREE / [ALLOCATED]
   f4e52040  (cpu 1 cache)
   f4e52200  (cpu 1 cache)
  [f4e523c0]
  [f4e52580]
  [f4e52740]
  [f4e52900]
  [f4e52ac0]
  [f4e52c80]
  [f4e52e40]
...

In the truncated output above, all of the inode address in the slab cache are dumped; the ones currently in use are surrounded by brackets, the free ones are not. So, for example, the inodes at addresses f4e52040 and f4e52200 are free; the others are not. The full output was that pulled out just the free inode addresses (i.e., output lines starting with three spaces), and . The file was modified to be a crash by making each extracted inode address to be the arguments of the command, using its short-cut method that allows the dropping of the struct command name; therefore the input file contained hundreds of crash commands of the form:

inode.i_sem f4e52040
inode.i_sem f4e52200
inode.i_sem f5cdc040
inode.i_sem f5cdc200
inode.i_sem f5cdc3c0
inode.i_sem f5cdc580
...

Note that the command would be used by default above, as documented in its help page; if the first command line argument is not a crash or gdb command, but it is the name of a known data structure, it passes the arguments to the command.

Using the capability of feeding an , in this case consisting of hundreds of short-cut commands like those above, the output was again quite verbose, consisting of structure member dumps of the form:

crash> < input.file
crash> inode.i_sem f4e52040
  i_sem = {
    count = {
      counter = 1
    },
    sleepers = 0,
    wait = {
      lock = {
        lock = 1
      },
      task_list = {
        next = 0xf4e520ac,
        prev = 0xf4e520ac
      }
    }
  },
crash> inode.i_sem f4e52200
  i_sem = {
    count = {
      counter = 1
    },
    sleepers = 0,
    wait = {
      lock = {
        lock = 1
      },
      task_list = {
        next = 0xf4e5226c,
        prev = 0xf4e5226c
      }
    }
  },
...

However, it was a simple matter of to grep, and looking for counter values not equal to 1:

crash> < input.file | grep counter | grep -v "= 1"
      counter = 3
      counter = 3
      counter = 3
      counter = 3
crash>

This turned out to be the smoking gun. Another round of debugging with an instrumented kernel that trapped attempts to free an inode with a semaphore counter of 3 caught the perpetrator in the act.

加载新的模块。

crash> mod -d egenera_base
crash> mod -s egenera_base egenera_base.o
MODULE   NAME                     SIZE OBJECT FILE
f8804000 egenera_base           448324 egenera_base.o
crash>
crash> whatis
Usage: whatis [struct | union | typedef | symbol]
Enter "help whatis" for details.
crash> bt -t
PID: 11192 TASK: e3a5a000 CPU: 0   COMMAND: "java"
      START: dump_execute at fa5740eb
[e3a5bc64] __do_vmdump at fa5744f7
[e3a5bc7c] do_vmdump at fa573081
[e3a5bc8c] kdb_dump at c0183c00
[e3a5bc98] kdb_local at c018434c
[e3a5bcc4] kernrpc_rclan_send at f881b516
[e3a5bce4] hash_remove at f881bea8
[e3a5bd0c] kernrpc_unregister_send at f881a300
[e3a5bd54] egenera_config_query_rpc at fb2ed2c1
[e3a5bd84] do_get_write_access at f9a93e85
[e3a5bdd4] journal_dirty_metadata at f9a945b9
[e3a5bdf8] ext3_do_update_inode at f9aa7a64
[e3a5be08] journal_get_write_access at f9a93ef6
[e3a5be9c] wake_up_process at c0119ccb
[e3a5beac] deliver_signal at c012795e
[e3a5bebc] ignored_signal at c0127770
[e3a5bed8] kdba_getregcontents at c023a3ca
[e3a5bf0c] kdb_main_loop at c0184604
[e3a5bf34] kdb at c0184cf6
[e3a5bf70] egenera_nmi_check at c010857d
[e3a5bf90] do_nmi at c01087b8
[e3a5bfb8] nmi at c0107732
crash> whatis kernrpc_rclan_send
int kernrpc_rclan_send(kernrpc_transport_t *, kernrpc_transport_descriptor_t *, long unsigned int, int);

阅读(1961) | 评论(0) | 转发(0) |

上一篇：c语言常见错误

下一篇：shell中的数组使用方法

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6

A Case Study: "kernel BUG at pipe.c:120!"

A Case Study: `"kernel BUG at pipe.c:120!"`