分类: LINUX
2008-12-04 17:04:50
... |
In this case the PANIC string "kernel BUG at pipe.c:120!" points to the exact kernel source code line at which the panic occurred.
Then, getting a backtrace of panicking task is typically the first order of the day:
crash> bt |
The backtrace shows that the call to die() was generated by an invalid_op exception. The exception was caused by the BUG() call in the pipe_read() function:
if (count && PIPE_WAITING_WRITERS(*inode) && |
In the code segment above, the pipe_read() code has previously down'd the semaphore of the inode associated with the pipe, giving it exclusive access. It had read all data in the pipe, but still needed more to satisfy the count requested. Finding that there was a writer with more data -- and who was waiting on the semaphore -- it woke up the writer. However, after doing the wakeup, it did a sanity-check on the pipe contents, and found that it was no longer empty -- which is theoretically impossible since it was still holding the semaphore. It appeared that the writer process wrote to the pipe while the reader process still had exclusive access -- somehow overriding the semaphore.
Since the semaphore mechanism was seemingly not working, it was first necessary to look at the actual semaphore structure associated with the pipe's inode. This first required looking at the first argument to the pipe_read() function; the command shows that it is a struct file pointer:
crash> whatis pipe_read |
Using the option, each frame in the backtrace is expanded to show all stack data in the frame. Looking at the expansion of the sys_read() frame, we can see that the last thing pushed on the stack before calling pipe_read() was the file pointer address of edf3f740:
... |
The task at hand is finding the inode containing the suspect semaphore from the file structure address. The file structure's f_dentry member points to its dentry structure, whose d_inode member in turn points to the pipe's inode. The command can be used to dump the complete contents of a data structure at a given address; by tagging the .member onto the structure name, we can print just the member desired. By following the structure chain, the inode address can be determined like so:
crash> struct file.f_dentry edf3f740 |
The dump of the semaphore structure above showed the problem: the counter value of 2 is illegal. It should never be greater than 1; in this case a value of 2 allows two successful down operations, i.e., giving two tasks access to the pipe at the same time.
(As an aside, determining the inode address above could also be accomplished by using the context-sensitive command, which dumps the associated file, dentry and inode structure addresses for each open file descriptor of a task. The dumped file descriptor list would contain one with a reference to the file structure at edf3f740, and would also show the associated inode address of f640e740.)
Before getting a dumpfile, this same panic had occurred several times. It was erroneously presumed that the problem was in the pipe-handling code, but it was eventually determined not to be the case. By instrumenting a kernel with debug code, the starting counter value of a pipe was found to be 3. Compounding that problem was the fact that the inode slab cache is one of a few special cases that presume that the freed inode's contents are left in a legitimate state so that they do not have to be completely reinitialized with each subsequent reallocation. So when the pipe's inode was created, it received an inode with a bogus counter value.
Confirming the existence of bogus inode structures in the slab cache was a multi-stepped procedure. Using the command command to access the inode slab cache, we can get the addresses of all free and currently-allocated inodes. Since there are typically several thousand inodes, the output is extremely verbose, but here is the beginning of it:
crash> kmem -S inode_cache |
In the truncated output above, all of the inode address in the slab cache are dumped; the ones currently in use are surrounded by brackets, the free ones are not. So, for example, the inodes at addresses f4e52040 and f4e52200 are free; the others are not. The full output was that pulled out just the free inode addresses (i.e., output lines starting with three spaces), and . The file was modified to be a crash by making each extracted inode address to be the arguments of the command, using its short-cut method that allows the dropping of the struct command name; therefore the input file contained hundreds of crash commands of the form:
inode.i_sem f4e52040 |
Note that the command would be used by default above, as documented in its help page; if the first command line argument is not a crash or gdb command, but it is the name of a known data structure, it passes the arguments to the command.
Using the capability of feeding an , in this case consisting of hundreds of short-cut commands like those above, the output was again quite verbose, consisting of structure member dumps of the form:
crash> < input.file |
However, it was a simple matter of to grep, and looking for counter values not equal to 1:
crash> < input.file | grep counter | grep -v "= 1" |
This turned out to be the smoking gun. Another round of debugging with
an instrumented kernel that trapped attempts to free an inode with
a semaphore counter of 3 caught the perpetrator in the act.
加载新的模块。
crash> mod -d egenera_base
crash> mod -s egenera_base egenera_base.o
MODULE NAME SIZE OBJECT FILE
f8804000 egenera_base 448324 egenera_base.o
crash>
crash> whatis
Usage: whatis [struct | union | typedef | symbol]
Enter "help whatis" for details.
crash> bt -t
PID: 11192 TASK: e3a5a000 CPU: 0 COMMAND: "java"
START: dump_execute at fa5740eb
[e3a5bc64] __do_vmdump at fa5744f7
[e3a5bc7c] do_vmdump at fa573081
[e3a5bc8c] kdb_dump at c0183c00
[e3a5bc98] kdb_local at c018434c
[e3a5bcc4] kernrpc_rclan_send at f881b516
[e3a5bce4] hash_remove at f881bea8
[e3a5bd0c] kernrpc_unregister_send at f881a300
[e3a5bd54] egenera_config_query_rpc at fb2ed2c1
[e3a5bd84] do_get_write_access at f9a93e85
[e3a5bdd4] journal_dirty_metadata at f9a945b9
[e3a5bdf8] ext3_do_update_inode at f9aa7a64
[e3a5be08] journal_get_write_access at f9a93ef6
[e3a5be9c] wake_up_process at c0119ccb
[e3a5beac] deliver_signal at c012795e
[e3a5bebc] ignored_signal at c0127770
[e3a5bed8] kdba_getregcontents at c023a3ca
[e3a5bf0c] kdb_main_loop at c0184604
[e3a5bf34] kdb at c0184cf6
[e3a5bf70] egenera_nmi_check at c010857d
[e3a5bf90] do_nmi at c01087b8
[e3a5bfb8] nmi at c0107732
crash> whatis kernrpc_rclan_send
int kernrpc_rclan_send(kernrpc_transport_t *, kernrpc_transport_descriptor_t *, long unsigned int, int);