Chinaunix首页 | 论坛 | 博客
  • 博客访问: 264820
  • 博文数量: 69
  • 博客积分: 1400
  • 博客等级: 上尉
  • 技术积分: 900
  • 用 户 组: 普通用户
  • 注册时间: 2007-12-07 16:35
文章分类

全部博文(69)

文章存档

2011年(1)

2010年(68)

我的朋友

分类: LINUX

2010-08-23 21:14:18

This week, a total of 49 oopses and warnings have been reported, compared to 53 reports in the previous week," Arjan van de Ven noted, sending out a list of the week's . Al Viro suggested, "FWIW, people moaning about the lack of entry-level kernel work would do well by decoding those to the level of 'this place in this function, called from , with so-and-so variable being ' and posting the results." This was met by multiple requests for documentation on how to actually decode an oops. :

"It's actually not necessarily at all that trivial, unless you have a deep understanding of the code generated for the architecture in question (and even then, some oopses take more time to figure out than others, thanks to inlining and tailcalls etc). If the oops happened with a kernel you generated yourself, it's usually rather easy. Especially if you said 'y' to the 'generate debugging info' question at configuration time."

Linus went on to detail how to debug a random oops reported on the lkml, "you will generally have to disassemble the hex sequence given in the oops (the 'Code:' line), and try to match it up against the source code to try to figure out what is going on." He then offered a number of tips on how this is best accomplished, continuing with an example walking through one of the reports oops. Al Viro replied of accomplishing the same thing, walking through of another oops and isolating a bug.


From: Arjan van de Ven
Subject: Date: Jan 5, 2:06 pm 2008

The  website collects kernel oops and
warning reports from various mailing lists and bugzillas as well as
with a client users can install to auto-submit oopses.
Below is a top 10 list of the oopses collected in the last 7 days.
(Reports prior to 2.6.23 have been omitted in collecting the top 10)

This week, a total of 49 oopses and warnings have been reported,
compared to 53 reports in the previous week.


Rank 1: __ieee80211_rx
	Warning at net/mac80211/rx.c:1672
	Reported 6 times (11 total reports)
	Same issue that was ranked 2nd last week
	Johannes has diagnosed this as a driver bug in the iwlwifi drivers
	More info: 

Rank 2: elv_next_request
	kernel page fault
	Reported 6 times (7 total reports)
	Seems to be related to fast modprobe/rmmod cycles
	More info: 

Rank 3: d_splice_alias
	NULL pointer deref
	Reported 3 times
	Happens in the isofs code
	Only seen in 2.6.24-rc5-mm1
	More info: 

Rank 4: remove_proc_entry
	Was also ranked 4th last week
	Only in tainted oopses
	Reported 3 times (12 total reports)
	More info: 

Rank 5: __d_path
	In the sys_getcwd system call
	Only reported for 2.6.23.x, by one user
	Reported 2 times
	More info: 

Rank 6: device_release
	Was ranked 8th last week
	Same reports as last week, but now entered into bugzilla.kernel.org
	Reported 2 times (8 total reports)
	More info: http://www.kerneloops.org/search.php?search=device_release

Rank 7: pgd_alloc
	Has only been seen on machines tainted with the nvidia module
	Reported 2 times
	More info: 

Rank 8: evdev_disconnect
	kernel page fault
	Reported 2 times (10 total reports)
	Previously seen in older kernels including 2.6.21 but as far back as 2.6.16
	More info: http://www.kerneloops.org/search.php?search=evdev_disconnect

Rank 9: mutex_lock
	kernel null pointer due to rfcomm_tty_close sysfs interaction
	Reported 2 times (9 total reports)
	Ranked 9th last week as well
	More info: 

Rank 10: lock_acquire
	WARNING: at kernel/lockdep.c:2658 check_flags()
	Reported 2 times (8 total reports)
	Seems related to __atomic_notifier_call_chain
	More info: 


kerneloops.org news
* There is now a UI client so that if your kernel has an oops, you'll get asked for permission
   to submit this (rather than having to manually edit a config file as before)
* If you run the Gentoo distribution, please install the client using "emerge kerneloops"
* If you run the Fedora 8 (or rawhide) distribution, please install the rpm you can download
   from the  homepage
* If you run Debian unstable, please install the client via apt-get install kerneloops
--

From: Al Viro Subject: Date: Jan 5, 2:39 pm 2008 On Sat, Jan 05, 2008 at 01:06:17PM -0800, Arjan van de Ven wrote: > The website collects kernel oops and > warning reports from various mailing lists and bugzillas as well as > with a client users can install to auto-submit oopses. > Below is a top 10 list of the oopses collected in the last 7 days. > (Reports prior to 2.6.23 have been omitted in collecting the top 10) > > This week, a total of 49 oopses and warnings have been reported, > compared to 53 reports in the previous week. FWIW, people moaning about the lack of entry-level kernel work would do well by decoding those to the level of "this place in this function, called from , with so-and-so variable being " and posting the results. As skills go, it's far more useful than "how to trim the trailing whitespace" and the rest of checkpatch.pl-inspired crap that got so popular lately... --
From: J. Bruce Fields Subject: Date: Jan 7, 10:44 am 2008 On Sat, Jan 05, 2008 at 09:39:35PM +0000, Al Viro wrote: > On Sat, Jan 05, 2008 at 01:06:17PM -0800, Arjan van de Ven wrote: > > The website collects kernel oops and > > warning reports from various mailing lists and bugzillas as well as > > with a client users can install to auto-submit oopses. > > Below is a top 10 list of the oopses collected in the last 7 days. > > (Reports prior to 2.6.23 have been omitted in collecting the top 10) > > > > This week, a total of 49 oopses and warnings have been reported, > > compared to 53 reports in the previous week. > > FWIW, people moaning about the lack of entry-level kernel work would > do well by decoding those to the level of "this place in this function, > called from , with so-and-so variable being " and posting > the results. As skills go, it's far more useful than "how to trim > the trailing whitespace" and the rest of checkpatch.pl-inspired crap > that got so popular lately... Is there any good basic documentation on this to point people at? --b. --
From: Kevin Winchester Subject: Date: Jan 7, 6:19 pm 2008 J. Bruce Fields wrote: > On Sat, Jan 05, 2008 at 09:39:35PM +0000, Al Viro wrote: >> On Sat, Jan 05, 2008 at 01:06:17PM -0800, Arjan van de Ven wrote: >>> The website collects kernel oops and >>> warning reports from various mailing lists and bugzillas as well >>> as with a client users can install to auto-submit oopses. Below >>> is a top 10 list of the oopses collected in the last 7 days. >>> (Reports prior to 2.6.23 have been omitted in collecting the top >>> 10) >>> >>> This week, a total of 49 oopses and warnings have been reported, >>> compared to 53 reports in the previous week. >> FWIW, people moaning about the lack of entry-level kernel work >> would do well by decoding those to the level of "this place in this >> function, called from , with so-and-so variable being >> " and posting the results. As skills go, it's far more >> useful than "how to trim the trailing whitespace" and the rest of >> checkpatch.pl-inspired crap that got so popular lately... > > Is there any good basic documentation on this to point people at? > I would second this question. I see people "decode" oops on lkml often enough, but I've never been entirely sure how its done. Is it somewhere in Documentation? -- Kevin Winchester --
From: Linus Torvalds Subject: Date: Jan 7, 8:26 pm 2008 On Mon, 7 Jan 2008, Kevin Winchester wrote: > J. Bruce Fields wrote: > > > > Is there any good basic documentation on this to point people at? > > I would second this question. I see people "decode" oops on lkml often > enough, but I've never been entirely sure how its done. Is it somewhere > in Documentation? It's actually not necessarily at all that trivial, unless you have a deep understanding of the code generated for the architecture in question (and even then, some oopses take more time to figure out than others, thanks to inlining and tailcalls etc). If the oops happened with a kernel you generated yourself, it's usually rather easy. Especially if you said "y" to the "generate debugging info" question at configuration time. Because, in that case, you really just do a simple gdb vmlinux and then you can do (for example) something like setting a breakpoint at the EIP that was reported for the oops, and it will tell you what line it came from. However, if you don't have the exact binary - which is the common case for random oopses reported on lkml - you will generally have to disassemble the hex sequence given in the oops (the "Code:" line), and try to match it up against the source code to try to figure out what is going on. Even just the disassembly is not entirely trivial, since the oops will give you the eip that it happened at, but you often want to also disassemble *backwards* in order to get more of a context (the "Code:" line will mark the particular EIP that starts the oopsing instruction by enclosing it in , but with non-constant instruction lengths, you need to use a bit of trial-and-error to figure it out. I usually just compile a small program like const char array[]="\xnn\xnn\xnn..."; int main(int argc, char **argv) { printf("%p\n", array); *(int *)0=0; } and run it under gdb, and then when it gets the SIGSEGV (due to the obvious NULL pointer dereference), I can just ask gdb to disassemble around the array that contains the code[] stuff. Try a few offsets, to see when the disassembly makes sense (and gives the reported EIP as the beginning of one of the disassembled instructions). (You can do it other and smarter ways too, I'm not claiming that's a particularly good way to do it, and the old "ksymoops" program used to do a pretty good job of this, but I'm used to that particular idiotic way myself, since it's how I've basically always done it) After that, you still need to try to match up the assembly code with the source code and figure out what variables the register contents actually are all about. You can often try to do a make the/affected/file.s to generate the asm file in your own tree - the register allocation can be totally different due to different compilers and different options (and things like the fact that maybe the source tree you do this on doesn't match the oops report exactly), but it's usually a good starting point to compare the disassembly from gdb with the *.s file output from the compiler. Quite often, it's all very obvious (you see some constant or other simple pattern). But if you're not used to the assembly format, you'll spend a lot of brainpower just trying to figure that part out even for the obvious stuff, which is why it's a good thing if you are very comfortable indeed with the assembly language of that particular platform. It's not really all that hard. But the first few times you see those oopses, it all looks mostly like just line noise. So it definitely takes some practice to do it well. Anyway, let's take an example, from where the most obviously relevant parts are: BUG: unable to handle kernel paging request at virtual address 00100100 EIP: 0060:[] EIP is at evdev_disconnect+0x65/0x9e eax: 00000000 ebx: 000ffcf0 ecx: c1926760 edx: 00000033 esi: f7415600 edi: f741564c ebp: f7415654 esp: c1967e68 Call Trace: [] input_unregister_device+0x6f/0xff [] klist_release+0x27/0x30 [] kref_put+0x5f/0x6c .. Code: 5e 4c 81 eb 10 04 00 00 eb 21 8d 83 08 04 00 00 b9 06 00 02 00 ba 1d 00 00 00 e8 6a 93 95 c7 8b 9b 10 04 00 00 81 eb 10 04 00 00 <8b> 83 10 04 00 00 0f 18 00 90 8d 83 10 04 00 00 39 f8 75 cb 8d so here let's do the above silly C program: const char array[]="\x5e\x4c\x81\xeb\x10\x04\x00\x00\xeb\x21.. and running it under gdb gives: 0x8048500 Program received signal SIGSEGV, Segmentation fault. 0x080483f7 in main () at test.c:14 14 *(int*)0=0; and now I can just try x/20i 0x8048500 and it turns out that already gives a reasonable disassembly. The first few instructions are bogus: they're really part of the previous instruction, but it looks pretty sane around the actual problem spot, which is "array+43" (there are 42 bytes of code before the EIP one, and 20 bytes after): 0x8048500 : pop %esi 0x8048501 : dec %esp 0x8048502 : sub [mid=540557,540611,545369,546335,546623,546884]x410,%ebx 0x8048508 : jmp 0x804852b 0x804850a : lea 0x408(%ebx),%eax 0x8048510 : mov [mid=540557,540611,545369,546335,546623,546884]x20006,%ecx 0x8048515 : mov [mid=540557,540611,545369,546335,546623,546884]x1d,%edx 0x804851a : call 0xcf9a1889 0x804851f : mov 0x410(%ebx),%ebx 0x8048525 : sub [mid=540557,540611,545369,546335,546623,546884]x410,%ebx 0x804852b : mov 0x410(%ebx),%eax 0x8048531 : prefetchnta (%eax) 0x8048534 : nop 0x8048535 : lea 0x410(%ebx),%eax 0x804853b : cmp %edi,%eax 0x804853d : jne 0x804850a 0x804853f : lea (%eax),%eax .. so now we know that the faulting instruction was that mov 0x410(%ebx),%eax and we can also see that this also matches the address that caused the oops (ebx=000ffcf0, so 0x410(%ebx) is 00100100, which matches the "unable to handle kernel paging request" message). (Now, people used to kernel oopses will also recognize 00100100 as the LIST_POISON1, so this is all about dereferencing the ->next pointer of a list entry that has been removed from the list, but that's a whole separate level of kernel knowledge). Anyway, you can now do make drivers/input/evdev.s and see if you can find that kind of code sequence in there. You can use the "EIP: evdev_disconnect+0x65/0x9e" thing as a hint: if your compiler setup isn't too different, it's likely to be roughly two thirds into that evdev_disconnect function (but inlining really can mean that it's somewhere else entirely in the source tree!) The rest left as an exercise for the reader. Linus --
From: Al Viro Subject: Date: Jan 7, 10:59 pm 2008 On Mon, Jan 07, 2008 at 07:26:12PM -0800, Linus Torvalds wrote: > I usually just compile a small program like > > const char array[]="\xnn\xnn\xnn..."; > > int main(int argc, char **argv) > { > printf("%p\n", array); > *(int *)0=0; > } Heh. I prefer char main[] = {.....}; for the same thing, with gdb a.out and no running at all. FWIW, I'm going to go through Arjan's collection and post blow-by-blow analysis of some of those suckers. Tonight, probably... Let's take e.g. RIP: 0010:[] [] kref_put+0x31/0x80 RSP: 0000:ffff81007ffe5df0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 0000000034333545 RCX: ffffffff80606270 RDX: 0000000000000040 RSI: ffffffff803b38b0 RDI: 0000000034333545 RBP: ffff81007ffe5e00 R08: 0000000000000001 R09: 0000000000000000 R10: ffffffff8094c430 R11: 0000000000000000 R12: ffffffff803b38b0 R13: ffff81011ede44d8 R14: ffffffff804d7d50 R15: ffff81011ff210f0 FS: 0000000002024870(0000) GS:ffff81011ff0dd00(0000) ... Call Trace: [] kobject_put+0x19/0x20 [] kobject_del+0x2b/0x40 [] delayed_delete+0x0/0xb0 [] delayed_delete+0x69/0xb0 [] run_workqueue+0x175/0x210 [] worker_thread+0x71/0xb0 [] autoremove_wake_function+0x0/0x40 [] worker_thread+0x0/0xb0 [] kthread+0x4d/0x80 [] child_rip+0xa/0x12 [] restore_args+0x0/0x30 [] kthread+0x0/0x80 [] child_rip+0x0/0x12 Code: f0 ff 0b 0f 94 c0 31 d2 84 c0 74 0b 48 89 df 41 ff d4 ba 01 What do we have here? It barfs in kref_put() called from kobject_put(). It's -rc6-mm1 and I don't have -mm at hand. Let's see if we can make any sense out of it from the mainline - it might be a good idea for the first pass, unless there are some clear indications to the contrary. kref_put() is fairly low-level (and deep in call chain, here). And it's pretty small: int kref_put(struct kref *kref, void (*release)(struct kref *kref)) { WARN_ON(release == NULL); WARN_ON(release == (void (*)(struct kref *))kfree); if (atomic_dec_and_test(&kref->refcount)) { release(kref); return 1; } return 0; } Poking around on the site (I'm not familiar enough with it, so bear with me) gives a link to posting and an important detail missed in that page: <1>Unable to handle kernel paging request at 0000000034333545 RIP: [] kref_put+0x31/0x80 Now, that's interesting - we barf on dereferencing a pointer that (a) has upper 32 bits zero and (b) doesn't have a bit 7 set in any byte. Smells like ASCII data misinterpreted as a pointer. Conversion to ASCII gives "E534", which sure as hell does look like a fragment of some string. OK, so where does that happen? In this case we have only one candidate, really - atomic_dec_and_test(&kref->refcount). Before it we only compare argument with constants, after it we pass argument to another function. Now, looking at the registers (see above) we notice that this address had come from rbx. Let's try objdump -d lib/kref.o and see what we've got there for kref_put(): 4a: 55 push %rbp 4b: 48 85 f6 test %rsi,%rsi 4e: 48 c7 c1 00 00 00 00 mov [mid=540557,540611,545369,546335,546623,546884]x0,%rcx 55: ba 36 00 00 00 mov [mid=540557,540611,545369,546335,546623,546884]x36,%edx 5a: 48 89 e5 mov %rsp,%rbp 5d: 41 54 push %r12 5f: 49 89 f4 mov %rsi,%r12 62: 53 push %rbx 63: 48 89 fb mov %rdi,%rbx 66: 74 15 je 7d 68: 48 81 fe 00 00 00 00 cmp [mid=540557,540611,545369,546335,546623,546884]x0,%rsi 6f: 75 26 jne 97 71: 48 c7 c1 00 00 00 00 mov [mid=540557,540611,545369,546335,546623,546884]x0,%rcx 78: ba 37 00 00 00 mov [mid=540557,540611,545369,546335,546623,546884]x37,%edx 7d: 48 c7 c6 00 00 00 00 mov [mid=540557,540611,545369,546335,546623,546884]x0,%rsi 84: 48 c7 c7 00 00 00 00 mov [mid=540557,540611,545369,546335,546623,546884]x0,%rdi 8b: 31 c0 xor %eax,%eax 8d: e8 00 00 00 00 callq 92 92: e8 00 00 00 00 callq 97 97: f0 ff 0b lock decl (%rbx) 9a: 0f 94 c0 sete %al 9d: 31 d2 xor %edx,%edx 9f: 84 c0 test %al,%al a1: 74 0b je ae a3: 48 89 df mov %rbx,%rdi a6: 41 ff d4 callq *%r12 a9: ba 01 00 00 00 mov [mid=540557,540611,545369,546335,546623,546884]x1,%edx ae: 5b pop %rbx af: 41 5c pop %r12 b1: c9 leaveq b2: 89 d0 mov %edx,%eax b4: c3 retq It's not necessary identical to what that kernel had; still, not bad for a starting point. We are even lucky enough to find 'f0 ff' immediately in there: 97: f0 ff 0b lock decl (%rbx) Next instructions also match - actually, better than I expected. So. We even have register allocation matching the reported kernel and it all makes sense - this is the instruction where it had puked, the address had, indeed, come from rbx and it's locked decrement of *rbx followed by some testing and conditional jump. Just what one would expect from atomic_dec_and_test(). IOW, we have &kref->refcount equal to 0x0000000034333545. What's the value of kref itself? We could look into definition of struct kref, or just notice that release(kref) should be right after that, so we could see what gets passed to it. a3: 48 89 df mov %rbx,%rdi a6: 41 ff d4 callq *%r12 is clearly it. So what we are passing is rbx itself, so that offset got to be zero. All right, so we have kref_put() getting 0x0000000034333545 instead of a pointer in the first argument. It's called from kobject_put() and unless -mm has changes in there, there's no need to guess where in kobject_put() that had been: void kobject_put(struct kobject * kobj) { if (kobj) kref_put(&kobj->kref, kobject_release); } Aha. Now, we need to work back from &kobj->kref to kobj. Again, assuming that -mm doesn't change struct kobject in a way that would affect the offset, we have struct kobject { const char * k_name; struct kref kref; struct list_head entry; ... Looks like it's not too likely, unless somebody had been deliberately rearranging fields (to fit cachelines better, etc.). We'll need to verify that, of course, but for now it's a good starting assumption. Very well, we have one pointer in front of it. It's an amd64, so it's an 8 byte field and no alignment padding is to be expected. IOW, kobj is 0x0000000034333545 - 8, i.e. 0x000000003433353D. What's _that_ in ASCII? "=534". OK, that makes even more sense for a part of some string... Let's check; time to take a look at that patch, after all struct kobject { - const char * k_name; + const char *name; struct kref kref; OK, not a problem. While we are at it, kobject_put() is unchanged by the patch. Now, back to trace. kobject_put() is called from kobject_del(). Now, that one does differ from mainline: void kobject_del(struct kobject * kobj) { if (!kobj) return; sysfs_remove_dir(kobj); kobj->state_in_sysfs = 0; kobj_kset_leave(kobj); kobject_put(kobj->parent); kobj->parent = NULL; } Humm... So we have kobj->parent containing crap. What about the caller? It's from drivers/md/md.c: static void delayed_delete(struct work_struct *ws) { mdk_rdev_t *rdev = container_of(ws, mdk_rdev_t, del_work); kobject_del(&rdev->kobj); } and it's only used in static void unbind_rdev_from_array(mdk_rdev_t * rdev) { char b[BDEVNAME_SIZE]; if (!rdev->mddev) { MD_BUG(); return; } bd_release_from_disk(rdev->bdev, rdev->mddev->gendisk); list_del_init(&rdev->same_set); printk(KERN_INFO "md: unbind<%s>\n", bdevname(rdev->bdev,b)); rdev->mddev = NULL; sysfs_remove_link(&rdev->kobj, "block"); /* We need to delay this, otherwise we can deadlock when * writing to 'remove' to "dev/state" */ INIT_WORK(&rdev->del_work, delayed_delete); schedule_work(&rdev->del_work); } Well, that takes care of the rest of trace - we have used INIT_WORK to set rdev->del_work up, scheduled it for execution and eventually the callback had been called (asynchronously to us). So what do we have so far? unbind_rdev_from_array(rdev); had been called and rdev->kobj.parent turned to contain a crap value (0x000000003433353D) instead of a pointer. That's about all we can get out of trace. Now, let's see what could have triggered it. Curious... Looking through diff shows an interesting bit: - rdev->kobj.parent = &mddev->kobj; - if ((err = kobject_add(&rdev->kobj))) + if ((err = kobject_add(&rdev->kobj, &mddev->kobj, "dev-%s", b))) goto fail; in bind_rdev_to_array(). At the first sight the changes in kobject_add() seem to match that. And nothing else in md.c seems to be setting it to anything non-NULL. Very well, so it's one of the following: * unbind_rdev_from_array() called on rdev that didn't pass through bind_rdev_to_array(). * unbind_rdev_from_array() called on rdev that bailed out from bind_rdev_to_array() before that point. * mddev value in the above is crap. That's bloody unlikely, BTW - kobject_add() would increment the refcount of rdev->kobj.parent (or we would be in far more trouble, since it would not match kobject_del() and _that_ would hurt a lot more than just md.c). So &mddev->kobj would better be something saner when it went through that point. * *rdev got corrupted in between. Actually, looking at the callers of unbind_rdev_from_array()... We always follow it with export_rdev(). Which does (presumably) final kobject_put() on &rdev->kobj, freeing rdev itself. What guarantees that it doesn't happen before we get to callback? AFAICS, nothing whatsoever... And if it does happen, we'll get rdev happily freed (by rdev_free(), as ->release() of &rdev->kobj) by the time we get to delayed_delete(). Which explains what's going on just fine. BTW, -mm changes in kobject.c explain WTF it doesn't trigger in mainline - there we managed to get away with that since kobject_add() bumped refcount of kobject by one and kobject_del() decremented it. That masked the bug. --

From: Arjan van de Ven
Subject: Date: Jan 8, 10:42 am 2008

Randy Dunlap wrote:
>>
>> (You can do it other and smarter ways too, I'm not claiming that's a 
>> particularly good way to do it, and the old "ksymoops" program used to do 
>> a pretty good job of this, but I'm used to that particular idiotic way 
>> myself, since it's how I've basically always done it)
> 
> One other way to do it (at least for x86-32/64) is to use
> $kerneltree/scripts/decodecode.  It may work on other $arches also,
> but I haven't tested it on others.

I've made life easier for those using the  website;
at least for x86 oopses the website now does this for you and shows
the decoded Code: line in the raw oops data:


阅读(1484) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~