本文基于rhel7.4环境以及raid5代码进行分析。(碰到一个bug啦,自己又花了一些时间才找到mddev结构的指针,记录下来供自己复习吧)
当系统panic,在calltrace中内存的地址会对应着寄存器的名称。但当系统出现卡死情况,这时候又想通过vmcore来进行分析系统的状态,需要手动执行echo c > /proc/sysrq-trigger来触发系统panic。当然前提是需要配置好kdump。这一部分内容就不在这儿多说了。
当获得vmcore和vmcore-dmesg之后,同样需要安装debug info包。根据不同的内核版本,选择相应的rpm 包
kernel-debuginfo-common-x86_64-3.10.0-547.el7.x86_64.rpm
kernel-debuginfo-3.10.0-547.el7.x86_64.rpm
通过vmcore分析,首先要做的是通过堆栈找到关键结构体(mddev)的内存地址。所以通过vmcore-dmesg先观察哪些函数和mddev有关。
static void raid5_finish_reshape(struct mddev *mddev), mddev作为一个参数传递给raid5_finish_reshape,所以可以在上下文中找到mddev的内存地址。
我选择了如下一个进程:
[440673.133173] INFO: task md0_raid5:27638 blocked for more than 120 seconds.
[440673.140046] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[440673.147968] md0_raid5 D ffffffff81687e70 0 27638 2 0x00000080
[440673.147970] ffff88005499f810 0000000000000046 ffff88022084de20 ffff88005499ffd8
[440673.155545] ffff88005499ffd8 ffff88005499ffd8 ffff88022084de20 ffff88022fcd6d00
[440673.163094] 0000000000000000 7fffffffffffffff ffff88022ff99de8 ffffffff81687e70
[440673.170657] Call Trace:
[440673.173197] [] ? bit_wait+0x50/0x50
[440673.178437] [] schedule+0x29/0x70
[440673.183492] [] schedule_timeout+0x239/0x2c0
[440673.189417] [] ? jbd2_journal_try_to_free_buffers+0xd8/0x120 [jbd2]
[440673.197426] [] ? ext4_releasepage+0x52/0xb0 [ext4]
[440673.203952] [] ? bit_wait+0x50/0x50
[440673.209185] [] io_schedule_timeout+0xae/0x130
[440673.215285] [] io_schedule+0x18/0x20
[440673.220603] [] bit_wait_io+0x11/0x50
[440673.225917] [] __wait_on_bit+0x65/0x90
[440673.231409] [] wait_on_page_bit+0x81/0xa0
[440673.237163] [] ? wake_bit_function+0x40/0x40
[440673.243174] [] truncate_inode_pages_range+0x3bb/0x740
[440673.249962] [] truncate_inode_pages_final+0x5e/0x90
[440673.256582] [] ext4_evict_inode+0x70/0x4d0 [ext4]
[440673.263028] [] evict+0xa7/0x170
[440673.267912] [] dispose_list+0x3e/0x50
[440673.273318] [] invalidate_inodes+0x134/0x160
[440673.279332] [] __invalidate_device+0x3a/0x60
[440673.285341] [] flush_disk+0x2b/0xd0
[440673.290572] [] check_disk_size_change+0x89/0x90
[440673.296845] [] ? put_device+0x17/0x20
[440673.302248] [] revalidate_disk+0x54/0x80
[440673.307916] [] raid5_finish_reshape+0x5f/0x180 [raid456]
[440673.314971] [] md_reap_sync_thread+0x54/0x150
[440673.321066] [] md_check_recovery+0x119/0x4f0
[440673.327081] [] raid5d+0x594/0x930 [raid456]
[440673.333006] [] md_thread+0x155/0x1a0
然后查看堆栈的信息
crash> bt -f
...
#17 [ffff88005499fcc8] revalidate_disk at ffffffff81235984
ffff88005499fcd0: ffff88020e701000 ffff88022388a400
ffff88005499fce0: ffff880210ac0000 ffff88022388a4a8
ffff88005499fcf0: ffff88005499fd18 ffffffffa09f2c9f
#18 [ffff88005499fcf8] raid5_finish_reshape at ffffffffa09f2c9f [raid456]
ffff88005499fd00: ffff88020e701000 ffff88020e701288
ffff88005499fd10: ffff880210ac0000 ffff88005499fd30
ffff88005499fd20: ffffffff814fec14 (这是md_reap_sync_thread的返回地址)
#19 [ffff88005499fd20] md_reap_sync_thread at ffffffff814fec14
ffff88005499fd28: ffff88020e701000 ffff88005499fd58
ffff88005499fd38: ffffffff814ff0e9
反汇编是可以看到某一个函数和具体寄存器的关系的,反汇编查看一下
crash> dis -l md_reap_sync_thread | grep -C 3 ffffffff814fec14
...
/usr/src/debug/kernel-3.10.0-547.el7/linux-3.10.0-547.el7.x86_64/drivers/md/md.c: 8232
0xffffffff814fec0f : mov %rbx,%rdi
0xffffffff814fec12 : callq *%rax
/usr/src/debug/kernel-3.10.0-547.el7/linux-3.10.0-547.el7.x86_64/drivers/md/md.c: 8237
0xffffffff814fec14 : mov 0x21c(%rbx),%eax
在调用raid5_finish_reshape, 参数传递给了rdi。也就是rbx和rdi中的地址就是mddev。
此时需要把堆栈的地址对应具体的寄存器。调用一个函数,首先做的就是压栈。
crash> dis -l raid5_finish_reshape
/usr/src/debug/kernel-3.10.0-547.el7/linux-3.10.0-547.el7.x86_64/drivers/md/raid5.c: 7593
0xffffffffa09f2c40 : data32 data32 data32 xchg %ax,%ax [FTRACE NOP]
0xffffffffa09f2c45 : push %rbp
0xffffffffa09f2c46 : mov %rsp,%rbp
0xffffffffa09f2c49 : push %r13
0xffffffffa09f2c4b : push %r12
0xffffffffa09f2c4d : push %rbx
根据压栈的顺序进行填充
#18 [ffff88005499fcf8] raid5_finish_reshape at ffffffffa09f2c9f [raid456]
ffff88005499fd00: ffff88020e701000 (rbx) ffff88020e701288 (r12)
ffff88005499fd10: ffff880210ac0000 (r13) ffff88005499fd30 (rbp)
ffff88005499fd20: ffffffff814fec14 (ret ip)
现在就得到mddev的值啦ffff88020e701000
验证一下:
crash> mddev ffff88020e701000 | grep private
private = 0xffff88022388a400,
crash> r5conf ffff88022388a400 | grep mddev
mddev = 0xffff88020e701000,
有了mddev的地址,我还想获得组成raid5成员盘md_rdev的地址。所有的成员盘都在mddev->disks链表上
struct mddev {
...
disks = {
next = 0xffff88021127e000,
prev = 0xffff880053c79000
},
但是链接到该链表上的是md_rdev->same_set。
crash> struct -ox md_rdev
struct md_rdev {
[0x0] struct list_head same_set; //恰好是第一个元素,所以disks上的指针及位md_rdev的指针。
crash> list -H ffff88021127e000
ffff880053c7aa00
ffff880053c7ba00
ffff880053c78400
ffff880053c78a00
ffff880053c78000
ffff880053c79000
ffff88020e701018
Ok 搞定
阅读(2393) | 评论(0) | 转发(0) |