Andrew Huang 转载请注明作者及网址
oops是英语口语"糟糕"的意思,当LINUX 内核发生严重错误时,比如内存段错误时,将会提示一大段信息。就提示 Oops,因此得名,
Oops提示信息相当多,包括出问题时的,各个常用寄存器的值,调用的堆栈,以及出错的可能原因
1.oops 的格式
内核的文档里的详细的Oops的说明,的名字是
Documentation/oops-tracing.txt
http://www.mjmwired.net/kernel/Documentation/oops-tracing.txt
oops第一段出错是内存page地址,
例如提示
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c7510000
往往表示碰到空指针了。
第二段出错时是寄存器的快照,不同CPU显示不同情况。
其中基本上所有CPU都会有的PC寄存器(Program counter register),它保存最后出问题的地址。
LR保存着函数返回地址。这里就比较容易看出是谁出问题
<8>PC is at memcpy+0x48/0x330
<8>LR is at s3c_fimc_v4l2_enum_fmt_vid_cap+0x44/0x4c
这里可以看出,最后出问题代码在 memcpy里,而它是被s3c_fimc_v4l2_enum_fmt_vid_cap调用。结合前面的原因,可以知道应该是在memcpy里碰到空指针。
函数名后面的两个数字,第一个是调用偏移量,第二个是函数总尺寸。
比如上例,在s3c_fimc_v4l2_enum_fmt_vid_cap的函数机器代码总尺寸是0x4c,在偏移量 0x44处调用了memcpy,而memcpy本身的总尺寸是0x330,而在偏移量0x48处,导致出现Oops信息
最重要是第三段,即可出错的调用堆栈(Call Trace).从这里即可推算出错的所在函数,参考下例.
这里被调用函数上,调用函数在下。
- <8>Backtrace:
-
<8>[] (s3c_fimc_v4l2_enum_fmt_vid_cap+0x0/0x4c) from [] (__video_do_ioctl+0x600/0x3238)
-
<8>[] (__video_do_ioctl+0x0/0x3238) from [] (v4l1_compat_get_capabilities+0x1c0/0x25c)
-
<8>[] (v4l1_compat_get_capabilities+0x0/0x25c) from [] (v4l_compat_translate_ioctl+0x150/0x238)
-
<8>[] (v4l_compat_translate_ioctl+0x0/0x238) from [] (__video_do_ioctl+0x118/0x3238)
-
<8>[] (__video_do_ioctl+0x0/0x3238) from [] (__video_ioctl2+0x198/0x2a0)
-
<8>[] (__video_ioctl2+0x0/0x2a0) from [] (video_ioctl2+0x1c/0x20)
-
<8>[] (video_ioctl2+0x0/0x20) from [] (vfs_ioctl+0x64/0x74)
-
<8>[] (vfs_ioctl+0x0/0x74) from [] (do_vfs_ioctl+0x438/0x488)
上述最后几个调用就是
v4l1_compat_get_capabilities--> __video_do_ioctl --> s3c_fimc_v4l2_enum_fmt_vid_cap
函数名前的地址表示调用地址,函数名后面两个数字,也是表示调用入口偏移量(总是0x0)和函数机器代码总尺寸
而from后面的地址是调用该函数的地址。以及对应的函数名和其偏移量/总尺寸
如下例:
[] (s3c_fimc_v4l2_enum_fmt_vid_cap+0x0/0x4c) from [] (__video_do_ioctl+0x600/0x3238)
[] (__video_do_ioctl+0x0/0x3238) from []
__video_do_ioctl在 该函数总尺寸是0x3238,调用子函数的偏移量为0x0,这个偏移量的绝对地址是0xc01ca8e8处,这里调用函数s3c_fimc_v4l2_enum_fmt_vid_cap.
s3c_fimc_v4l2_enum_fmt_vid_cap这个函数总尺寸是0x4c,而它的from后的地址,正好就是上一个调用函数的地址。并且在括号内部注明是__video_do_ioctl的偏移量0x600处.
2.一个oops实例
我自己写了一个v4l2的测试程序,在s3c6410(linux 2.6.28)对cmos的的摄像头的驱动执行了ioctl的VIDIOC_ENUM_FMT操作来查询它支持的视频格式,结果就出了oops信息了。
- ###Unable to handle kernel NULL pointer dereference at virtual address 00000000
-
pgd = c7510000
-
[00000000] *pgd=574f6031, *pte=00000000, *ppte=00000000
-
<8>Internal error: Oops: 17 [#1]
-
<8>Modules linked in:
-
<8>CPU: 0 Not tainted (2.6.28.6-FriendlyARM #70)
-
<8>PC is at memcpy+0x48/0x330
-
<8>LR is at s3c_fimc_v4l2_enum_fmt_vid_cap+0x44/0x4c
-
<8>pc : [<c0170c08>] lr : [<c0200138>] psr: 80000013
-
<8>sp : c74fba84 ip : 00000000 fp : c74fbabc
-
<8>r10: c04dc710 r9 : 00000001 r8 : c0405602
-
<8>r7 : c74766e0 r6 : c04dc710 r5 : 00000000 r4 : c74fbc80
-
<8>r3 : 00000000 r2 : ffffffc0 r1 : 00000000 r0 : c74fbc80
-
<8>Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
-
<8>Control: 00c5387d Table: 57510008 DAC: 00000015
-
<8>Process test_v4l (pid: 642, stack limit = 0xc74fa260)
-
<8>Stack: (0xc74fba84 to 0xc74fc000)
-
<8>ba80: 00000000 c04dc710 c74766e0 c0405602 c74fbc80 c74fbc80 c0200138
-
<8>baa0: c02000f4 c74fbc80 c0495488 c037313c c74fbbac c74fbac0 c01caee8 c0200100
-
<8>bac0: c032ddf4 c73049c0 c005be40 c74fbacc c74fbacc c008ec04 c74fbb04 c74fbae8
-
<8>bae0: c74fbb18 c007107c c04e7c18 00000000 00000000 c0397cc8 ffffffea c74fbb08
-
<8>bb00: c031995c c0319820 00000000 c74aabe0 00000000 c74fbba0 c74fbb3c c74fbb28
-
<8>bb20: c0319a14 00000064 c68cd200 00000000 00000000 c74fbb9c c03141e4 c0319978
-
<8>bb40: c032bd1c 00000000 c74d5e00 c74fbba0 c0397cc8 00000000 bf000000 00000000
-
<8>bb60: c747c804 c74cb00c c74fbb8c c74fbb78 c73049f0 75e3b180 00000000 00000000
-
<8>bb80: c74fbbb4 c74fbe38 c7501100 c74766e0 c01ca8e8 00000000 c74fbc80 00000001
-
<8>bba0: c74fbd2c c74fbbb0 c01d04b4 c01ca8f4 c74fbc1c 00000000 00000000 00000000
-
<8>bbc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
-
<8>bbe0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
-
<8>bc00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
-
<8>bc20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
-
<8>bc40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
-
<8>bc60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
-
<8>bc80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
-
<8>bca0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
-
<8>bcc0: 00000000 00000000 00000000 00000000 c0040a90 00000004 00000000 00000000
-
<8>bce0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
-
<8>bd00: c00417f8 c74fbe38 c0495488 c037313c c74766e0 803c7601 00000000 c04dc710
-
<8>bd20: c74fbd3c c74fbd30 c01d14b0 c01d0300 c74fbe2c c74fbd40 c01caa00 c01d136c
-
<8>bd40: c74fbd5c c74fbd50 c01a96a4 c01941f4 c74fbd74 c74fbd60 c004c4c4 c01a9698
-
<8>bd60: c04cd140 c74fa000 c74fbda4 c74fbd78 c004ca24 c004c8c0 c006c830 c032de08
-
<8>bd80: 00000000 00000045 0000001a c72ff425 00000000 c0483b18 c74fbdb4 c74fbda8
-
<8>bda0: c004ca90 c004c95c c74fbdcc c74fbdb8 c74fbdd4 c74fbdc0 c006bfd4 c003783c
-
<8>bdc0: c0471c88 c0476e4c c74fbdf4 c74fbdd8 c006bf20 c006bfac 00020000 c0476e4c
-
<8>bde0: c04768f4 c680c7b8 c74fbe14 c74fbdf8 c006eb64 c01749b4 00000000 c748b7e8
-
<8>be00: c680c7b4 00000000 c74766e0 c74fbe38 00000000 00000000 00000000 803c7601
-
<8>be20: c74fbee4 c74fbe30 c01cdcb8 c01ca8f4 00000002 bec77c88 2d633373 636d6966
-
<8>be40: 00000030 00000000 00000000 00000000 00000000 00000000 00000029 00000002
-
<8>be60: 00000000 00000000 00000000 00000000 00000000 c748b7e8 c74fbedc c74fbe88
-
<8>be80: c007ca1c c006ee54 c00417f8 0000007d c75249f4 00000000 c74b2870 4007d000
-
<8>bea0: 00000000 00000042 4007d000 c0565960 c74fbed4 00000000 c748b7e8 c74766e0
-
<8>bec0: 803c7601 00000000 00000003 c002df08 c74fa000 00000000 c74fbef4 c74fbee8
-
<8>bee0: c01cdddc c01cdb2c c74fbf14 c74fbef8 c009e108 c01cddcc c68aa1b8 bec77c88
-
<8>bf00: c74766e0 803c7601 c74fbf7c c74fbf18 c009e57c c009e0b0 c74fbfb0 c748b7e8
-
<8>bf20: c74fbf3c c74fbf30 c005f5f8 c0176364 c74fbf7c c74fbf40 c0034394 c005f5f4
-
<8>bf40: c00927d0 c00bab14 00000000 00000200 c70b0dc0 ffffffff c74766e0 bec77c88
-
<8>bf60: 803c7601 00000003 c002df08 c74fa000 c74fbfa4 c74fbf80 c009e60c c009e150
-
<8>bf80: 60000010 00000000 00009154 00000140 00000000 00000036 00000000 c74fbfa8
-
<8>bfa0: c002dd60 c009e5d8 00009154 00000140 00000003 803c7601 bec77c88 bec77c88
-
<8>bfc0: 00009154 00000140 00000000 00000036 00000000 00000000 40180b60 bec77cd4
-
<8>bfe0: 00000000 bec77c80 0000884c 4010ac3c 60000010 00000003 00000000 00000000
-
<8>Backtrace:
-
<8>[<c02000f4>] (s3c_fimc_v4l2_enum_fmt_vid_cap+0x0/0x4c) from [<c01caee8>] (__video_do_ioctl+0x600/0x3238)
-
<8> r6:c037313c r5:c0495488 r4:c74fbc80 r3:c02000f4
-
<8>[<c01ca8e8>] (__video_do_ioctl+0x0/0x3238) from [<c01d04b4>] (v4l1_compat_get_capabilities+0x1c0/0x25c)
-
<8>[<c01d02f4>] (v4l1_compat_get_capabilities+0x0/0x25c) from [<c01d14b0>] (v4l_compat_translate_ioctl+0x150/0x238)
-
<8>[<c01d1360>] (v4l_compat_translate_ioctl+0x0/0x238) from [<c01caa00>] (__video_do_ioctl+0x118/0x3238)
-
<8>[<c01ca8e8>] (__video_do_ioctl+0x0/0x3238) from [<c01cdcb8>] (__video_ioctl2+0x198/0x2a0)
-
<8>[<c01cdb20>] (__video_ioctl2+0x0/0x2a0) from [<c01cdddc>] (video_ioctl2+0x1c/0x20)
-
<8>[<c01cddc0>] (video_ioctl2+0x0/0x20) from [<c009e108>] (vfs_ioctl+0x64/0x74)
-
<8>[<c009e0a4>] (vfs_ioctl+0x0/0x74) from [<c009e57c>] (do_vfs_ioctl+0x438/0x488)
-
<8> r6:803c7601 r5:c74766e0 r4:bec77c88 r3:c68aa1b8
-
<8>[<c009e144>] (do_vfs_ioctl+0x0/0x488) from [<c009e60c>] (sys_ioctl+0x40/0x64)
-
<8> r9:c74fa000 r8:c002df08 r7:00000003 r6:803c7601 r5:bec77c88
-
<8>r4:c74766e0
-
<8>[<c009e5cc>] (sys_ioctl+0x0/0x64) from [<c002dd60>] (ret_fast_syscall+0x0/0x2c)
-
<8> r7:00000036 r6:00000000 r5:00000140 r4:00009154
-
<8>Code: ba000002 f5d1f03c f5d1f05c f5d1f07c (e8b151f8)
-
---[ end trace 1c239a4a0a70678b ]---
-
-
Segmentation fault
如果从这一些信息看出来,是s3c_fimc_v4l2_enum_fmt_vid_cap调用的memcpy中出错,而调用位置在较后的位置(0x4c中0x40处)。查看代码如下:
- static int s3c_fimc_v4l2_enum_fmt_vid_cap(struct file *filp, void *fh,
-
struct v4l2_fmtdesc *f)
-
{
-
struct s3c_fimc_control *ctrl = (struct s3c_fimc_control *) fh;
-
int index = f->index;
-
-
if (index >= S3C_FIMC_MAX_CAPTURE_FORMATS)
-
return -EINVAL;
-
-
memset(f, 0, sizeof(*f));
-
memcpy(f, ctrl->v4l2.fmtdesc + index, sizeof(*f));
-
-
return 0;
-
}
从这里看,比较明显是memcpy碰到空指针了。可以有能是f,也有可以是ctrl(即fh).因此可以,这里一f是应用程序传来,如果有问题,在memset处就出错了,因此只可能是ctrl处为空指地,打印一下,发现
ctrl->v4l2.fmtdesc 空指针。因此加入判断语句
因此修改代码如下
- static int s3c_fimc_v4l2_enum_fmt_vid_cap(struct file *filp, void *fh,
-
struct v4l2_fmtdesc *f)
-
{
-
struct s3c_fimc_control *ctrl = (struct s3c_fimc_control *) fh;
-
int index = f->index;
-
-
if (index >= S3C_FIMC_MAX_CAPTURE_FORMATS)
-
return -EINVAL;
-
-
memset(f, 0, sizeof(*f));
-
-
if(ctrl->v4l2.fmtdesc == 0) //add by Andrew Huang
-
return -EINVAL;
-
-
memcpy(f, ctrl->v4l2.fmtdesc + index, sizeof(*f));
-
-
return 0;
-
}
修改测试后,果然不出现oops信息,应用程序运行正常.
但深层次的原因没有解决,这里是原因是
ctrl = (struct s3c_fimc_control *) fh,而
fh 是 __video_do_ioctl():void *fh = file->private_data;
而file->private_data;初始化在
static int s3c_fimc_open(struct inode *inode, struct file *filp)()
{
id = MINOR(inode->i_rdev);
ctrl = &s3c_fimc.ctrl[id];
filp->private_data = ctrl;
}
在s3c_fimc.ctrl[].v4l2 的各个指针完全没有初始化。这个当然驱动本身没有完全实现功能。
另外一个同样问题是出现在对v4l1的 VIDIOCGPICT支持问题上。
同样没有初始化,出现除0错误.在除0前判断一下。
- static noinline int v4l1_compat_get_picture(
-
struct video_picture *pict,
-
struct file *file,
-
v4l2_kioctl drv)
-
{
-
int err;
-
struct v4l2_format *fmt;
-
-
fmt = kzalloc(sizeof(*fmt), GFP_KERNEL);
-
if (!fmt) {
-
err = -ENOMEM;
-
return err;
-
}
-
-
pict->brightness = get_v4l_control(file,
-
V4L2_CID_BRIGHTNESS, drv);
-
pict->hue = get_v4l_control(file,
-
V4L2_CID_HUE, drv);
-
pict->contrast = get_v4l_control(file,
-
V4L2_CID_CONTRAST, drv);
-
pict->colour = get_v4l_control(file,
-
V4L2_CID_SATURATION, drv);
-
pict->whiteness = get_v4l_control(file,
-
V4L2_CID_WHITENESS, drv);
-
-
fmt->type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
-
err = drv(file, VIDIOC_G_FMT, fmt);
-
if (err < 0) {
-
dprintk("VIDIOCGPICT / VIDIOC_G_FMT: %d\n", err);
-
-
goto done;
-
}
-
-
-
if(fmt->fmt.pix.width >0) //add by Andrew Huang
-
pict->depth = ((fmt->fmt.pix.bytesperline << 3)
-
+ (fmt->fmt.pix.width - 1))
-
/ fmt->fmt.pix.width;
-
-
printk(KERN_WARNING "hxy %s:02\n",__func__);
-
-
pict->palette = pixelformat_to_palette(
-
fmt->fmt.pix.pixelformat);
-
done:
-
kfree(fmt);
-
return err;
-
}
3.oops进一步处理
如果需要进一步处理,可以用dmesg 把原始的oops导入到一个文本文件当中,用应用程序工具 ksymoops 来操作.但一般这个对桌面版的LINUX比较有效
ksymoops 需要几项内容:Oops 消息输出、来自正在运行的内核的 System.map 文件,还有 /proc/ksyms、vmlinux 和 /proc/modules。关于如何使用 ksymoops,内核源代码 /usr/src/linux/Documentation/oops-tracing.txt 中或 ksymoops 手册页上有完整的说明可以参考。Ksymoops 反汇编代码部分,指出发生错误的指令,并显示一个跟踪部分表明代码如何被调用。
4.扩展阅读:
<<掌握 Linux 调试技术 -- 在 Linux 上找出并解决程序错误的主要方法>>
http://www.ibm.com/developerworks/cn/linux/sdk/l-debug/index.html
阅读(6048) | 评论(0) | 转发(6) |