段错误查找记录1-ihasudgq-ChinaUnix博客

Chinaunix首页 | 论坛 | 博客

ihasudgq的ChinaUnix博客

首页　| 　博文目录　| 　关于我

博客访问： 191239
博文数量： 89
博客积分： 0
博客等级：民兵
技术积分： 828
用户组：普通用户
注册时间： 2013-10-08 10:44

文章分类

全部博文（89）

灵域www.uy0.net（79）
未分配的博文（10）

文章存档

2014年（9）

2013年（80）

我的朋友

最近访客

推荐博文

相关博文

段错误查找记录1

分类： LINUX

2013-11-08 16:36:05

*.*.*.45 app 总是段错误，且不产生core文件(ulimit -c unlimited 已设置)

grep segfault /var/log/messages
Oct 31 17:39:40 -45 kernel: *Serve[9909]: segfault at 3946 ip 0000000000003946 sp 00007f8de69a9e18 error 14 in *Server[400000+13000]
Oct 31 18:32:41 r-45 kernel: *Serve[17038]: segfault at 7de6 ip 0000000000007de6 sp 00007f09cacf9128 error 14 in *Server[400000+13000]
Oct 31 18:53:43 r-45 kernel: *Serve[17281]: segfault at 13a4 ip 0000003c0ac0920b sp 00007f1ebdd64bc0 error 4 in ld-2.12.so[3c0ac00000+20000]

dmesg | tail -30
*Info[5879]: segfault at 7d1 ip 0000003c0b0480ac sp 00007fff0ef9d540 error 4 in libc-2.12.so[3c0b000000+18a000]
*Serve[17038]: segfault at 7de6 ip 0000000000007de6 sp 00007f09cacf9128 error 14 in *Server[400000+13000]
*Serve[17281]: segfault at 13a4 ip 0000003c0ac0920b sp 00007f1ebdd64bc0 error 4 in ld-2.12.so[3c0ac00000+20000]

glibc 库版本查看
ldd
(GNU libc) 2.12 对malloc 是非线程安全版本至少需要>= 2.3.6

以“*Serve[17038]: segfault at 7de6 ip 0000000000007de6 sp 00007f09cacf9128 error 14 in *Server[400000+13000]”为例
IP is the Instruction Pointer aka Program Counter. SP is the stack pointer.
The 'error 4' is a low-level error code which is irrelevant for us.

objdump -D | grep 7de6
407de6: bb ff ff ff ff mov $0xffffffff,%ebx
17de6: 01 03 add %eax,(%rbx)

400000 - 7de6(ip) = 3F821A

whith gdb:
info symbol *3F821A

errcode 详见linux 内核源码arch/*/mm/fault.c 描述，
* Page fault error code bits:
*
* bit 0 == 0: no page found1: protection fault
* bit 1 == 0: read access1: write access
* bit 2 == 0: kernel-mode access1: user-mode access
* bit 3 == 1: use of reserved bit detected
* bit 4 == 1: fault was an instruction fetch
(从这里可以看出，网上有些文章说errcode 范围为0~7是不对的)

以下是我查看linux源码中相应解释:

address - the location in memory the code is trying to access (it's likely that 10 and 11 are offsets from a pointer we expect to be set to a valid value but which is instead pointing to 0)
ip - instruction pointer, ie. where the code which is trying to do this lives
sp - stack pointer
error - Architecture-specific flags; see arch/*/mm/fault.c for your platform.

Event for a shared lib, the "[3c0ac00000+20000]" part should give a hint where the crashing segment of the lib was mapped in memory. "readelf --segments mylib.so" lists these segments, and then you can calculate the EIP offset into the crashing segment and feed that to addr2line (or view it in "objdump -dgS").

backtrace、backtrace_symbols、backtrace_symbols_fd 可以打印函数调用的堆栈
通过捕捉SIGSEGV信号，在信号处理函数中调用堆栈函数，打印函数down挡掉时的堆栈具体见man 手册

gcc 自带内置的函数可以打印堆栈地址：
__builtin_return_address() 返回一个地址(指针形式), 需要知道栈有多少层

eg:// get current address
void* p = __builtin_return_address(0);
printf("0x%x\n", p);
// get callee address
p = __builtin_return_address(1);
// we cannot get more addresses as we don't have any
// information about how many leves of calls we have

libunwind库可以打印详细的堆栈调用，需下载安装，再查看man手册使用

sigaction函数sa_sigaction => siginfo_t => si_addr引起错误的内存地址

段错误：
Nov 3 17:03:54 haier-45 kernel: haierUpassServe[22708]: segfault at 3946 ip 0000000000003946 sp 00007f56edf20e18 error 14 in haierUpassServer[400000+13000]
以上信息说明：
开始是系统当前时间
进程名字及pid
segfault at 引起故障的地址
ip 指令的内存地址
sp 堆栈指针地址, 及栈顶指针
err is not an errno nor a signal numbe, but page fault error code
[400000+13000] 对象崩溃时映射的虚拟内存起始地址和大小

addr2line -e app ip

demon[3702]:segfault at b771c488 ip b771c4b3 sp bfc8b1a0 error 7 in libmodule.so[b771c000+1000]
上面的b771c000是libmodule.so的载入地址，所以运行
addr2line -e libmodule.so ip地址-模块载入地址（b771c4b3-b771c000）
此处也可根据程序ip指令地址 - 模块载入地址，然后反汇编相应的动态库，在反汇编中查找相应的差值(.so文件中指令地址为相对地址，满足共享需要)
如果指令为程序中，可直接反汇编程序，在反汇编中查找相应指令地址

或者gdb重新运行程序，break在相应的指令地址

如果关注的backtrace位于一个动态链接库中，那么麻烦一些，因为动态链接库的基地址不是固定的。
程序每次运行加载动态库的地址是不一样的
这个时候，首先要把进程的memory map找来。在Linux下，进程的memory map可以在/proc//maps文件中得到。然后在这个文件中找到动态链接库的基地址，然后将backtrace中的地址 – 动态链接库的基地址，得到偏移地址offset address, 最后addr2line -C -f -e 。(此方法，进程一结束
proc目下下相应的pid就不存在了) 调用pmap命令也可、 /proc/pid/maps

dladdr可在程序运行时得到本程序加载的动态库信息
dl_iterate_phdr功能同上，但感觉更好用些，具体见man手册

objdump -DCl ./app 反汇编：
08048544

:
main():
/mnt/hgfs/D/code/vm_proj_saving/myproj/c/segfault.c:6
8048544: 55
push %ebp
8048545: 89 e5 mov %esp,%ebp
8048547: 83 e4 f0
and $0xfffffff0,%esp
804854a: 83 ec 20
sub $0x20,%esp
/mnt/hgfs/D/code/vm_proj_saving/myproj/c/segfault.c:7
804854d: c7 44 24 1c 00 00 00
movl $0x0,0x1c(%esp)
8048554: 00
/mnt/hgfs/D/code/vm_proj_saving/myproj/c/segfault.c:9
8048555: b8 44 86 04 08
mov $0x8048644,%eax
注意没有加$的数表示内存地址，而不表示立即数(立即数即为数字，而非变量)
.o 文件所有指令中用到的符号地址都是相对地址，下一步链接器要修改这些指令，把其中的地址都改成加载时的内存地址，这些指令才能正确执行。

总结：可以在程序源码中打印动态库加载地址(dl_iterate_phdr, dladdr)信息用于打印动态库加载地址信息主要是为了算crash发生在动态库中的具体编译地址,

根据偏移地址反汇编(objdump)发生错误的动态库，查找偏移地址处错误代码(具体见上边)

若无动态库或发生错误为应用程序本身，则可直接根据ip (指令地址) 在应用程序反汇编中查找。

至于段错误信号，可以通过信号捕捉，在信号处理函数中打印进程crash时堆栈，打印堆栈有上边三个函数(__builtin_return_address()、backtrace()、libunwind库需安装，再man libunwind查看使用方法)均可打印,。

(以上仅供参考，欢迎指正, 具体代码事例见---段错误查找记录2)

阅读(739) | 评论(0) | 转发(0) |

0

上一篇：hdu 3401 Trade(单调队列优化dp)

下一篇：Tomcat 中的 JSP: 自动编译

给主人留下些什么吧！~~

关于我们 | 关于IT168 | 联系方式 | 广告合作 | 法律声明 | 免费注册

Copyright 2001-2010 ChinaUnix.net All Rights Reserved 北京皓辰网域网络信息技术有限公司. 版权所有

感谢所有关心和支持过ChinaUnix的朋友们