[原创文章不易,转载请注明出处链接]
通过一个简单实用的程序实例和调试过程,打印出变量和代码在内存中的分配位置,分析x86架构CPU下程序运行时汇编级指令的表现,并推导出各级调用函数的栈帧内存布局。
1. 打印内存布局
示例程序源代码如下所示。
-
1 #include <stdio.h>
-
2 #include <stdlib.h>
-
3 #include <string.h>
-
4
-
5 int gb = 0x50505050;
-
6 char *gb_string = "/bin/sh";
-
7
-
8 unsigned int get_bp(int val)
-
9 {
-
10 printf("ebp_caller=%#x.n", *(&val-2));
-
11 printf("ret_caller=%#x.n", *(&val-1));
-
12 __asm__("movl %ebp,%eax");
-
13 }
-
14
-
15 unsigned int get_ip(void)
-
16 {
-
17 __asm__("lea main,%eax");
-
18 }
-
19
-
20 int
-
21 main()
-
22 {
-
23 int stk = 0x51515151;
-
24 char *stk_string = "/bin/sh";
-
25 char stk_buf[32];
-
26 char *heap = NULL;
-
27
-
28 heap = malloc(32);
-
29
-
30 printf("stack: ebp_cur=%#x, firstval:%#x, buf:%#x.n", *((int*)get_bp(1)), &stk, stk_buf);
-
31 printf("heap: alloc in %#x(stk_ptr:%#x).n", heap, &heap);
-
32 printf("global data: gb:%#x.n", &gb);
-
33 printf("rdonly data: stk_string_prt->%#x(stkaddr:%#x), gb_string_prt->%#x(gbaddr:%#x).n", stk_string, &stk_string, gb_string, &g b_string);
-
34 printf("code text: main:%#x.n", get_ip());
-
35
-
36 return 0;
-
37 }
从函数get_bp可知,获取ebp有两个方法。一是直接嵌入汇编代码,利用eax是装返回值寄存器的特点。二是利用参数在栈帧中ra上面的固定位置的特点。
直接运行全部程序
(gdb) r
Starting program: /mnt/hgfs/vshare/test_stackframe/test_memory_list2
Reading symbols from shared object read from target memory...done.
Loaded system supplied DSO at 0xaee000
ebp_caller=0xbfea0b88.
ret_caller=0x8048433.
stack: ebp_cur=0xbfea0b88, firstval:0xbfea0b80, buf:0xbfea0b5c.
heap: alloc in 0x89a9008(stk_ptr:0xbfea0b58).
global data: gb:0x8049784.
rdonly data: stk_string_prt->0x8048580(stkaddr:0xbfea0b7c), gb_string_prt->0x8048580(gbaddr:0x8049788).
code text: main:0x80483f2.
Program exited normally.
上面打印的顺序其实也是内存由高到低的排序。
2. 多层函数调用
在第10行打印之前设置断点,
(gdb) b 10
运行
(gdb) r
Starting program: /mnt/hgfs/vshare/test_stackframe/test_memory_list2
Reading symbols from shared object read from target memory...done.
Loaded system supplied DSO at 0x705000
Breakpoint 1, get_bp (val=1) at test_memory_list.c:10
10 printf("ebp_caller=%#x.\n", *(&val-2));
停在断点处,准备执行第10行的打印代码。
源代码级单步运行,即执行第10行的打印代码
(gdb) s
ebp_caller=0xbfc7f7f8.
11 printf("ret_caller=%#x.\n", *(&val-1));
查看当前准备执行的代码位置:
(gdb) x/i $eip
0x080483ce <>: lea 0x4(%ebp),%eax
反汇编get_bp函数,看看get_bp函数的汇编代码,准备进行汇编代码级单步运行。
(gdb) disass $eip
Dump of assembler code for function get_bp:
0x080483b4 <>: push %ebp
0x080483b5 <>: mov %esp,%ebp
0x080483b7 <>: sub $0x8,%esp
0x080483ba <>: mov %ebp,%eax
0x080483bc <>: mov (%eax),%eax
0x080483be <>: mov %eax,0x4(%esp)
0x080483c2 <>: movl $0x8048588,(%esp)
0x080483c9 <>: call 0x80482f0
0x080483ce <>: lea 0x4(%ebp),%eax
0x080483d1 <>: mov (%eax),%eax
0x080483d3 <>: mov %eax,0x4(%esp)
0x080483d7 <>: movl $0x8048599,(%esp)
0x080483de <>: call 0x80482f0
0x080483e3 <>: mov %ebp,%eax
0x080483e5 <>: leave
0x080483e6 <>: ret
End of assembler dump.
改成汇编代码级单步运行
(gdb) si
0x080483d1 11 printf("ret_caller=%#x.\n", *(&val-1));
(gdb) si
0x080483d3 11 printf("ret_caller=%#x.\n", *(&val-1));
(gdb) si
0x080483d7 11 printf("ret_caller=%#x.\n", *(&val-1));
(gdb) si
0x080483de 11 printf("ret_caller=%#x.\n", *(&val-1));
“0x080483de : call 0x80482f0”说明准备调用子函数了,子函数地址为0x80482f0。
顺带看一下调用子函数的两个参数$0x8048599和%eax
(gdb) x/s 0x8048599
0x8048599 <<_IO_stdin_used+29>>: "ret_caller=%#x.\n"
(gdb) x $eax
0x08048433 <>: mov (%eax),%edx
(gdb) i symbol $eax
main + 65 in section .text
与原代码“11 printf("ret_caller=%#x.\n", *(&val-1));”相符。第二个参数$eax正是main调用get_bp的返回地址,如下所示:
(gdb) disass main
Dump of assembler code for function main:
……
0x08048427 <>: movl $0x1,(%esp)
0x0804842e <>: call 0x80483b4 //调用get_bp
0x08048433 <>: mov (%eax),%edx
……
继续进行汇编代码级单步运行:
(gdb) si
0x080482f0 in ?? ()
此代码地址无法识别!?用info symbol查一下
(gdb) i symbol 0x080482f0
No symbol matches 0x080482f0.
果然没找到有效符号。看看是什么代码:
(gdb) x/5i 0x080482f0
0x80482f0: jmp *0x8049770
0x80482f6: push $0x10
0x80482fb: jmp 0x80482c0
0x8048300 <<__gmon_start__@plt>>: jmp *0x8049774
0x8048306 <<__gmon_start__@plt+6>>: push $0x18
看看“0x80482f0: jmp *0x8049770”这句跳到何处,查看内存地址0x8049770处的内容:
(gdb) x/16x 0x08049770
0x8049770 <<_GLOBAL_OFFSET_TABLE_+20>>: 0x00a95284 0x08048306 0x00000000 0x00000000
0x8049780 <>: 0x08049688 0x50505050 0x08048580 0x00000000
0x8049790: 0x00000000 0x00000000 0x00000000 0x00000000
0x80497a0: 0x00000000 0x00000000 0x00000000 0x00000000
原来是GOT(即存放所有全局变量和函数指针的全局偏移表),看看0x00a95284是什么代码:
(gdb) x/i 0x00a95284
0xa95284 <>: push %ebp
这个地址是库函数printf。看来动态共享库的加载内存地址在更低的地方,这里是在低16M的虚拟内存空间。而且printf的地址已经动态加载进来了。要知道每次程序重新运行时,GOT表中printf的地址是不同的。
继续汇编代码级单步运行
(gdb) si
0x00a95284 in printf () from /lib/libc.so.6
(gdb) si
0x00a95285 in printf () from /lib/libc.so.6
进入printf这个库函数了。
3. 栈帧布局
我们现在可以展示一下栈内存中栈帧的布局。
查看目前CPU寄存器的值和状态。
(gdb) i r
eax 0x8048433 134513715
ecx 0x0 0
edx 0xb820b0 12066992
ebx 0xb80ff4 12062708
esp 0xbfc7f798 0xbfc7f798
ebp 0xbfc7f7a8 0xbfc7f7a8
esi 0xa4ecc0 10808512
edi 0x0 0
eip 0xa95285 0xa95285
eflags 0x200282 2097794
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
反汇编printf函数,看看printf函数的汇编代码
(gdb) disass $eip
Dump of assembler code for function printf:
0x00a95284 <>: push %ebp
0x00a95285 <>: mov %esp,%ebp
0x00a95287 <>: push %ebx
0x00a95288 <>: sub $0x10,%esp
0x00a9528b <>: call 0xa67650
0x00a95290 <>: add $0xebd64,%ebx
0x00a95296 <>: lea 0xc(%ebp),%eax
0x00a95299 <>: mov %eax,0xfffffff8(%ebp)
0x00a9529c <>: mov %eax,0x8(%esp)
0x00a952a0 <>: mov 0x8(%ebp),%eax
0x00a952a3 <>: mov %eax,0x4(%esp)
0x00a952a7 <>: mov 0xfffffe74(%ebx),%eax
0x00a952ad <>: mov (%eax),%eax
0x00a952af <>: mov %eax,(%esp)
0x00a952b2 <>: call 0xa8c0f9
0x00a952b7 <>: add $0x10,%esp
0x00a952ba <>: pop %ebx
0x00a952bb <>: pop %ebp
0x00a952bc <>: ret
0x00a952bd <>: nop
0x00a952be <>: nop
0x00a952bf <>: nop
---Type to continue, or q to quit---
End of assembler dump.
由于下一步“0x00a95285 : mov %esp,%ebp”才是更新ebp指针,还没有执行,而且由“0x00a95288 : sub $0x10,%esp”可知printf还为栈帧申请了16bytes的空间。所以我们显示栈内存空间时往回退10个字即40bytes。
显示当前栈内存:
(gdb) x/80x $ebp-40
0xbfc7f780: 0x00a952b7 0x00b814c0 0x08048588 0xbfc7f7a4
0xbfc7f790: 0xbfc7f7a4 0x00b80ff4 0xbfc7f7a8 0x080483e3
0xbfc7f7a0: 0x08048599 0x08048433 0xbfc7f7f8 0x08048433
0xbfc7f7b0: 0x00000001 0x00000000 0x00000001 0x00000000
0xbfc7f7c0: 0x00000000 0x00000000 0x0952c008 0x08048340
0xbfc7f7d0: 0x00b80ff4 0x0804975c 0xbfc7f7e8 0x080482bd
0xbfc7f7e0: 0x00b81c80 0xbfc7f89c 0xbfc7f808 0x08048580
0xbfc7f7f0: 0x51515151 0xbfc7f810 0xbfc7f868 0x00a677e4
0xbfc7f800: 0x00a4ecc0 0x080484d8 0xbfc7f868 0x00a677e4
0xbfc7f810: 0x00000001 0xbfc7f894 0xbfc7f89c 0x00a425bb
0xbfc7f820: 0x00000000 0xb7f81690 0x00000001 0x00000001
0xbfc7f830: 0x00b80ff4 0x00a4ecc0 0x00000000 0xbfc7f868
0xbfc7f840: 0x3bdf24f6 0x84beab4f 0x00000000 0x00000000
0xbfc7f850: 0x00000000 0x00a479e0 0x00a42e40 0x00a4efd8
0xbfc7f860: 0x00000001 0x08048310 0x00000000 0x08048331
0xbfc7f870: 0x080483f2 0x00000001 0xbfc7f894 0x080484d8
0xbfc7f880: 0x080484d0 0x00a42e40 0xbfc7f88c 0x00a4b2fb
0xbfc7f890: 0x00000001 0xbfc8199a 0x00000000 0xbfc819cd
0xbfc7f8a0: 0xbfc819e0 0xbfc819ef 0xbfc819fa 0xbfc81a0a
0xbfc7f8b0: 0xbfc81a1e 0xbfc81a2c 0xbfc81a60 0xbfc81a72
由于printf只运行了第一行汇编代码,还没有更新ebp指针,所以“i r”寄存器信息中的ebp=0xbfc7f7a8是get_bp的帧基址值。又printf运行的第一行汇编代码已经将0xbfc7f7a8压栈,其存放位置就是printf的帧基址值,往上查找上面的栈内存,printf的帧基址值是0xbfc7f798。main的帧基址值容易找到,就是0xbfc7f7a8处保存的值0xbfc7f7f8,依此类推。
如图中所示,深灰色部分是printf函数的栈帧,黄色部分是get_bp函数的栈帧,青色部分是main函数的栈帧。其中粗体标示了每个函数栈帧中保存的返回指针ra,带下划线的是有效的ebp-ra对。main函数栈帧中保存的返回指针ra据说要再往栈底方向增大4个字的位置。
4. 栈回溯
根据刚才ebp链和ebp-ra成对出现的特点,即可回溯出函数调用的关系:
(gdb) i symbol $eip
printf + 1 in section .text
(gdb) i symbol 0x080483e3
get_bp + 47 in section .text
(gdb) i symbol 0x08048433
main + 65 in section .text
(gdb) i symbol 0x00a677e4
__libc_start_main + 220 in section .text
(gdb) i symbol 0x08048331
_start + 33 in section .text
用来GDB的backtrace命令查看,结果只能回溯到main函数。
(gdb) bt
#0 0x00a95285 in printf () from /lib/libc.so.6
#1 0x080483e3 in get_bp (val=1) at test_memory_list.c:11
#2 0x08048433 in main () at test_memory_list.c:30
阅读(392) | 评论(0) | 转发(0) |