Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1722625
  • 博文数量: 177
  • 博客积分: 9416
  • 博客等级: 中将
  • 技术积分: 2513
  • 用 户 组: 普通用户
  • 注册时间: 2006-01-06 16:08
文章分类

全部博文(177)

文章存档

2013年(4)

2012年(13)

2011年(9)

2010年(71)

2009年(12)

2008年(11)

2007年(32)

2006年(25)

分类: C/C++

2009-05-20 22:58:19

Before we can start, let's have a look at some assembly instructions and registers.
  • ebp: here it is a base pointer which points to the starting of a stack frame.
  • eax: here it is used to store the return value, or temporary value.
  • call: pushes the address of the next instruction (eip) following the subroutine call onto the system stack, and changes program flow to the address specified by its operand.
  • leave: copies ebp to esp to release the stack frame set up for callee. generally speaking, it performs 2 steps:
    • movl %ebp, %esp
    • popl %ebp
  • ret: fetches the return address from the top of the system stack, increments the system stack pointer, and changes program flow to the return address; optional immediate operand added to the new top-of-stack pointer, effectively removing any arguments that the calling program pushed on the stack before the execution of the corresponding CALL instruction.
Stack
The caller will take some actions before the callee is called. They are:
  1. Push EAX, ECX & EDX and some other registers as needed.
  2. Push the arguments of callee to the stack.
  3. Call the callee using call instruction.
The code snippet demonstrates these steps:

d = f(a, b, c);


movl -20(%ebp), %eax
    movl %eax, 8(%esp) # push c,
    movl -16(%ebp), %eax
    movl %eax, 4(%esp) # push b,
    movl -12(%ebp), %eax
    movl %eax, (%esp) # push a
    call _f

|               |
|---------------|
|    Arg #1     |<=esp
|---------------|
|    Arg #2     |<=esp+4
|---------------|
|    Arg #3     |<=esp+8
|---------------|
|Saved registers|
---------------|<=top of stack before the call is initiated.
|     ...       |
|     ...       |
|               |<-ebp(caller's ebp)
Before control is switched to f(), the stack is like below (Note call instruction mentioned above):
|               |
|---------------|
|Return Address |<=esp
|---------------|
|    Arg #1     |
|---------------|
|    Arg #2     |
|---------------|
|    Arg #3     |
|---------------|
|Saved registers|
---------------|<=top of stack before the call is initiated.
|     ...       |
|     ...       |
|               |<=ebp(caller's ebp)
Now the control is transferred to the callee, f(). f() will do the steps below to setup it's stack frame:
  1. Save current ebp to the stack(pushl %ebp)
  2. Copy esp to ebp so that the top of stack serves as a base for addressing. ebp is also the "stack frame" pointer.(movl %esp, %ebp). Now ebp serves as base pointer to stack frame.
  3. Allocate space for local variables and temporary storage(subl $4, $esp)
Below is the code.

int f(int i, int j, int k)
{
    int m = 0;
    m = i + j * k;
    return m;
}

The assembly code:

pushl %ebp
movl %esp, %ebp
subl $4, %esp
movl $0, -4(%ebp)
movl 12(%ebp), %eax
imull 16(%ebp), %eax
addl 8(%ebp), %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
leave
ret

The stack now looks like below:
|               |
|---------------|
|local variable1|<=esp(ebp-4)
|---------------|
| caller's ebp  |<=ebp
|---------------|
|Return Address |
|---------------|
|    Arg #1     |<=ebp+8
|---------------|
|    Arg #2     |<=ebp+12
|---------------|
|    Arg #3     |<=ebp+16
|---------------|
|Saved registers|
---------------|<=top of stack before the call is initiated.
|     ...       |
|     ...       |
|               |
After the execution of f() is done, leave and ret will return the control to caller. The stack frame is the same as above but some registers are changed (that's why sometimes local variables keeps it value after funciton return.):
|               |
|---------------|
|local variable1|
|---------------|
| caller's ebp  |
|---------------|
|Return Address |<=esp
|---------------|
|    Arg #1     |
|---------------|
|    Arg #2     |
|---------------|
|    Arg #3     |
|---------------|
|Saved registers|
---------------|<=top of stack before the call is initiated.
|     ...       |
|     ...       |
|               |<=ebp(caller's ebp)
Then the return value of f() is saved to local variable d of caller:

movl %eax, -24(%ebp)

As we can see, only %eax is used to save the return value. That is, when trying to return big object, we should put the target somewhere else.
Say, we have:

typedef struct _A
{
    int m1;
    int m2;
    int* m3;
}A;
A foo()
{
    A ret;
    ret.m1 = 0;
    ret.m2 = 0;
    ret.m3 = 0;
    return ret;
}

Then we try to get the returned object through:

aA = foo();

Let's look at the assembly code and the stack.

leal -40(%ebp), %eax
movl %eax, (%esp)
call _foo

First 2 instructions tells us that address of the object is pushed to the stack.
| caller's ebp  |<=esp
|---------------|
|Return Address |
|---------------|
|Returned obj --|-----------------------------
|---------------|                            |
|    Arg #1     |                            |
|---------------|                            |
|    Arg #2     |                            |
|---------------|                            |
|    Arg #3     |                            |
|---------------|                            |
|Saved registers|                            |
---------------|                            |
|     ...       |                            |
|     ...       |<=ebp-40 <-------------------
|     ...       |
|     ...       |
|               |
Now let look what does foo() do(in assembly):

pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl 8(%ebp), %edx
movl $0, -24(%ebp)
movl $0, -20(%ebp)
movl $0, -16(%ebp)
movl -24(%ebp), %eax
movl %eax, (%edx)
movl -20(%ebp), %eax
movl %eax, 4(%edx)
movl -16(%ebp), %eax
movl %eax, 8(%edx)
movl %edx, %eax
leave
ret $4

And the stack:
|               |<=esp(ebp-24)
|---------------|
|               |
|---------------|
|               |              -------
|---------------|              | edx |--------
|               |              -------       |
|---------------|                            |
|               |                            |
|---------------|                            |
|               |                            |
|---------------|                            |
| caller's ebp  |<=ebp                       |
|---------------|                            |
|Return Address |<=ebp+4                     |
|---------------|                            |
|  Returned obj |<=ebp+8----------------------
|---------------|                            |
|    Arg #1     |                            |
|---------------|                            |
|    Arg #2     |                            |
|---------------|                            |
|    Arg #3     |                            |
|---------------|                            |
|Saved registers|                            |
---------------|                            |
|     ...       |                            |
|     ...       |<=ebp-40 <-------------------
|     ...       |
|     ...       |
|               |
After the last 2 instructions, the stack and the registers should be (the return value is considered as extra argument):
|               |
|---------------|
|               |
|---------------|
|               |
|---------------|
|               |
|---------------|
|               |
|---------------|
|               |
|---------------|
| caller's ebp  |
|---------------|
|Return Address |
|---------------|
|  Returned obj |<=esp  ----------------------
|---------------|                            |
|    Arg #1     |                            |
|---------------|                            |
|    Arg #2     |                            |
|---------------|                            |
|    Arg #3     |                            |
|---------------|                            |
|Saved registers|                            |
---------------|                            |
|     ...       |                            |
|     ...       |<=ebp-40 <-------------------
|     ...       |
|     ...       |
|               |<=ebp
Now we need to move esp up to make all esp always point to return address on the stack:

subl $4, %esp

In fact,

aA = foo();

can be transformed to:

foo(&aA);

Now we have enough information to go further.
Parameter Passing
In C/C++, the only way to passing parameters is by value. This is different with passing parameters through pointer/reference. The latter is still "passing by value", in which value doesn't mean the object it refers to, but the value of pointer itself.

Let's consider the code snippet below:

int foo(int a, int b)
{
    return a + b;
}
int main()
{
    int i = 10;
    int j = 20;
    foo(i, j);
    return 0;
}

Use -S and -fverbose-asm of gcc to generate the assembly code, we will see:

subl $24, %esp
movl $10, -8(%ebp)
movl $20, -12(%ebp)
movl -12(%ebp), %eax
movl %eax, 4(%esp)
movl -8(%ebp), %eax
movl %eax, (%esp)
call foo

Before call to foo(), the caller(main() here) first setup stack frame for the call. The subl instruction means that caller reserves 24 * 4 bytes for local variables, temporary objects and return value. Note that ebp is now the base pointer to the stack frame of main(). Local variables i and j are pushed to the stack. The 4 instructions just before call instruction tell us "Passing By Value", even the parameter is of pointer type: only the pointer itself can be put to the stack, not the object it refers to. Any modification of the pointer itself on the stack will not reflect on parameters. However, you can modify the object it refers to by deferencing it. E.g.

int foo(int* a, int* b)
{
    *a = *b;
    return *a + *b;
}

The assembly code is:

pushl %ebp
    movl %esp, %ebp
    movl 12(%ebp), %eax
    movl (%eax), %edx
    movl 8(%ebp), %eax
    movl %edx, (%eax)
    movl 8(%ebp), %eax
    movl (%eax), %edx
    movl 12(%ebp), %eax
    movl (%eax), %eax
    leal (%edx,%eax), %eax
    popl %ebp
    ret

We will see modification to a and b inside foo() (12(%ebp) and 8(%ebp)) will not modify local i and j in main() -- They are copies.
Value Return
As mentioned in first section "stack", how value is returned is quite clear now.

NOTE: All the assembly code is generated by gcc, in AT&T style.
Copyleft (C) 2007-2009 raof01.
本文可以用于除商业外的所有用途。此处“用途”包括(但不限于)拷贝/翻译(部分或全部),不包括根据本文描述来产生代码及思想。若用于非商业,请保留此权利声明,并标明文章原始地址和作者信息;若要用于商业,请与作者联系(raof01@gmail.com),否则作者将使用法律来保证权利。


阅读(1978) | 评论(5) | 转发(0) |
给主人留下些什么吧!~~

fera2009-10-27 10:35:47

Having known this, we can write a buffer overflow program: int injection(int i) { cout << "injection" << " " << i << endl; return 0; } int main() { int a; *(&a + 2) = (int)injection; return 0; }

chinaunix网友2009-06-12 13:21:47

he masm you qu bie ma?

fera2009-06-05 17:09:02

Smart guys will now know how va_list works...

fera2009-05-25 14:25:39

I'm working on arm platform. When looking at the asm code generated by arm compiler from same c code mentioned in this essay, i found it's quite hard for me. I have to learn some asm knowledge on arm, otherwise I cannot understand some source code from work. Tough job...

chinaunix网友2009-05-22 12:48:36

value is returned by eax.