Hacking The Art of Exploitation-qgx2009-ChinaUnix博客

#include int main() { int i; for(i=0; i < 10; i++) // Loop 10 times. { puts("Hello, world!\n"); // put the string to the output. } return 0; // Tell OS the program exited without errors. }

reader@hacking:~/booksrc $ gcc firstprog.c reader@hacking:~/booksrc $ ls -l a.out -rwxr-xr-x 1 reader reader 6621 2007-09-06 22:16 a.out reader@hacking:~/booksrc $ ./a.out Hello, world! Hello, world! Hello, world! Hello, world! Hello, world! Hello, world! Hello, world! Hello, world! Hello, world! Hello, world! reader@hacking:~/booksrc $

:
8048374:       55                      push   %ebp
8048375:       89 e5                   mov    %esp,%ebp
8048377:       83 ec 08                sub    $0x8,%esp
804837a:       83 e4 f0                and    $0xfffffff0,%esp
804837d:       b8 00 00 00 00          mov    $0x0,%eax
8048382:       29 c4                   sub    %eax,%esp
8048384:       c7 45 fc 00 00 00 00    movl   $0x0,0xfffffffc(%ebp)
804838b:       83 7d fc 09             cmpl   $0x9,0xfffffffc(%ebp)
804838f:       7e 02                   jle    8048393
8048391:       eb 13                   jmp    80483a6
8048393:       c7 04 24 84 84 04 08    movl   $0x8048484,(%esp)
804839a:       e8 01 ff ff ff          call   80482a0
804839f:       8d 45 fc                lea    0xfffffffc(%ebp),%eax
80483a2:       ff 00                   incl   (%eax)
80483a4:       eb e5                   jmp    804838b
80483a6:       c9                      leave
80483a7:       c3                      ret
80483a8:       90                      nop
80483a9:       90                      nop
80483aa:       90                      nop

The objdump program will spit out far too many lines of output to sensibly examine,
objdump会输出许多行代码，这些代码都是需要认证检查的。
so the output is piped into grep with the command-line option to only display 20 lines after the regular expression main.:.
objdump的输出通过管道传递给grep命令，从而可以只显示main标志后的20行代码。
Each byte is represented in hexadecimal notation,
objdump使用16进制表示法。
which is a base-16 numbering system. The numbering system you are most familiar with uses a base-10 system, since at 10 you need to add an extra symbol. Hexadecimal uses 0 through 9 to represent 0 through 9, but it also uses A through F to represent the values 10 through 15. This is a convenient notation since a byte contains 8 bits, each of which can be either true or false. This means a byte has 256 (28) possible values, so each byte can be described with 2 hexadecimal digits.

The hexadecimal numbers—starting with 0x8048374 on the far left—are memory addresses.
在程序作面的以0x8048374开始的使内存地址。机器语言指令组成的可执行文件需要放到内存中才能执行。作面的每一个数字都表示机器指令存放在内存中的位置。
The bits of the machine language instructions must be put somewhere, and this somewhere is called memory. Memory is just a collection of bytes of temporary storage space that are numbered with addresses.

The hexadecimal bytes in the middle of the listing above are the machine language instructions for the x86 processor.
上面的代码中，中间的那一列是机器指令。他们是由0和1组成的，只有cpu才能understand。Of course, these hexadecimal values are only representations of the bytes of binary 1s and 0s the CPU can understand.
But since 0101010110001001111001011000001111101100111100001 … isn't very useful to anything other than the processor, the machine code is displayed as hexadecimal bytes and each instruction is put on its own line, like splitting a paragraph into sentences.

Come to think of it, the hexadecimal bytes really aren't very useful themselves, either—that's where assembly language comes in.
考虑一下，16进制的机器指令实在是用处不大，所以汇编语言登场了。
The instructions on the far right are in assembly language. Assembly language is really just a collection of mnemonics for the corresponding machine language instructions. The instruction ret is far easier to remember and make sense of than 0xc3 or 11000011.
在右面一列是汇编语言。汇编语言知识相应的机器指令的助记符，比如记住指令ret就比记住0xc3 or 11000011容易。
Unlike C and other compiled languages, assembly language instructions have a direct one-to-one relationship with their corresponding machine language instructions.
不像c或其他编译型语言，汇编语言指令是直接与机器指令一对一的。
This means that since every processor architecture has different machine language instructions, each also has a different form of assembly language.
这说明不同体系结构的处理器有不同的机器指令，它们同样有不同的汇编指令集。
Assembly is just a way for programmers to represent the machine language instructions that are given to the processor.
汇编只是提供给编程人员的代表机器指令的一种方式。
大部分人坚持使用两种主要的汇编语言类型：AT&T syntax and Intel syntax。
Exactly how these machine language instructions are represented is simply a matter of convention and preference. While you can theoretically create your own x86 assembly language syntax, most people stick with one of the two main types: AT&T syntax and Intel syntax.
The assembly shown in the output on Section 0x251 is AT&T syntax, as just about all of Linux's disassembly tools use this syntax by default.
上面显示的汇编是AT&T语法格式的。默认所有的linux反汇编工具都使用这个格式。
It's easy to recognize AT&T syntax by the cacophony of % and $ symbols prefixing everything (take a look again at the example on Section 0x251).
AT&T 语法在所有符号都会使用 % and $ ，所以比较容易识别。
The same code can be shown in Intel syntax by providing an additional command-line option, -M intel, to objdump, as shown in the output below.
当然我们也可以查看intel语法的汇编，只需要在objdump后添加 -M选项。

0x250. Getting Your Hands Dirty