Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1268997
  • 博文数量: 389
  • 博客积分: 2874
  • 博客等级: 少校
  • 技术积分: 3577
  • 用 户 组: 普通用户
  • 注册时间: 2009-10-24 10:34
文章分类

全部博文(389)

文章存档

2020年(2)

2018年(39)

2017年(27)

2016年(3)

2015年(55)

2014年(92)

2013年(54)

2012年(53)

2011年(64)

分类: 网络与安全

2013-06-28 11:33:39

0x250. Getting Your Hands Dirty

2.5.1.1. firstprog.c
#include 

int main()
{
  int i;
  for(i=0; i < 10; i++)       // Loop 10 times.
  {
    puts("Hello, world!\n");  // put the string to the output.
  }
  return 0;                   // Tell OS the program exited without errors.
}
reader@hacking:~/booksrc $ gcc firstprog.c
reader@hacking:~/booksrc $ ls -l a.out
-rwxr-xr-x 1 reader reader 6621 2007-09-06 22:16 a.out
reader@hacking:~/booksrc $ ./a.out
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello, world!
reader@hacking:~/booksrc $
Okay, this has all been stuff you would learn in an elementary programming class—basic,
Okay,本次我要分享的东西是比较基本的,但是也是必须了解的。

but essential. Most introductory programming classes just teach how to read and write C.
大部分介绍编程的课程只会教大家如何读或写c代码。
Don't get me wrong, being fluent in C is very useful and is enough to make you a decent programmer, but it's only a piece of the bigger picture.
当然能够熟练的使用c语言是非常有用的,但熟练的使用C语言只是计算机人员需要掌握的知识(我们称所有需要掌握的知识为 “全景")的一部分,
Most programmers learn the language from the top down and never see the big picture.
大部分编程人员学习编程语言知识停留在使用的层面上,而从来没有关注过其他部分。
Hackers get their edge from knowing how all the pieces interact within this bigger picture.
所以如果我们了解了其他部分,我们在编程领域就会占尽优势。
To see the bigger picture in the realm of programming, simply realize that C code is meant to be compiled.
要想在编程领域看到全景,首先我们需要认识到C代码是用来编译的。
The code can't actually do anything until it's compiled into an executable binary file.
源代码实际上什么都不能干,除非编译成可执行的二进制文件。
Thinking of C-source as a program is a common misconception that is exploited by hackers every day.
认为c代码就是程序是一种误解。
The binary a.out's instructions are written in machine language, an elementary language the CPU can understand.
二进制文件被介绍是由机器语言编写的,机器语言是一种只有cpu才能理解的基础语言。
Compilers are designed to translate the language of C code into machine language for a variety of processor architectures.
编译器被设计用来将c语言转换为各种处理器架构的机器语言。
In this case, the processor is in a family that uses the x86 architecture.
我们使用的是x86架构的处理器。还有很多其他架构的处理器,比如Sparc架构和PowerPC架构。
There are also Sparc processor architectures (used in Sun Workstations) and the PowerPC processor architecture (used in pre-Intel Macs). Each architecture has a different machine language, so the compiler acts as a middle ground—translating C code into machine language for the target architecture.
每种架构都有不同的机器语言,所以编译器就扮演了将c代码转换为各种机器语言的脚色。

As long as the compiled program works, the average programmer is only concerned with source code.
当编译器工作的时候,大部分编程人员只关心代码。
But a hacker realizes that the compiled program is what actually gets executed out in the real world.
但是我们需要意识到编译器执行后为我们这个世界带来了什么。
With a better understanding of how the CPU operates, a hacker can manipulate the programs that run on it.
当我们对cpu的操作有了更好的了解,我们就可以操作在cpu上运行的程序。
We have seen the source code for our first program and compiled it into an executable binary for the x86 architecture.
我们看到前面编写的代码和编译后得到的x86体系结构的可执行文件。
But what does this executable binary look like?
但是可执行文件时什么样子的呢?
The GNU development tools include a program called objdump, which can be used to examine compiled binaries. Let's start by looking at the machine code the main() function was translated into.
GNU开发工具箱包含一个工具叫做objdump,该工具可以用来检查可执行文件。

#include

int main()
{
  int i;
  for(i=0; i < 10; i++)       // Loop 10 times.
  {
    puts("Hello, world!\n");  // put the string to the output.
  }
  return 0;                   // Tell OS the program exited without errors.
}



reader@hacking:~/booksrc $ objdump -D a.out | grep -A20 main.:
08048374
:
 8048374:       55                      push   %ebp
 8048375:       89 e5                   mov    %esp,%ebp
 8048377:       83 ec 08                sub    $0x8,%esp
 804837a:       83 e4 f0                and    $0xfffffff0,%esp
 804837d:       b8 00 00 00 00          mov    $0x0,%eax
 8048382:       29 c4                   sub    %eax,%esp
 8048384:       c7 45 fc 00 00 00 00    movl   $0x0,0xfffffffc(%ebp)
 804838b:       83 7d fc 09             cmpl   $0x9,0xfffffffc(%ebp)
 804838f:       7e 02                   jle    8048393
 8048391:       eb 13                   jmp    80483a6
 8048393:       c7 04 24 84 84 04 08    movl   $0x8048484,(%esp)
 804839a:       e8 01 ff ff ff          call   80482a0
 804839f:       8d 45 fc                lea    0xfffffffc(%ebp),%eax
 80483a2:       ff 00                   incl   (%eax)
 80483a4:       eb e5                   jmp    804838b
 80483a6:       c9                      leave
 80483a7:       c3                      ret
 80483a8:       90                      nop
 80483a9:       90                      nop
 80483aa:       90                      nop

The objdump program will spit out far too many lines of output to sensibly examine,
objdump会输出许多行代码,这些代码都是需要认证检查的。
so the output is piped into grep with the command-line option to only display 20 lines after the regular expression main.:.
objdump的输出通过管道传递给grep命令,从而可以只显示main标志后的20行代码。
Each byte is represented in hexadecimal notation,
objdump使用16进制表示法。
which is a base-16 numbering system. The numbering system you are most familiar with uses a base-10 system, since at 10 you need to add an extra symbol. Hexadecimal uses 0 through 9 to represent 0 through 9, but it also uses A through F to represent the values 10 through 15. This is a convenient notation since a byte contains 8 bits, each of which can be either true or false. This means a byte has 256 (28) possible values, so each byte can be described with 2 hexadecimal digits.

The hexadecimal numbers—starting with 0x8048374 on the far left—are memory addresses.
在程序作面的以0x8048374开始的使内存地址。机器语言指令组成的可执行文件需要放到内存中才能执行。作面的每一个数字都表示机器指令存放在内存中的位置。
The bits of the machine language instructions must be put somewhere, and this somewhere is called memory. Memory is just a collection of bytes of temporary storage space that are numbered with addresses.


The hexadecimal bytes in the middle of the listing above are the machine language instructions for the x86 processor.
上面的代码中,中间的那一列是机器指令。他们是由0和1组成的,只有cpu才能understand。Of course, these hexadecimal values are only representations of the bytes of binary 1s and 0s the CPU can understand.
But since 0101010110001001111001011000001111101100111100001 … isn't very useful to anything other than the processor, the machine code is displayed as hexadecimal bytes and each instruction is put on its own line, like splitting a paragraph into sentences.


Come to think of it, the hexadecimal bytes really aren't very useful themselves, either—that's where assembly language comes in.
考虑一下,16进制的机器指令实在是用处不大,所以汇编语言登场了。
The instructions on the far right are in assembly language. Assembly language is really just a collection of mnemonics for the corresponding machine language instructions. The instruction ret is far easier to remember and make sense of than 0xc3 or 11000011.
在右面一列是汇编语言。汇编语言知识相应的机器指令的助记符,比如记住指令ret就比记住0xc3 or 11000011容易。
Unlike C and other compiled languages, assembly language instructions have a direct one-to-one relationship with their corresponding machine language instructions.
不像c或其他编译型语言,汇编语言指令是直接与机器指令一对一的。
This means that since every processor architecture has different machine language instructions, each also has a different form of assembly language.
这说明不同体系结构的处理器有不同的机器指令,它们同样有不同的汇编指令集。
Assembly is just a way for programmers to represent the machine language instructions that are given to the processor.
汇编只是提供给编程人员的代表机器指令的一种方式。
大部分人坚持使用两种主要的汇编语言类型:AT&T syntax and Intel syntax。
Exactly how these machine language instructions are represented is simply a matter of convention and preference. While you can theoretically create your own x86 assembly language syntax, most people stick with one of the two main types: AT&T syntax and Intel syntax.
The assembly shown in the output on Section 0x251 is AT&T syntax, as just about all of Linux's disassembly tools use this syntax by default.
上面显示的汇编是AT&T语法格式的。默认所有的linux反汇编工具都使用这个格式。
It's easy to recognize AT&T syntax by the cacophony of % and $ symbols prefixing everything (take a look again at the example on Section 0x251).
AT&T 语法在所有符号都会使用 % and $ ,所以比较容易识别。
The same code can be shown in Intel syntax by providing an additional command-line option, -M intel, to objdump, as shown in the output below.
当然我们也可以查看intel语法的汇编,只需要在objdump后添加 -M选项。

阅读(1448) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~