Chinaunix首页 | 论坛 | 博客
  • 博客访问: 59264
  • 博文数量: 27
  • 博客积分: 2000
  • 博客等级: 大尉
  • 技术积分: 300
  • 用 户 组: 普通用户
  • 注册时间: 2009-04-24 17:31
文章分类
文章存档

2011年(1)

2010年(8)

2009年(18)

我的朋友

分类: LINUX

2009-05-05 17:29:47

Compiler, Linker, Loader - An Intro To Relocatable Modules


This example originally came from Dr. Funderlic's course pack for CSC244/246 (Operating Systems) in the Spring 2001. Because of the inability to read it, lack of complete tables, and overall confusing nature, it is reposted here with great care taken to make sure that it is understandable.
(
这个例子最初来自博士Funderlic的课程包CSC244/246(操作系统)在2001年春季。由于无法阅读它,缺乏完整的表格和整体混乱的性质,把它转贴这里非常谨慎,以确保它是可以理解的。)

First, you must understand exactly why it is necessary to be using this idea of relocatability. The simplest explanation is that it is necessary. The technical explanation is as follows: Computers have a bank of memory in which to store large numbers of programs and data. Traditionally, computer memory indexing started at 0 and ran to 655360 (640 KiloBytes). This of course, isn't the first block of memory to be used in a computer, there were others, but this is the earliest example that makes this easier to understand. Modern computers now have much larger blocks of memory, ranging all the way from 0 to somewhere in the vicinity of a GigaByte.
(
首先,您必须准确地理解为什么要采用
relocatability这种想法。最简单的原因是因为需要。技术方面的解释如下:电脑的一砣内存里存放着很多很多的程序和数据(要保证大家各得其所,不能相互地址冲突)。过去,电脑内存的地址从0655360640千字节)。当然这不是一台计算机使用的第一块内存,还有其他内存,但是这是最初级的范例,以便于理解。现代的电脑现在拥有更大的内存,从0到数G的范围。
)

When programs are loaded into this block of memory, there is no telling exactly where the code will go. It's unlikely that it'll get placed at location 0 and more likely that it will end up somewhere near the top of middle depending on the system being used. Because of this uncertainty there is no way we can determine where our storage space, functions within our code, or any other data we might wish to reference. How do we solve this problem you might ask? The answer is relocatable modules.
(
当程序加载进内存,不能确切知道代码会跑到内存什么位置。它不太可能会放在位置
0,而很可能又结束于中间某位置。由于这种不确定性,我们就没办法确定如何引用我们代码里的存储空间,功能函数,或任何其他数据。你可能会问那我们如何解决这个问题呢?答案是:可重定位模块。
)

A relocatable module is one in which every reference to locations within a module are tagged so that they may be updated with the new address once loaded into memory. Before we get into the actual compiler, linker, and loader examples, let's take a quick peek at something easier.
(可重定位模块是这样的,模块内每一个对地址的引用都被打上一个标记,等模块加载到内存里再用新地址替换这些标记。在我们进入实际的编译器,连接器,和加载器的例子之前,让我们来看看一些浅易的例子。
)

Imagine a simple computer with some RAM (it doesn't matter how much). This computer will only run ONE program. It starts executing at address 0 and stops when it runs out of code. Here's a quick example in pseudo assembly (based loosely on 8086 code using MASM) with roughly corresponding C functions listed next to it.
(
假设一台简单的计算机具有一些内存(别管内存有多少)
。这台电脑上只能运行一个程序。它从地址0开始执行,在代码尽头停止运行。这里有一个伪汇编写的快速例子(使用MASM 基于松散的8086代码),右边列出大致对应的C语句。)

         .code             //   No C Code - Start code here
    MAIN:                  //   void main() {
          MOV [X], 1       //    x = 1;
          ADD [X], 20      //    x = x + 20;
                           //   }
          END MAIN         //   This has no C counterpart
                           //   Is the same as - Code ends here
                           //   and execution starts at MAIN
         .data             //   No C Code - Variable declarations
    X    DB        ??      //   int x; // an 1 byte integer


Hopefully this makes some sense. Really, what you have is a variable X created in memory and then a function that utilizes this space to do some elementary calculations. Now, in order to see how this code is run, we'll need to compile it. To do that, we need to compile each line. Rather than use OPCODES(
作业码), we'll just leave the words the way they are for readability. Before I list the compiled code, here's a short run down on the notation: Anything in italics is a constant value and anything in bold is a memory address that refers to(指向) some position in memory. You must also remember that this code will start running at address 0.
(
希望这个例子有点意义。我们
在内存里建立了1变量X,接着函数MAIN使用这个内存变量X做了某些基础计算。为了搞懂代码是怎么运行的,需要“编译”它。我们需要一句一句的逐句进行编译。不用OPCODES,只是为了让这些指令保持可读性。在列出编译代码之前,先说明1下:斜体 表示常量。粗体表示某个内存地址。一定要记住 代码是从地址0开始执行。)

ADDRESS

VALUE

0

MOV

1

6

2

1

3

ADD

4

6

5

20

6

STOARGE FOR X


Notice the way the destinations of the MOV and ADD operations(操作) are all changed to a memory address of 6. The reason for this is that the data segment(.data) for this particular program is appended to the end of the code(as shown in the example). Compilers are nice to the human world by making(翻译生成) all this machine code a layer of the computer(计算机1层面的机器代码) that we shouldn't have to deal with. Because computers operate on spaces in memory everything it is necessary for the program to eventually be broken down into something nearly unreadable so that all operations are performed on a particular piece of memory.

(注意 MOVADD 操作改变的目的地址都是内存地址6(为什么是6呢?)原因是在这个程序里 数据段紧接在代码段末尾。编译器的好处是代替人类翻译生成计算机1层面的机器代码,省得我们亲自出马了。因为电脑操作一切东西都是在内存空间里,程序最终被分解为一些几乎没办法读懂的东西,最终所有操作都是在1段特定的内存上进行。)

Next, we'll look at what happens when we take our same program and expand it to a computer that can have more than one process in memory at a time. When we allow this to happen, we now must find a way to deal with the fact that we are no longer at location zero. In order to remove this burden from the programmer, a loader was invented. A loader is nothing more than a program that takes a compiled and linked (more on that in a minute) program and replaces all the references to memory with their new address. Because each executable would like to run at position 0 of the memory, it uses that number as its base. This has a convenient benefit for the loader. It can now just find a block of memory and then add the starting space to each of the relocatable addresses inside the code. Basically all the individual memory addresses we used before become an offset from the new base address that the program is placed into. Here's a new table that starts memory at position 810 instead of 0:
下一步,我们看看会发生什么事,我们将同样的程序,扩展到一台可以有多个进程同时运行的计算机上。要想这么干,必须想办法来解决一个问题,即我们程序的位置不能再从零开始。为了帮程序员消除这个负担,加载器发明了。加载器只不过是个程序:其获取
1个经过编译和链接(更多的是在一分钟内)的程序,把所有对内存的引用替换为新地址。因为每一个可执行文件都默认从内存位置0开始运行,它使用0作为基地址。这对加载器是一个方便之处。它只要找到一块内存,然后把代码里每一个重定位的地址加上起始地址即可。基本上我们之前的程序里所有的单独的内存地址,与程序被放置的新的基地址之间形成一个(固定)偏移。这里有一张新的表,内存中起始位置810,而不再是0

OLD ADDRESS (Offset)

NEW ADDRESS

VALUE

0

810

MOV

1

811

816

2

812

1

3

813

ADD

4

814

816

5

815

20

6

816

STOARGE FOR X


As you can tell by looking at the table, the following formula holds true: New Address = Base Address + Offset. And of course the offset is given as the original address in our source file. Now things are going to get a little more complicated. We're going to introduce the idea of Modularization, or breaking your code into smaller pieces, such as functions, or files.
(
看上表你可以得出,下面的公式是正确的:新地址 = 基地址 + 偏移量。当然,在我们的源文件里,偏移量是作为初始地址(original address)给出的。现在要稍微再复杂点。我们要引进模块化的想法,或者说把代码分成更小的片段,如多个函数,或多个文件。)



In this part we're going to use the examples given in the course pack and add a few tables that were left out. For simplicity, every operation takes only 1 operand, LOAD will move data from a memory address into a register (the AX or accumulator register) and the STORE operation will move data from the AX register into the given memory space. When using 'real' assembly language, the same thing takes place except that the lengths are different sizes and are not fixed at one.
(
本部分,我们将使用课程里的例子,并补充几个表格。为简单起见,所有操作只处理一个操作数,LOAD 把数据从1个内存地址搬运到1个寄存器里(the AX or accumulator register)STORE 把数据从AX寄存器搬运到1个内存地址处。使用真正的汇编语言也是一样的,只不过指令长度是不固定的,而我们的例子为了简单起见假设指令长度固定为1)

MOULDE 1:
     INTERNAL GLOBAL MAIN(), Y
     EXTERNAL REFERENCE S1()
     INTERNAL REFERENCE X
 
     MAIN:   LOAD     X              0        1
             STORE    Y              2        3
             JUMP     S1             4        5
 
     X:      CONSTANT 1              6
     Y:      STORAGE  1              7

 

In the above example the first three lines are to define what type of data is present in the module. In module 1 it is apparent that the symbol MAIN (which represents a function) and the variable Y are both stored INTERNALLY, and are marked as GLOBAL so that other modules may use them. Line 2 indicates that the function S1 is an external function and should be present in another module (most likely module 2). And the last and final line starts that the variable X is an internal variable and is not to be used by other function (this the lack of the GLOBAL keyword). The numbers to the right of the instructions indicate the offset at which each of these instructions happen. Using this information, we can build the following symbol table:
(
在上面的例子头3行是模块1里的数据类型定义。在模块1里,显然,符号MAIN(代表一个函数)和变量Y都存储在模块内部,并标示为全局属性,使得其他模块可以使用它们。第2行表明函数S1是一个外部函数,应由其他模块提供(这里当然是模块2)。倒数第2行的变量x是一个内部变量,不能被其他函数使用(没有带全局属性的关键字GLOBAL)。指令右边的数字指示说明这些指令每部分的偏移量。利用此信息,我们可以建立下面的符号表:)

SYMBOL      INTERNAL ADDRESS     EXTERNAL REFERENCE   IDENTITY
MAIN()             0                     -            INTERNAL GLOBAL
S1()               5                     ?            EXTERNAL
X                  6                     -            INTERNAL
Y                  7                     -            INTERNAL GLOBAL


This table represents all the symbols (or non-keywords) inside the file, the address at which they can be found inside the file, the address at where they are referenced externally, and where the variable or function was defined. The function S1 is defined outside of the current file, and because the compiler doesn't actually know where the external function is, we leave this part of the symbol table blank for now.
(
此表格代表了文件内的所有符号的信息:文件内可被找到的地址,外部引用的地址,以及变量或函数在什么地方被定义的。函数S1没有在当前文件中定义,因为编译器实际上并不知道这个外部函数在哪儿,我们现在把这个符号表项保留为空白。)

If we build an address table like we did in the first section, we can see where all the data is in relation to the current file (remember that files start at 0 and continue on till they run out of code). For module 1, here's what a sample object module would look like:
(
象第一部分一样,我们建立一个地址表,我们可以看到,所有的数据都是相对于当前文件(记住,文件从
0开始,并继续直到代码结尾处)。对模块1,目标模块的例子将如下所示:)

ADDRESS

VALUE

0

LOAD

1

6

2

STORE

3

7

4

JUMP

5

??

6

STOARGE FOR X

7

STOARGE FOR Y

 



Now we're going to take module two and do the exact same series of steps on it. First resolve it into a symbol table and then write out the object module address table.
(
现在我们要对模块2做完全相同的步骤。首先解析出符号表,然后搞出对象模块地址表。)

MOULDE 2:
    
     INTERNAL GLOBAL S1()
     EXTERNAL REFERENCE Y
     INTERNAL REFERENCE Z
 
     S1:   LOAD     Y              0        1
           ADD      2              2        3
           STORE    Z              4        5
           HALT                    6
 
     Z:    STORAGE  1              7


And here's the symbol table for module #2: (下面是模块2的符号表:)

SYMBOL      INTERNAL ADDRESS     EXTERNAL REFERENCE   IDENTITY
 
S1()               0                     -            INTERNAL GLOBAL
Y                  1                     ?            EXTERNAL
Z                  7                     -            INTERNAL


Again notice that the external reference for the variable Y is a ?. This is because we still don't know anything about module 1. When we introduce the linker, then we'll resolve the question marks.
(
再次注意,外部引用变量Y是? 。这是因为我们仍然对模块1一无所知。当我们介绍到连接器,那么我们会解决这个问号。)

Lastly, the object module (address) table for module number 2.
(
下面是模块2的对象模块地址表:)


 

ADDRESS

VALUE

0

LOAD

1

??

2

ADD

3

2

4

STORE

5

7

6

HALT

7

STOARGE FOR Z

 



Now we need to look at how the two object modules are put together to make one relocatable executable file. This extremely important task is accomplished by the linker. Remember the ?? we had in our tables? These unknown items are what the linker is supposed to resolve. It takes 2 (or more) module files, placing the MAIN function at the beginning of the program (at position 0), and then making sure all references are resolved and the relocatable addresses (the ones in bold) are updated to their new positions. When we perform the link operation, we essentially take both object modules and link them together into one file. This is done almost in a copy and paste operation with a few touchups(
润色/修改) to make sure the addressing is corrected. To make it easier to read, we will list the new module table by addresses in sequence of 2 (remember that our simple machine takes a 1 byte instruction followed by a 1 byte operand).
(
现在我们需要研究如何把两个对象模块捏合在一起,组成一个可重定位的可执行文件。这极为重要的任务是由链接器完成的。记得我们符号表里的
?号吗?这些未知的条目是连接器应当要解决的。连接器摆放2个(或更多)模块文件,把MAIN函数放在程序的开头位置(在位置0),然后确保所有的引用被解析出来,将重定位地址(标黑体字的部分)更新为新的地址。当我们执行连接操作,我们基本上拿来2个目标模块把它们连接在一起,形成一个文件。这几乎就是复制、粘贴和少数修改,以确保地址得到纠正。为了便于阅读,我们将根据两个模块的地址按顺序列出新的对象模块表(请记住,在我们简单的机器上假设1字节指令后跟1字节操作数)。
)

Address

Operator/Data

Operand/Data

Module 1 Starts Here

0

LOAD

6

2

STORE

7

4

JUMP

8

6

STORAGE FOR X

STORAGE FOR Y

Module 2 Starts Here

8

LOAD

7

10

ADD

2

12

STORE

15

14

HALT

STORAGE FOR Z



The first thing you might notice upon looking at the table are the numbers listed in red. These are the numbers that linker had to determine or change in order to link the file into an output module. The first two numbers are left along because the addresses 6 and 7 are still in the same place as when the file was originally created. However, the 8 listed for the jump statement was previously unknown. The linker determined its value by scanning through the symbol tables and the completed object module to find its new addresses. The load instruction's operand was previously unknown as well. It was determined in the same manner that the destination for the jump was. And lastly, the STORE command was previously using the value of 7. If this value were left at that, it is clear from the table that it would be writing into the storage for variable Y rather than for Z, which is what the program called for.
(
第一件事,您可能会注意到的是表中列出的红色数字。这些数字是连接器需要确定或者要修改的,以便把文件连接形成输出模块。头两个数字保持原貌,因为地址67仍然在同一个地方,和该文件最初创建时一样。然而,JUMP后的8是前面所未知的,连接器通过扫描符号表和整个对象模块找到它的新地址(的办法)来确定它的值。LOAD指令的操作数也是前面所未知的。和JUMP目标地址的确定方法是一样的。最后,STORE指令从前使用数值7。如果这个值被保留不变,很明显从表中看到,它将被写入内存变量Y而不是Z中,而程序本来要使用Z)



The last thing to cover is the loader. It's function is exactly the same as in our first example, so I'll let you refer there for the technical workings of it. For the course pack examples, we'll use a starting address of 140 in memory and load our compiled and linked program into memory to be executed. Remember that the values in our last object table are nothing more than the offsets added to our base address. Here's what the resulting table will look like:
(
最后一件事情要谈及的是加载器。它的功能和我们的第一个例子是完全一样的,所以我会让您参考它的技术工作方式。按课程包的例子,我们会使用内存起始地址140,加载我们的已编译、连接后的程序到内存中执行。记住,在我们最后一个对象模块表里的数值仍只不过是偏移量+基地址。由此产生的地址解析表如下所示:)

Address

Operator/Data

Operand/Data

140

LOAD

146

142

STORE

147

144

JUMP

148

146

STORAGE FOR X

STORAGE FOR Y

148

LOAD

147

150

ADD

2

152

STORE

155

154

HALT

STORAGE FOR Z



As you can tell, all the loader did was take the original values and offset them so that they were positioned correctly for use in memory.
(
正如你可以告诉的,加载器的所有工作就是处理原始值和用偏移量加以弥补,使他们能够准确地定位使用内存。)

If you've followed this far and don't have any questions, then you should understand how the compiling, linking, and loading of relocatable modules works!
(
如果您已经看到这儿而没有任何疑问,那么你应该已经理解编译,连接,加载可重定位模块的工作原理了!)



Hopefully you found this useful. Questions, comments, corrections, etc can be addressed to me at: . Thanks for your time.
(
希望您可阅读获益。问题,意见,更正等可以给我:gpneujah@eos.ncsu.edu 。感谢您的时间。)

阅读(764) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~