Chinaunix首页 | 论坛 | 博客
  • 博客访问: 2358362
  • 博文数量: 527
  • 博客积分: 10343
  • 博客等级: 上将
  • 技术积分: 5565
  • 用 户 组: 普通用户
  • 注册时间: 2005-07-26 23:05
文章分类

全部博文(527)

文章存档

2014年(4)

2012年(13)

2011年(19)

2010年(91)

2009年(136)

2008年(142)

2007年(80)

2006年(29)

2005年(13)

我的朋友

分类: 架构设计与优化

2014-02-06 23:34:22

The Art of Picking Intel Registers
选取Intel寄存器的艺术


I wrote this article for an online magazine called Scene Zine. Scene Zine caters to the Demo Scene, which is an digital art community dedicated to pushing the limits of computers through a mix of music, art, and computer programming. A particular category of demoscene productions, 4K intros, focus on the final production's raw file size. The goal is to put as much high-quality music, graphics, and animation as possible into only 4096 bytes. Doing this requires highly-specialized size optimization techniques, since 4096 bytes is less space than two pages of typed text or a true-color Windows XP icon. This article discusses some of these techniques.
我为在线杂志Scene Zine写下这篇文章. Scene Zine是为了投合Demo Scene的偏好
,后者是一个致力于通过综合音乐/艺术和编程来探索计算机的极限的数字艺术社区
.该社区有一个特别的项目叫4K秀, 专注于最小化最终生成的文件尺寸. 目标是尽
可能多地把高品质的音乐/图形和动画挤进4096字节中. 干这活需要高度专门化的
优化代码大小的技术, 因为4906字节连放下2页文字/或是一个真彩色的Windows XP
图标都不够.




Some people have commented that they want to see more expert programming articles in Scene Zine. To remedy the situation, this article is for all assembly language programmers out there. It discusses the fine art of picking which registers to use in your code. This information should simplify your coding and help you write smaller routines.
一些人说他们想在Scene Zine上看到更多专业的编程方面的文章. 针对这个情况,
本文是专为所有的汇编语言程序员而写的. 其中讨论了在代码中该选用哪个寄存器
的精微的艺术. 它能帮你简化代码并让你写出更小的程序.


When the engineers at Intel designed the original 8086 processor, they had a special purpose in mind for each register. As they designed the instruction set, they created many optimizations and special instructions based on the function they expected each register to perform. Using registers according to Intel's original plan allows the code to take full advantage of these optimizations. Unfortunately, this seems to be a lost art. Few coders are aware of Intel's overall design, and most compilers are too the simplistic or focused on execution speed to use the registers properly. Understanding how the registers and instruction set fit together, however, is an important step on the road to effortless size-coding.
当Intel的工程师们设计最初的8086处理器时, 在他们的脑子里每一个寄存器都有
其专门用途. 当他们设计指令集时, 他们针对每个寄存器的期望用途进行了大量的
优化. 按Intel的设计原意使用这些寄存器会让你的代码从这些优化中获利. 不幸
的是, 看起来这是一门失传的艺术. 只有很少的程序员会在意Intel的整体设计,
多数的编译器(译: 在代码生成方面)要么是太过极简主义, 要么是只注重执行速度
, 以致未能对这些寄存器物尽其用. 然而, 正确理解这些寄存器与指令集之间相辅
相成的共生关系, 是通向代码极小主义的关键一步.


Using the registers consistently has other advantages besides size optimization. Like using good variable names, using consistent registers makes code more readable. When they are used properly, the registers have meanings almost as clear as the loop counter, i, in higher-level languages. In fact, I occasionally name my variables in C after x86 registers because the register names are so descriptive. With proper register use, x86 assembler can be almost as self-documenting as a high-level language.
能贯彻一致地正确使用这些寄存器还有代码大小优化之外的额外好处. 就象使用好
的变量名能提高可读性一样. 正确地使用寄存器时, 它们的含义就几乎象高级语言
中的循环变量i一样清晰. 实际上, 我有时候在C代码中就用x86寄存器的名字作为
变量名, 因为这些寄存器的名字具有如此抢眼的表现力. 只要正确地使用这些寄存
器, x86汇编代码也可以象高级语言一样自文档化.


Another benefit that consistent register use brings is better compression. In productions which use a compressor to pack the final build, such as 4K intros, creating more redundant code leads to smaller packed sizes. When code uses registers consistently, the same instruction sequences begin to appear over and over. This, in turn, improves the compression ratio.
一致地使用寄存器的另一项好处是它带来更好的压缩率. 在最终代码要压缩后打包
的情况下, 比如4K秀项目中, 指令中有更多的重复就意味着压缩后更小的文件. 代
码中使用寄存器的一致性会让最终的指令中重复地出现相同的指令序列. 结果就是
压缩率提高了.


As a review, all x86-family CPU's have 8 general-purpose registers. The registers are 32 bits wide, although 16-bit versions are also accessible with a special one-byte instruction prefix. In 16-bit mode, the situation is reversed. The lower 16 bits are accessible by default, and the full registers are accessible only with a prefix byte.
总的来说, x86家族的CPU有8个通用寄存器. 这些寄存器有32位宽, 不过通过一个
特殊的一字节指令前辍也可以访问其16位的版本. 下反之亦然, 在16位宽模式下默
认是访问其低16位, 对完整的32位版本寄存器的访问只能通过一个指令字前辍来打
开.


Each register name is really an acronym. This is true even for the "alphabetical" registers EAX, EBX, ECX, and EDX. The following list shows the register names and their meanings:
每个寄存器的名字其实源于首字母缩写. 即使对那些看起来象是按字母表顺序命名的
EAX, EBX, ECX和EDX寄存器也是如此, 下面的列表是这8个寄存器和他们的含义:


EAX - Accumulator Register
EBX - Base Register
ECX - Counter Register
EDX - Data Register
ESI - Source Index
EDI - Destination Index
EBP - Base Pointer
ESP - Stack Pointer
EAX - 累加寄存器
EBX - 基(址)寄存器
ECX - 计数器寄存器
EDX - 数据寄存器
ESI - 源索引寄存器
EDI - 目标索引寄存器
EBP - 基址指针
ESP - 栈指针


In addition to the full-sized general registers, the x86 processor also has eight byte-sized registers. Since these registers map directly into EAX, EBX, ECX, and EDX, most people view them as parts of the larger registers. From the instruction set point of view, however, the 8-bit registers are separate entities. For example, the CL and CH registers share none of the ECX register's useful properties. Except for AL and AH, none of the 8-bit registers have any special significance in the instruction set, so this article does not mention them.
在完整大小的通用寄存器之外, x86处理器还有8个字节大小的寄存器. 因为这些寄
存器直接映射到EAX, EBX, ECX和EDX, 很多人把它们视为是更大的寄存器的一个部
分. 从指令集的角度看, 这些8位宽的寄存器是独立的实体. 比如CL和CH寄存器并
不共享ECX寄存器的一些有用的属性. 除了AL和AH, 其它的8位宽寄存器在指令集中
并没有特别的重要性, 所以本文不会提及它们.


EAX: The Accumulator
EAX: 累加器


There are three major processor architectures: register, stack, and accumulator. In a register architecture, operations such as addition or subtraction can occur between any two arbitrary registers. In a stack architecture, operations occur between the top of the stack and other items on the stack. In an accumulator architecture, the processor has single calculation register called the accumulator. All calculations occur in the accumulator, and the other registers act as simple data storage locations.
有3种主要的处理器设计: 基于寄存器的, 基于栈的和基于累加器的. 在基于寄存
器的处理器设计中, 象加和减这样的操作可以作用于任意两个寄存器. 在基于栈的
处理器设计中, 操作数只能是位于栈顶的数据和栈上的其它数据. 在基于累加器的
处理器设计中, 处理器有一个专门用于计算的寄存器叫做累加器. 所有的计算都作
用在它身上, 其它的寄存器只是一个简单的用来存储数据的地方.


Obviously, the x86 processor does not have an accumulator architecture. It does, however, have an accumulator-like register: EAX / AL. Although most calculations can occur between any two registers, the instruction set gives the accumulator special preference as a calculation register. For example, all nine basic operations (ADD, ADC, AND, CMP, OR, SBB, SUB, TEST, and XOR) have special one-byte opcodes for operations between the accumulator and a constant. Specialized operations, such as multiplication, division, sign extension, and BCD correction can only occur in the accumulator.
显然, x86处理器并不是一个基于累加器的设计. 但是它确实有一个类似于累加器
角色的寄存器: EAX / AL. 虽然多数的计算都可以作用于任意的两个寄存器, 指令
集对累加器寄存器却有特殊的偏重. 比如, 所有的9个基本操作(ADD, ADC, AND,
CMP, OR, SBB, SUB, TEST和XOR)当两个操作数是累加器和常数时,其指令的操作码
都是仅有一字节长的. 象乘/除/带符号位扩展以及BCD校正这些操作则只能作用于
累加器上.


Since most calculations occur in the accumulator, the x86 architecture contains many optimized instructions for moving data in and out of this register. To start, the processor has sixteen byte-sized XCHG opcodes for swapping data between the accumulator and any other register. These aren't terribly useful, but they show how strongly the Intel engineers preferred the accumulator over the other registers. For them, it was better to swap data into the accumulator to than to work with it where it was. Other instructions that move data in and out of the accumulator are LODS, STOS, IN, OUT, INS, OUTS, SCAS, and XLAT. Finally, the MOV instruction has a special one-byte opcode for moving data into the accumulator from a constant memory location.
因为多数算术运算只能在累加器上做, x86家族包含了很多优化的指令对这个寄存
器做数据的移入移出. 首先, 处理器有16个一字节的xchg操作码在累加器和其它寄
存之间互换数据. 这虽然并不是特别有用, 却充分说明了相比于其它寄存器Intel
的工程师是何等偏爱累加器了. 对他们来说, 把数据挪到累加器中再做操作比就地
操作数据要好多了. 其它一些会把数据移入移出累加器的指令还有LODS, STOS,
IN, OUT, INS, OUTS, SCAS和XLAT. 最后, MOV指令有一个特别的一字节操作码版
本可以把数据从一个常量内存地址移到累加器中.


In your code, try to perform as much work in the accumulator as possible. As you will see, the remaining seven general-purpose registers exist primarily to support the calculation occurring in the accumulator.
在你的代码中, 要尽量把工作放在累加器中去做. 如你所见, 其它的7个通用寄存
器主要是为了支持累加器上的计算.


EDX: The Data Register
EDX: 数据寄存器


Of the seven remaining general-purpose registers, the data register, EDX, is most closely tied to the accumulator. Instructions that deal with over sized data items, such as multiplication, division, CWD, and CDQ, store the most significant bits in the data register and the least significant bits in the accumulator. In a sense, the data register is the 64-bit extension of the accumulator. The data register also plays a part in IO instructions. In this case, the accumulator holds the data to read or write from the port, and the data register holds the port address.
在其它的7个通用寄存器中, 数据寄存器EDX是最接近于累加器的. 那些需要处理更
多数据的指令, 比如乘/除/CWD/CDQ, 都把权重高的数据部分存于数据寄存器中,
权重低的数据存于累加器中. 感觉上, 数据寄存器就象是64位宽版本的累加器. 数
据寄存器也在IO指令中扮演一定的角色. 这时累加器中放的是要从端口读或写的数
据, 而数据寄存器中放的是端口的地址.


In your code, the data register is most useful for storing data related to the accumulator's calculation. In my experience, most calculations need only these two registers for storage if they are written properly.
在你的代码中, 数据寄存器主要用于存储参与累加器中计算的数据. 我的经验是,
多数的计算只需要这两个寄存器来存放数据.


ECX: The Count Register
ECX: 计数器寄存器


The count register, ECX, is the x86 equivalent of the ubiquitous variable i. Every counting-related instruction in the x86 uses ECX. The most obvious counting instructions are LOOP, LOOPZ, and LOOPNZ. Another counter-based instruction is JCXZ, which, as the name implies, jumps when the counter is 0. The count register also appears in some bit-shift operations, where it holds the number of shifts to perform. Finally, the count register controls the string instructions through the REP, REPE, and REPNE prefixes. In this case, the count register determines the maximum number of times the operation will repeat.
计数器寄存器ECX是无处不在的变量i的x86版. 每个需要计数的指令都会用到ECX.
最明显的例子是LOOP, LOOPZ和LOOPNZ. 另一个基于计数器的指令是JCXZ, 就象它
的名字暗示的那样, 它在计数器寄存器为0时进行跳转. 计数器寄存器也出现在一
些移位指令中, 此时它的内容是要进行移动的位数. 最后, 计数器寄存器通过
REP/REPE/REPNE前辍控制字串指令. 这时它控制着这些操作最多要重复的次数.


Particularly in demos, most calculations occur in a loop. In these situations, ECX is the logical choice for the loop counter, since no other register has so many branching operations built around it. The only problem is that this register counts downward instead of up as in high level languages. Designing a downward-counting is not hard, however, so this is only a minor difficulty.
尤其是在演示中, 多数计算都在循环中进行. 此时ECX就是循环计数的自然之选,
因为没有任何一个寄存象它一样有众多的分支指令是因它而生的. 唯一的问题是这
些寄存器是向下计数而不是象高级语言中向上计数. 代码写成向下计数并不是难事
, 不过这仍是一个小问题.


EDI: The Destination Index
EDI: 目标索引寄存器


Every loop that generates data must store the result in memory, and doing so requires a moving pointer. The destination index, EDI, is that pointer. The destination index holds the implied write address of all string operations. The most useful string instruction, remarkably enough, is the seldom-used STOS. STOS copies data from the accumulator into memory and increments the destination index. This one-byte instruction is perfect, since the final result of any calculation should be in the accumulator anyhow, and storing results in a moving memory address is a common task.
每一个会产生数据的循环都要在内存中保存其结果, 要做到这一点就需要一个会移
动的指针. 目标索引寄存器EDI就是这个指针. 它存放着隐含的写操作的目标地址.
最有用的串指令, 最显眼但很少用到的STOS. STOS把累加器中的数据复制到内存并
且递增索引寄存器的值. 这个一字节指令堪称完美, 因为任何计算的最终结果就应
该放在累加器中, 而把结果保存到一个会移动的内存地址又是如此常见的要求.


Many coders treat the destination index as no more than extra storage space. This is a mistake. All routines must store data, and some register must serve as the storage pointer. Since the destination index is designed for this job, using it for extra storage is a waste. Use the stack or some other register for storage, and use EDI as your global write pointer.
很多程序员把目标索引寄存器仅仅看作是另一个可以存放数据的地方. 这是一个误
会. 所有的过程都要存数据, 总有寄存器要用作存储指针. 因为目标索引寄存器就
是为此而设计的, 所以把它作为另一个存储空间实在是屈才了. 用栈或其它寄存器
作存储, 而把EDI用作你全局的写指针.


ESI: The Source Index
ESI: 源索引寄存器


The source index, ESI, has the same properties as the destination index. The only difference is that the source index is for reading instead of writing. Although all data-processing routines write, not all read, so the source index is not as universally useful. When the time comes to use it, however, the source index is just as powerful as the destination index, and has the same type of instructions.
源索引寄存器ESI跟目标索引寄存器有同样的特质. 唯一的不同是它用于读数据而
不是写. 由于所有的数据处理过程必然要写但不一定都要读. 所以源索引寄存器并
不处处会用到. 不过, 当需要用它时, 源索引寄存器就跟目标索引寄存器一样强大
, 它也跟目标索引寄存器一样用于同一类的指令.


In situations where your code does not read any sort of data, of course, using the source index for convenient storage space is acceptable.
当然, 在你的代码中确实不需要读取数据时, 把源索引寄存器用作一个可以存储数
据的空间是完全可以接受的.


ESP and EBP: The Stack Pointer and the Base Pointer
ESP and EBP: 栈指令和基址指针


Of the eight general purpose registers, only the stack pointer, ESP, and the base pointer, EBP, are widely used for their original purpose. These two registers are the heart of the x86 function-call mechanism. When a block of code calls a function, it pushes the parameters and the return address on the stack. Once inside, function sets the base pointer equal to the stack pointer and then places its own internal variables on the stack. From that point on, the function refers to its parameters and variables relative to the base pointer rather than the stack pointer. Why not the stack pointer? For some reason, the stack pointer lousy addressing modes. In 16-bit mode, it cannot be a square-bracket memory offset at all. In 32-bit mode, it can be appear in square brackets only by adding an expensive SIB byte to the opcode.
8个通用寄存器中, 只有栈指针ESP和基址指令EBP被广泛用于它们的最初设计目的.
这两个寄存器是x86函数调用机制的核心. 当一块代码调用了一个函数时, 它把参
数和返回地址压入到栈上. 一旦控制流程进入到函数内部, 函数会把基址指针设置
为栈指针的当前值, 然后把函数内部的局部变量放到栈上. 从此时起, 函数内部需
要访问它的参数和局部变量时, 都使用相对于基址指针而不是栈指针的地址. 为什
么不使用栈指针? 确实事出有因, 栈指针不喜欢寻址模式. 在16位模式下, 它根本
不能通过方括号语法用作一个内存地址. 在32位模式, 它倒是可以出现在方括号中
用作地址了, 但却要额外增加一个字节到操作码中.


In your code, there is never a reason to use the stack pointer for anything other than the stack. The base pointer, however, is up for grabs. If your routines pass parameters by register instead of by stack (they should), there is no reason to copy the stack pointer into the base pointer. The base pointer becomes a free register for whatever you need.
在你的代码里, 除了操作栈之外, 不应有别的情况需要使用栈指针了. 基址指针寄
存器, 却是人人争用的香饽饽. 如果你的函数通过寄存器而不是通过栈来传递参数
(就应该这样), 那就没必要把栈指针复制到基址指针寄存器中了. 这样基址指针寄
存器就空出来了, 你可以用它作任何事.


EBX: The Base Register
EBX: 基址寄存器


In 16-bit mode, the base register, EBX, acts as a general-purpose pointer. Besides the specialized ESI, EDI, and EBP registers, it is the only general-purpose register that can appear in a square-bracket memory access (For example, MOV [BX], AX). In the 32-bit world, however, any register may serve as a memory offset, so the base register is no longer special.
在16位模式下, 基址寄存器EBX可以用作一个通用寄存器. 除了ESI, EDI和EBP之外
, 它是唯一一个可以出现在方括号中用作内存访问的寄存器了(比如, MOV [BX],
AX). 在32位的世界里, 任何寄存器都可以用作内存地址. 所以在这里EBX不再特殊
.


The base register gets its name from the XLAT instruction. XLAT looks up a value in a table using AL as the index and EBX as the base. XLAT is equivalent to MOV AL, [BX+AL], which is sometimes useful if you need to replace one 8-bit value with another from a table (Think of color look-up).
基址寄存器得名于XLAT指令. 该指令以AL作为索引, 以EBX作为基地址在一个表中
查找一个值. XLAT指令等价于MOV AL, [BX+AL], 它在你需要把一个8位的值替换为
查找表中的另一个值时是有用的.(想一想颜色查找表).


So, of all the general-purpose registers, EBX is the only register without an important dedicated purpose. It is a good place to store an extra pointer or calculation step, but not much more.
所以, 在所有的通用寄存器中, EBX是唯一一个没有突出的专属用途的. 它是一个
用来存放指针或中间计算结果的好去处, 不过也仅限于此--不应该有更多其它用途了.


Conclusion
结论


The eight general-purpose registers in the x86 processor family each have a unique purpose. Each register has special instructions and opcodes which make fulfilling this purpose more convenient or efficient. The registers and their uses are shown briefly below:
x86处理器家族中8个通用寄存器中的每一个都有一个独一无二的设计目的. 每个寄
存器都有相关的特殊指令和操作码让指令更加方便或高效. 以下简单列出这些寄存
器和它们的应用:


EAX - All major calculations take place in EAX, making it similar to a dedicated accumulator register.
EAX - 所有主要的计算应该发生在EAX中, 这让它就象是一个专门的累加器.
EDX - The data register is the an extension to the accumulator. It is most useful for storing data related to the accumulator's current calculation.
EDX - 数据寄存器是累加器的扩展. 它最有用的地方在于存储用于累加器计算的相
关数据.
ECX - Like the variable i in high-level languages, the count register is the universal loop counter.
ECX - 就象高级语言中的变量i, 计数寄存器是无处不在的循环计数.
EDI - Every loop must store its result somewhere, and the destination index points to that place. With a single-byte STOS instruction to write data out of the accumulator, this register makes data operations much more size-efficient.
EDI - 每个循环总要把它的结果存个地方, 目标索引正是指向这个地方. 通过一个
一字节的STOS指令来写出累加器中的数据, 这个寄存器使得数据操作更加的代码极
小化了.
ESI - In loops that process data, the source index holds the location of the input data stream. Like the destination index, EDI has a convenient one-byte instruction for loading data out of memory into the accumulator.
ESI - 在处理数据的循环中, 源索引放的是输入数据流的地址. 象目标索引一样,
ESI有一个方便的一字节长指令来把数据从内存中加载到累加器中.
ESP - ESP is the sacred stack pointer. With the important PUSH, POP, CALL, and RET instructions requiring it's value, there is never a good reason to use the stack pointer for anything else.
ESP - ESP是神圣不可侵犯的栈指针. 重要的PUSH, POP, CALL和RET指令都全赖它
的正确性, 所以再不应有别的理由来征用它.
EBP - In functions that store parameters or variables on the stack, the base pointer holds the location of the current stack frame. In other situations, however, EBP is a free data-storage register.
EBP - 对于需要在栈上保存参数或局部变量的函数, 基址指针的内容是当前的栈帧
. 在别的情况下, EBP是一个自由的数据存储寄存器.
EBX - In 16-bit mode, the base register was useful as a pointer. Now it is completely free for extra storage space.
EBX - 在16位模式下, 基址寄存器就象指针一样有用. 现在(译: 指在32位模式下)
它完全可用作一个自由的存储空间.
As an example of how these registers fit together, here is an outline of a typical routine:
下面是一个典型的代码段, 展示了这些寄存器如何相辅相成地彼此支撑:


                mov     esi, source_address
                mov     edi, destination_address
                mov     ecx, loop_count
my_loop:        lodsd


                ;Do some calculations with eax here.


                stosd
                loop    my_loop


In this example, ECX is the loop counter, ESI points to the input data, and EDI points to the output data. Some calculations, such as a blur, filter, or perhaps a color look-up occur in the loop using EAX as a variable. This example is a bit simplistic, but hopefully it shows the general idea. A real routine would probably deal with much more complicated data than DWORD's, and would probably involve a bunch of floating-point as well.
在这个例子中, ECX是循环计数器, ESI指向输入数据, EDI指向输出数据. 一些循
环体中的计算比如模糊化, 过滤或者颜色查找都用EAX作为变量. 这个例子有点过
于简单, 不过希望它能展示一些通用的用法. 一段实际点的代码很可能要处理远比
DWORD复杂的数据, 也可能会涉及一些浮点数操作.


In conclusion, using the registers as Intel intended has several advantages. In the fist case, it allows your code to take advantage of many optimizations and special instructions. It also makes the code more readable, since registers perform predictable functions. Finally, using the registers consistently leads to better compression by promoting more repetitive instruction sequences.
结论是, 按Intel的意图来使用寄存器有几个好处. 首先, 它让你的代码能充分
利用很多的实现优化和特殊指令. 它也让你的代码可读性更强, 因为寄存器都做
了人们预期它们会做的事. 最后, 一致地使用寄存器也会产生更好的压缩比.

阅读(6407) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~