Chinaunix首页 | 论坛 | 博客
  • 博客访问: 125625
  • 博文数量: 51
  • 博客积分: 3010
  • 博客等级: 中校
  • 技术积分: 380
  • 用 户 组: 普通用户
  • 注册时间: 2009-06-15 16:39
文章分类

全部博文(51)

文章存档

2011年(1)

2010年(1)

2009年(49)

我的朋友

分类: LINUX

2009-06-26 12:35:27


This article is a 'quick-n-dirty' introduction to the AT&T assembly language syntax, as implemented in the GNU Assembleras(1). For the first timer the AT&T syntax may seem a bit confusing, but if you have any kind of assembly language programming background, it's easy to catch up once you have a few rules in mind. I assume you have some familiarity to what is commonly referred to as the INTEL-syntax for assembly language instructions, as described in the x86 manuals. Due to its simplicity, I use the NASM (Netwide Assembler) variant of the INTEL-syntax to cite differences between the formats.

The GNU assembler is a part of the GNU Binary Utilities (binutils), and a back-end to the GNU Compiler Collection. Althoughasis not the preferred assembler for writing reasonably big assembler programs, its a vital part of contemporary Unix-like systems, especially for kernel-level hacking. Often criticised for its cryptic AT&T-style. syntax, it is argued thataswas written with an emphasis on being used as a back-end to GCC, with little concern for "developer-friendliness". If you are an assembler programmer hailing from an INTEL-Syntax background, you'll experience a degree of stifling with regard to code-readability and code-generation. Nevertheless, it must be stated that, many operating systems' code-base depend onasas the assembler for generating low-level code.

The Basic Format

The structure of a program in AT&T-syntax is similar to any other assembler-syntax, consisting of a series of directives, labels, instructions - composed of a mnemonic followed by a maximum of three operands. The most prominent difference in the AT&T-syntax stems from the ordering of the operands.

For example, the general format of a basic data movement instruction in INTEL-syntax is,

mnemonic	destination, source

whereas, in the case of AT&T, the general format is

mnemonic	source, destination

To some (including myself), this format is more intuitive. The following sections describe the types of operands to AT&T assembler instructions for the x86 architecture.

Registers

All register names of the IA-32 architecture must be prefixed by a '%' sign, eg. %al,%bx, %ds, %cr0 etc.

mov	%ax, %bx

The above example is the mov instruction that moves the value from the 16-bit register AX to 16-bit register BX.

Literal Values

All literal values must be prefixed by a '$' sign. For example,


mov $100, %bx
mov $A, %al

The first instruction moves the the value 100 into the register AX and the second one moves the numerical value of the ascii A into the AL register. To make things clearer, note that the below example is not a valid instruction,

mov	%bx,	$100

as it just tries to move the value in register bx to a literal value. It just doesn't make any sense.

Memory Addressing

In the AT&T Syntax, memory is referenced in the following way,

segment-override:signed-offset(base,index,scale)

parts of which can be omitted depending on the address you want.

%es:100(%eax,%ebx,2)

Please note that the offsets and the scale should not be prefixed by '$'. A few more examples with their equivalent NASM-syntax, should make things clearer,

GAS memory operand			NASM memory operand
------------------ -------------------

100 [100]
%es:100 [es:100]
(%eax) [eax]
(%eax,%ebx) [eax+ebx]
(%ecx,%ebx,2) [ecx+ebx*2]
(,%ebx,2) [ebx*2]
-10(%eax) [eax-10]
%ds:-10(%ebp) [ds:ebp-10]
Example instructions,
mov	%ax,	100
mov %eax, -100(%eax)

The first instruction moves the value in register AX into offset 100 of the data segment register (by default), and the second one moves the value in eax register to [eax-100].

Operand Sizes

At times, especially when moving literal values to memory, it becomes neccessary to specify the size-of-transfer or the operand-size. For example the instruction,

mov	$10,	100

only specfies that the value 10 is to be moved to the memory offset 100, but not the transfer size. In NASM this is done by adding the casting keyword byte/word/dword etc. to any of the operands. In AT&T syntax, this is done by adding a suffix - b/w/l - to the instruction. For example,

movb	$10,	%es:(%eax)

moves a byte value 10 to the memory location [ea:eax], whereas,

movl	$10,	%es:(%eax)

moves a long value (dword) 10 to the same place.

A few more examples,

movl	$100, %ebx
pushl %eax
popw %ax

Control Transfer Instructions

The jmp, call, ret, etc., instructions transfer the control from one part of a program to another. They can be classified as control transfers to the same code segment (near) or to different code segments (far). The possible types of branch addressing are - relative offset (label), register, memory operand, and segment-offset pointers.

Relative offsets, are specified using labels, as shown below.

label1:
.
.
jmp label1

Branch addressing usingregisters or memory operandsmust be prefixed by a '*'. To specify a "far" control tranfers, a 'l' must be prefixed, as in 'ljmp', 'lcall', etc. For example,

GAS syntax			NASM syntax
========== ===========

jmp *100 jmp near [100]
call *100 call near [100]
jmp *%eax jmp near eax
jmp *%ecx call near ecx
jmp *(%eax) jmp near [eax]
call *(%ebx) call near [ebx]
ljmp *100 jmp far [100]
lcall *100 call far [100]
ljmp *(%eax) jmp far [eax]
lcall *(%ebx) call far [ebx]
ret retn
lret retf
lret $0x100 retf 0x100

Segment-offsetpointers are specified using the following format:

jmp	$segment, $offset

For example:

jmp	$0x10, $0x100000
If you keep these few things in mind, you'll catch up real soon. As for more details on the GNU assembler, you could try thedocumentation.
阅读(512) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~