chapter1:
As a computer processor chip runs, it reads instruction codes that are stored in memory
The instruction pointer is used to help the processor keep track of which instruction codes have already been processed and what code is next in line to be processed. Of course, there are special instruction codes that can change the location of the instruction pointer, such as jumping to a specific location in the program.
Similarly, a data pointer is used to help the processor keep track of where the data area in memory starts. This area is called the stack. As new data elements are placed in the stack, the pointer moves “down” in
memory. As data is read from the stack, the pointer moves “up” in memory.
The instructions must be placed in memory in the proper format and order for the processor to properly step through the program code.
Every instruction must contain at least 1 byte called the operation code (or opcode for short).
-------
The IA-32 instruction code format consists of four main parts:
❑ Optional instruction prefix
❑ Operational code (opcode)
❑ Optional modifier
❑ Optional data element
-------
Using memory locations:
1) Defining variables in assembly language consists of two parts:
1. A label that points to a memory location
2. A data type and default value for the memory bytesi.e.:
test:
.long 150
2) Using the stack
The stack is a region of memory reserved at the end of the memory range that the computer reserves for
the application. A pointer (called the stack pointer) is used to point to the next memory location in the stack to put or take data.
When calling functions in an assembly language program, you usually place any data elements that you want passed to the function on the top of the stack. When the function is called, it can retrieve the data elements from the stack.
The data section is used to declare the memory region where data elements are stored for the program. This section cannot be expanded after the data elements are declared, and it remains static throughout the program.
The bss section is also a static memory section. It contains buffers for data to be declared later in the program. What makes this section special is that the buffer memory area is zero-filled.
chapter2:
a control bus, an address bus, and a data bus.
The main components in the processor areas follows:
❑ Control unit
❑ Execution unit
❑ Registers
❑ Flags
The processor retrieves instruction codes from memory based on the
CS register value, and an offset value contained in the EIP instruction pointer register. A program cannot explicitly load or change the CS register. The processor assigns its value as the program is assigned a memory space.
The instruction pointer register (or EIP register), sometimes called the program counter, keeps track of the next instruction code to execute.
An application program cannot directly modify the instruction pointer per se. You cannot specify a memory address and place it in the EIP register. Instead, you must use normal program control instructions, such as jumps, to alter the next instruction to be read into the prefetch cache.
The flags are divided into three groups based on function:
❑ Status flags
❑ Control flags
❑ System flags
The DS, ES, FS, and GS segment registers are all used to point to data segments. By having four separate
data segments, the program can help separate data elements, ensuring that they do not overlap. The pro-
gram must load the data segment registers with the appropriate pointer value for the segments, and ref-
erence individual memory locations using an offset value.
AT&T syntax uses ljmp $section, $offset, whereas Intel syntax uses jmp section:offset.
-------
chapter4:
Defining sections:
❑ The data section
❑ The bss section
❑ The text section
The GNU assembler declares sections using the .section declarative statement. The .section statement takes a single argument, the type of section it is declaring.
The bss section should always be placed before the text section, but the data section can be moved to follow the text section.
Defining the starting point:
The _start label is used to indicate the instruction from which the
program should start running. If the linker cannot find this label, it will produce an error message:
Besides declaring the starting label in the application, you also need to make the entry point available for external applications. This is done with the .globl directive.
In order to debug the assembly language program, you must first reassemble the source code using the
-gstabs parameter:
The format of the break command is: break *label+offset
chapter5:
.data:
.octa 16 byte
.quad 8 byte
数组:
length:
.int 62, 35, 47
常量: .equ LINUX_SYS_CALL, 0x80
To reference the static data element, you must use a dollar sign before the label name. For example, the instruction
movl $LINUX_SYS_CALL, %eax
moves the value assigned to the LINUX_SYS_CALL symbol to the EAX register.
.bss:
.comm .lcomm
example: .comm symbol, length
One benefit to declaring data in the bss section is that the data is not included in the executable program. When data is defined in the data section, it must be included in the executable program, since it must be initialized with a specific value. Because the data areas declared in the bss section are not initialized with
program data, the memory areas are reserved at runtime, and do not have to be included in the final program.
The .fill directive enables the assembler to automatically create the data elements for you.
indexed memory mode:
base_address(offset_address, index, size)
The offset_address and index value must be registers, but the size value can be a numerical value.
While using a label references the data value contained in the memory location, you can get the memory location address of the data value by placing a dollar sign ($) in front of the label in the instruction.
movl value, %ecx
cmp %ebx, %ecx (%ecx - %ebx)
cmova %ecx, %ebx
The conditional move instructions are grouped together in pairs, with two instructions having the same
meaning. For example, a value can be above another value, but it can also be not below or equal to the
value. Both conditions are equivalent, but both have separate conditional move instructions.
The unsigned conditional move instructions rely on the Carry, Zero, and Parity flags to determine the difference between two operands.
The signed conditional move instructions utilize the Sign and Overflow flags to indicate the condition of
the comparison between the operands.
XCHG Exchanges the values of two registers, or a register and a memory location
BSWAP Reverses the byte order in a 32-bit register
XADD Exchanges two values and stores the sum in the destination operand
CMPXCHG Compares a value with an external value and exchanges it with another
CMPXCHG8B Compares two 64-bit values and exchanges it with another
xchg operand1, operand2
When one of the operands is a memory location, the processor’s LOCK signal is automatically asserted,
preventing any other processor from accessing the memory location during the exchange.
Be careful when using the XCHG instruction with memory locations. The LOCK process is very time-
consuming, and can be detrimental to your program’s performance.
cmpxchg source, destination
The CMPXCHG instruction compares the destination operand with the value in the EAX, AX, or AL registers.
If the values are equal, the value of the source operand value is loaded into the destination operand. If
the values are not equal, the destination operand value is loaded into the EAX, AX, or AL registers.
The gas assembler supports the .align directive, which is used to align defined data elements on specific memory boundaries.
chapter6:
Unconditional Branches
three types of unconditional branches:
❑ Jumps
❑ Calls
❑ Interrupts
jmp:
Behind the scenes, the single assembly jump instruction is assembled into one of three different types of jump opcodes:
❑ Short jump
❑ Near jump
❑ Far jump
A short jump is used when the jump offset is less than 128 bytes. A far jump is used in segmented memory models when the jump goes to an instruction in another segment. The near jump is used for all other jumps.
call:
A call is similar to the jump instruction, but it remembers where it
jumped from and has the capability to return there if needed.
The call instruction has two parts. The first part is the actual CALL instruction, which requires a single
operand, the address of the location to jump to:
The second part of the call instruction is the return instruction. This enables the function to return to the
original part of the code, immediately after the CALL instruction. The return instruction has no operands,
just the mnemonic RET. It knows where to return to by looking at the stack.
When the CALL instruction is executed, it places the EIP register onto the stack and then modifies the EIP register to point to the called function address. When the called function is completed, it retrieves the old EIP register value from the stack and returns control back to the original program.
The third type of unconditional branch is the interrupt. An interrupt is a way for the processor to “interrupt” the current instruction code path and switch to a different path. Interrupts come in two varieties:
❑ Software interrupts
❑ Hardware interrupts
Simply using the INT instruction with the 0x80 value transfers control to the
Linux system call program. The Linux system call program has many subfunctions that can be used. The subfunctions are performed based on the value of the EAX register at the time of the interrupt.
CLC Clear the carry flag (set it to zero)
CMC Complement the carry flag (change it to the opposite of what is set)
STC Set the carry flag (set it to one)
Each of these instructions directly modifies the carry flag bit in the EFLAGS register.
The loop instructions use the ECX register as a counter and automatically decrease its value as the loop
instruction is executed. The following table describes the instructions in the loop family.
Instruction Description
LOOP Loop until the ECX register is zero
LOOPE/LOOPZ Loop until either the ECX register is zero, or the ZF flag is not set
LOOPNE/LOOPNZ Loop until either the ECX register is zero, or the ZF flag is set
The format for each of these instructions is
loop address
where address is a label name for a location in the program code to jump to. Unfortunately, the loop
instructions support only an 8-bit offset, so only short jumps can be performed.
An added benefit of the loop instructions is that they decrease the value of the ECX register without
affecting the EFLAGS register flag bits. When the ECX register reaches zero, the Zero flag is not set.
When the LOOP instruction is executed, it first decreases the value in ECX by one,
and then it checks to see whether it is zero. Using this logic, if the value of ECX is already zero before the
LOOP instruction, it will be decreased by one, making it -1. Because this value is not zero, the LOOP
instruction continues on its way, looping back to the defined label.
chapter 8:
sub source, destination (des - src)
chapter11:Using Functions
Three steps are required for creating functions in assembly language programs:
1. Define what input values are required.
2. Define the processes performed on the input values.
3. Define how the output values are produced and passed to the calling program.
Defining input values, three techniques can be employed:
❑ Using registers
❑ Using global variables
❑ Using the stack
.type func1, @function
func1:
Defining output values
❑ Place the result in one or more registers.
❑ Place the result in a global variable memory location.
most C compilers use a standard method for handling input and output values
in assembly language code compiled from C functions. This method works equally as well for any assembly language program, even if it wasn’t derived from a C program.
The C solution for passing input values to functions is to use the stack.
Likewise, the C style defines a common method for returning values to the main program, using the EAX register for 32-bit results (such as short integers), the EDX:EAX register pair for 64-bit integer values, and the FPU ST(0) register for floating-point values.
The stack consists of memory locations reserved at the end of the memory area allocated to the program
When the CALL instruction is executed, it places the return address from the calling program onto the top of the stack as well, so the function knows where to return.
for example:
Function parameter 3
Function parameter 2
Function parameter 1
ESP Return Address
Popping values off of the stack to retrieve the input parameters would cause a problem, as the return address might be lost in the process. Instead, a different method is used to retrieve the input parameters from the stack.
ESP may change in function:
To avoid this problem, it is common practice to copy the ESP register value to the EBP register when entering the function. This ensures that there is a register that always contains the correct pointer to the top of the stack when the function is called. Any data pushed onto the stack during the function would not affect the EBP register value. To avoid corrupting the original EBP register if it is used in the main program, before the ESP register value is copied, the EBP register value is also placed on the stack.
16(%ebp)
Function parameter 3
12(%ebp)
Function parameter 2
8(%ebp)
Function parameter 1
4(%ebp)
Return Address
(%ebp)
ESP Old EBP Value
standard c function:
function:
pushl %ebp
movl %esp, %ebp
.
.
movl %ebp, %esp
popl %ebp
ret
The function prologue code now must include one additional line to reserve the space for the local variables by moving the stack pointer down. You must remember to reserve enough space for all of the local variables needed in the function. The new prologue would look like this:
function:
pushl %ebp
movl %ebp, %esp
subl $8, %esp
.
.
The virtual memory addresses assigned to programs running in Linux start at address 0x8048000 and end at address 0xbfffffff. You would expect the stack pointer to be set to 0xbfffffff each time a program starts, but this is not the case. Before the program loads, Linux places a few things into the stack, which is where the command-line parameters come in.
Linux places four types of information into the program stack when the program starts:
❑ The number of command-line parameters (including the program name)
❑ The name of the program as executed from the shell prompt
❑ Any command-line parameters included on the command line
❑ All the current Linux environment variables at the start of the program
chapter12:
The
kernel is primarily responsible for four things:
❑ Memory
management
❑ Device management
❑ File system
management
❑ Process management
The memory locations are
grouped into blocks called pages. Each page of memory is located either
in the physical memory or the swap space. The kernel must maintain a
table of the memory pages that indicates which pages are where.
cat
/proc/meminfo
ipcs -m
There are three different
classifications of device files:
❑ Character
❑
Block
❑ Network
Device files are created in the file
system as nodes. Each node has a unique number pair that identifies it
to the Linux kernel. The number pair includes a major and minor device
number.
The Linux kernel interfaces with each file system using
the Virtual File System (VFS). This provides a standard interface for
the kernel to communicate with any type of file system. VFS caches
information in memory as each file system is mounted and used.
The
kernel provides system calls to help manage and access files on each of
the different file systems using VFS. A single system call can be used
to access files on any file system type.
系统调用号: %eax
/usr/include/asm/unistd_32.h
system calls require that input values be placed in registers. There is a specific order in which each input value is placed in the registers. Placing the wrong input value in a wrong register can produce catastrophic results.
The EIP, EBP, and ESP registers cannot be used, as that would adversely affect the program operation. This leaves five registers available to hold input values.
The register order used is EBX, ECX, EDX, ESI, EDI, and EBP.This enables up to six input values to be used. The output value is returned in the EAX register.
The order in which the system calls expect input values is as follows:
❑ EBX (first parameter)
❑ ECX (second parameter)
❑ EDX (third parameter)
❑ ESI (fourth parameter)
❑ EDI (fifth parameter)
System calls that require more than six input parameters use a different method of passing the parameters to the system call. The EBX register is used to contain a pointer to the memory location of the input parameters, stored in sequential order. The system call uses the pointer to access the memory location to
read the parameters.
The return value from a system call is placed in the EAX register. It is your job to check the value in the EAX register, especially for failure conditions.
using structure:
.section .data
result:
type1:
.int 0
now you can use result to reference the structure and $result to its address.
chapter 13 1) basic format:
asm( “assembly code” );
The basic inline assembly code can utilize global C variables defined in the application.
The volatile modifier can be placed in the asm statement to indicate that no optimization is desired
on that section of code.
asm (“assembly code” : output locations : input operands : changed registers);
The format of the input and output values list is “constraint”(variable),
where variable is a C variable declared within the program. In the extended asm format, both local and
global variables can be used. The constraint defines where the variable is placed (for input values) or
moved from (for output values). This is what defines whether the value is placed in a register or a mem-
ory location.
The constraint is a single-character code. The constraint codes are shown in the following table.
Constraint Description
a Use the %eax, %ax, or %al registers.
b Use the %ebx, %bx, or %bl registers.
c Use the %ecx, %cx, or %cl registers.
d Use the %edx, %dx, or $dl registers.
S Use the %esi or %si registers.
D Use the %edi or %di registers.
r Use any available general-purpose register.
q Use either the %eax, %ebx, %ecx, or %edx register.
A Use the %eax and the %edx registers for a 64-bit value.
f Use a floating-point register.
t Use the first (top) floating-point register.
u Use the second floating-point register.
m Use the variable’s memory location.
o Use an offset memory location.
V Use only a direct memory location.
i Use an immediate integer value.
n Use an immediate integer value with a known value.
g Use any register or memory location available.
In addition to these constraints, output values include a constraint modifier, which indicates how the
output value is handled by the compiler. The output modifiers that can be used are shown in the follow-
ing table.
Output Modifier Description
+ The operand can be both read from and written to.
= The operand can only be written to.
% The operand can be switched with the next operand if necessary.
& The operand can be deleted and reused before the inline functions complete.
In extended asm format, to reference a register in the assembly code you must use two percent signs
instead of just one(%%)
Referencing placeholders:
If an input and output value in the inline assembly code share the same C variable from the program,
you can specify that using the placeholders as the constraint value.
asm (“imull %1, %0”
: “=r”(data2)
: “r”(data1), “0”(data2));
The 0 tag signals the compiler to use the first named register for the output value data2. The first named
register is defined in the second line, which assigns a register to the data2 input variable. This ensures
that the same register will be used to hold the input and output values. Of course, the result will be placed
in the data2 value when the inline code is complete.
Alternative placeholders:
The alternative name is defined within the sections in which the input and output values are declared.
The format is as follows:
%[name] ”constraint”(variable)
The name value defined becomes the new placeholder identifier for the variable in the inline assembly
code, as shown in the following example:
asm (“imull %[value1], %[value2]”
: [value2] “=r”(data2)
: [value1] “r”(data1), “0”(data2));
Handling jumps:
Both conditional and unconditional branches allow you to specify a number as a label, along with a
directional flag to indicate which way the processor should look for the numerical label. The first occur-
rence of the label found will be taken.
The labels have been replaced with 0: and 1:. The JGE and JMP instructions use the f modifier to indi-
cate the label is forward from the jump instruction. To move backward, you must use the b modifier.
other:
ANDing with $-16 (= 0xFFFFFFF0) rounds the stack pointer down to a
multiple of 16. It ensures that data pushed onto the stack will be aligned
on a 16-byte boundary.
阅读(1071) | 评论(0) | 转发(0) |