分类:
2009-04-03 14:53:31
Disassembling and The Analysis of ARM Processors
ARM处理器反汇编分析。
RISC processors are used in many small devices such as PDA, mobile phones, clever coffee-machines etc. There is a big variety of assemblers for RISC processors, but the most frequent one now is ARM.
RISC处理器的使用在许多小型设备,如掌上电脑,移动电话,聪明的咖啡机等,有很大的不同的装配的RISC处理器,但最常见的一个是ARM。
I am going to talk about ARM 7 since I had a deal with them.
我们将来谈谈ARM7
Let's begin with ARM architecture. ARM processor has a total of 37 registers: 31 general-purpose 32-bit registers and 6 status registers. Set of available registers depends on processors state. ARM state executes 32-bit instructions, Thumb executes 16-bit ones.
我们从ARM架构开始,ARM处理器有37个寄存器。31个基本32位寄存器和6个状态寄存器。寄存器的根据处理器的状态来设置。ARM状态执行32位的指令。Thumb执行16位指令。
In ARM state 18 registers are available: directly accessible R0-R15, CPSR (current program status register), SPSR (status of saved program). 3 of directly accessible registers can be called service-purpose:
在ARM状态有18个寄存器可用。直接能访问的是R0-R15 CPSR(当前程序状态寄存器)
SPSR(保存程序状态),有三个直接访问寄存器会有专门的服务。
(R13) SP - stack pointer 栈指针
(R14) LR - link register, the special register for storage of the return address when procedures are being called. I.e. LR is not saved in the stack - it just lies in the register.
链接寄存器,这个特别的寄存器用来保存返回地址,当程序调用CALL时,LR不是保存在栈中, 只是用来登记。
(R15) PC - a pointer to the current command. It is possible to write to it by ordinary mov changing thereby the address of the next command to be executed.
PC 当前指令指针,我们可以去写它,然后改变地址去执行。
In Thumb state 13 registers are available: R0-R8, R13-R15, CPSR, SPSR.
在Thumb状态有13个寄存器,R0-R8 R13-R15 CPSR SPSR
Transition between the states doesn’t change the contents of the registers.
两个状态上下文寄存器不能转换。
Entry into Thumb state can be achieved by executing a BX instruction with the state bit (bit 1) set in the operand register. Entry into APM state can be achieved by executing a BX instruction with the state bit (bit 0) set in the operand register.
要想进入Thumb状态,需要执行BX指令,并设置操作寄存器位为1,进入ARM状态也需要执行BX指令,并设置操作寄存器位为0.
Set of commands in both states differs, but many commands are still similar. Commands of Thumb state have length of 2 bytes, ARM - 4 bytes. The description of commands of Thumb and ARM states can be taken here:
两个状态的命令集是不同的,但许多命令仍然还是很像的。Thumb状态的指令长度为2byte,
ARM为4byte. 具体ARM跟Thumb的指令可以下载这个文档。
Especially interesting is that many commands operate with several registers at once. For example:
特别的指令需要操作几个寄存器。例如。
ADD R3, SP, #4
That maps to
R3:=SP+4
Or, for example, a command of storing the registers to the stack:
例如,这个命令保存寄存器的值到栈。
PUSH {R2-R4, R7, LR}
It is not an analogue of pushad in x86 assembler. Just in ARM assembler it is possible to push the list of registers onto the stack in such way.
这个不是类似X86中的Pushad 只有在ARM中,它是PUSH 寄存器到栈上。
The data in memory can be either little endian (as at Intel) or big endian (as at Motorola). So, while investigating a code it is necessary to be determined with the data type.
数据在内存中可能是小端或大端,所以,首先需要调查一下数据的类型。
There is a pile of compilers for development of programs for ARM:
下面为一些编译和开发ARM程序的相关资料。
- GNU compiler with all consequences - all through command line + debugging through gdb.
- unpretentious ARM assembler.
- official tools for ARM’s develpment. Here you can only buy them.
- alternative to IDA for ARM. 30-day's trial version is offered.
Features of ARM assembler which is generated by C ++ ARM compilers.
Naturally, on analysis of different weavings code person faces not with the code written on pure assembler, but with C++ ARM compiler generated on the code, and of course it’s a surprise for those who had accustomed to x86 assembler.
Functions calls
函数调用
There are no call conventions (cdecl, stdcall and so on) at all! All the functions use the convention similar to Borland's fastcall. I.e. firstly registers, and if it isn't enough of them, parameters are being passed via stack.
这里没有调用惯例(cdecl stdcall 等)根本上,所以的函数调用的协议都是跟BORLAND一样用fastcall, 首先使用寄存器,如果参数态度,就放到栈里面。
For example: 例:
ROM:
ROM:
ROM:
ROM:
The order of parameters passing maps to registers’ numbers, i.e. R0 is the first, R1 is the second, R2 is the third. That is for
上面的参数是通过寄存器,参数0为R0 参数2为R1,参数3为R2
int memcmp (
const void *buf1,
const void *buf2,
size_t count
);
buf1 = R0
buf2 = R1
count = R2
value returned by the function is being passed via R0:
函数的返回值通过R0
ROM:
ROM:
ROM:
ROM:
ROM:
ROM:
Here is the call with passing via the stack:
这个是通过栈传递参数
ROM:000BCDEC MOV R2, *0
ROM:000BCDEE STR R2, [SP]
ROM:000BCDF0 MOV R2, *128
ROM:000BCDF2 MOV R3, *128
ROM:000BCDF4 MOV R1, *14
ROM:000BCDF6 MOV R0, *0
ROM:000BCDF8 BL FillBoxColor
So, R0-R3 contain coordinates and the fifth parameter (color) is being stored to the stack.
R0-R3 包含4个参数,其余的保存在栈上
The number of operands can be determined only analytically, i.e. we have to analyze the function call and its prologue. Partly, info on the arguments quantity can be received reasoning from which registers from function onset are being stored to the stack. For example, in Thumb state the processor operates with registers R0-R7 and service-purpose ones. So, after having noticed a function, which begins with
ROM:00059ADA getTextBounds
ROM:00059ADA PUSH {R4-R7, LR},
you can assume that it gets arguments via R0, R1, R2, R3 and SP. Further on a call:
你会断定参数是从R0 R1 R2 R3 和SP
ROM:0005924E ADD R0, SP, *0x14
ROM:00059250 ADD R1, SP, *0x
ROM:00059252 ADD R2, SP, *0x68
ROM:00059254 ADD R3, SP, *0x64
ROM:00059256 BL getTextBounds
we see that only R0-R3 are used. That means that 4 parameters are being passed.
我们只看到R0-R3被用到,意思是第4个参数就需要栈来传递。
Transitions
As usual, transitions aka jumps can be conditional and unconditional. The transitions themselves can be relative and register. At that, register ones are often used for switching between Thumb/ARM state. Unconditional short transitions are embedded as command B (branch). And long ones - via register transition BX (Branch with exchange). Function calls are being performed via BL (Branch with link), i.e. transition with storing the return address to LR. Also it is possible to change the performance address by writing in PC register:
ADD PC, *0x64
But C compilers usually do not work in such way. They use writing in PC only in branchings.
Branches
Also called switch. They are embedded rather originally:
ROM:0027806E CMP R2, *0x4D; 'M'
ROM:00278070 BCS loc_
ROM:00278072 ADR R3, word_
ROM:00278074 ADD R3, R3, R2
ROM:00278076 LDRH R3, [R3, R2]
ROM:00278078 ADD PC, R3
ROM:
ROM:
ROM:
ROM:
ROM:
ROM:
First there is a check of the case number takes place. It must be less than 0x4D. If the case number is higher, switch on default case happens, i.e. on loc_
Further the address of branches table word_
0x278078 (current value PC) +0xAA (offset from the table) + 0x4 (!!!) = 0x278126.
We have to add 4 because of ARM processors’ characteristics: when an operation with PC register is being performed, the result is higher by 4 (as it is written in documentation - " to ensure it is word aligned ").
Access to memory
In Thumb state processor can address to memory in +/-256 bytes limit. Therefore access to memory occurs not directly, but via register loading. I.e. it is impossible to address directly to 0x974170, but it can be done via the register. For example:
ROM:00277FF6 LDR R0, =unk_974170
ROM:00277FF8 LDR R0, [R0]
We have received value to the address 0x974170. But we haven't finished yet! The address of a variable (0x974170) is stored nearby within the 256 bytes limit:
ROM:00278044 off_278044 DCD unk_974170
That is, in fact, opcode of LDR command contains an operand offset for LDR command relatively to the current address.
There is an artful property of optimization: if any address can be received relatively to another already used in the current function, then it can be get by arithmetic operations or indirect access. It means that if function, for example, wants to use one variable on the address 0x100000, and another one on the address 0x100150, then the compiler can make access either through two separate addresses or through the following code:
LDR R0, =0x100000
ADD R0, *0xFF
ADD R0, *0x51
LDR R0, [R0]
In x86 it would be treated as the reference to a substructure within the other structure. But here we see usual optimization. What for? To minimize access to memory. I.e. arithmetics works faster than data loading. As a matter of fact, the whole ARM assembler code abounds in different register calculations. Actually, as many as 16 registers were made just for this - to address less often to memory and the stack. For this reason stack variables can be met only in very big functions. Working with the stack differs nothing from the analogous procedure in x86.
Code investigation in IDA
On loading ARM binary images it is necessary to load them as binary files since they do not have a unified structure. On loading you have to specify type of the processor. If the processor for which the code was written is absent in the list of processor modules, then you can load an image file and specify the general type of ARM processor (little endian) or ARMB (big endian). Further it is necessary to create ROM and RAM segments. There is no unified approach. This must be done in depending of an image and architecture of each separate ARM processor. For example, for ARM7 the memory card has nearly the next look:
0x0 - 0x8000 of RAM processor
0x8000 - 0x1000000 ROM
0x1000000 - 0x..... - SRAM (here looking how much of it the device has)
Now we can start the analysis of a code. A point of an input in the weaving code in many devices (in particular, in mobile phones) = 0x8000. The processor starts from ARM state so that a code on the 0x8000 address is equal to the code of ARM state. Processor module IDA is rather primitive and very frequently in attempt of the analysis of such switching, plenty of Thumb code is being transformed in ARM (and on the contrary). Manually to switch a state of a code you can by pressing ALT-G and entering zero in the field Value for ARM state and 1 - for Thumb.