转:在linux平台上创建超小的ELF可执行文件-顺流o逆流-ChinaUnix博客

顺流逆流

首页　| 　博文目录　| 　关于我

顺流o逆流

博客访问： 61576
博文数量： 27
博客积分： 2000
博客等级：大尉
技术积分： 300
用户组：普通用户
注册时间： 2009-04-24 17:31

文章分类

全部博文（27）

未分配的博文（27）

文章存档

2011年（1）

2010年（8）

2009年（18）

我的朋友

相关博文

转:在linux平台上创建超小的ELF可执行文件

分类： LINUX

2009-06-15 12:54:44

在linux平台上创建超小的ELF可执行文件(修订版)

作者：breadbox <>
原文《A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux 》
整理翻译：alert7 <>
来源: http://www.xfocus.org/
时间：2001-9-4

前言：
有些时候，文件的大小是很重要的，从这片文章中，也探讨了ELF文件格式内部的工作情况与LINUX的操作系统。该篇文章向我们展示了如何构造一个超小的ELF可执行文件。

文章中给出的这些example都是运行在intel 386体系的LINUX上。其他系统体系上或许也有同样的效果，但我不敢肯定。

我们的汇编代码使用的是Nasm写的，它的风格类似于X86汇编风格。
NASM软件是免费的，可以从下面得到

-----------------------------------------------------------------

看看下面一个很小的程序例子，它唯一做的事情就是返回一个数值到操作系统中。
UNIX系统通常返回0和1，这里我们使用42作为返回值。

[alert7@redhat]# set -o noclobber && cat > tiny.c << EOF
/* tiny.c */
int main(void) { return 42; }
EOF

[alert7@redhat]# gcc -Wall tiny.c
[alert7@redhat]# ./a.out ;echo $?
42

再用gdb看看，这个程序实在很简单吧
[alert7@redhat]# gdb -q a.out
(gdb) disass main
Dump of assembler code for function main:
0x80483a0 : push %ebp
0x80483a1 : mov %esp,%ebp
0x80483a3 : mov $0x2a,%eax      ; 0x2a = 42
0x80483a8 : jmp 0x80483b0
0x80483aa : lea 0x0(%esi),%esi
0x80483b0 : leave
0x80483b1 : ret

看看有多大
[alert7@redhat]# wc -c a.out
11648 a.out

在原作者的机子上是3998，在我的rh 2.2.14-5.0上就变成11648，好大啊，我们需要使它变的更小。

[alert7@redhat]# gcc -Wall -s tiny.c
[alert7@redhat]# ./a.out ;echo $?
42
[alert7@redhat]# wc -c a.out
2960 a.out
现在变成2960，小多了.

gcc -Wall -s tiny.c 实际上等价于下面2条命令:
gcc -Wall tiny.c
strip a.out 抛弃所有的标号

[alert7@redhat]# gcc -Wall tiny.c
[alert7@redhat]# wc -c a.out
11648 a.out
[alert7@redhat]# strip a.out
[alert7@redhat]# wc -c a.out
2960 a.out

下一步，我们来进行优化。
[alert7@redhat]# gcc -Wall -s -O3 tiny.c
[alert7@redhat]# wc -c a.out
2944 a.out

我们看到，只比上面的小16个字节，所以使用优化指令来减小大小是比较困难的。

很不幸，C程序在编译的时候，编译器会增加一些额外的代码，所以接下来我们使用汇编来写程序。

如上一个程序，我们需要返回代码为42，我们只需要把eax设置为42就可以了。程序的返回状态就是存放在eax中的，从上面一段disass main出来的汇编代码我们也应该知道。

[alert7@redhat]# set -o noclobber && cat > tiny.asm << EOF
; tiny.asm
BITS 32
GLOBAL main
SECTION .text
main:
mov eax, 42
ret
EOF

编译并测试
[alert7@redhat]# nasm -f elf tiny.asm
[alert7@redhat]# gcc -Wall -s tiny.o
[alert7@redhat]# ./a.out ; echo $?
42

现在看看汇编代码有什么不同，看看它的大小
[alert7@redhat]# wc -c a.out
2892 a.out

这样又减小了（2944-2892）52个字节. 但是，只要我们使用main()接口，就还会有许多额外的代码。连接器linker 会自动为我们附加一段OS和main()之间的接口代码。事实上就是调用main()。所以我们如何来去掉我们不需要的代码呢。

linker默认使用的实际入口是标号_start。gcc联接时，它会自动包括一个_start的例程，设置argc和argv，....，最后调用main()。
#《before main() 分析》如果应用程序里包含关于动态链接器的描述段(.interp section),内核首先加载应用程序的段,紧接着加载动态链接器的段,2者都加载进用户空间.然后从内核系统调用返回到用户空间,跳转到动态链接器(/lib/ld-2.2.4.so)的入口(动态连接器里定义的全局符号_start处)。如果应用程序里没有包含关于动态链接器的描述段,就直接跳转到应用程序入口(start.s里定义的全局符号_start处)。

所以让我们来看看，是否可以跳过这个，自己定义_start例程。

[alert7@redhat]# set -o noclobber && cat > tiny.asm << EOF
; tiny.asm
BITS 32
GLOBAL _start
SECTION .text
_start:
mov eax, 42
ret
EOF

[alert7@redhat]# nasm -f elf tiny.asm
[alert7@redhat]# gcc -Wall -s tiny.o
tiny.o: In function `_start':
tiny.o(.text+0x0): multiple definition of `_start'
/usr/lib/crt1.o(.text+0x0): first defined here
/usr/lib/crt1.o: In function `_start':
/usr/lib/crt1.o(.text+0x18): undefined reference to `main'
collect2: ld returned 1 exit status

如何做才可以编译过去呢？
GCC有一个编译选项--nostartfiles 允许当连接程序时，不使用标准的启动文件。(连接器连接程序时,通常是使用标准启动文件的。)

我们要的就是这个，再来：

[alert7@redhat]# nasm -f elf tiny.asm
[alert7@redhat]# gcc -Wall -s -nostartfiles tiny.o
[alert7@redhat]# ./a.out ; echo $?
Segmentation fault (core dumped)
139

gcc没有报错，但是程序core dump了，到底发生了什么？

错就错在我们把_start看成了一个C的函数，然后试着从它(_start)返回(ret)。事实上它根本不是一个函数。它仅仅是一个标号，它只是被linker使用的一个程序入口点。当程序运行时，内核系统调用sys_execve()加载用户程序和解释器程序（如果有的话）到用户空间。最后把所保存用户现场中的eip改成了新的地址(_start)，使得CPU在返回用户空间时进入新的程序入口(_start)。如果有解释器映像存在，那么这就是解释器映像的程序入口，否则就是目标映像的程序入口。参考：《漫谈兼容内核之八 ELF映像的装入(一)》
如果我们看1下汇编程序执行到_start时的堆栈，将看到(ret返回时)在堆栈顶部的数值为1(实际上是argc)，它的确不象一个地址。事实上，堆栈顶部那个数值应该是用户程序的argc变量，之后是argv数组(结尾包含NULL元素)，接下来是envp环境变量。所以，那个根本就不是返回地址。
(# _start 是个标号, 它不是一个函数 _start(), 内核系统调用sys_execve()结束时，从内核堆栈和内核EIP直接返回到用户空间_start标号处执行。并非是call/ret关系。如果是call调用函数, 肯定先要将返回地址EIP压入堆栈, 以便供 ret 指令返回. )

因此，_start要退出，就要调用exit()函数。

事实上，我们实际调用的是 _exit()函数，因为exit()函数所要做的额外事情太多了，因为我们跳过了lib库的启动代码，所以我们也可以跳过lib库的shutdown代码。

好了，再让我们试试。调用 _exit()函数，它唯一的参数就是一个整形。所以我们需要push一个数到堆栈里，然后调用_exit().

[alert7@redhat]# set -o noclobber && cat > tiny.asm << EOF
; tiny.asm
BITS 32
EXTERN _exit
GLOBAL _start
SECTION .text
_start:
push dword 42
call _exit
EOF

[alert7@redhat]# nasm -f elf tiny.asm
[alert7@redhat]# gcc -Wall -s -nostartfiles tiny.o
[alert7@redhat]# ./a.out ; echo $?
42

yeah~~,成功了，来看看多大

[alert7@redhat]# wc -c a.out
1312 a.out

不错不错，又减少了将近一半,:),还有没有其他我们感兴趣的gcc选项呢？

GCC有一个编译选项 -nostdlib    在连接器连接程序的时候，不使用标准的lib库和启动文件。那些东西都需要自己指定传给linker. 这个值得研究一下:

[alert7@redhat]# gcc -Wall -s -nostdlib tiny.o
tiny.o: In function `_start':
tiny.o(.text+0x6): undefined reference to `_exit'
collect2: ld returned 1 exit status

原因: _exit()是一个库函数，加了-nostdlib 就不能使用了。所以我们必须自己处理，首先，必须知道在linux下如何制造一个系统调用。

-----------------------------------------------------------------

象其他操作系统一样，linux通过系统调用来向程序提供基本的服务。
这包括打开文件，读写文件句柄，等等......

LINUX系统调用接口只有一个指令：int 0x80。所有的系统调用都是通过该接口。
为了制造一个系统调用，eax应该包含一个数字（即系统调用号,要使用哪个系统调用），其他寄存器保存着系统调用的参数。
假如系统调用使用一个参数，那么参数在ebx中；
假如使用两个参数，那么在ebx,ecx中
假如使用三个,四个，五个参数，那么使用ebx,ecx,esi

系统调用结束返回时, eax 将包含了一个返回值。
假如错误发生，eax将是一个负值，它的绝对值表示错误的类型。

在/usr/include/asm/unistd.h中列出了不同的系统调用。快速浏览一下,可以看到exit的系统调用号为1。它只有一个参数，该值被放到ebx中，系统调用结束时该值会放在eax中作为返回值返回给父进程。

好了，现在又可以开工了:)

[alert7@redhat]# set -o noclobber && cat > tiny.asm << EOF
; tiny.asm
BITS 32
GLOBAL _start
SECTION .text
_start:
mov eax, 1
mov ebx, 42
int 0x80
EOF

[alert7@redhat]# nasm -f elf tiny.asm
[alert7@redhat]# gcc -Wall -s -nostdlib tiny.o
[alert7@redhat]# ./a.out ; echo $?
42

看看大小

[alert7@redhat]# wc -c a.out
416 a.out

现在可真是tiny，呵呵，那么还能不能更小呢？

使用更短的汇编指令，看看下面两段汇编代码：

00000000 B801000000 mov eax, 1
00000005 BB2A000000 mov ebx, 42
0000000A CD80       int 0x80

00000000 31C0 xor eax, eax
00000002 40   inc eax
00000003 B32A mov bl, 42
00000005 CD80 int 0x80

很明显从功能上讲是等价的，但是下面一个比上面一个节约了5个字节。

[alert7@redhat]# set -o noclobber && cat > tiny.asm << EOF
; tiny.asm
BITS 32
GLOBAL _start
SECTION .text
_start:
xor eax,eax
inc eax
mov bl,42
int 0x80

EOF
[alert7@redhat]# nasm -f elf tiny.asm
[alert7@redhat]# ld -s tiny.o        # 使用gcc大概已经不能减少大小了，我们直接使用linker--ld
[alert7@redhat]# wc -c a.out
412 a.out

只小了4个字节，应该是5个字节的，原因是另外的一个字节被用来"按字节对齐"去了。

是否到达了极限了呢，能否更小？

hm.我们的程序代码现在只有7个字节长。ELF文件中其余405字节是否还有额外的负载呢？他们都是些什么？

使用objdump来看看文件的内容：

[alert7@redhat]# objdump -x a.out | less
a.out: no symbols

a.out: file format elf32-i386
a.out
architecture: i386, flags 0x00000102:
EXEC_P, D_PAGED
start address 0x08048080

Program Header:
LOAD off 0x00000000 vaddr 0x08048000 paddr 0x08048000 align 2**12
filesz 0x00000087 memsz 0x00000087 flags r-x

Sections:
Idx Name    Size      VMA        LMA         File off     Algn
0 .text     00000007  08048080   08048080    00000080     2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .bss      00000001  08049087   08049087    00000087     2**0
CONTENTS
2 .comment 0000001c 00000000   00000000    00000088     2**0
CONTENTS, READONLY

[译者注：在我的机子上多了个.bss节，我想可能是跟ld版本有关。所以在我系统上演示的一直比原作者上面的大:(   看来要想更小的话，还是可以考虑找个低版本的编译:) ]

如上，完整的.text节为7个字节大，刚好如我们刚才所说。

但是还有其他的节，例如".comment",谁安排它的呢？".comment"节大小为28(0x1c)个字节。我们现在不知道.comment节到底是什么东西，但是可以大胆的说，它不是必须的。

.comment节在文件偏移量为00000087 (16进制) 我们来看看是什么东西

[alert7@redhat]# objdump -s a.out

a.out: file format elf32-i386

Contents of section .text:
8048080 31c040b3 2acd80                  1.@.*..
Contents of section .bss:
8049087 00 .
Contents of section .comment:
0000 00546865 204e6574 77696465 20417373 .The Netwide Ass
0010 656d626c 65722030 2e393800          embler 0.98.

哦，是nasm自己的一段信息，或许我们应该使用gas.......

假如我们：
[alert7@redhat]# set -o noclobber && cat > tiny.s << EOF
.globl _start
.text
_start:
xorl %eax, %eax
incl %eax
movb $42, %bl
int $0x80
EOF

[alert7@redhat]# gcc -s -nostdlib tiny.S
[alert7@redhat]# ./a.out ; echo $?
42
[alert7@redhat]# wc -c a.out
368 a.out

[译者注:在作者机子上这里大小没有变化，但在我的系统上，这里变成了368（跟作者的机子上一样了），比前面的所有的都要小 ]

再用一下objdump，会有些不同：

Sections:
Idx Name Size           VMA         LMA      File off     Algn
0 .text   00000007       08048074    08048074 00000074     2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .data   00000000       0804907c    0804907c 0000007c     2**2
CONTENTS, ALLOC, LOAD, DATA
2 .bss    00000000       0804907c    0804907c 0000007c     2**2
ALLOC

没有了.comment节，但是还有两个无用的节：.data和.bss，用来存储不存在的数据。而且那些节居然还是0长度。他们使文件大小变大。

所以它们都是没有用的，我们如何来去掉它们呢？

我们需要准备一些elf文件格式的知识。虽然我也已经翻译过《ELF文件格式》，在http://www.xfocus.org/上可以找到，但是翻译的很垃圾，早已招人唾骂过了，所以还是推荐大家看英文原版文档，而且是强烈推荐。

--------------------------------------------------------------------elf文件格式英文文档下载地址：

或者

基本上，我们需要知道如下知识：

每一个elf文件都是以一个ELF header的结构开始的。该结构为52个字节长，描述了文件的内容和结构。例如，前16个字节包含了一个“标识符”，它包含了ELF文件的魔术数，标识信息表明系统是32位的还是64位的，小端序还是大端序，等等。

在elf header包含的其他的信息还有，例如：目标体系；ELF文件是否是可执行的(*.exe)\还是objedt文件(*.o)\还是一个共享库(*.so)；程序的开始地址；program header table和section header table 在文件中的偏移量。

两个表可以出现在文件的任何地方, 但是以前经常是直接紧跟在ELF HEADER后面，现今可以位于文件的末尾或许是靠近末尾。两个表有相似的功能，显示了文件的不同构成视图。但是，section header table更关注的是识别在硬盘文件中不同部分的位置或偏移量，然而，program header table描述的是把硬盘文件里各个部分加载到内存中的哪个位置上。

简单的说，section header table 是被编译器(compiler)和连接器(linker)使用，program header table是被程序加载器(loader)使用。对object文件(*.o)，program header talbe是可选的，实际上也从来没有这个表。对于可执行文件(*.exe)来说，section header table 是可选的，但是它却总是存在于可执行文件中。

因此，对于我们的程序来说，seciton header table是完全没有用的，那些sections也不会影响到程序在内存中的映象。

那么，到底如何去掉它们呢？

我们必须自己来构造程序的ELF HEADER.

你也可以查看ELF文档和/usr/include/linux/elf.h得到相关信息，一个空的ELF可执行文件应该象下面这样：

BITS 32

org 0x08048000

ehdr: ; Elf32_Ehdr
db 0x7F, "ELF", 1, 1, 1 ; e_ident
times 9 db 0
dw 2 ; e_type
dw 3 ; e_machine
dd 1 ; e_version
dd _start ; e_entry
dd phdr - $$ ; e_phoff       # $$ 表示程序起始位置 org 0x08048000
dd 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx

ehdrsize equ $ - ehdr        # $表示程序当前位置

phdr: ; Elf32_Phdr
dd 1 ; p_type
dd 0 ; p_offset
dd $$ ; p_vaddr              # $$ 表示程序起始位置 org 0x08048000
dd $$ ; p_paddr
dd filesize ; p_filesz
dd filesize ; p_memsz
dd 5 ; p_flags
dd 0x1000 ; p_align

phdrsize equ $ - phdr

_start:

; your program here

filesize equ $ - $$

该映象包含了1个ELF header ,没有section header table , 有1个program header table 仅包含了1个表项目。该表项目指示程序加载器loader把完整的文件装载到内存(一般的是包含自己的ELF header 和 program header table)开始地址为0x08048000（这是可执行文件装载的默认地址）的地方，并且开始执行_start处代码，_start紧跟在program header table之后。没有.data段，没有.bss段，没有.comment段。

好了，现在我们的程序就变成这样了：

[alert7@redhat]# cat tiny.asm
; tiny.asm
org 0x08048000

ehdr: ; Elf32_Ehdr
db 0x7F, "ELF", 1, 1, 1 ; e_ident
times 9 db 0
dw 2 ; e_type
dw 3 ; e_machine
dd 1 ; e_version
dd _start ; e_entry
dd phdr - $$ ; e_phoff
dd 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx

ehdrsize equ $ - ehdr

phdr: ; Elf32_Phdr
dd 1 ; p_type
dd 0 ; p_offset                 # 段在文件中偏移量
dd $$ ; p_vaddr                 # 段加载到内存中的虚拟地址
dd $$ ; p_paddr
dd filesize ; p_filesz          # 段在文件中的大小
dd filesize ; p_memsz           # 段加载到内存中的大小
dd 5 ; p_flags
dd 0x1000 ; p_align

phdrsize equ $ - phdr
_start:
mov bl, 42
xor eax, eax
inc eax
int 0x80

filesize equ $ - $$

[alert7@redhat]# nasm -f bin -o a.out tiny.asm
[alert7@redhat]# chmod +x a.out
[alert7@redhat]# ./a.out ; echo $?
42

再看看大小：

[alert7@redhat]# wc -c a.out
93 a.out

真是奇迹，才93个字节大小了。

假如我们明白在可执行文件中的每个字节，我们或许还可以更小,也许很是极限了哦:)

-------------------------------------------------------------------

你可能已经注意到了：
1）ELF文件的不同部分允许被放在文件中任何地方（除了ELF header,它必须放在文件的开始），并且它们可以交叠。
2）事实上有一些字段到目前还没有被用到。

在鉴别文件字段(16字节)最后有9个字节为0，我们的代码只有7个字节长，所以我们试图把代码放入鉴别文件字段最后9个字节中，还有2个字节剩余。
(但是考虑到指令的对齐,代码实际是从第9字节开始,所以只有1个字节剩余)

(DB 字节; DW 字; DD 双字)

[alert7@redhat]# cat tiny.asm
; tiny.asm

BITS 32

org 0x08048000

ehdr: ; Elf32_Ehdr
db 0x7F, "ELF" ; e_ident      # 0x7F , "E","L","F"   占4个db
db 1, 1, 1, 0                 # 1, 1, 1, 0     占4个db
_start: mov bl, 42            # 代码从第9字节开始, 占7个字节
xor eax, eax
inc eax
int 0x80
db 0                          # 剩余1个字节
dw 2 ; e_type
dw 3 ; e_machine
dd 1 ; e_version
dd _start ; e_entry
dd phdr - $$ ; e_phoff
dd 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx

ehdrsize equ $ - ehdr

phdr: ; Elf32_Phdr
dd 1 ; p_type
dd 0 ; p_offset                  # 段在文件内偏移量
dd $$ ; p_vaddr                  # 段加载到内存中的虚拟地址
dd $$ ; p_paddr
dd filesize ; p_filesz           # 段在文件内的大小
dd filesize ; p_memsz            # 段加载到内存中的大小
dd 5 ; p_flags
dd 0x1000 ; p_align

phdrsize equ $ - phdr

filesize equ $ - $$

[alert7@redhat]# nasm -f bin -o a.out tiny.asm
[alert7@redhat]# chmod +x a.out
[alert7@redhat]# ./a.out ; echo $?
42
[alert7@redhat]# wc -c a.out
84 a.out

现在我们的程序只有1个elf header和1个program header table表项目，为了装载和运行程序，这些是我们必需的。所以现在我们不能减少了！除非....

我们使elf header和program header table一部分重合或者说是交叠，有没有可能呢？

答案当然是有的，注意我们的程序，就会注意到在elf header最后8个字节和program header table 前8个字节是一样的，所以...

(DB 字节; DW 字; DD 双字)

[alert7@redhat]# cat tiny.asm
; tiny.asm

BITS 32

org 0x08048000

ehdr:                 # elf header 开始
db 0x7F, "ELF" ; e_ident
db 1, 1, 1, 0
_start: mov bl, 42
xor eax, eax
inc eax
int 0x80
db 0
dw 2 ; e_type
dw 3 ; e_machine
dd 1 ; e_version
dd _start ; e_entry
dd phdr - $$ ; e_phoff
dd 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
phdr: dd 1 ; e_phnum ; p_type     # program header table 开始
                 ; e_shentsize
dd 0 ; e_shnum ; p_offset
        ; e_shstrndx
ehdrsize equ $ - ehdr             # elf header 结尾
dd $$ ; p_vaddr
dd $$ ; p_paddr
dd filesize ; p_filesz
dd filesize ; p_memsz
dd 5 ; p_flags
dd 0x1000 ; p_align
phdrsize equ $ - phdr             # program header table 结尾

filesize equ $ - $$

[alert7@redhat]# nasm -f bin -o a.out tiny.asm
[alert7@redhat]# chmod +x a.out
[alert7@redhat]# ./a.out ; echo $?
42
[alert7@redhat]# wc -c a.out
76 a.out

现在已经不能够再更多的重叠那两个结构了，因为两个结构的字节没有再相同的了。

但是，我们可以再构造这两个结构，使它们有更多的相同部分。

到底linux实际会检查多少字段呢？例如，它会检查e_machine字段吗？

事实上很令人惊讶，一些字段居然被默默的忽略了。

因此：哪些东西才是ELF header中最重要的呢？最前的四个字节当然重要，它包含了一个魔术数，否则linux的exec()系统调用不会继续处理它。但e_ident字段的其他3个字节却不会被检查，这就意味着有不少于12个连续的字节可供我们设置为任意的值。e_type必须被设置为2(用来表明是个可执行文件)，e_machine必须为3。就象e_ident中的版本号一样，e_version被完全的忽略。（这样做可以理解，因为现在只有一个版本的ELF标准）。e_entry当然要设置为正确的值，因为它指向程序的开始。毫无疑问，e_phoff应该是program header table在文件中的正确偏移量，e_phnum是program header table中所包含的正确的入口数。然而，e_flags 没有被当前的Intel体系使用，所以我们应该可以重新利用。e_ehsize用来校验elf header 所期望的大小，但是LINUX忽略了它。e_phentsize用来校验program header table表项的大小。但是只有在2.2.17以后的2.2系列内核中这个字段才是被检查的。早于2.2的和2.4.0的内核是忽略它的。

program header table又如何呢？
p_type必须是1（即PT_LOAD），表明这是个可载入的段。p_offset是开始装载的文件内偏移量。同样的，p_vaddr是正确的内存装载地址，可用的地址为0-0x80000000，并且要页对齐。文档上说p_paddr被忽略，因此这个字段是我们可用的。p_filesz 指示了从文件中装载到内存中有多少字节，p_memsz指示了需要多大的内存段。因此，他们的值应该是相关的。p_flags指示了给于内存段什么权限。可设置读，写，执行，其他位也可能被设置，但是我们只需要最小权限。最后，p_align给出了对齐需求。该字段主要使用在当重定位段包含了与位置无关的代码时，岂今为止，对于可执行文件 LINUX忽略该字段。

根据分析，我们从中可以看出一些必要的字段，一些无用的字段，这样，我们就可以重叠更多的字数了。

[alert7@redhat]# cat tiny.asm
; tiny.asm

BITS 32

org 0x00200000

db 0x7F, "ELF" ; e_ident        <-----   # ELF header 开始
db 1, 1, 1, 0
_start:
mov bl, 42
xor eax, eax
inc eax
int 0x80
db 0
dw 2 ; e_type
dw 3 ; e_machine
dd 1 ; e_version
dd _start ; e_entry
dd phdr - $$ ; e_phoff
phdr: dd 1 ; e_shoff ; p_type     <------ # program header table 开始  e_shoff 忽略 p_type=1
dd 0 ; e_flags ; p_offset           # e_flags 忽略 p_offset=0
dd $$ ; e_ehsize ; p_vaddr          # e_ehsize 忽略 p_vaddr = 0x00200000
          ; e_phentsize                     # e_phentsize = 0x0020 (program header table的1个表项大小为0x20可以计算出来)
dw 1 ; e_phnum ; p_paddr          # e_phnum = 1   p_paddr 忽略
dw 0 ; e_shentsize
dd filesize ; e_shnum ; p_filesz  # e_shnum忽略   e_shstrndx 忽略
                ; e_shstrndx   <-------    # ELF header 结束. linux 不检查ELF header长度,因此不用计算e_ehsize
dd filesize ; p_memsz
dd 5 ; p_flags
dd 0x1000 ; p_align            <------   # program header table 结束

filesize equ $ - $$

正如你看到的，program header table的前12个字节重叠在ELF header的最后12个字节里。相当的吻合。重叠中只有两部分会有麻烦。
第一个是e_phnum字段，相对应的p_paddr 被忽略。
第二个是e_phentsize字段，它和p_vaddr前两个字节相一致，为了这个相一致，使用了非标准的加载地址0x00200000，那么前面的两个字节就是0x0020。

[alert7@redhat]# nasm -f bin -o a.out tiny.asm
[alert7@redhat]# chmod +x a.out
[alert7@redhat]# ./a.out ; echo $?
42
[alert7@redhat]# wc -c a.out
64 a.out

well,现在大小为64字节了

如果我们使 program header table完全放在ELF header中，那么，呵呵，大小就可以更小了，但是这样做行吗？

是的，是可能的。使program header table从第四个字节就开始，精心构造可执行的ELF文件。

我们注意到：
第一, p_memsz指出了为内存段分配多少内存。明显的，它必须至少跟p_filesz一样大，当然更大是没有关系的。

第二, 可执行位可以从p_flags字段中丢弃，linux会为我们设置它的。为什么这样会工作呢？
作者说不知道，又猜测了原因说是否因为入口指针指向了该段？

[★译者注:
但我知道，linux根本就没有为我们设置p_flags字段中的可执行位，之所以可以工作，
只是因为Intel硬件体系上根本就不具有执行保护功能，就是这个原因，才使得有人有
必要设计了类似堆栈不可运行的内核补丁程序。]

[alert7@redhat]# cat tiny.asm
; tiny.asm

BITS 32

org 0x00001000

db 0x7F, "ELF" ; e_ident         <----     # ELF header 开始, 0x7F , "E","L","F"   占4个db
dd 1 ; p_type                    <----     # program header table 开始, 直接可以算出 e_phoff = 4
dd 0 ; p_offset                  # p_offset = 0
dd $$ ; p_vaddr                  # p_vaddr = $$
dw 2 ; e_type ; p_paddr          # e_type =2    p_paddr 忽略
dw 3 ; e_machine
dd filesize ; e_version ; p_filesz      # e_version 忽略      p_filesz = filesize
dd _start ; e_entry ; p_memsz           # e_entry = _start     p_memsz = _start 保证数值大于 p_filesz 即可
dd 4 ; e_phoff ; p_flags                # e_phoff = 4  p_flags = 4   在intel 平台下, p_flags =4 仍然可以执行
_start:
mov bl, 42 ; e_shoff ; p_align          # e_shoff 忽略 p_align 忽略
xor eax, eax
inc eax ; e_flags                       # e_flags 忽略
int 0x80
db 0                            <-----      # program header table 结束, 直接可以算出 e_phentsize = 0x20
dw 0x34 ; e_ehsize              # e_ehsize 长度固定等于52字节 =0x34
dw 0x20 ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx               <-----    # ELF header 结束

filesize equ $ - $$

p_flags字段从5变为4，这个4也是e_phoff字段的值，它给出了program header table在文件中的偏移量。代码被放在从e_shoff 开始到e_flags内部结束。
(e_shoff和e_flags为dd,共8字节,刚好放下我们的代码)

注意：装载地址(0x1000)被设定的更低了。只是为了令e_entry的值是一个比较小的合适的数值，因为e_entry 的值也是p_mensz的值(保证p_memsz数值大于 p_filesz 即可, 但也不要大的太离谱)。

[alert7@redhat]# nasm -f bin -o a.out tiny.asm
[alert7@redhat]# chmod +x a.out
[alert7@redhat]# ./a.out ; echo $?
42
[alert7@redhat]# wc -c a.out
52 a.out
[alert7@redhat]# readelf -a a.out
ELF Header:
Magic: 7f 45 4c 46 01 00 00 00 00 00 00 00 00 10 00 00
Class: ELF32
Data: none
Version: 0
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x34
Entry point address: 0x1020
Start of program headers: 4 (bytes into file)
Start of section headers: -1070519629 (bytes into file)
Flags: 0x80cd40
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 1
Size of section headers: 0 (bytes)
Number of section headers: 0
Section header string table index: 0

There are no sections in this file.

Program Header:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00001000 0x00030002 0x00034 0x01020 R 0xc0312ab3

There is no dynamic segment in this file.

There are no relocations in this file.

No version information found in this file.
[alert7@redhat]# strings a.out        # a.out 里面完全找不到字符串
[alert7@redhat]#

现在，程序代码本身和program header table完全嵌入了ELF header,我们的可执行文件现在和elf header一样大。而且可以正常运行。

最后，我们不禁还要问，是否到达了最小的极限呢？毕竟，我们需要一个完整的ELF header,否则linux不会给我们运行的机会。

真的是这样吗？

错了，我们还可以运用最后一招卑鄙的哄骗技术了。

如果文件大小还没有整个ELF header大的话，linux还是会运行它的。把那些少的字节填充为0。呵呵，在文件的最后还有7个0，可以丢弃。

[alert7@redhat]# cat tiny.asm
; tiny.asm ◆◆◆

BITS 32

org 0x00001000

db 0x7F, "ELF" ; e_ident
dd 1 ; p_type
dd 0 ; p_offset
dd $$ ; p_vaddr
dw 2 ; e_type ; p_paddr
dw 3 ; e_machine
dd filesize ; e_version ; p_filesz
dd _start ; e_entry ; p_memsz
dd 4 ; e_phoff ; p_flags
_start:
mov bl, 42 ; e_shoff ; p_align
xor eax, eax
inc eax ; e_flags
int 0x80
db 0
dw 0x34 ; e_ehsize
dw 0x20 ; e_phentsize
db 1 ; e_phnum
; e_shentsize
; e_shnum
; e_shstrndx

filesize equ $ - $$

[alert7@redhat]# nasm -f bin -o a.out tiny.asm
[alert7@redhat]# chmod +x a.out
[alert7@redhat]# ./a.out ; echo $?
42
[alert7@redhat]# wc -c a.out
45 a.out

讨论到此，一个elf可执行文件最小大小为45 bytes,我们被迫终止我们的讨论了。

--------------------------------------------------------------------------------
一个45字节大小的文件比一个用标准工具创建的最小可执行文件的1/8还要小，比用纯C代码创建的1/50还要小。

这片文章中的一半ELF字段变量违反了标准的ELF规范，

以上程序中打上◆ 的程序，会使readelf core dump
[alert7@redhat]# readelf -a a.out
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 b3 2a 31 c0 40 cd 80 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 179
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x200008
Start of program headers: 32 (bytes into file)
Start of section headers: 1 (bytes into file)
Flags: 0x0
Size of this header: 0 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 1
Size of section headers: 0 (bytes)
Number of section headers: 64
Section header string table index: 0
readelf: Error: Unable to read in 0 bytes of section headers

Program Header:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00200000 0x00000001 0x00040 0x00040 R E 0x1000

There is no dynamic segment in this file.
Segmentation fault (core dumped)

呵呵，居然出现了可爱的core dumped

[alert7@redhat]# ls -l /usr/bin/readelf
-rwxr-xr-x 1 root root 132368 Feb 5 2000 /usr/bin/readelf

:(不是带s位的，也就懒的去看它到底哪里出问题了。

创建的这种超小的elf文件的确比较畸形，连objdump都不能dump它们了。
[alert7@redhat]# objdump -a a.out
objdump: a.out: File format not recognized

阅读(869) | 评论(0) | 转发(0) |

上一篇：转:汇编器,连接器

下一篇：转:Before main() 分析

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6