Chinaunix首页 | 论坛 | 博客
  • 博客访问: 113650
  • 博文数量: 29
  • 博客积分: 2615
  • 博客等级: 少校
  • 技术积分: 461
  • 用 户 组: 普通用户
  • 注册时间: 2006-04-28 16:01
文章分类

全部博文(29)

文章存档

2012年(1)

2011年(4)

2010年(4)

2009年(4)

2008年(16)

分类: LINUX

2009-10-09 16:03:57

Silvio Cesare
<>
October 1998

[also see: ]


INTRODUCTION

This paper documents the algorithms and implementation of UNIX parasite and
virus code using ELF objects. Brief introductions on UNIX virus detection and
evading such detection are given. An implementation of the ELF parasite
infector for UNIX is provided, and an ELF virus for Linux on x86 architecture
is also supplied.

Elementary programming and UNIX knoledge is assumed, and an understanding of
Linux x86 archtitecture is assumed for the Linux implementation. ELF
understanding is not required but may be of help.

The ELF infection method uses is based on utilizing the page padding on the
end of the text segment which provides suitable hosting for parasite code.

This paper does not document any significant virus programming techniques
except those that are only applicable to the UNIX environment. Nor does it
try to replicate the ELF specifications. The interested reader is advised
to read the ELF documentation if this paper is unclear in ELF specifics.

ELF INFECTION

A process image consists of a 'text segment' and a 'data segment'. The text
segment is given the memory protection r-x (from this its obvious that self
modifying code cannot be used in the text segment). The data segment is
given the protection rw-.

The segment as seen from the process image is typically not all in use as
memory used by the process rarely lies on a page border (or we can say, not
congruent to modulo the page size). Padding completes the segment, and in
practice looks like this.

key:
[...] A complete page
M Memory used in this segment
P Padding

Page Nr
#1 [PPPPMMMMMMMMMMMM] \
#2 [MMMMMMMMMMMMMMMM] |- A segment
#3 [MMMMMMMMMMMMPPPP] /

Segments are not bound to use multiple pages, so a single page segment is quite
possible.

Page Nr
#1 [PPPPMMMMMMMMPPPP] <- A segment

Typically, the data segment directly proceeds the text segment which always
starts on a page, but the data segment may not. The memory layout for a
process image is thus.

key:
[...] A complete page
T Text
D Data
P Padding

Page Nr
#1 [TTTTTTTTTTTTTTTT] <- Part of the text segment
#2 [TTTTTTTTTTTTTTTT] <- Part of the text segment
#3 [TTTTTTTTTTTTPPPP] <- Part of the text segment
#4 [PPPPDDDDDDDDDDDD] <- Part of the data segment
#5 [DDDDDDDDDDDDDDDD] <- Part of the data segment
#6 [DDDDDDDDDDDDPPPP] <- Part of the data segment

pages 1, 2, 3 constitute the text segment
pages 4, 5, 6 constitute the data segment

From here on, the segment diagrams may use single pages for simplicity. eg

Page Nr
#1 [TTTTTTTTTTTTPPPP] <- The text segment
#2 [PPPPDDDDDDDDPPPP] <- The data segment

For completeness, on x86, the stack segment is located after the data segment
giving the data segment enough room for growth. Thus the stack is located at
the top of memory (remembering that it grows down).

In an ELF file, loadable segments are present physically in the file, which
completely describe the text and data segments for process image loading. A
simplified ELF format for an executable object relevant in this instance is.

ELF Header
.
.
Segment 1 <- Text
Segment 2 <- Data
.
.

Each segment has a virtual address associated with its starting location.
Absolute code that references within each segment is permissible and very
probable.

To insert parasite code means that the process image must load it so that the
original code and data is still intact. This means, that inserting a
parasite requires the memory used in the segments to be increased.

The text segment compromises not only code, but also the ELF headers including
such things as dynamic linking information. If the parasite code is to be
inserted by extending the text segment backwards and using this extra
memory, problems can arise because these ELF headers may have to move in
memory and thus cause problems with absolute referencing. It may be possible
to keep the text segment as is, and create another segment consisting of the
parasite code, however introducing an extra segment is certainly questionable
and easy to detect.

Extending the text segment forward or extending the data segment backward will
probably overlap the segments. Relocating a segment in memory will cause
problems with any code that absolutely references memory.

It may be possible to extend the data segment, however this isn't preferred,
as its not UNIX portable that properly implement execute memory protection.

Page padding at segment borders however provides a practical location for
parasite code given that its size is able. This space will not interfere with
the original segments, requiring no relocation. Following the guidline just
given of preferencing the text segment, we can see that the padding at the
end of the text segment is a viable solution.

The resulting segments after parasite insertion into text segment padding looks
like this.

key:
[...] A complete page
V Parasite code
T Text
D Data
P Padding

Page Nr
#1 [TTTTTTTTTTTTVVPP] <- Text segment
#2 [PPPPDDDDDDDDPPPP] <- Data segment

...

A more complete ELF executable layout is (ignoring section content - see
below).

ELF Header
Program header table
Segment 1
Segment 2
Section header table optional

In practice, this is what is normally seen.

ELF Header
Program header table
Segment 1
Segment 2
Section header table
Section 1
.
.
Section n

Typically, the extra sections (those not associated with a segment) are such
things as debugging information, symbol tables etc.

From the ELF specifications:

"An ELF header resides at the beginning and holds a ``road map'' describing the
file's organization. Sections hold the bulk of object file information for the
linking view: instructions, data, symbol table, relocation information, and so
on.

...
...

A program header table, if present, tells the system how to create a process
image. Files used to build a process image (execute a program) must have a
program header table; relocatable files do not need one. A section header
table contains information describing the file's sections. Every section has
an entry in the table; each entry gives information such as the section name,
the section size, etc. Files used during linking must have a section header
table; other object files may or may not have one.

...
...

Executable and shared object files statically represent programs. To execute
such programs, the system uses the files to create dynamic program
representations, or process images. A process image has segments that hold
its text, data, stack, and so on. The major sections in this part discuss the
following.

Program header. This section complements Part 1, describing object file
structures that relate directly to program execution. The primary data
structure, a program header table, locates segment images within the file and
contains other information necessary to create the memory image for the
program."

After insertion of parasite code, the layout of the ELF file will look like
this.

ELF Header
Program header table
Segment 1 - The text segment of the host
- The parasite
Segment 2
Section header table
Section 1
.
.
Section n

Thus the parasite code must be physically inserted into the file, and the
text segment extended to see the new code.

An ELF object may also specify an entry point of the program, that is, the
virtual memory location that assumes control of the program. Thus to
activate parasite code, the program flow must include the new parasite. This
can be done by patching the entry point in the ELF object to point (jump)
directly to the parasite. It is then the parasite's responsibility that the
host code be executed - typically, by transferring control back to the host
once the parasite has completed its execution.

From /usr/include/elf.h

typedef struct
{
unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
Elf32_Half e_type; /* Object file type */
Elf32_Half e_machine; /* Architecture */
Elf32_Word e_version; /* Object file version */
Elf32_Addr e_entry; /* Entry point virtual address */
Elf32_Off e_phoff; /* Program header table file offset */
Elf32_Off e_shoff; /* Section header table file offset */
Elf32_Word e_flags; /* Processor-specific flags */
Elf32_Half e_ehsize; /* ELF header size in bytes */
Elf32_Half e_phentsize; /* Program header table entry size */
Elf32_Half e_phnum; /* Program header table entry count */
Elf32_Half e_shentsize; /* Section header table entry size */
Elf32_Half e_shnum; /* Section header table entry count */
Elf32_Half e_shstrndx; /* Section header string table index */
} Elf32_Ehdr;

e_entry is the entry point of the program given as a virtual address. For
knowledge of the memory layout of the process image and the segments that
compromise it stored in the ELF object see the Program Header information
below.

e_phoff gives use the file offset for the start of the program header table.
Thus to read the header table (and the associated loadable segments), you may
lseek to that position and read e_phnum*sizeof(Elf32_Pdr) bytes associated with
the program header table.

It can also be seen, that the section header table file offset is also given.
It was previously mentioned that the section table resides at the end of
the file, so after inserting of data at the end of the segment on file, the
offset must be updated to reflect the new position.

/* Program segment header. */

typedef struct
{
Elf32_Word p_type; /* Segment type */
Elf32_Off p_offset; /* Segment file offset */
Elf32_Addr p_vaddr; /* Segment virtual address */
Elf32_Addr p_paddr; /* Segment physical address */
Elf32_Word p_filesz; /* Segment size in file */
Elf32_Word p_memsz; /* Segment size in memory */
Elf32_Word p_flags; /* Segment flags */
Elf32_Word p_align; /* Segment alignment */
} Elf32_Phdr;

Loadable program segments (text/data) are identified in a program header by a
p_type of PT_LOAD (1). Again as with the e_shoff in the ELF header, the
file offset (p_offset) must be updated in later phdr's to reflect their new
position in the file.

p_vaddr identifies the virtual address of the start of the segment. As
mentioned above regarding the entry point. It is now possible to identify
where program flow begins, by using p_vaddr as the base index and calculating
the offset to e_entry.

p_filesz and p_memsz are the file sizes and memory sizes respectively that
the segment occupies. The use of this scheme of using file and memory sizes,
is that where its not necessary to load memory in the process from disk, you
may still be able to say that you want the process image to occupy its
memory.

The .bss section (see below for section definitions), which is for uninitialized
data in the data segment is one such case. It is not desirable that
uninitialized data be stored in the file, but the process image must allocated
enough memory. The .bss section resides at the end of the segment and any
memory size past the end of the file size is assumed to be part of this
section.

/* Section header. */

typedef struct
{
Elf32_Word sh_name; /* Section name (string tbl index) */
Elf32_Word sh_type; /* Section type */
Elf32_Word sh_flags; /* Section flags */
Elf32_Addr sh_addr; /* Section virtual addr at execution */
Elf32_Off sh_offset; /* Section file offset */
Elf32_Word sh_size; /* Section size in bytes */
Elf32_Word sh_link; /* Link to another section */
Elf32_Word sh_info; /* Additional section information */
Elf32_Word sh_addralign; /* Section alignment */
Elf32_Word sh_entsize; /* Entry size if section holds table */
} Elf32_Shdr;

The sh_offset is the file offset that points to the actual section. The
shdr should correlate to the segment its located it. It is highly suspicious
if the vaddr of the section is different to what is in from the segments
view.

To insert code at the end of the text segment thus leaves us with the following
to do so far.

* Increase p_shoff to account for the new code in the ELF header
* Locate the text segment program header
* Increase p_filesz to account for the new code
* Increase p_memsz to account for the new code
* For each phdr who's segment is after the insertion (text segment)
* increase p_offset to reflect the new position after insertion
* For each shdr who's section resides after the insertion
* Increase sh_offset to account for the new code
* Physically insert the new code into the file - text segment p_offset
+ p_filesz (original)

There is one hitch however. Following the ELF specifications, p_vaddr and
p_offset in the Phdr must be congruent together, to modulo the page size.

key: ~= is denoting congruency.

p_vaddr (mod PAGE_SIZE) ~= p_offset (mod PAGE_SIZE)

This means, that any insertion of data at the end of the text segment on the
file must be congruent modulo the page size. This does not mean, the text
segment must be increased by such a number, only that the physical file be
increased so.

This also has an interesting side effect in that often a complete page must be
used as padding because the required vaddr isn't available. The following
may thus happen.

key:
[...] A complete page
T Text
D Data
P Padding

Page Nr
#1 [TTTTTTTTTTTTPPPP] <- Text segment
#2 [PPPPPPPPPPPPPPPP] <- Padding
#3 [PPPPDDDDDDDDPPPP] <- Data segment

This can be taken advantage off in that it gives the parasite code more space,
such a spare page cannot be guaranteed.

To take into account of the congruency of p_vaddr and p_offset, our algorithm
is modified to appear as this.

* Increase p_shoff by PAGE_SIZE in the ELF header
* Locate the text segment program header
* Increase p_filesz by account for the new code
* Increase p_memsz to account for the new code
* For each phdr who's segment is after the insertion (text segment)
* increase p_offset by PAGE_SIZE
* For each shdr who's section resides after the insertion
* Increase sh_offset by PAGE_SIZE
* Physically insert the new code and pad to PAGE_SIZE, into the file -
text segment p_offset + p_filesz (original)

Now that the process image loads the new code into being, to run the new code
before the host code is a simple matter of patching the ELF entry point and
the virus jump to host code point.

The new entry point is determined by the text segment v_addr + p_filesz
(original) since all that is being done, is the new code is directly prepending
the original host segment. For complete infection code then.

* Increase p_shoff by PAGE_SIZE in the ELF header
* Patch the insertion code (parasite) to jump to the entry point
(original)
* Locate the text segment program header
* Modify the entry point of the ELF header to point to the new
code (p_vaddr + p_filesz)
* Increase p_filesz by account for the new code (parasite)
* Increase p_memsz to account for the new code (parasite)
* For each phdr who's segment is after the insertion (text segment)
* increase p_offset by PAGE_SIZE
* For each shdr who's section resides after the insertion
* Increase sh_offset by PAGE_SIZE
* Physically insert the new code (parasite) and pad to PAGE_SIZE, into
the file - text segment p_offset + p_filesz (original)

This, while perfectly functional, can arouse suspicion because the the new
code at the end of the text segment isn't accounted for by any sections.
Its an easy matter to associate the entry point with a section however by
extending its size, but the last section in the text segment is going to look
suspicious. Associating the new code to a section must be done however as
programs such as 'strip' use the section header tables and not the program
headers. The final algorithm is using this information is.

* Increase p_shoff by PAGE_SIZE in the ELF header
* Patch the insertion code (parasite) to jump to the entry point
(original)
* Locate the text segment program header
* Modify the entry point of the ELF header to point to the new
code (p_vaddr + p_filesz)
* Increase p_filesz by account for the new code (parasite)
* Increase p_memsz to account for the new code (parasite)
* For each phdr who's segment is after the insertion (text segment)
* increase p_offset by PAGE_SIZE
* For the last shdr in the text segment
* increase sh_len by the parasite length
* For each shdr who's section resides after the insertion
* Increase sh_offset by PAGE_SIZE
* Physically insert the new code (parasite) and pad to PAGE_SIZE, into
the file - text segment p_offset + p_filesz (original)

infect-elf-p is the supplied program (complete with source) that implements
the elf infection using text segment padding as described.

INFECTING INFECTIONS

In the parasite described, infecting infections isn't a problem at all. By
skipping executables that don't have enough padding for the parasite, this
is solved implicitly. Multiple parasites may exist in the host, but their is
a limit of how many depending on the size of the parasite code.

NON (NOT AS) TRIVIAL PARASITE CODE

Parasite code that requires memory access requires the stack to be used
manually naturally. No bss section can be used from within the virus code,
because it can only use part of the text segment. It is strongly suggested
that rodata not be used, in-fact, it is strongly suggested that no location
specific data be used at all that resides outside the parasite at infection
time.

Thus, if initialized data is to be used, it is best to place it in the text
segment, ie at the end of the parasite code - see below on calculating address
locations of initialized data that is not known at compile/infection time.

If the heap is to be used, then it will be operating system dependent. In
Linux, this is done via the 'brk' syscall.

The use of any shared library calls from within the parasite should be removed,
to avoid any linking problems and to maintain a portable parasite in files
that use varying libraries. It is thus naturally recommended to avoid using
libc.

Most importantly, the parasite code must be relocatable. It is possible to
patch the parasite code before inserting it, however the cleanest approach
is to write code that doesn't need to be patched.

In x86 Linux, some syscalls require the use of an absolute address pointing to
initialized data. This can be made relocatable by using a common trick used
in buffer overflow code.

jmp A
B:
pop %eax ; %eax now has the address of the string
. ; continue as usual
.
.

A:
call B
.string \"hello\"

By making a call directly proceeding the string of interest, the address of
the string is pushed onto the stack as the return address.

BEYOND ELF PARASITES AND ENTER VIRUS IN UNIX

In a UNIX environment the most probably method for a typical garden variety
virus to spread is through infecting files that it has legal permission to do
so.

A simple method of locating new files possible to infect, is by scanning the
current directory for writable files. This has the advantage of being
relatively fast (in comparison to large tree walks) but finds only a small
percentage of infect-able files.

Directory searches are however very slow irrespectively, even without large
tree walks. If parasite code does not fork, its very quickly noticed what is
happening. In the sample virus supplied, only a small random set of files
in the current directory are searched.

Forking, as mentioned, easily solves the problem of slowing the startup to
the host code, however new processes on the system can be spotted as abnormal
if careful observation is used.

The parasite code as mentioned, must be completely written in machine code,
this does not however mean that development must be done like this.
Development can easily be done in a high level language such as C and then
compiled to asm to be used as parasite code.

A bootstrap process can be used for initial infection of the virus into a host
program that can then be distributed. That is, the ELF infector code is used,
with the virus as the parasite code to be inserted.

THE LINUX PARASITE VIRUS

This virus implements the ELF infection described by utilizing the padding at
the end of the text segment. In this padding, the virus in its entirety is
copied, and the appropriate entry points patched.

At the end of the parasite code, are the instructions.

movl %ebp, $XXXX
jmp *%ebp

XXXX is patched when the virus replicates to the host entry point. This
approach does have the side effect of trashing the ebp register which may or
may not be destructive to programs who's entry points depend on ebp being set
on entry. In practice, I have not seen this happen (the implemented Linux
virus uses the ebp approach), but extensive replicating has not been performed.

On execution of an infected host, the virus will copy the parasite (virus)
code contained in itself (the file) into memory.

The virus will then scan randomly (random enough for this instance) through
the current directory, looking for ELF files of type ET_EXEC or ET_DYN to
infect. It will infect up to Y_INFECT files, and scan up to N_INFECT files in
total.

If a file can be infected, ie, its of the correct ELF type, and the padding
can sustain the virus, a a modified copy of the file incorporating the virus
is made. It then renames the copy to the file its infecting, and thus it
is infected.

Due to the rather large size of the virus in comparison to the page size
(approx 2.3k) not all files are able to be infected, in fact only near half on
average.

DEVELOPMENT OF THE LINUX VIRUS

The Linux virus was completely written in C, and strongly based around the
ELF infector code. The C code is supplied as elf-p-virus.c The code requires
the use of no libraries, and avoids libc by using a similar scheme to the
_syscall declarations Linux employs modified not to use errno.

Heap memory was used for dynamic allocation of the phdr and shdr tables using
'brk'.

Linux has some syscalls which require the address of initialized strings to
be passed to it, notably, open, rename, and unlink. This requires initialized
data storage. As stated before, rodata cannot be used, so this data was
placed at the end of the code. Making it relocatable required the use of the
above mentioned algorithm of using call to push the address (return value)
onto the stack. To assist in the asm conversion, extra variables were
declared so to leave room on the stack to store the addresses as in some
cases the address was used more than once.

The C code form of the virus allowed for a debugging version which produces
verbose output, and allows argv[0] to be given as argv[1]. This is
advantageous because you can setup a pseudo infected host which is non
replicating. Then run the virus making argv[0] the name of the pseudo infected
host. It would replicate the parasite from that host. Thus it was possible to
test without having a binary version of a replicating virus.

The C code was converted to asm using the c compiler gcc, with the -S flag to
produce assembler. Modifications were made so that use of rodata for
initialized data (strings for open, unlink, and rename), was replaced with
the relocatable data using the call address methodology.

Most of the registers were saved on virus startup and restored on exit
(transference of control to host).

The asm version of the virus, can be improved tremendously in regards to
efficiency, which will in turn improve the expected life time and replication
of the virus (a smaller virus can infect more objects, where previously the
padding would dictate the larger virus couldn't infect it). The asm virus was
written with development time the primary concern and hence almost zero time
was spent on hand optimization of the code gcc generated from the C version.
In actual fact, less than 5 minutes were spent in asm editing - this is
indicative that extensive asm specific skills are not required for a non
optmised virus.

The edited asm code was compiled (elf-p-virus-egg.c), and then using objdump
with the -D flag, the addresses of the parasite start, the required offsets for
patching were recorded. The asm was then edited again using the new
information. The executeable produced was then patched manually for any bytes
needed. elf-text2egg was used to extract hex-codes for the complete length of
the parasite code usable in a C program, ala the ELF infector code. The ELF
infector was then recompiled using the virus parasite.

# objdump -D elf-p-virus-egg
.
.
08048143
阅读(2687) | 评论(0) | 转发(0) |
0

上一篇:黑客基地

下一篇:Python VIM 开发环境配置

给主人留下些什么吧!~~