Chinaunix首页 | 论坛 | 博客
  • 博客访问: 15183496
  • 博文数量: 7460
  • 博客积分: 10434
  • 博客等级: 上将
  • 技术积分: 78178
  • 用 户 组: 普通用户
  • 注册时间: 2008-03-02 22:54
文章分类

全部博文(7460)

文章存档

2011年(1)

2009年(669)

2008年(6790)

分类: BSD

2008-03-20 17:15:23

5内核的运行时补丁
5.1 数据访问库
    5.1.1 kvm_openfiles 函数
    5.1.2 kvm_nlist 函数
    5.1.3 kvm_geterr 函数
    5.1.4 kvm_read 函数
    5.1.5 kvm_write 函数
    5.1.6 kvm_close 函数
5.2 代码字节补丁
5.3 理解x86的调用语句
    5.3.1 调用语句补丁
5.4 分配内核内存
    5.4.1 函数
    5.4.2 MALLOC 宏
    5.4.3 函数
    5.4.4 FREE 宏
    5.4.5 示例
5.5 从用户空间分配内核内存
    5.5.1 示例
5.6 嵌入函数挂勾
    5.6.1 示例
    5.6.2 Gotchas
5.7 掩盖系统调用挂钩
5.8 小结





5
RUN-TIME MEMORY PATCHING
内核内存的运行时补丁



In the previous chapters we looked at the classic method of introducing code into a running kernel: through a kernel module. In this chapter we’ll look at how to patch and augment a running kernel with userland code. This is accomplished by interacting with the /dev/kmem device, which allows us to read from and write to kernel virtual memory. In other words, /dev/kmem allows us to patch the various code bytes (loaded in executable memory space) that control the logic of the kernel. This is commonly referred to as run-time kernel memory patching.

在前面的章节里,我们着眼于向运行中的内核引入代码的传统方法:通过一个可装载内核模块。在本章,我们 看看如何用用户层代码来修改和扩展一个运行中的内核。这个方法通过与/dev/kmem 设备进行交互来完成,它让我们从内核的虚拟内存读写数据。换句话说,/dev/kmem 允许我们修改各种控制着内核逻辑的代码字节(加载在可执行的内存区域)。这个方法通常称之为内核内存的运行时补丁。



5.1 Kernel Data Access Library
5.1 内核数据访问库

The Kernel Data Access Library (libkvm) provides a uniform interface for accessing kernel virtual memory through the /dev/kmem device. The following six functions from libkvm form the basis of run-time kernel memory patching.

内核数据访问库(libkvm)提供通过/dev/kmem 访问内核虚拟内存历来一致界面。下面从libkvm 6摘录的6个函数构成了内核内存运行时修补的基础。



5.1.1 The kvm_openfiles Function
5.1.1 kvm_openfiles 函数


Access to kernel virtual memory is initialized by calling the kvm_openfiles function. If kvm_openfiles is successful, a descriptor is returned to be used in all subsequent libkvm calls. If an error is encountered, NULL is returned instead. Here is the function prototype for kvm_openfiles:

对内核虚拟内存的访问是通过调用kvm_openfiles 函数进行初始化的。如果kvm_openfiles 函数调用成功,一个后续libkvm 调用都要用到的描述符就会返回。如果遇到错误,就返回NULL。下面kvm_openfiles 的函数原型:

--------------------------------------------------------------------------------
#include
#include

kvm_t *
kvm_openfiles( char *execfile, const char *corefile,
        const char *swapfile, int flags, char *errbuf);
--------------------------------------------------------------------------------

The following is a brief description of each parameter.

下面是各个参数的简单描述

execfile

This specifies the kernel image to be examined, which must contain a symbol table. If this parameter is set to NULL, the currently running kernel image is examined.

它指定要检查的内核映象,内核映象必须包含一个符号表。如果这个参数设置为NULL,就检查当前正在运行的内核映象。


corefile

This is the kernel memory device file; it must be set to either /dev/mem or a crash dump core generated by savecore(8). If this parameter is set to NULL, /dev/mem is used.

这是内核内存设备文件。它必须设置为/dev/mem 或者由savecore(8)产生的崩溃dump core。如果该参数设为NULL,就是使用/dev/mem。


swapfile

This parameter is currently unused; thus, it’s always set to NULL.

这个参数当前没有使用。它总是设为NULL


flags

This parameter indicates the read/write access permissions for the core file. It must be set to one of the following constants:

这个参数指明core文件的读/写访问。它必须设置为以下常数中的一个:

O_RDONLY Open for reading only.
O_WRONLY Open for writing only.
O_RDWR Open for reading and writing.

O_RDONLY 只读打开
O_WRONLY 只写打开
O_RDWR 读写打开


errbuf

If kvm_openfiles encounters an error, an error message is written into this parameter.

如果kvm_openfiles 遇到一个错误,错误信息被写到这个参数中去。



5.1.2 The kvm_nlist Function
5.1.2 kvm_nlist 函数

The kvm_nlist function retrieves the symbol table entries from a kernel image.

kvm_nlist 函数从内核映象中取回符号表入口

--------------------------------------------------------------------------------
#include
#include

int
kvm_nlist(kvm_t *kd, struct nlist *nl);
--------------------------------------------------------------------------------

Here, nl is a null-terminated array of nlist structures. To make proper use of kvm_nlist, you’ll need to know two fields in struct nlist, specifically n_name, which is the name of a symbol loaded in memory, and n_value, which is the address of the symbol.

这里,nl是一个null结尾的nlist结构数组。为了恰当地使用kvm_nlist, 你应当了解nlist 中的两个域,特别地,n_name, 它是加载在内存中的符号名称;还有n_value ,它是对应符号的地址。

The kvm_nlist function iterates through nl, looking up each symbol in turn through the n_name field; if found, n_value is filled out appropriately. Otherwise, it is set to 0.

kvm_nlist 函数遍历nl,依次通过n_name 域寻找每个符号。如果找到,n_value 就被恰当地填充。否则,它就被设置为0。



5.1.3 The kvm_geterr Function
5.1.3 kvm_geterr 函数

The kvm_geterr function returns a string describing the most recent error condition on a kernel virtual memory descriptor.

kvm_geterr 函数返回一个字符串。该字符串描述了与内核虚拟内存描述符有关的,最近的错误情况。
--------------------------------------------------------------------------------
#include

char *
kvm_geterr(kvm_t *kd);
--------------------------------------------------------------------------------

The results are undefined if the most recent libkvm call did not produce an error.

如果最近的libkvm 调用没有产生错误,该函数的返回没有定义。



5.1.4 The kvm_read Function
5.1.4 kvm_read 函数

Data is read from kernel virtual memory with the kvm_read function. If the read is successful, the number of bytes transferred is returned. Otherwise, -1 is returned.

kvm_read 函数用于从内核虚拟内存中读取数据。如果调用成功,返回已传送数据的以byte为单位的数量。否则,返回-1.

--------------------------------------------------------------------------------
#include

ssize_t
kvm_read(kvm_t *kd, unsigned long addr, *buf, size_t nbytes);
--------------------------------------------------------------------------------

Here, nbytes indicates the number of bytes to be read from the kernel space address addr to the buffer buf.

这里,nbytes指明需要从内核空间地址addr读取到缓冲buf的字节数量。



5.1.5 The kvm_write Function
5.15 kvm_write 函数

Data is written to kernel virtual memory with the kvm_write function.

kvm_write 函数用于将数据写到内核虚拟内存中。

--------------------------------------------------------------------------------
#include

ssize_t
kvm_write(kvm_t *kd, unsigned long addr, const void *buf, size_t nbytes);
--------------------------------------------------------------------------------

The return value is usually equal to the nbytes argument, unless an error has occurred, in which case -1 is returned instead. In this definition, nbytes indicates the number of bytes to be written to addr from buf.

返回值通常与参数nbytes相同。除非出现了一个错误。这种情况下,代替之的是,返回-1。在这个定义中,nbytes指明了需要从buf写到addr的字节数。



5.1.6 The kvm_close Function
5.1.6 kvm_close 函数

An open kernel virtual memory descriptor is closed by calling the kvm_close function.

kvm_close 函数关闭一个打开的内核虚拟内存描述符。

--------------------------------------------------------------------------------
#include
#include

int
kvm_close(kvm_t *kd);
--------------------------------------------------------------------------------

If kvm_close is successful, 0 is returned. Otherwise, -1 is returned.

如果kvm_close 调用成功,返回0。否则,返回-1。




5.2 Patching Code Bytes
5.2 代码字节补丁

Now, equipped with the functions from the previous section, let’s patch some kernel virtual memory. I’ll start with a very basic example. Listing 5-1 is a system call module that acts like an over-caffeinated “Hello, world!” function.

现在,具备了前面章节的函数知识,让我们对一些内核虚拟内存进行修改。我将以一个非常基础的例子开始。清单5-1是一个系统调用,它运行起来就像一个咖啡碱过度中毒了的“Hello, world!”函数。

--------------------------------------------------------------------------------
#include
#include
#include
#include
#include
#include
#include

/* The system call function. */
/* 系统调用函数 */
static int
hello(struct thread *td, void *syscall_args)
{
    int i;
     /*1*/ for (i = 0; i < 10; i++)
        printf(" Rocks!\n");

    return(0);
}

/* The sysent for the system call. */
/* 针对新系统调用的sysent */
static struct sysent hello_sysent = {
    0,     /* number of arguments 参数的个数*/
    hello     /* implementing function 实现函数*/
};

/* The offset in sysent[] where the system call is to be allocated. */
/* 新的系统调用将分配在sysent[] 内的offset 处*/
static int offset = NO_SYSCALL;

/* The function called at load/unload. */
/* 加载/卸载模块时调用此函数 */
static int
load(struct module *module, int cmd, void *arg)
{
    int error = 0;

    switch (cmd) {
    case MOD_LOAD:
        uprintf("System call loaded at offset %d.\n", offset);
        break;

    case MOD_UNLOAD:
        uprintf("System call unloaded from offset %d.\n", offset);
        break;

    default:
        error = EOPNOTSUPP;
        break;
    }

    return(error);
}

SYSCALL_MODULE(hello, &offset, &hello_sysent, load, NULL);
--------------------------------------------------------------------------------
Listing 5-1: hello.c
清单5-1:hello.c

As you can see, if we execute this system call, we’ll get some very annoying output. To make this system call less annoying, we can patch out /*1*/ the for loop, which will remove the nine additional calls to printf. However, before we can do that, we’ll need to know what this system call looks like when it’s loaded in main memory.

可以看到,如果我们 执行这个系统调用,将得到一些非常烦人的输出。为了让这个系统调用不那么烦人,我们得修理修理这个for循环。我们期望这个修补能把其余9个对 printf的调用都移走。但是,在我们能够实现这个目标之前,我们得了解,在系统调用加载到内存后,它看起来是个什么样。

--------------------------------------------------------------------------------
$ objdump -dR ./hello.ko

./hello.ko: file format elf32-i386-freebsd

Disassembly of section .text:

00000480 :
480: 55             push %ebp
481: 89 e5             mov %esp,%ebp
483: 53 push %ebx
484: bb 09 00 00 00         mov $0x9,%ebx
489: 83 ec 04             sub $0x4,%esp
48c: 8d 74 26 00         lea 0x0(%esi),%esi
490: c7 04 24 0d 05 00 00     movl $0x50d,(%esp)
        493:     R_386_RELATIVE *ABS*
497: e8 fc ff ff ff         call 498
        498:     R_386_PC32 printf
49c: 4b             dec %ebx
49d: 79 f1 j            ns 490
49f: 83 c4 04             add $0x4,%esp
4a2: 31 c0             xor %eax,%eax
4a4: 5b             pop %ebx
4a5: c9             leave
4a6: c3             ret
4a7: 89 f6             mov %esi,%esi
4a9: 8d bc 27 00 00 00 00     lea 0x0(%edi),%edi
--------------------------------------------------------------------------------
NOTE The binary hello.ko was compiled explicitly without the -funroll-loops option.

提示 二进制文件hello.ko 在编译时已经明确地把-funroll-loops选项排除在外了。

Notice the instruction at address 49d, which causes the instruction pointer to jump back to address 490 if the sign flag is not set. This instruction is, more or less, the for loop in hello.c. Therefore, if we nop it out, we can make the hello system call somewhat bearable. The program in Listing 5-2 does just that.

注意位于地址49d 的指令,如果sign标志没有被设置,它就导致指令往后跳回到地址490 。这个指令,九不离十,就是hello.c中的for循环。因此,如果把它nop掉,我们就能够让这个hello系统调用变得稍稍能让人忍受一些。清单5-2中的程序要完成的任务就是这个。

--------------------------------------------------------------------------------
#include
#include
#include
#include
#include
#include

#define SIZE 0x30

/* Replacement code. */
/*代替的代码*/
unsigned char nop_code[] =
    "\x90\x90";         /* nop */

int
main(int argc, char *argv[])
{
    int i, offset;
    char errbuf[_POSIX2_LINE_MAX];
    kvm_t *kd;
    struct nlist nl[] = { {NULL}, {NULL}, };
    unsigned char hello_code[SIZE];

    /* Initialize kernel virtual memory access. */
    /* 初始化对内核虚拟内存的访问 */
    kd = kvm_openfiles(NULL, NULL, NULL, O_RDWR, errbuf);
    if (kd == NULL) {
        fprintf(stderr, "ERROR: %s\n", errbuf);
        exit(-1);
    }

    nl[0].n_name = "hello";

    /* Find the address of hello. */
    /* 寻找hello的地址 */
    if (kvm_nlist(kd, nl) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    if (!nl[0].n_value) {
        fprintf(stderr, "ERROR: Symbol %s not found\n",
            nl[0].n_name);
        exit(-1);
    }

    /* Save a copy of hello. */
    /* 保存hello的拷贝 */
    if (kvm_read(kd, nl[0].n_value, hello_code, SIZE) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    /* Search through hello for the jns instruction. */
    /* 搜索hello中jns指令 */
     /*1*/ for (i = 0; i < SIZE; i++) {
        if (hello_code[i] == 0x79) {
            offset = i;
            break;
        }
    }

    /* Patch hello. */
    /* 修补 hello. */
    if (kvm_write(kd, nl[0].n_value + offset, nop_code,
         /*2*/ sizeof(nop_code) - 1) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    /* Close kd. */
    /* 关闭 kd. */
    if (kvm_close(kd) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    exit(0);

}
--------------------------------------------------------------------------------
Listing 5-2: fix_hello.c

Notice how /*1*/ I search through the first 48 bytes of hello, looking for the jns instruction, instead of using a hard-coded offset. Depending on your compiler version, compiler flags, base system, and so on, it is entirely possible for hello.c to compile differently. Therefore, it’s useless to determine the location of jns ahead of time.

注意我搜索hello的前48个字节的方式,我寻找jns指令,而不是使用硬编码偏移量。根据你的编译器的版本,编译器标记,基本系统等等,hello.c编译后jns指令的位置完全有可能不同。因此,提前确定jns的位置是无效的。


In fact, it’s possible that when compiled, hello.c will not even include a jns instruction, as there are multiple ways to represent a for loop in machine code. Furthermore, recall that the disassembly of hello.ko identified two instructions that require dynamic relocation. This means that the first 0x79 byte encountered may be part of those instructions, and not the actual jns instruction. That’s why this is an example and not a real program.

实际上,有可能在编译后,hello.c甚至不包含jns指令,因为用机器码表示一个for循环 存在多种形式。此外,我们记得hello.ko的反汇编把需要动态重定位的两个指令识别为一样。这意味着,第一次遇到的0x79 字节可能是这些指令的一部分,而不是真实的jns指令。这就是示例只是个示范,而不是真实程序的原因。


NOTE  To get around these problems, use longer and/or more search signatures. You could also use hard-coded offsets, but your code would break on some systems.

提示  为了绕开这些问题,可以使用更长和/或更多的搜索标签。你也可以使用硬编码偏移量,但你的代码在某些系统上将会崩溃。


Another interesting detail worth mentioning is that when I patch hello with kvm_write, I /*2*/ pass sizeof(nop_code) – 1, not sizeof(nop_code), as the nbytes argument. In C, character arrays are null terminated; therefore, sizeof(nop_code) returns three. However, I only want to write two nops, not two nops and a NULL.

另一个有趣的细节也值得一提,当我用 kvm_write修改hello时,作为nbytes 参数传递的是sizeof(nop_code) – 1,而不是sizeof(nop_code)。在C中,字符数组是以null结素的;因此,sizeof(nop_code)返回3。但是,我想写的只是 两个nops,而不是两个nops和一个NULL。


The following output shows the results of executing hello before and after running fix_hello on ttyv0 (i.e., the system console):

下面的输出显示了在ttyv0 (也就是系统控制台)运行fix_hello 之前和之后,执行hello的结果。

--------------------------------------------------------------------------------
$ sudo kldload ./hello.ko
System call loaded at offset 210.
$ perl -e 'syscall(210);'
FreeBSD Rocks!
FreeBSD Rocks!
FreeBSD Rocks!
FreeBSD Rocks!
FreeBSD Rocks!
FreeBSD Rocks!
FreeBSD Rocks!
FreeBSD Rocks!
FreeBSD Rocks!
FreeBSD Rocks!
$ -o fix_hello fix_hello.c –lkvm
$ sudo ./fix_hello
$ perl -e 'syscall(210);'
FreeBSD Rocks!
--------------------------------------------------------------------------------

Success! Now let’s try something a little more advanced.

成功了!让我们试试稍微高级点的东西。



5.3 Understanding x86 Call Statements、5.3
5.3 理解x86的调用语句

In x86 assembly the call statement is a control transfer instruction used to call a function or procedure. There are two types of call statements: near and far. For our purposes, we only need to understand near call statements. The following (contrived) code segment illustrates the details of a near call.

x86汇编的调用语句是用来调用一个函数或过程的控制转移指令。有两种类型的调用语句:近调用和远调用。根据我们的目的,我们只须理解近调用语句。下面的(人写的)代码片段演示了近调用的细节。

--------------------------------------------------------------------------------
200: bb 12 95 00 00     mov $0x9512,%ebx
205: e8 f6 00 00 00     call 300
20a: b8 2f 14 00 00     mov $0x142f,%eax
--------------------------------------------------------------------------------

In the above code snippet, when the instruction pointer reaches address 205—the call statement—it will jump to address 300. The hexadecimal representation for a call statement is e8. However, f6 00 00 00 is obviously not 300. At first glance, it appears that the machine code and assembly code don’t match, but in fact, they do. In a near call, the address of the instruction after the call statement is saved on the stack, so that the called procedure knows where to return to. Thus, the machine code operand for a call statement is the address of the called procedure, minus the address of the instruction following the call statement (0x300 – 0x20a = 0xf6). This explains why the machine code operand for call is f6 00 00 00 in this example, not 00 03 00 00. This is an important point that will come into play shortly.

在上面的代码片段 中,当指令指针到达地址205--调用语句--时它将跳转到地址300。代表调用语句的16进制机器码是e8 。但是,f6 00 00 00 明显不是300。 一眼看过去,好象是机器码和汇编代码不相符。事实上,它们是相对应的。在近调用中,位于调用指令后面的指令的地址,是保存在堆栈的,所以,这个被调用的过 程知道返回到哪里。因此,调用语句的机器码操作数是被调用过程的地址减去紧跟调用语句的指令的地址(0x300 – 0x20a = 0xf6)。这解释了为什么在这个例子里,针对调用语句的机器码操作数是f6 00 00 00,而不是00 03 00 00。 在以后的演示中,记住这点很重要。



5.3.1 Patching Call Statements
5.3.1 调用语句补丁

Going back to Listing 5-1, let’s say that when we nop out the for loop, we also want hello to call uprintf instead of printf. The program in Listing 5-3 patches hello to do just that.

回到清单5-1,在我们nop掉for循环时,我们说过也希望hello调用是uprintf而不是printf。清单5-3的程序就是修改hello来做到那点的。

--------------------------------------------------------------------------------
#include
#include
#include
#include
#include
#include

#define SIZE 0x30

/* Replacement code. */
/* 替代代码 */
unsigned char nop_code[] =
    "\x90\x90";         /* nop */
int
main(int argc, char *argv[])
{
    int i, jns_offset, call_offset;
    char errbuf[_POSIX2_LINE_MAX];
    kvm_t *kd;
    struct nlist nl[] = { {NULL}, {NULL}, {NULL}, };
    unsigned char hello_code[SIZE], call_operand[4];

    /* Initialize kernel virtual memory access. */
    /* 初始化内核内存的访问 */
    kd = kvm_openfiles(NULL, NULL, NULL, O_RDWR, errbuf);
    if (kd == NULL) {
        fprintf(stderr, "ERROR: %s\n", errbuf);
        exit(-1);
    }

    nl[0].n_name = "hello";
    nl[1].n_name = "uprintf";

    /* Find the address of hello and uprintf. */
    /* 寻找hello 和 uprintf 的地址. */
    if ( /*1*/ kvm_nlist(kd, nl) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    if (!nl[0].n_value) {
        fprintf(stderr, "ERROR: Symbol %s not found\n",
            nl[0].n_name);
        exit(-1);
    }

    if (!nl[1].n_value) {
        fprintf(stderr, "ERROR: Symbol %s not found\n",
            nl[1].n_name);
        exit(-1);
    }

    /* Save a copy of hello. */
    /* 保存hello的拷贝 */
    if (kvm_read(kd, nl[0].n_value, hello_code, SIZE) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    /* Search through hello for the jns and call instructions. */
    /* 在hello 中搜索jns 和 call 指令 */
    for (i = 0; i < SIZE; i++) {
        if (hello_code[i] == 0x79)
            jns_offset = i;
        if (hello_code[i] == 0xe8)
             /*2*/ call_offset = i;
    }

    /* Calculate the call statement operand. */
    /* 计算调用语句的操作数 */
    *(unsigned long *)&call_operand[0] = nl[1].n_value -
        /*4*/ (nl[0].n_value + call_offset + 5);

    /* Patch hello. */
    /* 修补hello*/
    if (kvm_write(kd, nl[0].n_value + jns_offset, nop_code,
        sizeof(nop_code) - 1) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    if (/*5*/ kvm_write(kd, nl[0].n_value + call_offset + 1, call_operand,
        sizeof(call_operand)) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    /* Close kd. */
    /* 关闭kd. */
    if (kvm_close(kd) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    exit(0);
}
--------------------------------------------------------------------------------

Listing 5-3: fix_hello_improved.c
清单 5-3: fix_hello_improved.c

Notice how hello is patched to invoke uprintf instead of printf. First, the addresses of hello and uprintf are /*1*/ stored in nl[0].n_value and nl[1].n_value, respectively. Next, the relative address of call within hello is /*2*/ stored in call_offset. Then, a new call statement operand is calculated by subtracting /*4*/ the address of the instruction following call from /*5*/ the address of uprintf. This value is stored in call_operand[]. Finally, the old call statement operand is /*5*/ overwritten with call_operand[].

注是是如何给hello打补丁, 让它调用uprintf而不是printf的。首先,hello 和uprintf 的地址分别保存到nl[0].n_value 和 nl[1].n_value中。接着,hello内部call的相对地址保存到call_offset。然后,通过把uprintf的地址减去紧跟 call的指令的地址,计算出一个调用语句新的操作码。这个值保存到call_operand[]。最后,调用语句旧的操作码被call_operand []覆盖。


The following output shows the results of executing hello, before and after running fix_hello_improved on ttyv1:

下面的输出显示了在ttyv1 运行fix_hello_improved 之前和之后,执行hello的结果。

--------------------------------------------------------------------------------
$ sudo kldload ./hello.ko
System call loaded at offset 210.
$ perl -e 'syscall(210);'
$ gcc -o fix_hello_improved fix_hello_improved.c –lkvm
$ sudo ./fix_hello_improved
$ perl -e 'syscall(210);'
FreeBSD Rocks!
--------------------------------------------------------------------------------

Success! At this point, you should have no trouble patching any kernel code byte. However, what happens when the patch you want to apply is too big and will overwrite nearby instructions that you require? The answer is . . .

成功了! 由此看来,你编写任何内核代码字节补丁应该没有困难了。但是,当你想要应用的补丁太大以至将要覆盖掉你需要的邻近指令时,该怎么办呢?答案是...



5.4 Allocating Kernel Memory
5.4 分配内核内存

In this section I’ll describe a set of core functions and macros used to allocate
and deallocate kernel memory. We’ll put these functions to use later on, when
we explicitly solve the problem outlined above.

在本节,我将描述一组用来分配和释放内核内存的核心函数和宏。稍后我们将要使用这些函数,在我们要解决上面列出的问题的时候。


5.4.1 The malloc Function
5.4.1 malloc 函数

The malloc function allocates a specified number of bytes of memory in kernel space. If successful, a kernel virtual address (that is suitably aligned for storage of any data object) is returned. If an error is encountered, NULL is returned instead.

malloc 函数在内核空间分配指定字节单位数量的内存。如果成功,一个内核虚拟地址(这个地址已经针对任何数据对象的存储进行了适当的对齐)就返回。如果遇到错误,代替之的是返回NULL.


Here is the function prototype for malloc:

下面是malloc的函数原型

--------------------------------------------------------------------------------
#include
#include

void *
malloc(unsigned long size, struct malloc_type *type, int flags);
--------------------------------------------------------------------------------

The following is a brief description of each parameter.

下面是对每个参数的简单描述.


size

This specifies the amount of uninitialized kernel memory to allocate.

它指定要分配的还没初始化的内核内存的数量


type

This parameter is used to perform statistics on memory usage and for basic sanity checks. (Memory statistics can be viewed by running the command vmstat –m.) Typically, I’ll set this parameter to M_TEMP, which is the malloc_type for miscellaneous temporary data buffers.

这个参数用于执行内存使用的统计以及基本的稳定性检查。(内存统计可通过运行命令 vmstat -m 来查看)。一般,我们把这个参数设置为M_TEMP, 代表malloc_type 是各种各样临时性的数据缓存。


NOTE For more on struct malloc_type, see the malloc(9) manual page.

提示 查看malloc(9) 手册可了解malloc_type 结构的更多信息。


flags

This parameter further qualifies malloc’s operational characteristics. It can be set to any of the following values:

这个参数进一步限制malloc 的操作特征。它可以设置为下列值中任一个:


M_ZERO This causes the allocated memory to be set to zero.

M_ZERO 它导致分配的内存初始化为0


M_NOWAIT This causes malloc to return NULL if the allocation request cannot be fulfilled immediately. This flag should be set when calling malloc in an interrupt context.

M_NOWAIT 它使得malloc在分配请求不能马上得到满足时返回NUL。在中断上下文中调用malloc时,应当设置这个标志。


M_WAITOK This causes malloc to sleep and wait for resources if the allocation request cannot be fulfilled immediately. If this flag is set, malloc cannot return NULL.

M_WAITOK 它导致在分配请求不能马上得到满足时,malloc进入休眠来等待资源。如果设置了这个标志,malloc不可能返回NULL。


Either M_NOWAIT or M_WAITOK must be specified.

M_NOWAIT 或 M_WAITOK 两者中,一定要指定其中的一个。


5.4.2 The MALLOC Macro
5.4.2 MALLOC 宏

For compatibility with legacy code, the malloc function is called with the MALLOC macro, which is defined as follows:

为了与遗留代码相兼容,malloc函数通过MALLOC 宏来调用的。该宏定义如下:

--------------------------------------------------------------------------------
#include
#include

MALLOC(space, cast, unsigned long size, struct malloc_type *type, int flags);
--------------------------------------------------------------------------------

This macro is functionally equivalent to:

这个宏在功能上等价于:

--------------------------------------------------------------------------------
(space) = (cast)malloc((u_long)(size), type, flags)
--------------------------------------------------------------------------------



5.4.3 The free Function
5.4.3 free 函数

To deallocate kernel memory that was previously allocated by malloc, call the free function.

为了释放一个先前通过malloc分配的内存,要调用free 函数

--------------------------------------------------------------------------------
#include
#include

void
free(void *addr, struct malloc_type *type);
--------------------------------------------------------------------------------

Here, addr is the memory address returned by a previous malloc call, and type is its associated malloc_type.

在这里,addr 是由先前malloc 调用返回的内存地址。type 是与之相关联的malloc_type。



5.4.4 The FREE Macro
5.4.4 FREE 宏

For compatibility with legacy code, the free function is called with the FREE macro, which is defined as follows:

为了与遗留代码相兼容,free 函数通过FREE 宏来调用的。该宏定义如下:

--------------------------------------------------------------------------------
#include
#include

FREE(void *addr, struct malloc_type *type);
--------------------------------------------------------------------------------

This macro is functionally equivalent to:

该宏在功能上等价于:

--------------------------------------------------------------------------------
free((addr), type)
--------------------------------------------------------------------------------

NOTE At some point in 4BSD’s history, part of its malloc algorithm was inline in a macro, which is why there is a MALLOC macro in addition to a function call.1 However, FreeBSD’s malloc algorithm is just a function call. Thus, unless you are writing legacy-compatible code, the use of the MALLOC and FREE macros is discouraged.

提示 从4BSD的历史观点看来,它的部分malloc算法是嵌入在宏里面的,这就是为什么除了函数调用之外还有宏的原因。但是,FreeBSD的malloc算法仅仅是一个函数调用。因此,除非你是正在写遗留兼容的代码,MALLOC 和 FREE 的使用是不提倡的.






5.4.5 Example
5.4.5 示例

Listing 5-4 shows a system call module designed to allocate kernel memory. The system call is invoked with two arguments: a long integer containing the amount of memory to allocate and a long integer pointer to store the returned address.

清单5-4演示了一个用于分配内核内存的系统调用。这个系统调用要求两个参数:一个包含要分配内存数量的长整数,还有一个存储返回的地址的长整数指针。

--------------------------------------------------------------------------------
#include
#include
#include
#include
#include
#include
#include
#include

struct kmalloc_args {
    unsigned long size;
    unsigned long *addr;
};

/* System call to allocate kernel virtual memory. */
/* 这个系统调用用于分配内核虚拟内存 */
static int
kmalloc(struct thread *td, void *syscall_args)
{
    struct kmalloc_args *uap;
    uap = (struct kmalloc_args *)syscall_args;

    int error;
    unsigned long addr;

    /*1*/ MALLOC(addr, unsigned long, uap->size, M_TEMP, M_NOWAIT);
    /*2*/ error = copyout(&addr, uap->addr, sizeof(addr));

    return(error);
}

/* The sysent for the new system call. */
/* 针对新系统调用的sysent */
static struct sysent kmalloc_sysent = {
    2,         /* number of arguments 参数个数*/
    kmalloc     /* implementing function 实现函数*/
};

/* The offset in sysent[] where the system call is to be allocated. */
/* 新的系统调用将分配在sysent[] 内的offset 处*/
static int offset = NO_SYSCALL;

---------------------
1 John Baldwin, personal communication, 2006–2007.

/* The function called at load/unload. */
/* 加载/卸载模块时调用此函数 */
static int
load(struct module *module, int cmd, void *arg)
{
    int error = 0;    

    switch (cmd) {
    case MOD_LOAD:
        uprintf("System call loaded at offset %d.\n", offset);
        break;

    case MOD_UNLOAD:
        uprintf("System call unloaded from offset %d.\n", offset);
        break;

    default:
        error = EOPNOTSUPP;
        break;
    }    
    return(error);
}

SYSCALL_MODULE(kmalloc, &offset, &kmalloc_sysent, load, NULL);

--------------------------------------------------------------------------------

Listing 5-4: kmalloc.c
清单5-4 kmalloc.c

As you can see, this code simply /*1*/ calls the MALLOC macro to allocate uap->size amount of kernel memory, and then /*2*/ copies out the returned address to user space.

可以看出,这个代码简单地调用MALLOC 来分配uap->size 数量的内核内存,然后把返回的地址拷贝到用户空间。


Listing 5-5 is the user space program designed to execute the system call above.

清单5-5是设计来执行上面系统调用的用户空间程序。

--------------------------------------------------------------------------------
#include
#include
#include
#include

int
main(int argc, char *argv[])
{
    int syscall_num;
    struct module_stat stat;

    unsigned long addr;

    if (argc != 2) {
        printf("Usage:\n%s \n", argv[0]);
        exit(0);
    }

    stat.version = sizeof(stat);
    modstat(modfind("kmalloc"), &stat);
    syscall_num = stat.data.intval;
    syscall(syscall_num, (unsigned long)atoi(argv[1]), &addr);
    printf("Address of allocated kernel memory: 0x%x\n", addr);

    exit(0);
}
--------------------------------------------------------------------------------
Listing 5-5: interface.c
清单 5-5: interface.c

This program uses the modstat/modfind approach (described in Chapter 1) to pass the first command-line argument to kmalloc; this argument should contain the amount of kernel memory to allocate. It then outputs the kernel virtual address where the recently allocated memory is located.

这个程序使用了modstat/modfind 方法(在第1章中描述)来传递第一个命令行参数给kmalloc;这个参数应当包含要分配的内核内存数量。然后程序输出刚刚分配的内存所处的内核虚拟地址。



5.5 Allocating Kernel Memory from User Space
5.5 从用户空间分配内核内存


Now that you’ve seen how to “properly” allocate kernel memory using module code, let’s do it using run-time kernel memory patching. Here is the algorithm (Cesare, 1998, as cited in sd and devik, 2001) we’ll be using:

你已经知道如何使用模块代码来"正确地"分配内核内存。现在让我们运用内核内存运行时补丁的方法来实现它。下面是我们将要使用的算法(Cesare, 1998, as cited in sd and devik, 2001)

1. Retrieve the in-memory address of the mkdir system call.
1. 取到mkdir 系统调用在内存中的地址。

2. Save sizeof(kmalloc) bytes of mkdir.
2. 保存sizeof(kmalloc)字节大小的mkdir

3. Overwrite mkdir with kmalloc.
3. 把mkdir覆盖写为kmalloc

4. Call mkdir.
4. 调用mkdir

5. Restore mkdir.
5. 恢复mkdir

With this algorithm, you are basically patching a system call with your own code, issuing the system call (which will execute your code instead), and then restoring the system call. This algorithm can be used to execute any piece of code in kernel space without a KLD.

运用这个算法,基本上你是使用你自己的代码修改一个系统调用,请求这个系统调用(替之执行的是你的代码),最后恢复系统调用。这个算法能够让任何一段代码在内核空间执行,而不需要使用KLD,


However, keep in mind that when you overwrite a system call, any process that issues or is currently executing the system call will break, resulting in a kernel panic. In other words, inherent to this algorithm is a race condition or concurrency issue.

但是,要记住的是,在你覆盖一个系统调用时,任何一个请求或正在执行这个系统调用的进程将会崩溃,导致内核panic。换句话说,这个算法的固有缺陷是竞争条件或同步问题。



5.5.1 Example
5.5.1 示例

Listing 5-6 shows a user space program designed to allocate kernel memory. This program is invoked with one command-line argument: an integer containing the number of bytes to allocate.

清单5-6演示一个设计用来分配内核内存的用户空间程序。这个程序调用时带一个命令行参数:一个整数,它包含要分配内存的字节大小

--------------------------------------------------------------------------------
#include
#include
#include
#include
#include
#include
#include
#include

/* Kernel memory allocation (kmalloc) function code. */
/* 内核内存分配(kmalloc)函数的代码 */
/*1*/ unsigned char kmalloc[] =
    "\x55"                 /* push %ebp             */
    "\xb9\x01\x00\x00\x00"         /* mov     $0x1,%ecx         */
    "\x89\xe5"             /* mov     %esp,%ebp         */
    "\x53"                 /* push %ebx             */
    "\xba\x00\x00\x00\x00"         /* mov     $0x0,%edx         */
    "\x83\xec\x10"             /* sub     $0x10,%esp         */
    "\x89\x4c\x24\x08"         /* mov     %ecx,0x8(%esp)         */
    "\x8b\x5d\x0c"             /* mov     0xc(%ebp),%ebx         */
    "\x89\x54\x24\x04"         /* mov     %edx,0x4(%esp)         */
    "\x8b\x03"             /* mov     (%ebx),%eax         */
    "\x89\x04\x24"             /* mov     %eax,(%esp)         */
    "\xe8\xfc\xff\xff\xff"         /* call    4e2     */
    "\x89\x45\xf8"             /* mov     %eax,0xfffffff8(%ebp)     */
    "\xb8\x04\x00\x00\x00"         /* mov     $0x4,%eax         */
    "\x89\x44\x24\x08"         /* mov     %eax,0x8(%esp)         */
    "\x8b\x43\x04"             /* mov     0x4(%ebx),%eax         */
    "\x89\x44\x24\x04"         /* mov     %eax,0x4(%esp)         */
    "\x8d\x45\xf8"             /* lea     0xfffffff8(%ebp),%eax     */
    "\x89\x04\x24"             /* mov     %eax,(%esp)         */
    "\xe8\xfc\xff\xff\xff"         /* call 500     */
    "\x83\xc4\x10"             /* add     $0x10,%esp         */
    "\x5b"                 /* pop     %ebx             */
    "\x5d"                 /* pop     %ebp             */
    "\xc3"                 /* ret                 */
    "\x8d\xb6\x00\x00\x00\x00";     /* lea     0x0(%esi),%esi         */

/*
* The relative address of the instructions following the call statements
* within kmalloc.
*/
/*
* 紧跟调用语句的指令在kmalloc内的相对地址
*/
#define OFFSET_1 0x26
#define OFFSET_2 0x44

int
main(int argc, char *argv[])
{
    int i;
    char errbuf[_POSIX2_LINE_MAX];
    kvm_t *kd;
    struct nlist nl[] = { {NULL}, {NULL}, {NULL}, {NULL}, {NULL}, };
    unsigned char mkdir_code[sizeof(kmalloc)];
    unsigned long addr;

    if (argc != 2) {
        printf("Usage:\n%s \n", argv[0]);
        exit(0);
    }

    /* Initialize kernel virtual memory access. */
    /* 初始化内核虚拟内存访问 */
    kd = kvm_openfiles(NULL, NULL, NULL, O_RDWR, errbuf);
    if (kd == NULL) {
        fprintf(stderr, "ERROR: %s\n", errbuf);
        exit(-1);
    }
    nl[0].n_name = "mkdir";
    nl[1].n_name = "M_TEMP";
    nl[2].n_name = "malloc";
    nl[3].n_name = "copyout";

    /* Find the address of mkdir, M_TEMP, malloc, and copyout. */
    /* 搜索mkdir, M_TEMP, malloc, 和 copyout 的地址 */
    if (kvm_nlist(kd, nl) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    for (i = 0; i < 4; i++) {
        if (!nl[i].n_value) {
            fprintf(stderr, "ERROR: Symbol %s not found\n",
                nl[i].n_name);
            exit(-1);
        }    
    }

    /*
    * Patch the kmalloc function code to contain the correct addresses
    * for M_TEMP, malloc, and copyout.
    */
    /*
    * 修补kmalloc 函数的代码来包含M_TEMP, malloc, 和 copyout 的正确地址
    * for M_TEMP, malloc, and copyout.
    */
    *(unsigned long *)&kmalloc[10] = nl[1].n_value;
    *(unsigned long *)&kmalloc[34] = nl[2].n_value -
        (nl[0].n_value + OFFSET_1);
    *(unsigned long *)&kmalloc[64] = nl[3].n_value -
        (nl[0].n_value + OFFSET_2);

    /* Save sizeof(kmalloc) bytes of mkdir. */
    /* 保存 sizeof(kmalloc) 字节大小的mkdir. */
    if (kvm_read(kd, nl[0].n_value, mkdir_code, sizeof(kmalloc)) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    /* Overwrite mkdir with kmalloc. */
    /* 用kmalloc 覆盖mkdir */
    if (kvm_write(kd, nl[0].n_value, kmalloc, sizeof(kmalloc)) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    /* Allocate kernel memory. */
    /* 分配内核内存 */
    syscall(136, (unsigned long)atoi(argv[1]), &addr);
    printf("Address of allocated kernel memory: 0x%x\n", addr);

    /* Restore mkdir. */
    /* 恢复 mkdir. */
    if (kvm_write(kd, nl[0].n_value, mkdir_code, sizeof(kmalloc)) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    /* Close kd. */
    /* 关闭 kd. */
    if (kvm_close(kd) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    exit(0);
}
--------------------------------------------------------------------------------
Listing 5-6: kmalloc_reloaded.c
清单 5-6: kmalloc_reloaded.c

In the preceding code, the /*1*/ kmalloc function code was generated by disassembling the kmalloc system call from Listing 5-4:

在前面的代码中,kmalloc函数的代码是通过反汇编kmalloc系统调用来产生的。看清单5-4

--------------------------------------------------------------------------------
$ objdump –dR ./kmalloc.ko

./kmalloc.ko: file format elf32-i386-freebsd

Disassembly of section .text:

000004c0 :
4c0: 55         push %ebp
4c1: b9 01 00 00 00     mov $0x1,%ecx
4c6: 89 e5         mov %esp,%ebp
4c8: 53         push %ebx
4c9: ba 00 00 00 00     mov $0x0,%edx
     /*1*/ 4ca: R_386_32 M_TEMP
4ce: 83 ec 10         sub $0x10,%esp
4d1: 89 4c 24 08     mov %ecx,0x8(%esp)
4d5: 8b 5d 0c         mov 0xc(%ebp),%ebx
4d8: 89 54 24 04     mov %edx,0x4(%esp)
4dc: 8b 03         mov (%ebx),%eax
4de: 89 04 24         mov %eax,(%esp)
4e1: e8 fc ff ff ff     call 4e2
     /*2*/ 4e2: R_386_PC32     malloc
4e6: 89 45 f8         mov %eax,0xfffffff8(%ebp)
4e9: b8 04 00 00 00     mov $0x4,%eax
4ee: 89 44 24 08     mov %eax,0x8(%esp)
4f2: 8b 43 04         mov 0x4(%ebx),%eax
4f5: 89 44 24 04     mov %eax,0x4(%esp)
4f9: 8d 45 f8         lea 0xfffffff8(%ebp),%eax
4fc: 89 04 24         mov %eax,(%esp)
4ff: e8 fc ff ff ff     call 500
    /*3*/ 500: R_386_PC32 copyout
504: 83 c4 10         add $0x10,%esp
507: 5b         pop %ebx
508: 5d         pop %ebp
509: c3         ret
50a: 8d b6 00 00 00 00     lea 0x0(%esi),%esi
--------------------------------------------------------------------------------

Notice how objdump(1) reports three instructions that require dynamic relocation. The first, at offset 10, is /*1*/ for the address of M_TEMP. The second, at offset 34, is /*2*/ for the malloc call statement operand. And the third, at offset 64, is  /*3*/ for the copyout call statement operand.

注意objdump(1)报告了需要动态重定位的三个指令。第一个,在偏移10处,是关于M_TEMP 的地址。第二个,在偏移34处,是关于malloc调用语句的操作数。还有第三个,在偏移64处,是关于copyout调用语句的操作数。


In kmalloc_reloaded.c, we account for this in our kmalloc function code with the following five lines:

在kmalloc_reloaded.c 中,我们用下面4行解决kmalloc 函数代码中的这个问题。

--------------------------------------------------------------------------------
*(unsigned long *)&kmalloc[10] = /*1*/  nl[1].n_value;
*(unsigned long *)&kmalloc[34] = /*2*/ nl[2].n_value -
    /*3*/ (nl[0].n_value + OFFSET_1);
*(unsigned long *)&kmalloc[64] = /*4*/ nl[3].n_value -
    /*5*/ (nl[0].n_value + OFFSET_2);
--------------------------------------------------------------------------------

Notice how kmalloc is patched at offset 10 with /*1*/ the address of M_TEMP. It is also patched at offsets 34 and 64 with /*2*/  the address of malloc minus the /*3*/ address of the instruction following the malloc call, and the /*4*/ address of copyout minus /*5*/ the address of the instruction following the copyout call, respectively.

注意kmalloc怎样用M_TEMP的地址修补kmalloc的偏移10处的。同样,它分别用malloc的地址减去紧跟malloc调用语句的指令的地址,和copyout的地址减去紧跟copyout调用语句的指令的地址,来修补malloc内的偏移34和64处。


The following output shows kmalloc_reloaded in action:

下面的输出显示了kmalloc_reloaded 的运行

--------------------------------------------------------------------------------
$ gcc -o kmalloc_reloaded kmalloc_reloaded.c -lkvm
$ sudo ./kmalloc_reloaded 10
Address of allocated kernel memory: 0xc1bb91b0
--------------------------------------------------------------------------------

To verify the kernel memory allocation, you can use a kernel-mode debugger like ddb(4):

为了检验内核内存的分配,你可以使用内核模式的调试器,比如ddb(4):
 
--------------------------------------------------------------------------------
KDB: enter: manual escape to debugger
[thread pid 13 tid 100003 ]
Stopped at     kdb_enter+0x2c: leave
db> examine/x 0xc1bb91b0
0xc1bb91b0:     70707070
db>
0xc1bb91b4:     70707070
db>
0xc1bb91b8:     dead7070
--------------------------------------------------------------------------------




5.6 Inline Function Hooking
5.6 嵌入函数挂勾

Recall the problem posed at the end of Section 5.3.1: What do you do when you want to patch some kernel code, but your patch is too big and will overwrite nearby instructions that you require? The answer is: You use an inline function hook.

回忆一下在章节5.3.1末尾提到的问题:当你想修改一些内核代码,但是你的补丁太大导致将要覆盖你需要的邻近的指令时,你该怎么做?答案是:使用嵌入函数挂勾


In general, an inline function hook places an unconditional jump within the body of a function to a region of memory under your control. This memory will contain the “new” code you want the function to execute, the code bytes that were overwritten by the unconditional jump, and an unconditional jump back to the original function. This will extend functionality while preserving original behavior. Of course, you don’t have to preserve the original behavior.

一般来说,嵌入函数挂钩 在函数体内放置一个无条件转移指令,jump到受你控制的内存区域。这片内存应该包含你希望这个函数去执行的“新”代码和被你用无条件jump给覆盖了的 代码字节,以及一个跳转回原先函数的无条件跳转指令。这样做将扩展原函数的功能同时保留原先的行为。当然,你不一定非要保留原先的行为不可。

5.6.1 Example
5.6.1 示例

In this section we’ll patch the mkdir system call with an inline function hook so that it will output the phrase “Hello, world!\n” each time it creates a directory.


Now, let’s take a look at the disassembly of mkdir to see where we should place the jump, which bytes we need to preserve, and where we should jump back to.
--------------------------------------------------------------------------------
$ nm /boot/kernel/kernel | grep mkdir
c04dfc00 T devfs_vmkdir
c06a84e0 t handle_written_mkdir
c05bfa10 T kern_mkdir
c05bfec0 T mkdir
c07d1f40 B mkdirlisthd
c04ef6a0 t msdosfs_mkdir
c06579e0 t nfs4_mkdir
c066a910 t nfs_mkdir
c067a830 T nfsrv_mkdir
c07515b6 r nfsv3err_mkdir
c06c32e0 t ufs_mkdir
c07b8d20 D vop_mkdir_desc
c05b77f0 T vop_mkdir_post
c07b8d44 d vop_mkdir_vp_offsets
$ objdump -d --start-address=0xc05bfec0 /boot/kernel/kernel

/boot/kernel/kernel: file format elf32-i386-freebsd

Disassembly of section .text:

c05bfec0 :
c05bfec0: 55             push %ebp
c05bfec1: 89 e5         mov %esp,%ebp
c05bfec3: 83 ec 10         sub $0x10,%esp
c05bfec6: 8b 55 0c         mov 0xc(%ebp),%edx
c05bfec9: 8b 42 04         mov 0x4(%edx),%eax
c05bfecc: 89 44 24 0c         mov %eax,0xc(%esp)
c05bfed0: 31 c0         xor %eax,%eax
c05bfed2: 89 44 24 08         mov %eax,0x8(%esp)
c05bfed6: 8b 02         mov (%edx),%eax
c05bfed8: 89 44 24 04         mov %eax,0x4(%esp)
c05bfedc: 8b 45 08         mov 0x8(%ebp),%eax
c05bfedf: 89 04 24         mov %eax,(%esp)
c05bfee2: e8 29 fb ff ff     call c05bfa10
c05bfee7: c9             leave
c05bfee8: c3             ret
c05bfee9: 8d b4 26 00 00 00 00     lea 0x0(%esi),%esi
--------------------------------------------------------------------------------

Because I want to extend the functionality of mkdir, rather than change it, the best place for the unconditional jump is at the beginning. An unconditional jump requires seven bytes. If you overwrite the first seven bytes of mkdir, the first three instructions will be eliminated, and the fourth instruction (which starts at offset six) will be mangled. Therefore, we’ll need to save the first four instructions (i.e., the first nine bytes) in order to preserve mkdir’s functionality; this also means that you should jump back to offset nine to resume execution from the fifth instruction.

因为我想扩展mkdir的功能,而不 是改变它,所以放置无条件跳转jump的最佳位置是在开头。一个无条件jump需要7字节。如果你覆盖mkdir的前7个字节,那它前3个指令就会被删 除,还有第4个指令(开始于偏移6处)就会被破坏。因此,为了保留mkdir的功能,我们得保存前面4个指令(也就是说前9个字节);这也意味着,你应该 从第5个指令往后跳回到偏移9处来恢复mkdir的运行。


Before committing to this plan, however, let’s look at the disassembly of mkdir on a different machine.

在开始这个计划之前,让我们观察一下在不同机器上mkdir的反汇编。

--------------------------------------------------------------------------------
$ nm /boot/kernel/kernel | grep mkdir
c047c560 T devfs_vmkdir
c0620e40 t handle_written_mkdir
c0556ca0 T kern_mkdir
c0557030 T mkdir
c071d57c B mkdirlisthd
c048a3e0 t msdosfs_mkdir
c05e2ed0 t nfs4_mkdir
c05d8710 t nfs_mkdir
c05f9140 T nfsrv_mkdir
c06b4856 r nfsv3err_mkdir
c063a670 t ufs_mkdir
c0702f40 D vop_mkdir_desc
c0702f64 d vop_mkdir_vp_offsets
$ objdump -d --start-address=0xc0557030 /boot/kernel/kernel

/boot/kernel/kernel: file format elf32-i386-freebsd

Disassembly of section .text:

c0557030 :
c0557030: 55             push %ebp
c0557031: 31 c9         xor %ecx,%ecx
c0557033: 89 e5         mov %esp,%ebp
c0557035: 83 ec 10         sub $0x10,%esp
c0557038: 8b 55 0c         mov 0xc(%ebp),%edx
c055703b: 8b 42 04         mov 0x4(%edx),%eax
c055703e: 89 4c 24 08         mov %ecx,0x8(%esp)
c0557042: 89 44 24 0c         mov %eax,0xc(%esp)
c0557046: 8b 02         mov (%edx),%eax
c0557048: 89 44 24 04         mov %eax,0x4(%esp)
c055704c: 8b 45 08         mov 0x8(%ebp),%eax
c055704f: 89 04 24         mov %eax,(%esp)
c0557052: e8 49 fc ff ff     call c0556ca0
c0557057: c9             leave
c0557058: c3             ret
c0557059: 8d b4 26 00 00 00 00     lea 0x0(%esi),%esi
--------------------------------------------------------------------------------

Notice how the two disassemblies are quite different. In fact, this time around the fifth instruction starts at offset eight, not nine. If the code were to jump back to offset nine, it would most definitely crash this system. What this boils down to is that when writing an inline function hook, in general, you’ll have to avoid using hard-coded offsets if you want to apply the hook to a wide range of systems.

注意到这两个反汇编代码完全不一样。实际上,这次第5个指令开始于偏移8处,而不是9。如果代码往后跳回到偏移9处,它无疑会导致系统崩溃。这就是写一个嵌入函数挂勾的难度的在。一般来说,如果你想让挂勾适用于大范围的系统,就必须避免使用硬编码的偏移


Looking back at the two disassemblies, notice how mkdir calls kern_mkdir every time. Therefore, we can jump back to that (i.e., 0xe8). In order to preserve mkdir’s functionality, we’ll now have to save every byte up to, but not including, 0xe8.

往后看看那两个反汇编代码,注意到mkdir每次都要调用kern_mkdir。因此,我们可以跳回到那里(也就是0xe8)。为了保留mkdir的功能,现在我们得保存mkdir中上至但不包含0xe8的全部字节。

Listing 5-7 shows my mkdir inline function hook.

清单演示了我的mkdir嵌入函数挂勾

NOTE To save space, the kmalloc function code is omitted.

注意 为了节省空间,kmalloc函数的代码被省略了。

--------------------------------------------------------------------------------
#include
#include
#include
#include
#include
#include
#include
#include

/* memory allocation (kmalloc) function code. */
/* 分配 (kmalloc) 函数代码. */
unsigned char kmalloc[] =
. . .

/*
* The relative address of the instructions following the call statements
* within kmalloc.
*/
/*
* 紧跟调用语句的指令在kmalloc内的相对地址
*/
#define K_OFFSET_1 0x26
#define K_OFFSET_2 0x44

/* "Hello, world!\n" function code. */
/* "Hello, world!\n" 函数代码. */
/*1*/ unsigned char hello[] =
    "\x48"                 /* H             */
    "\x65"                 /* e             */
    "\x6c"                 /* l             */
    "\x6c"                 /* l             */
    "\x6f"                 /* o             */
    "\x2c"                 /* ,             */
    "\x20"                 /*             */
    "\x77"                 /* w             */
    "\x6f"                 /* o             */
    "\x72"                 /* r             */
    "\x6c"                 /* l             */
    "\x64"                 /* d             */
    "\x21"                 /* !             */
    "\x0a"                 /* \n             */
    "\x00"                 /* NULL         */
    "\x55"                 /* push %ebp         */
    "\x89\xe5"             /* mov %esp,%ebp     */
    "\x83\xec\x04"             /* sub $0x4,%esp     */
    "\xc7\x04\x24\x00\x00\x00\x00"     /* movl $0x0,(%esp)     */
    "\xe8\xfc\xff\xff\xff"         /* call uprintf     */
    "\x31\xc0"             /* xor %eax,%eax     */
    "\x83\xc4\x04"             /* add $0x4,%esp     */
    "\x5d";             /* pop %ebp         */

/*
* The relative address of the instruction following the call uprintf
* statement within hello.
*/
*/
/*
* 紧跟调用uprintf语句的指令在hello内的相对地址
*/
#define H_OFFSET_1 0x21

/* Unconditional jump code. */
/* 无条件跳转代码 */
unsigned char jump[] =
    "\xb8\x00\x00\x00\x00"         /* movl $0x0,%eax     */
    "\xff\xe0";             /* jmp *%eax         */

int
main(int argc, char *argv[])
{
    int i, call_offset;
    char errbuf[_POSIX2_LINE_MAX];
    kvm_t *kd;
    struct nlist nl[] = { {NULL}, {NULL}, {NULL}, {NULL}, {NULL},
        {NULL}, };
    unsigned char mkdir_code[sizeof(kmalloc)];
    unsigned long addr, size;

    /* Initialize kernel virtual memory access. */
    /* 初始化对内核虚拟内存的访问 */
    kd = kvm_openfiles(NULL, NULL, NULL, O_RDWR, errbuf);
        if (kd == NULL) {
            fprintf(stderr, "ERROR: %s\n", errbuf);
            exit(-1);
    }

    nl[0].n_name = "mkdir";
    nl[1].n_name = "M_TEMP";
    nl[2].n_name = "";
    nl[3].n_name = "copyout";
    nl[4].n_name = "uprintf";

    /*
    * Find the address of mkdir, M_TEMP, malloc, copyout,
    * and uprintf.
    */
    /*
    * 查找 mkdir, M_TEMP, malloc, copyout 和uprintf的地址
    */
    if (kvm_nlist(kd, nl) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    for (i = 0; i < 5; i++) {
        if (!nl[i].n_value) {
            fprintf(stderr, "ERROR: Symbol %s not found\n",
                nl[i].n_name);
            exit(-1);
        }
    }

    /* Save sizeof(kmalloc) bytes of mkdir. */
    /* 保存 sizeof(kmalloc) 字节大小的  mkdir. */
    if (kvm_read(kd, nl[0].n_value, mkdir_code, sizeof(kmalloc)) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    /* Search through mkdir for call kern_mkdir. */
    /* 在mkdir 中查找kern_mkdir.  */
    for (i = 0; i < sizeof(kmalloc); i++) {
        if (mkdir_code[i] == 0xe8) {
            call_offset = i;
            break;
        }
    }

    /* Determine how much memory you need to allocate. */
    /* 确定需要分配多少内存. */
    size = (unsigned long)sizeof(hello) + (unsigned long)call_offset +
        (unsigned long)sizeof(jump);

    /*
    * Patch the kmalloc function code to contain the correct addresses
    * for M_TEMP, malloc, and copyout.
    */
    /*
    * 修补kmalloc 函数代码来包含M_TEMP, malloc, 和 copyout 的正确地址
    * for M_TEMP, malloc, and copyout.
    */
    *(unsigned long *)&kmalloc[10] = nl[1].n_value;
    *(unsigned long *)&kmalloc[34] = nl[2].n_value -
        (nl[0].n_value + K_OFFSET_1);
    *(unsigned long *)&kmalloc[64] = nl[3].n_value -
        (nl[0].n_value + K_OFFSET_2);

    /* Overwrite mkdir with kmalloc. */
]    /* kmalloc 用覆盖 mkdir */
    if (kvm_write(kd, nl[0].n_value, kmalloc, sizeof(kmalloc)) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    /* Allocate kernel memory. */
    /* 分配内核内存. */
    syscall(136, size, &addr);

    /* Restore mkdir. */
    /* 恢复 mkdir. */
    if (kvm_write(kd, nl[0].n_value, mkdir_code, sizeof(kmalloc)) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    /*
    * Patch the "Hello, world!\n" function code to contain the
    * correct addresses for the "Hello, world!\n" string and uprintf.
    */
    /*
    * 修改 "Hello, world!\n" 函数代码来包含"Hello, world!\n"字符串
    * 和uprintf的正确地址
    */
    *(unsigned long *)&hello[24] = addr;
    *(unsigned long *)&hello[29] = nl[4].n_value - (addr + H_OFFSET_1);

    /*
    * Place the "Hello, world!\n" function code into the recently
    * allocated kernel memory.
    */
    /*
    * 把 "Hello, world!\n" 函数代码放置到最近分配的内存中
    */
    if (kvm_write(kd, addr, hello, sizeof(hello)) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    /*
    * Place all the mkdir code up to but not including call kern_mkdir
    * after the "Hello, world!\n" function code.
    */
    /*
    * 把mkdir 中上至但不包含call kern_mkdir的代码放置到"Hello, world!\n"函数的后面
    * after the "Hello, world!\n" function code.
    */
    if (kvm_write(kd, addr + (unsigned long)sizeof(hello) - 1,
        mkdir_code, call_offset) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    /*
    * Patch the unconditional jump code to jump back to the call
    * kern_mkdir statement within mkdir.
    */
    /*
    * 修补jump代码来跳转到mkdir内部的调用kern_mkdir 语句
    */
    *(unsigned long *)&jump[1] = nl[0].n_value +
        (unsigned long)call_offset;

    /*
    * Place the unconditional jump code into the recently allocated
    * kernel memory, after the mkdir code.
    */
    /*
    * 把无条件jump代码放置到最近分配的内核内存,位于mkdir代码的后面
    */
    if (kvm_write(kd, addr + (unsigned long)sizeof(hello) - 1 +
        (unsigned long)call_offset, jump, sizeof(jump)) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    /*
    * Patch the unconditional jump code to jump to the start of the
    * "Hello, world!\n" function code.
    */
    /*
    * 修补无条件jump代码来跳转到"Hello, world!\n" 函数代码的开头
    */
    /*2*/ *(unsigned long *)&jump[1] = addr + 0x0f;

    /*
    * Overwrite the beginning of mkdir with the unconditional
    * jump code.
    */
    /*
    * 用无条件jump代码覆盖mkdir的前端
    */
    if (kvm_write(kd, nl[0].n_value, jump, sizeof(jump)) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    /* Close kd. */
    /* 关闭 kd. */
    if (kvm_close(kd) < 0) {
        fprintf(stderr, "ERROR: %s\n", kvm_geterr(kd));
        exit(-1);
    }

    exit(0);
}

--------------------------------------------------------------------------------

Listing 5-7: mkdir_patch.c

清单 5-7 : mkdir_patch.c

As you can see, employing an inline function hook is relatively straightforward (although it’s somewhat lengthy). In fact, the only piece of code you haven’t seen before is /*1*/ the "Hello, world!\n" function code. It is rather simplistic, but there are two important points about it.

你可以看到,采嵌入函数挂钩的方法相对比较简单(虽然它有点长)。实际上,唯一你以前没见过的一段代码是"Hello, world!\n" 函数的代码。它是相当地简单,但是这里有重要的两点。


First, notice how the first 15 bytes of hello are actually data; to be exact, these bytes make up the string Hello, world!\n. The actual assembly language instructions don’t start until offset 15. This is why the unconditional jump code, which overwrites mkdir, is /*2*/ set to addr + 0x0f.

首先,注意到hello开头前15字节其实是代码;准确地说,这些代码构成了字符串 Hello, world!\n 。实际的汇编语言指令是从偏移15字节处开始的。这就是,为什么覆盖mkdir的无条件jump代码,被放置在addr + 0x0f 位置。

Second, note hello’s final three instructions. The first zeros out the %eax register, the second cleans up the stack, and the last restores the %ebp register. This is done so that when mkdir actually begins executing, it’s as if the hook never happened.

第二,注意到hello的最后3个指令。第1个对%eax 寄存器清零,第2个清空堆栈,第3个恢复%ebp寄存器。由于这些代码的执行,使得当mkdir 实际开始运行时,看起来挂钩从没发生过一样。


The following output shows mkdir_patch in action:

下面的输出显示了mkdir_patch 运行情况。

--------------------------------------------------------------------------------
$ -o mkdir_patch mkdir_patch.c –lkvm
$ sudo ./mkdir_patch
$ mkdir TESTING
Hello, world!
$ ls –F
TESTING/ mkdir_patch* mkdir_patch.c
--------------------------------------------------------------------------------


5.6.2 Gotchas

Because mkdir_patch.c is a simple example, it fails to reveal some typical gotchas associated with inline function hooking.

因为mkdir_patch.c 是个简单的例子,它无法展现与内嵌函数挂钩相关的一些典型gotchas 。


First, by placing an unconditional jump within the body of a function, whose behavior you intend to preserve, there is a good chance that you’ll cause a kernel panic. This is because the unconditional jump code requires the use of a general-purpose register; however, it is likely that within the body of a function, all the general-purpose registers will already be in use. To get around this, push the register you are going to use onto the stack before jumping, and then pop it off after.

首 先,在你希望保留的函数体内放置一个无条件跳转jump,这是将导致内核panic的极佳机会。这是因为这个无条件跳转jump代码需要使用一个通用寄存 器;但是,很可能在函数内部,所有的通用寄存器全部已经在用了。为了绕开这点,在跳转之前得把你打算要使用的寄存器push到堆栈,最后再把它pop回 去。


Second, if you copy a call or jump statement and place it into a different region of memory, you can’t execute it as is; you have to adjust its operand first. This is because a call or jump statement’s machine code operand is a relative address.

第二,如果你拷贝一个调用或跳转语句并放置到内存的不同区域,你无法象以前那样执行它;你必须首先调整它的操作数。这是因为调用或跳转语句的机器码操作数是相对地址。


Finally, it’s possible for your code to be preempted while patching, and during that time, your target function may execute in its incomplete state. Therefore, if possible, you should avoid patching with multiple writes.

最后,在打补丁时你的代码被抢占也是可能的事,并且在那段时间里,你的目标函数可能用它的不完整状态执行。因此,如果可能,你应当避免需要多次写操作才能完成的补丁。



5.7 Cloaking System Call Hooks
5.7 掩盖系统调用挂钩

Before concluding this chapter, let’s take a brief look at a nontrivial application for run-time kernel memory patching: cloaking system call hooks. That is, implementing a system call hook without patching the system call table or any system call function. This is achieved by patching the system call dispatcher with an inline function hook so it references a Trojan system call table instead of the original. This renders the original table functionless, but maintains its integrity, enabling the Trojan table to direct system call requests to any handler you like.

本章结束之前,让我们看看内核内存补丁的一个非常规应用:掩盖系统调用挂钩。也就是,实现系统调用的挂钩,而不需要修 改系统调用表或任何系统调用函数.这个效果是通过用一个嵌入函数挂钩来修改系统调用派遣程序,让它引用一个Trojan系统调用表而不是原先的来达成的。 这样做致使原先的系统调用表丧失了它的功能,但又维持它的完整性,使得Trojan系统调用表把系统调用请求引导到任何一个你喜欢的处理程序去。


Because the code to do this is rather lengthy (it’s longer than mkdir_patch.c), I’ll simply explain how it’s done and leave the actual code to you.

因为实现代码相当地长(它比mkdir_patch.c要长),我仅简单地解释它是怎么做的,实际代码留给你完成。


The system call dispatcher in is syscall, which is implemented in the file /sys/i386/i386/trap.c as follows.

FreeBSD的系统调用派遣程序是syscall。它在文件/sys/i386/i386/trap.c 中实现如下


NOTE In the interest of saving space, any code irrelevant to this discussion is omitted.

提示 为了节省空间,与讨论无关的代码都给忽略了。

--------------------------------------------------------------------------------

syscall(frame)
    struct trapframe frame;
{
    caddr_t params;
    struct sysent *callp;
    struct thread *td = curthread;
    struct proc *p = td->td_proc;
    register_t orig_tf_eflags;
    u_int sticks;
    int error;
    int narg;
    int args[8];
    u_int code;
. . .
    if (code >= p->p_sysent->sv_size)
        callp = &p->p_sysent->sv_table[0];
    else
     /*1*/ callp = &p->p_sysent->sv_table[code];/* <-- 1 */
. . .
}
--------------------------------------------------------------------------------

In syscall, line /*1*/ references the system call table and stores the address of the system call to be dispatched into callp. Here is what this line looks like disassembled:

在syscall中,该行引用系统调用表,把需要派遣的系统调用的地址保存到callp中。下面是该行在反汇编后的样子:

--------------------------------------------------------------------------------
486: 64 a1 00 00 00 00         mov     %fs:0x0,%eax
48c: 8b 00             mov     (%eax),%eax
48e: 8b 80 a0 01 00 00         mov     0x1a0(%eax),%eax
494: 8b 40 04             mov     0x4(%eax),%eax
--------------------------------------------------------------------------------

The first instruction loads curthread, the currently running thread (i.e., the %fs segment register), into %eax. The first field in a thread structure is a pointer to its associated proc structure; hence, the second instruction loads the current process into %eax. The next
instruction loads p_sysent into %eax. This can be verified, as the p_sysent field (which is a
sysentvec pointer) is located at an offset of 0x1a0 within a proc structure. The last instruction loads the system call table into %eax. This can be verified, as the sv_table field is located at an offset of 0x4 within a sysentvec structure. This last line is the one you’ll need to scan for and patch. However, be aware that, depending on the system, the system call table can be loaded into a different general-purpose register.

第1个指令装载curthread,当前运行线程(也是%fs段寄存器),到%eax。thread 中的第1个域是与它相关联的proc结构的。 因此,第2个指令装载当前的进程到%eax。接下来的指令把p_sysent 装载到%eax。这点是能够检验的。因为p_sysent (它是一个sysentvec 的指针)位于proc结构内偏移0x1a0 的地方。最后一条指令装载系统调用表到%eax。这点也可以去查证,因为域sv_table 位于sysentvec 结构体内部偏移0x4 的地方。这最后一行就是你要去搜索和进行修改的。但是,必须意识到,依赖于系统,系统调用表可能装载到一个不同的通用寄存器中。


Also, after Trojaning the system call table, any system call that are loaded won’t work. However, since you now control the system calls responsible for loading a module, this can be fixed.

同样,在强奸了系统调用表后,任何一个加载的系统调用模块都不能工作。但是,既然现在你控制了负责加载模块的系统调用,这个缺陷可以被修正。


That’s about it! All you really need to do is patch one spot. Of course, the devil is in the details. (In fact, all the gotchas I listed in Section 5.6.2 are a direct result of trying to patch that one spot.)

就这样!你真正要做的是修正这个缺陷。当然,难点是细节的处理。(实际上,章节5.6.2 列出的所有gotchas 是尝试修正那个缺陷的指引。)


NOTE If you Trojan your own system call table, you’ll null the effects of traditional system call hooking. In other words, this technique of cloaking system calls can be applied defensively.

注意 如果你强奸了自己的系统调用表,你也就导致传统的系统调用挂钩失效了。换句话说,掩盖系统调用这项技术也可以应用在防御。


5.8 Concluding Remarks
5.8 小结

Run-time kernel memory patching is one of the strongest techniques for modifying software logic. Theoretically, you can use it to rewrite the entire operating system on the fly. Furthermore, it’s somewhat difficult to detect, depending on where you place your patches and whether or not you use inline function hooks.

内核内存运行时修补是修改软件逻辑的最强大的技术之一。理论上,你可以使用它改写整个操作系统。此外,它相对地难以探测,这取决于你把补丁放在哪里以及你是否使用嵌入函数挂钩。


At the time of this writing, a technique to cloak run-time kernel memory patching has been published. See “Raising The Bar For Windows Detection” by Jamie Butler and Sherri Sparks, published in Phrack magazine, issue 63. Although this article is written from a Windows perspective, the theory can be applied to any x86 operating system.

在 写本章的时候,一种掩盖内核内存补丁的技术已经被公布了。见于Jamie Butler 和 Sherri Sparks 写的“Raising The Bar For Windows Rootkit Detection”,发表在Phrack 杂志第63期。尽管这篇文章是从windows的角度写的,但它的理论也适用于任何基于x86的操作系统。


Finally, like most rootkit techniques, run-time kernel memory patching has legitimate uses. For example, Microsoft calls it hot patching and uses it to patch systems without requiring a reboot.

最后,象大多数rootkit技术一样,内核运行时内存补丁技术有它的合法使用。比如,微软把它叫做热补丁,使用它来修补系统而不需要系统的重启。
阅读(1025) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~