Chinaunix首页 | 论坛 | 博客
  • 博客访问: 613899
  • 博文数量: 197
  • 博客积分: 7001
  • 博客等级: 大校
  • 技术积分: 2155
  • 用 户 组: 普通用户
  • 注册时间: 2005-02-24 00:29
文章分类

全部博文(197)

文章存档

2022年(1)

2019年(2)

2015年(1)

2012年(100)

2011年(69)

2010年(14)

2007年(3)

2005年(7)

分类: LINUX

2010-05-22 17:29:26

            patching - 性能狂镇静剂
           -- 引自Lgeust代码注释 (Powerfully Placating Performance Pedants)

   主要目标:是 self-modifying code, 可以增强性能, 适用于不同配置(如UP和SMP).
    代价: 代码增大和footprintf增大(因为要保存被patching指令的地址, 还有将lock前缀换成noop等,如果不配置CONFIG_SMP连noop都没有), 对嵌入式系统不一定有利.

用法起源

最 早的起源 Runtime memory barrier patching
里面的讨论很有意义.



SMP alternatives



参 见include/asm-i386/Alternative.h 的注释



终于弄明白了 Linux 内核的 LOCK_PREFIX 的含义

http://blog.chinaunix.net/u/12325/showart_1999057.html

Lguest

In this case, the address is a runtime constant, so at
the very least we can patch the call to a simple direct call, or
ideally, patch an inline implementation into the callsite. 

(1)取消间接跳转
参见linux- 2.6.23.2/include/asm-i386/Paravirt.h的注释和paravirt_patch_default函数.
这里的 关键是paravirt_ops.operations ,  the address is a runtime constant.所以是可行的.


[patch 14/26] Xen-paravirt_ops: add common patching machinery



/*
 * These macros are intended to wrap calls into a paravirt_ops
 * operation, so that they can be later identified and patched at
 * runtime.
 *
 * Normally, a call to a pv_op function is a simple indirect call:
 * (paravirt_ops.operations)(args...).
 *
 * Unfortunately, this is a relatively slow operation for modern CPUs,
 * because it cannot necessarily determine what the destination
 * address is.  In this case, the address is a runtime constant, so at
 * the very least we can patch the call to e a simple direct call, or
 * ideally, patch an inline implementation into the callsite.  (Direct
 * calls are essentially free, because the call and return addresses
 * are completely predictable.)
 *
 * These macros rely on the standard gcc "regparm(3)" calling
 * convention, in which the first three arguments are placed in %eax,
 * %edx, %ecx (in that order), and the remaining arguments are placed
 * on the stack.  All caller-save registers (eax,edx,ecx) are expected
 * to be modified (either clobbered or used for return values).
 *
 * The call instruction itself is marked by placing its start address
 * and size into the .parainstructions section, so that
 * apply_paravirt() in arch/i386/kernel/alternative.c can do the
 * appropriate patching under the control of the backend paravirt_ops
 * implementation.
 *
 * Unfortunately there's no way to get gcc to generate the args setup
 * for the call, and then allow the call itself to be generated by an
 * inline asm.  Because of this, we must do the complete arg setup and
 * return value handling from within these macros.  This is fairly
 * cumbersome.
 *
 * There are 5 sets of PVOP_* macros for dealing with 0-4 arguments.
 * It could be extended to more arguments, but there would be little
 * to be gained from that.  For each number of arguments, there are
 * the two VCALL and CALL variants for void and non-void functions.
 *
 * When there is a return value, the invoker of the macro must specify
 * the return type.  The macro then uses sizeof() on that type to
 * determine whether its a 32 or 64 bit value, and places the return
 * in the right register(s) (just %eax for 32-bit, and %edx:%eax for
 * 64-bit).
 *
 * 64-bit arguments are passed as a pair of adjacent 32-bit arguments
 * in low,high order.
 *
 * Small structures are passed and returned in registers.  The macro
 * calling convention can't directly deal with this, so the wrapper
 * functions must do this.
 *
 * These PVOP_* macros are only defined within this header.  This
 * means that all uses must be wrapped in inline functions.  This also
 * makes sure the incoming and outgoing types are always correct.
 */






(2) 直接替换指令
使用原因参见下面注释,将cli,sti等常用指令就地patching, 这样就不用调用Hypercall切换到Host.


linux-2.6.23.2/arch/i386/kernel/vmlinux.lds.S
. = ALIGN(4);
  .parainstructions : AT(ADDR(.parainstructions) - LOAD_OFFSET) {
      __parainstructions = .;
    *(.parainstructions)
      __parainstructions_end = .;
  }



start_kernel()=>check_bugs()=>alternative_instructions() 函数中有如下代码
apply_paravirt(__parainstructions, __parainstructions_end);

void apply_paravirt(struct paravirt_patch_site *start,
            struct paravirt_patch_site *end)
{
    struct paravirt_patch_site *p;
    char insnbuf[MAX_PATCH_LEN];

    if (noreplace_paravirt)
        return;

    for (p = start; p < end; p++) {
        unsigned int used;

        BUG_ON(p->len > MAX_PATCH_LEN);
        /* prep the buffer with the original instructions */
        memcpy(insnbuf, p->instr, p->len);
        used = paravirt_ops.patch(p->instrtype, p->clobbers, insnbuf,
                      (unsigned long)p->instr, p->len);

        BUG_ON(used > p->len);

        /* Pad the rest with nops */
        add_nops(insnbuf + used, p->len - used);
        text_poke(p->instr, insnbuf, p->len);
    }
}

lugest_init() 中有如下代码:
paravirt_ops.patch = lguest_patch;


/*G:050
 * Patching (Powerfully Placating Performance Pedants)
 *
 * We have already seen that "struct paravirt_ops" lets us replace simple
 * native instructions with calls to the appropriate back end all throughout
 * the kernel.  This allows the same kernel to run as a Guest and as a native
 * kernel, but it's slow because of all the indirect branches.
 *
 * Remember that David Wheeler quote about "Any problem in computer science can
 * be solved with another layer of indirection"?  The rest of that quote is
 * "... But that usually will create another problem."  This is the first of
 * those problems.
 *
 * Our current solution is to allow the paravirt back end to optionally patch
 * over the indirect calls to replace them with something more efficient.  We
 * patch the four most commonly called functions: disable interrupts, enable
 * interrupts, restore interrupts and save interrupts.  We usually have 10
 * bytes to patch into: the Guest versions of these operations are small enough
 * that we can fit comfortably.
 *
 * First we need assembly templates of each of the patchable Guest operations,
 * and these are in lguest_asm.S. */

/*G:060 We construct a table from the assembler templates: */
static const struct lguest_insns
{
    const char *start, *end;
} lguest_insns[] = {
    [PARAVIRT_PATCH(irq_disable)] = { lgstart_cli, lgend_cli },
    [PARAVIRT_PATCH(irq_enable)] = { lgstart_sti, lgend_sti },
    [PARAVIRT_PATCH(restore_fl)] = { lgstart_popf, lgend_popf },
    [PARAVIRT_PATCH(save_fl)] = { lgstart_pushf, lgend_pushf },
};

/* Now our patch routine is fairly simple (based on the native one in
 * paravirt.c).  If we have a replacement, we copy it in and return how much of
 * the available space we used. */
static unsigned lguest_patch(u8 type, u16 clobber, void *ibuf,
                 unsigned long addr, unsigned len)
{
    unsigned int insn_len;

    /* Don't do anything special if we don't have a replacement */
    if (type >= ARRAY_SIZE(lguest_insns) || !lguest_insns[type].start)
        return paravirt_patch_default(type, clobber, ibuf, addr, len);

    insn_len = lguest_insns[type].end - lguest_insns[type].start;

    /* Similarly if we can't fit replacement (shouldn't happen, but let's
     * be thorough). */
    if (len < insn_len)
        return paravirt_patch_default(type, clobber, ibuf, addr, len);

    /* Copy in our instructions. */
    memcpy(ibuf, lguest_insns[type].start, insn_len);
    return insn_len;
}


注 意, 在apply_paravirt 之前, 还是paravirt_ops处理敏感指令.

   paravirt_ops.save_fl = save_fl;
    paravirt_ops.restore_fl = restore_fl;
    paravirt_ops.irq_disable = irq_disable;
    paravirt_ops.irq_enable = irq_enable;

阅读(1466) | 评论(0) | 转发(0) |
0

上一篇:Lguest boot

下一篇:lguest Hypercalls

给主人留下些什么吧!~~