patching - 性能狂镇静剂
-- 引自Lgeust代码注释 (Powerfully
Placating Performance Pedants)
主要目标:是 self-modifying code,
可以增强性能, 适用于不同配置(如UP和SMP).
代价:
代码增大和footprintf增大(因为要保存被patching指令的地址,
还有将lock前缀换成noop等,如果不配置CONFIG_SMP连noop都没有), 对嵌入式系统不一定有利.
用法起源
最
早的起源 Runtime memory barrier patching
里面的讨论很有意义.
SMP
alternatives
参
见include/asm-i386/Alternative.h 的注释
终于弄明白了 Linux 内核的 LOCK_PREFIX 的含义
http://blog.chinaunix.net/u/12325/showart_1999057.htmlLguest
In
this case, the address is a runtime constant, so at
the very least
we can patch the call to a simple direct call, or
ideally, patch an
inline implementation into the callsite.
(1)取消间接跳转
参见linux-
2.6.23.2/include/asm-i386/Paravirt.h的注释和paravirt_patch_default函数.
这里的
关键是paravirt_ops.operations , the address is a runtime constant.所以是可行的.
[patch 14/26] Xen-paravirt_ops: add common
patching machinery
/*
*
These macros are intended to wrap calls into a paravirt_ops
*
operation, so that they can be later identified and patched at
*
runtime.
*
* Normally, a call to a pv_op function is a simple
indirect call:
* (paravirt_ops.operations)(args...).
*
*
Unfortunately, this is a relatively slow operation for modern CPUs,
*
because it cannot necessarily determine what the destination
*
address is. In this case, the address is a runtime constant, so at
*
the very least we can patch the call to e a simple direct call, or
*
ideally, patch an inline implementation into the callsite. (Direct
*
calls are essentially free, because the call and return addresses
*
are completely predictable.)
*
* These macros rely on the
standard gcc "regparm(3)" calling
* convention, in which the first
three arguments are placed in %eax,
* %edx, %ecx (in that order),
and the remaining arguments are placed
* on the stack. All
caller-save registers (eax,edx,ecx) are expected
* to be modified
(either clobbered or used for return values).
*
* The call
instruction itself is marked by placing its start address
* and size
into the .parainstructions section, so that
* apply_paravirt() in
arch/i386/kernel/alternative.c can do the
* appropriate patching
under the control of the backend paravirt_ops
* implementation.
*
*
Unfortunately there's no way to get gcc to generate the args setup
*
for the call, and then allow the call itself to be generated by an
*
inline asm. Because of this, we must do the complete arg setup and
*
return value handling from within these macros. This is fairly
*
cumbersome.
*
* There are 5 sets of PVOP_* macros for dealing
with 0-4 arguments.
* It could be extended to more arguments, but
there would be little
* to be gained from that. For each number of
arguments, there are
* the two VCALL and CALL variants for void and
non-void functions.
*
* When there is a return value, the
invoker of the macro must specify
* the return type. The macro then
uses sizeof() on that type to
* determine whether its a 32 or 64
bit value, and places the return
* in the right register(s) (just
%eax for 32-bit, and %edx:%eax for
* 64-bit).
*
* 64-bit
arguments are passed as a pair of adjacent 32-bit arguments
* in
low,high order.
*
* Small structures are passed and returned in
registers. The macro
* calling convention can't directly deal with
this, so the wrapper
* functions must do this.
*
* These
PVOP_* macros are only defined within this header. This
* means
that all uses must be wrapped in inline functions. This also
*
makes sure the incoming and outgoing types are always correct.
*/
(2)
直接替换指令
使用原因参见下面注释,将cli,sti等常用指令就地patching, 这样就不用调用Hypercall切换到Host.
linux-2.6.23.2/arch/i386/kernel/vmlinux.lds.S
.
= ALIGN(4);
.parainstructions : AT(ADDR(.parainstructions) -
LOAD_OFFSET) {
__parainstructions = .;
*(.parainstructions)
__parainstructions_end = .;
}
start_kernel()=>check_bugs()=>alternative_instructions()
函数中有如下代码
apply_paravirt(__parainstructions, __parainstructions_end);
void
apply_paravirt(struct paravirt_patch_site *start,
struct
paravirt_patch_site *end)
{
struct paravirt_patch_site *p;
char insnbuf[MAX_PATCH_LEN];
if (noreplace_paravirt)
return;
for (p = start; p < end; p++) {
unsigned int used;
BUG_ON(p->len > MAX_PATCH_LEN);
/* prep the buffer with the original instructions */
memcpy(insnbuf, p->instr, p->len);
used =
paravirt_ops.patch(p->instrtype, p->clobbers, insnbuf,
(unsigned long)p->instr, p->len);
BUG_ON(used > p->len);
/* Pad the rest with nops
*/
add_nops(insnbuf + used, p->len - used);
text_poke(p->instr, insnbuf, p->len);
}
}
lugest_init()
中有如下代码:
paravirt_ops.patch = lguest_patch;
/*G:050
*
Patching (Powerfully Placating Performance Pedants)
*
* We have
already seen that "struct paravirt_ops" lets us replace simple
*
native instructions with calls to the appropriate back end all
throughout
* the kernel. This allows the same kernel to run as a
Guest and as a native
* kernel, but it's slow because of all the
indirect branches.
*
* Remember that David Wheeler quote about
"Any problem in computer science can
* be solved with another layer
of indirection"? The rest of that quote is
* "... But that usually
will create another problem." This is the first of
* those
problems.
*
* Our current solution is to allow the paravirt back
end to optionally patch
* over the indirect calls to replace them
with something more efficient. We
* patch the four most commonly
called functions: disable interrupts, enable
* interrupts, restore
interrupts and save interrupts. We usually have 10
* bytes to patch
into: the Guest versions of these operations are small enough
*
that we can fit comfortably.
*
* First we need assembly
templates of each of the patchable Guest operations,
* and these are
in lguest_asm.S. */
/*G:060 We construct a table from the
assembler templates: */
static const struct lguest_insns
{
const char *start, *end;
} lguest_insns[] = {
[PARAVIRT_PATCH(irq_disable)] = { lgstart_cli, lgend_cli },
[PARAVIRT_PATCH(irq_enable)] = { lgstart_sti, lgend_sti },
[PARAVIRT_PATCH(restore_fl)] = { lgstart_popf, lgend_popf },
[PARAVIRT_PATCH(save_fl)] = { lgstart_pushf, lgend_pushf },
};
/*
Now our patch routine is fairly simple (based on the native one in
*
paravirt.c). If we have a replacement, we copy it in and return how
much of
* the available space we used. */
static unsigned
lguest_patch(u8 type, u16 clobber, void *ibuf,
unsigned long addr, unsigned len)
{
unsigned int insn_len;
/* Don't do anything special if we don't have a replacement */
if (type >= ARRAY_SIZE(lguest_insns) || !lguest_insns[type].start)
return paravirt_patch_default(type, clobber, ibuf, addr, len);
insn_len = lguest_insns[type].end - lguest_insns[type].start;
/* Similarly if we can't fit replacement (shouldn't happen, but let's
* be thorough). */
if (len < insn_len)
return
paravirt_patch_default(type, clobber, ibuf, addr, len);
/*
Copy in our instructions. */
memcpy(ibuf,
lguest_insns[type].start, insn_len);
return insn_len;
}
注
意, 在apply_paravirt 之前, 还是paravirt_ops处理敏感指令.
paravirt_ops.save_fl = save_fl;
paravirt_ops.restore_fl =
restore_fl;
paravirt_ops.irq_disable = irq_disable;
paravirt_ops.irq_enable = irq_enable;