Simple VM JIT with LLVM-Dejun.Liu-ChinaUnix博客

Stevenlizard.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

Dejun.Liu

博客访问： 215461
博文数量： 55
博客积分： 1305
博客等级：中尉
技术积分： 350
用户组：普通用户
注册时间： 2010-12-11 15:18

文章分类

全部博文（55）

Opengl（8）
X server（1）
VIM（0）
Protocol（0）
Git（1）
E17（0）
WidgetEngine（0）
Plasma（0）
Connman（0）
Opkg（0）
IME（0）
Qi（0）
Kernel（27）
C Language（2）
Meego（0）
Thinking（1）
未分配的博文（15）

文章存档

2012年（1）

2011年（54）

我的朋友

相关博文

Simple VM JIT with LLVM

分类： LINUX

2011-06-08 14:35:14

I’ve been investigating using for , so I’ve been doing some very small scale experiments. I typically do this on most projects, to get a mental handle on the problem.

In doing this, I’ve written a very tiny VM to play with how LLVM handles it. Here is the breakdown of the entire VM:

Only operates on ints.
Uses numbered registers for operations.
3 Instructions:
1. set(reg, val) Set register number reg to integer value val
2. add(result, reg, val) Add register reg to val and put the result in result
3. show(reg) Print out the contents of register reg

Pretty trivial. Not turing complete because there is no branching. But it’s simple enough to explore how to handle bytecode in LLVM.

Sample Program

So, I’ve written a sample program, encoded directly as bytecode:

[ 0, 0, 3, 1, 0, 0, 4, 2, 0 ]

This is:

Set register 0 to 3
Add register 0 to 4 and put the result in register 0
Show register 0

Again, totally trivial. Should just output 7.

C Switch

So, the VM Design 101 way is to build a C switch statement to interpret each bytecode and perform it’s operations. It would look like this, given that int ops[] contains the above integer sequence for the program:

void add(int* ops, int* registers) { registers[ops[1]] = registers[ops[2]] + ops[3]; } void set(int* ops, int* registers) { registers[ops[1]] = ops[2]; } void show(int* ops, int* registers) { printf("=> %d\n", registers[ops[1]]); } void run(int* ops, int* registers) { switch(*ops) { case 0: set(ops, registers); ops += 3; break; case 1: add(ops, registers); ops += 4; break; case 2: show(ops, registers); return; } }

Very easy. We increments ops to move forward, using ops as a pointer to the current instruction. This is how every VM starts. But this code is very slow when compared to machine code because it obscures the execution flow from the CPU. And besides, this doesn’t use LLVM, the whole point of this post.

A static result…

As we look at the code that was run, we see that the program is set, add, then show. Lets pretend for a sec that given the above functions, we want to perform the same operation, we’d write:

void my_program() { int registers[2] = {0, 0}; int program[10] = [ 0, 0, 3, 1, 0, 0, 4, 2, 0 ] int* ops = (int*)program; set(ops, registers); ops += 3; add(ops, registers); ops += 4; show(ops, registers); ops += 2; }

So, my_program would perform the same operations as your bytecode above, and considerable faster because we avoid all the overhead the switch statement.

Combining the 2

If you look at the bytecode, than the at the hand written C, we can see that there there is a programmatic way to go from our array of numbers to the C code.

A common approach that people have taken in the past to actually write a C code emitter, that would parse the integers once, spit out a .c file, which you’d compile, then run. This works ok for some situations, but it doesn’t allow for any kind of dynamic abilities to run code. And besides, it doesn’t use LLVM.

The idea is to write a function thats takes as input the array of numbers and dynamically builds the equivalent of the above C function. And thats where LLVM comes in.

Part 1: importing the functions

A key part to this scheme is to have the add/set/show functions we defined above available to LLVM. Doing so lets us use our normal tools to write the each instruction as a small operation that can easily be tested. So, given that we have put those 3 functions into ops.c, we run:
llvm-gcc -emit-llvm -O3 -c ops.c, which generates ops.o as a LLVM bitcode file. Using llvm-dis < ops.o we see it contains:

@.str = internal constant [7 x i8] c"=> %dA0" ; [#uses=1] define void @add(i32* %ops, i32* %registers) nounwind { entry: %tmp1 = getelementptr i32* %ops, i32 1 ; [#uses=1] %tmp2 = load i32* %tmp1, align 4 ; [#uses=1] %tmp4 = getelementptr i32* %ops, i32 2 ; [#uses=1] %tmp5 = load i32* %tmp4, align 4 ; [#uses=1] %tmp7 = getelementptr i32* %registers, i32 %tmp5 ; [#uses=1] %tmp8 = load i32* %tmp7, align 4 ; [#uses=1] %tmp10 = getelementptr i32* %ops, i32 3 ; [#uses=1] %tmp11 = load i32* %tmp10, align 4 ; [#uses=1] %tmp12 = add i32 %tmp11, %tmp8 ; [#uses=1] %tmp14 = getelementptr i32* %registers, i32 %tmp2 ; [#uses=1] store i32 %tmp12, i32* %tmp14, align 4 ret void } define void @set(i32* %ops, i32* %registers) nounwind { entry: %tmp1 = getelementptr i32* %ops, i32 1 ; [#uses=1] %tmp2 = load i32* %tmp1, align 4 ; [#uses=1] %tmp4 = getelementptr i32* %ops, i32 2 ; [#uses=1] %tmp5 = load i32* %tmp4, align 4 ; [#uses=1] %tmp7 = getelementptr i32* %registers, i32 %tmp2 ; [#uses=1] store i32 %tmp5, i32* %tmp7, align 4 ret void } declare i32 @printf(i8*, ...) nounwind define void @show(i32* %ops, i32* %registers) nounwind { entry: %tmp1 = getelementptr i32* %ops, i32 1 ; [#uses=1] %tmp2 = load i32* %tmp1, align 4 ; [#uses=1] %tmp4 = getelementptr i32* %registers, i32 %tmp2 ; [#uses=1] %tmp5 = load i32* %tmp4, align 4 ; [#uses=1] %tmp7 = tail call i32 (i8*, ...)* @printf( i8* getelementptr ([7 x i8]* @.str, i32 0, i32 0), i32 %tmp5 ) nounwind ; [#uses=0] ret void }

Now we have our functions ready for easy importing.

Phase 2: LLVM C++ API

When I did this experiment, I decided that the function that, given an array of ints, would drive the LLVM, wasn’t necessary. After all, all I was concerned was if the output of that process would actually work. So instead, I hand wrote a function that uses the LLVM C++ api the same way it would be if this were dynamic:

Function* create(Module** out) { std::string error; Module* jit; // Load in the bitcode file containing the functions for each // bytecode operation. if(MemoryBuffer* buffer = MemoryBuffer::getFile("ops.o", &error)) { jit = ParseBitcodeFile(buffer, &error); delete buffer; } // Pull out references to them. Function* set = jit->getFunction(std::string("set")); Function* add = jit->getFunction(std::string("add")); Function* show = jit->getFunction(std::string("show")); // Now, begin building our new function, which calls the // above functions. Function* body = cast(jit->getOrInsertFunction("body", Type::VoidTy, PointerType::getUnqual(Type::Int32Ty), PointerType::getUnqual(Type::Int32Ty), (Type*)0)); // Our function will be passed the ops pointer and the // registers pointer, just like before. Function::arg_iterator args = body->arg_begin(); Value* ops = args++; ops->setName("ops"); Value* registers = args++; registers->setName("registers"); BasicBlock *bb = BasicBlock::Create("entry", body); // Set up our arguments to be passed to set. std::vector params; params.push_back(ops); params.push_back(registers); // Call out to set, passing ops and registers down CallInst* call = CallInst::Create(set, params.begin(), params.end(), "", bb); ConstantInt* const_3 = ConstantInt::get(APInt(32, "3", 10)); ConstantInt* const_4 = ConstantInt::get(APInt(32, "4", 10)); // add 3 to the ops pointer. GetElementPtrInst* ptr1 = GetElementPtrInst::Create(ops, const_3, "tmp3", bb); // Setup and call add, notice we pass down the updated ops pointer // rather than the original, so that we've moved down. std::vector params2; params2.push_back(ptr1); params2.push_back(registers); CallInst* call2 = CallInst::Create(add, params2.begin(), params2.end(), "", bb); // Push the ops pointer down another 4. GetElementPtrInst* ptr2 = GetElementPtrInst::Create(ops, const_4, "tmp3", bb); // Setup and call show. std::vector params3; params3.push_back(ptr2); params3.push_back(registers); CallInst* call3 = CallInst::Create(show, params3.begin(), params3.end(), "", bb); // And we're done! ReturnInst::Create(bb); *out = jit; return body; }

Now, lets write a function that calls create() and executes the result:

int main() { // The registers. int registers[2] = {0, 0}; // Our program. int program[20] = {0, 0, 3, 1, 0, 0, 4, 2, 0}; int* ops = (int*)program; // Create our function and give us the Module and Function back. Module* jit; Function* func = create(&jit); // Add in optimizations. These were taken from a list that 'opt', LLVMs optimization tool, uses. PassManager p; /* Comment out optimize p.add(new TargetData(jit)); p.add(createVerifierPass()); p.add(createLowerSetJmpPass()); p.add(createRaiseAllocationsPass()); p.add(createCFGSimplificationPass()); p.add(createPromoteMemoryToRegisterPass()); p.add(createGlobalOptimizerPass()); p.add(createGlobalDCEPass()); p.add(createFunctionInliningPass()); */ // Run these optimizations on our Module p.run(*jit); // Setup for JIT ExistingModuleProvider* mp = new ExistingModuleProvider(jit); ExecutionEngine* engine = ExecutionEngine::create(mp); // Show us what we've created! std::cout << "Created\n" << *jit; // Have our function JIT'd into machine code and return. We cast it to a particular C function pointer signature so we can call in nicely. void (*fp)(int*, int*) = (void (*)(int*, int*))engine->getPointerToFunction(func); // Call what we've created! fp(ops, registers); }

We drive our create() function and then execute the result. As you can see, we’ve commented out all optimizations as a first try. The output from running this is:

define void @body(i32* %ops, i32* %registers) { entry: call void @set( i32* %ops, i32* %registers ) %tmp3 = getelementptr i32* %ops, i32 3 ; [#uses=1] call void @add( i32* %tmp3, i32* %registers ) %tmp31 = getelementptr i32* %ops, i32 4 ; [#uses=1] call void @show( i32* %tmp31, i32* %registers ) ret void } => 7

Hey! Look at that! It runs! And we can see what it ran. We see it perform the calls to our functions that implement each opcode. Going back, it would be easily to write a function that dynamically drivers the LLVM C++ API to generate this code, it’s simply one call per bytecode, mapped directly.
Even if that were all that LLVM let us do, it would be worth it, but…

Wait, there’s more!

Before, we ran without optimizations to keep the output simple, lets see what happens we turn them on:

define void @body(i32* %ops, i32* %registers) { entry: %tmp1.i = getelementptr i32* %ops, i32 1 ; [#uses=1] %tmp2.i = load i32* %tmp1.i, align 4 ; [#uses=1] %tmp4.i = getelementptr i32* %ops, i32 2 ; [#uses=1] %tmp5.i = load i32* %tmp4.i, align 4 ; [#uses=1] %tmp7.i = getelementptr i32* %registers, i32 %tmp2.i ; [#uses=1] store i32 %tmp5.i, i32* %tmp7.i, align 4 %tmp3 = getelementptr i32* %ops, i32 3 ; [#uses=3] %tmp1.i7 = getelementptr i32* %tmp3, i32 1 ; [#uses=1] %tmp2.i8 = load i32* %tmp1.i7, align 4 ; [#uses=1] %tmp4.i9 = getelementptr i32* %tmp3, i32 2 ; [#uses=1] %tmp5.i10 = load i32* %tmp4.i9, align 4 ; [#uses=1] %tmp7.i11 = getelementptr i32* %registers, i32 %tmp5.i10 ; [#uses=1] %tmp8.i = load i32* %tmp7.i11, align 4 ; [#uses=1] %tmp10.i = getelementptr i32* %tmp3, i32 3 ; [#uses=1] %tmp11.i = load i32* %tmp10.i, align 4 ; [#uses=1] %tmp12.i = add i32 %tmp11.i, %tmp8.i ; [#uses=1] %tmp14.i = getelementptr i32* %registers, i32 %tmp2.i8 ; [#uses=1] store i32 %tmp12.i, i32* %tmp14.i, align 4 %tmp31 = getelementptr i32* %ops, i32 4 ; [#uses=1] %tmp1.i2 = getelementptr i32* %tmp31, i32 1 ; [#uses=1] %tmp2.i3 = load i32* %tmp1.i2, align 4 ; [#uses=1] %tmp4.i4 = getelementptr i32* %registers, i32 %tmp2.i3 ; [#uses=1] %tmp5.i5 = load i32* %tmp4.i4, align 4 ; [#uses=1] %tmp7.i6 = call i32 (i8*, ...)* @printf( i8* getelementptr ([7 x i8]* @.str, i32 0, i32 0), i32 %tmp5.i5 ) nounwind ; [#uses=0] ret void } => 7

WOW! Now we’re cooking with hot flaming nuclear power! LLVM has the extra mile and rather than just calling our functions that implement each instruction, it’s inlined all that code directly into our dynamically generated function! This means A LOT of additional overhead is discarded because, since our functions themselves did simple operations like store into memory or add 2 numbers, that code is now run a lot more quickly.

It’s commonly known that inlining can dramatically improve performance because it eliminates the over head of function calls (spilling and reload registers, stack frames, etc). And LLVM has just given us a powerful form of that for free.
This doesn’t allow for inlining across things like the kind of method call that Ruby does, but it puts us on the track to being able to feed LLVM enough information to do just that.

LLVM is an amazing piece of software. I can’t wait to start using it more.

阅读(1564) | 评论(0) | 转发(0) |

上一篇：Compiler taking input while compiling

下一篇：Linux调度域负载均衡-设计，实现和应用

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6