分类: LINUX
2010-04-14 11:12:27
This lab requires you to write a virtual machine (an interpreter) for a small stack-based low-level language called mini-bytecode, just like the well-known Java bytecode.
You should this package, in which we have offered you a code skeleton. Here is a brief discription of the files:
call-main.sml
: the entry point of whole program
main.sig, main.sml
: control all parts of the mini-JVM
file-reader.sig, file-reader.sml
: read as input a bytecode file, to form a char list
bytecode.sig, bytecode.fun
: the internal representation of a bytecode file
parser.sig, parser.fun
: parse the input char list, and form
the internal representation of a bytecode file
jvm.sig, jvm.fun
: the mini-JVM execution engine
sources.cm, sources.mlb, jvm.mlb
: various makefiles
test/*
: some test cases
util/*
: various utility library
First, compile the sources with MLton:
mlton -verbose 2 jvm.mlbwhich would normally produce an executable
jvm.exe
or jvm
(If you don't produce the executable, contact us!). And now you may try to
run a test case and see the result by typing:
jvm -verbose 2 test/gcd.classYou may see the execution information ended with an exeception
unhandled exeception: AddYourCodewhich means that you must add your code at some point (as we'd point out a little later) to make the executable run correctly. (Question: what other options jvm supports besides
-verbose
?)
The high-level execution flow is: first, file-reader
read from a text file and return its content as a list of char; and
then parser
parses the char list into bytecode
, and
finally, the jvm
executes the internal representation of
bytecode and print out the final result to standard out.
Your job is to finish
the files "bytecode.fun", "parser.fun"
and
"jvm.fun"
. The code you should modify looks like:
raise AddYourCodeSo you should find all place marked as this and replace them with your code.
We have also provided you two test cases:
"gcd.class"
and "sum.class"
, as can be
found in the "test/"
sub-directory. The correct output should be "Result:4"
and "Result:55"
separately. After you finish all the files,
you can compile by "CM.make "sources.cm";"
using SML/NJ
or by "mlton jvm.mlb"
in command-line using MLton to test your code.
The mini-Bytecode language is stack-based, supporting following instructions, in which i is an integer between 0 and 127.
Program is a sequence of these instructions. For example, here is a program to find the greatest common divider of 12 and 8.
0 : PUSH 12
1 : PUSH 8
2 : JEQ 8
3 : JLT 6
4 : SUB
5 : JUMP 2
6 : SWAP
7 : JUMP 2
8 : STOP
Java bytecode program are encoded in binary form, which are relatively hard and tedious to decode. Hence, we have made things much simpler by encoding the mini-Bytecode in Ascii form, rather than in binary form.
First, integers will be encoded as a character. Here ord maps a character to a number according to . And chr is the opposite of ord.
encInteger i = chr(ord "0" + i)
So, 1,2,3 will be encoded as "1","2","3" and 41,42,43 as "Y","Z","[".
The encoding of all the instructions is defined as function enc. Here ^ means to concatenate two strings. "a" ^ "b" = "ab".
enc(pop) | "p" |
enc(add) | "+" |
enc(sub) | "-" |
enc(swap) | "s" |
enc(push i) | "c" ^ encInteger(i) |
enc(jump i) | "j" ^ encInteger(i) |
enc(jeq i) | "=" ^ encInteger(i) |
enc(jlt i) | "<" ^ encInteger(i) |
enc(load i) | "r" ^ encInteger(i) |
enc(store i) | "w" ^ encInteger(i) |
enc(stop) | "." |
For example, the program showed in previous section is encoded as "c
A mini-Bytecode program is a list of instructions and
we mark them from 0 as showed in .
The machine state during the execution is (stack, store, pc)
. Here
stack is the data stack,
store is a memory containing all variables, pc
is the program counter (i.e., the next instruction in the instructions
list to be executed). For example, ([1,2], [0:12,1:8], 3)
means we have a stack of [1,2] , a store mapping variable 0 to 12
and variable 1 to 8, and we are going to execute the 3rd instruction.
Suppose that before an instruction instr
executes, the
machine state is ([x, y, ...], store, pc)
(remember that
now pc
points to instr
),
we describe the effect of executing the instruction instr
in the table below.
Instruction | Machine state after execution |
---|---|
pop | ([y,…],var,pc+1) |
add | ([(x+y),y,…],var,pc+1) |
sub | ([(x-y),y,…],var,pc+1) |
swap | ([y,x,…],var,pc+1) |
push i | ([i,x,y,…],var,pc+1) |
jump i | ([x,y,…],var,i) |
jeq i | if x=y then ([x,y,..],var,i) else ([x,y,…],var,pc+1) |
jlt i | if x |
load i | ([var(i),x,y,…],var,pc+1) |
store i | ([y,…],update var(i) to be x,pc+1) |
stop | ([x,y,..],var,pc) |
The initial mathine state is ([], [], 0)
, which
means an empty stack and store, and we run from the 0th instruction.
Program stops when we reach stop
, and the top element
on stack is treated as the final result.
For example, the program in will execute as below. And the result is 4.
PUSH 12 | ([12 ],[ ],1) |
PUSH 8 | ([8,12],[ ],2) |
JEQ 8 | ([8,12],[ ],3) |
JLT 6 | ([8,12],[ ],6) |
SWAP | ([12,8],[ ],7) |
JUMP 2 | ([12,8],[ ],2) |
JEQ 8 | ([12,8],[ ],3) |
JLT 6 | ([12,8],[ ],4) |
SUB | ([4,8],[ ],5) |
JUMP 2 | ([4,8],[ ],2) |
JEQ 8 | ([4,8],[ ],3) |
JLT 6 | ([4,8],[ ],6) |
SWAP | ([8,4],[ ],7) |
JUMP 2 | ([8,4],[ ],2) |
JEQ 8 | ([8,4],[ ],3) |
JLT 6 | ([8,4],[ ],4) |
SUB | ([4,4],[ ],5) |
JUMP 2 | ([4,4],[ ],2) |
JEQ 8 | ([4,4],[ ],8) |
STOP | ([4,4],[ ],8) |
"Result:***"
.
The last thing to note is that you should not change the file name
of jvm.mlb
(but you can modify its content).