有一部分是GCCwiki上给出的介绍,另外一些是两篇参考文章的阅读笔记。
* GENERIC and GIMPLE overview
GENERIC is designed for a common tree representation for all front end of
gcc, it can represent all constructs needed by a front end, it is language
independed. But GENERIC is too complex to analysis and optimization. To
address this problem, we get another IR GIMPLE. GIMPLE is a very simple
C-like three address language and keeps all the high level attributes of
date types. The data structures are the same trees used by gcc, but we
impose rules on how the trees can be combined. gimplify.c implements the
conversion from GINERIC to GIMPLE. Front ends may define helping functions
like in c-simplify.c, cp/cp-simplify.c and so on. Predicates to determine
whether a tree is in GIMPLE form is defined in tree-simple.c
* SSA
Conversion to SSA form is a three step process driven by tree-optimize.c
1. Convert the functions into GIMPLE form
2. Find variable references in the code, in tree-dfa.c
3. Build a control flow graph in tree-cfg.c. This implementation uses the
same basic_block structure used by the RTL optimizers. (To share a lot of
existing cfg code)
4. Rewrite the tree in SSA form in tree-ssa.c
* Designing the McCat compiler based on a family of structured IR.
(Main reference of GIMPLE design)
1. The design of the family of IR should be driven by the analysis and
transformations that are most important for effective compilation.
2. Parallelism expoitation can be classified:
a. high level restructuring(source to source)
b. low level code optimization and scheduling
These two opts are treated apart as they donot share a common objective. The
first is usually target at automatic loop vectorization or coarse grain
parallelism, do not usually handling fine grain parallelism such as software
pipelining and other scheduling. But the high level one perform dependent
analysis which is needed by the low level one two. This information is
usually lost in IR transformations.
3. Main objectives of designing of McCat compiler
a. A family of IR
b. Accurate, general, and pervasive alias analysis
c. Application and development of advanced compiler tools
4. Main concerns
a. Aiming at structured programs(may contain "structured" use of gotos)
b. Cleaning design
5. McCat system mainly consist of two part
a. A retargetable compiler
b. An architecture testbed
6. Sturcture of mcCat compiler
a. Front-end transform source language to FIRST
The popurse of FIRST is cleanly separate front end processing of
parsing, type checking from mid end processing of analysis and
transformations. FIRST completely captures the information of
current module.
b. Simplify transform FIRST to SIMPLE
Simpler and more structured, suitable for alias analysis and high
level transformations.
c. Blastify transform SIMPLE to LAST
A low level IR suitable for low level transformations
d. Transform into assemble code
7. SIMPLE design
a. Compositional. Control flow is regular an explicit. Donot support
unrestricted use of gotos.
b. Explicit array and structure references.
c. Types and type castings.
d. Pervasive data flow information.
e. Simple enough to analysis.
f. Clear senmantics. Make implict semantics in source language expilict.
g. Interprocedural analysis support. Retain all the information of a
complete Module.
h. Standard IR
8. Three major areas of transformation to generate SIMPLE IR
a. in the representation of variables
We define a varname to be a simple variable name, a pure structure
reference(a.b.c), a pure array reference(a[22][33]), or a pointer to a
structure reference((*a).b.c). Any other reference is broken down into
these.
b. in the format of basic statements, only a restricted number of
operands are allowed
Basic statements: x, y are varnames, a, b, p, q are const or identifer
x = a binop b
*p = a binop b
x =unop a
*p =unop a
x = a
*p = a
x = cast b
*p = cast b
x = *q
*p = *q
x = &y
*p = &y
f(args)
x = f(args)
*p = f(args)
c. in the representation of control flow constructs
restricted version of these control structure:
statements sequence
for-loop, while-loop, do-loop
switch/case
if/else
return
break, continue
while, do, if/else's conditional expression is reduced, all side effect
are moved out, but for-loop still can have side-effects in its head.
switch/case transformations is harder because of its unstructured
possibility. In SIMPLE, its requires are:
(1). switch/case consists of a sequence of cases, ending with a
default
(2). each case is made up of one or more case expressions followed
by a statements list, ending with a stop statement(break, continue,
return)
9. LAST design
a. Support for structured analysis
b. Support for pervasive analysis
c. Support for Load/Store machines
d. Support for simple code generator
e. Expose all opportunities for transformation, means that LAST should
be able to represent assemble level instructions.
10. Diffs between LAST and SIMPLE
a. Explicit address calculations
b. Relation between LAST and SIMPLE, each instruction in LAST keeps a
pointer to its corresponding SIMPLE statement.
c. Expose function prologue and epilogue
d. Expose memory hierarchy
e. Explicit load and store instructions
f. Expose labels and delay-slots
11. Other part of this paper introduce a alias analysis based on SIMPLE and
the motivation of designing McCat system. I think the analysor generator is
very interesting.
* GENERIC and GIMPLE: a new tree representation for entire functions
This paper discuss these two representations.
1. RTL is very useful for low level optimizations but has significant
limitations for higher level optimizations.
a. Its data types is limited to machine words, no concept of array or
structure.
b. It introduce the stack too soon. Taking the address of an object
force it into the stack, even if later opts removes the need of taking
address.
2. The major shortcoming of C trees, from an optimization point of view, is
that they are highly context-dependent. Many _STMT codes just serve as a
place holder for calling expand_ functions, and rely on RTL layer to manage
the scoping. Many these kinds of informations should be made explicit in
trees.
3. GENERIC is simply to provide a language independent way of representing
functions as trees.
4. In GENERIC, a statement is any expression whose value, if any, is
ignored. A statement will always has TREE_SIDE_EFFECTS set(or it will be
discarded), but non-statement expressions may also have side effects.
5. It would be possible to perform opts on GENERIC, but current compiler
don't.
6. If necessary, a front end can use some language dependent tree codes in
its GENERIC representation. In this situation, it must:
a. provide a hook to convert this tree code into GIMPLE
b. not expect them to work with any pass that run befor the conversion
to GIMPLE
7. GIMPLE is a simplified subset of GENERIC for use in opts. It is heavily
influenced by SIMPLE in McCat. One thing changed is that SIMPLE does
not support gotos, but GIMPLE does.
8. GIMPLE retains much of the structures of parse tree, but all expressions
are broken down into a 3-address form.
9. In GIMPLE scopes and control constructs like loops are represented as
container nodes. This kind of nodes is never used for its value. If a
COND_EXPR of BIND_EXPR has a value, it is stored in a temporary in the
controlled block, used inplace of the container.
10. Interface
a. Tree representation of a function is stored in DECL_SAVED_TREE, can
be lowered into GIMPLE by calling simplify_function_tree.
b. Language specific tree codes need its own simplifer, defined in
LANG_HOOKS_SIMPLIFY_EXPR.
c. Once LANG_HOOKS_SIMPLIFY_EXPR is defined, front end can call
simplify_function_tree and then optimizae_function_tree.
11. GIMPLE reference
a. Temporaries
When a subexpression is too complicated, gimplifier creates temporary to
hold its value. This kind of temporary is "expression temporary", and
are allocated with get_formal_tmp_var. Compiler will try elimination of
redundant calculations. Other temporary can be allocated using
get_initialized_tmp_var or create_tmp_var.
b. Expressions
An expression consists of an operation and right number of operands.
Each operand is a constant or a variable. The same rule holds for
arguments to a CALL_EXPR.
The target of an assignment can be a variable, a INDIREC_REF or a
compound lvalue.
The left hand side of a C comma expression is simply moved into a
separate statement.
Compound lvalue, currently, lvalues involving array and structure
references are not broken down.
Conditional expression is converted into an if statement with each
branch assigning to the same Temporary
c. Statements
Most statements will be assignment statements, represented by
MODIFY_EXPR. A CALL_EXPR whose value is ignored can also be a
statement. No other C expressions can appear in statements level.
(1). Blocks
Blocks are represented in GENERIC and GIMPLE using the BIND_EXPR
code. Variables in a block are collected into BIND_EXPR_VARS in
order. Any runtime initialization is moved out of DECL_INITIAL
into a statement in the controlled block. DECL_SAVED_TREE for a
GIMPLE function will always be a BIND_EXPR.
(2). Empty statement
(void) 0
(3). Loops are expressed using LOOP_EXPR, which represents an
infinite loop. Loop condition, break, continue are explicit gotos.
(4). Selection statements
Simple selection statements(like if) is represented using COND_EXPR,
if only one branch is used, the other is filled with a empty
statement. Normally, condition expression of a COND_EXPR is reduced
to a simple comparison. Shortcuts(&& and ||) is converted into
multiple ifs. Switch/case is represented using a SWITCH_EXPR,
contianing the condition, the body and a TREE_VEC of LABEL_DECLS
which the switch can jump to. The case labels are represented in the
body by CASE_LABEL_EXPR
(5). Jumps
Other jumps are represented by GOTO_EXPR of RETURN_EXPR. The operand
of GOTO_EXPR is a label or a variable containing target address. The
operand of RETURN_EXPR is a NULL_TREE or a MODIFY_EXPR which set the
return value.
(6). Cleanups. Destructors for local C++ objects and similar things
are represented using a TRY_FINALLY_EXPR. When the controlled block
exit, the cleanups are run.
(7) Exception handling represented using TRY_CATCH_EXPR. The
operands of this can be a normal statement to be executed when the
controlled block throwing exception, or of the following forms:
.. A CATCH_EXPR executes its handle if the thrown exception
match one of the allowed types. Multiple handler can be
expressed by a sequence of CATCH_EXPR.
.. An EH_FILTER_EXPR executes its handler if the thrown
exception does not match ont of the allows types.
Currently throwing exceptions is implemented by calling a function.
%{
/* Rough GIMPLE Grammar(GENERIC grammar is very similar) */
function:
FUNCTION_DECL
DECL_SAVED_TREE -> block
block:
BIND_EXPR
BIND_EXPR_VARS -> DECL chain
BIND_EXPR_BLOCK -> BLOCK
BIND_EXPR_BODY -> compound-stmt
compound-stmt:
COMPOUND_EXPR
cp0 -> non-compound-stmt
cp1 -> stmt
stmt: compound-stmt
| non-compound-stmt
non-compound-stmt:
block
| loop-stmt
| if=stmt
| switch-stmt
| jump-stmt
| label-stmt
| try-stmt
| modify-stmt
| call-stmt
loop-stmt:
LOOP_EXPR
LOOP_EXPR_BODY -> stmt | NULL_TREE
DO_LOOP_EXPR
(to be defined later)
if-stmt:
COND_EXPR
op0 -> condition
op1 -> stmt
op2 -> stmt
switch-stmt:
SWITCH_EXPR
op0 -> val
op1 -> stmt
op2 -> TREE_VEC of LABEL_DECLs
jump-stmt:
GOTO_EXPR
op0 -> LABEL_DECL | ,*, ID
RETURN_EXPR
op0 -> modify-stmt | NULL_TREE
label-stmt:
LABEL_EXPR
op0 -> LABEL_DECL
| CASE_LABEL_EXPR
CASE_LOW -> val | NULL_TREE
CASE_HIGH -> val | NULL_TREE
CASE_LABEL -> LABEL_DECL
try-stmt:
TRY_CATCH_EXPR
op0 -> stmt
op1 -> handler
| TRY_FINALLY_EXPR
op0 -> stmt
op1 -> stmt
handler:
catch-seq
| EH_FILTER_EXPR
| stmt
catch-seq:
CATCH_EXPR
| COMPOUND_EXPR
op0 -> CATCH_EXPR
op1 -> catch-seq
modify-stmt:
MODIFY_EXPR
op0 -> lhs
op1 -> rhs
call-stmt:
CALL_EXPR
op0 -> DECL | ,&, DECL
op1 -> arglist
arglist:
NULL_TREE
| TREE_LIST
op0 -> val
op1 -> arglist
varname : compref | _DECL
lhs: varname | ,*, _DECL
pseudo-lval: _DECL | ,*, _DECL
compref:
COMPONENT_REF
op0 -> compref | pseudo-lval
| ARRAY_REF
op0 -> compref | pseudo-lval
op1 -> val
condition: val | val relop val
val: _DECL | CONST
rhs:
varname
| CONST
| ,*, _DECL
| ,&, varname
| call_expr
| unop val
| val binop val
| ,(, cast ,), val
unop: ,+, | ,-, | ,!, | ,~,
binop: relop | ,-, | ,+, | ,/, | ,*, | ,%, | ,&, | ,|, | ,<<, | ,>>, | ,^, |
relop : All tree codes of class ,<,
}%
阅读(2556) | 评论(0) | 转发(0) |