Lecture 12 Intermediate Code Generation Translating Expressions Compiler Design Lecture 12 Intermediate Code Generation Translating Expressions
Phases of Compiler Source program Intermediate-code Generator Front End Lexical Analyzer (Scanner) Non-optimized Intermediate Code Tokens Back End Intermediate-code Optimizer Syntax Analyzer (Parser) Optimized Intermediate Code Parse tree Target-code Generator and Optimizer Semantic Analyzer Abstract Syntax Tree w/ Attributes Target machine code
Many Source and Target Languages - Problem Common front end S2 T2 Common back end Sn-1 Tm-1 Sn Tm How many compiler needed? N * M e.g. n= 10, m = 20 200 compiler !
Many Source and Target Languages - Solution Intermediate Language Sn-1 Tm-1 Sn Tm How many compiler needed? N (front ends) + M (back ends) e.g. n= 10, m = 20 30 compiler only
Intermediate Language (IR) Benefits Generate code for several machines One front-end Several back-ends Several high-level languages for one machine Several front-ends One back-end Intermediate language can be interpreted by small program written in machine code Example: Java and JVM Code optimization can be done in IR Written only once for all compilers The intermediate form may be more compact than machine code.
Choosing Intermediate Language (IR) Intermediate Language should be: Easy to translate from a high-level language. Easy to translate from the intermediate language to machine code. Suitable for optimization Trade of between first two contradictory issues High-level IR makes it hard for back-ends Low-level IR makes it hard for front-ends Solution use 2 IR; high-level and low-level And a translator between them
Some Intermediate Representations Abstract Syntax Tree (AST) Direct Acyclic Graph (DAG) Postfix Notation Three-Address Code The most common Set of instruction Each has at most 3 addresses in memory
The three-address code is an intermediate form, which consists of a sequence of assembly-like instructions with three operands per instruction. Each operand can act like a register.
1- Each three-address assignment instruction has at most one operator on the right side. Thus, these instructions fix the order in which operations are to be done. 2- The compiler must generate a temporary name to hold the value computed by a three-address instruction. 3-Some "three-address instructions" have fewer than three operands.
Grammar for an Example IR binary operator A transfer from memory relational operator (=, ≠ <, >, ≤ or≥ ). basics_lulu.pdf 151
Example High-level langue Translated to IR: Not optimized! z*(x-y)+w t1 := z t5 := x t6 := y t2 := t5 − t6 t3 := t1 * t2 t4 := w t0 := t3 + t4 Not optimized! No problem, can be optimized later Why? To simplify translation to IR
Translating Expressions The main complication: Expression language is tree-structured IR is flat: sequential The result of every operation should be stored in a temporary variable Every argument of operation should be stored also in a temporary variable
Syntax Directed Definition for Expression Translation done at compile-time At run-time 154 The function transop translates the name of an operator in the expression language into the name of the corresponding operator in the intermediate language.
Expression Example Idz*idx-idy Exp Exp1 - Exp2 id Exp * Exp id id t3=z .place1 = newvar() = t1, .place2 = newvar() = t2 .code1 = exp1.code(t1), .code2 = exp2.code(t2), Exp .code = [t3=z, t4=x,t1 = t3 * t4, t2=y, t0 = t1 – t2] Exp1 - Exp2 .place1 = newvar() = t3, .place2 = newvar() = t4 .code = [t2 = y] Exp * Exp id .code = [t3=z, t4=x, t1 = t3 * t4] .code = [t3 = z] .code = [t4 = x] id id Idz*idx-idy t3=z t4=x t1 = t3 * t4 t2=y t0 = t1 – t2
Translating Expression without Inherited Attributes (Three-Address Code) top denote the current symbol table Function top.get retrieves the entry Attribute E.addr denotes the address that will hold the value of E. new Temp() creates t1 , t2 ,…. gen(x '=‘ y '+' z) represents the three-address instruction x = y + z. Expressions appearing in place of variables like x, y, and z are evaluated when passed to gen Compilers Principles, Techniques, and Tools, 2-Ed - Ahuo, Ulman 379 402 Advantage: no code generated for variable access (last rule) Disadvantage: not good with side effects e.g. x-(x=3)
a = b + - c
Expression Example 3 idx-(idx=3) x*z – (x=3) Exp Exp1 - Exp2 id .place = newvar() = t0 .code1 = exp1.code = ‘’, .code2 = exp2.code = [t1=3, x = t1] Exp .code = [t1 = 3, x = t1, t0 = x – x] Exp1 - Exp2 .place = x .code = ‘’ .place = x .code = [t1=3, x = t1] id Id = Exp .place = newvar() = t1 .code = [t1 = 3] num idx-(idx=3) Always gives 0 t1 = 3 x = t1 t0 = x – x first argument takes the new value! x*z – (x=3) t1 = x * z old value, correct t2 = 3 x = t2 t0 = t1 – x new value
Expression Example 2 idx-(idx=3) Exp Exp1 - Exp2 id Id = Exp num t0 .place1 = newvar() = t1, .place2 = newvar() = t2 .code1 = exp1.code(t1), .code2 = exp2.code(t2) Exp .code = [t1=x, t3=3,x=t3,t2=x,t0=t1-t2] Exp1 - Exp2 .code = [t1 = x] .place1 = newvar() = t3 .code = [t3 = 3, x=t3, t2=x] id Id = Exp .code = [t3 = 3] num idx-(idx=3) t1=x old value t3=3 x=t3 t2=x new value t0 = t1 – t2 t1 holds the old value of x correct
Translating Function Call
Example 4 3+f(x-y,z) Will be:
References Basics of Compiler Design, Torben Ægidius Mogensen. Published through lulu.com, 2008 Compilers: Principles, Techniques and Tools, Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2nd Edition, Addison-Wesley, 2007 (sections: 6.4)