Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intermediate Representations Saumya Debray Dept. of Computer Science The University of Arizona Tucson, AZ 85721.

Similar presentations


Presentation on theme: "Intermediate Representations Saumya Debray Dept. of Computer Science The University of Arizona Tucson, AZ 85721."— Presentation transcript:

1 Intermediate Representations Saumya Debray Dept. of Computer Science The University of Arizona Tucson, AZ 85721

2 The Role of Intermediate Code Intermediate Code2 lexical analysis syntax analysis static checking intermediate code generation final code generation source code final code tokensintermediate code

3 Intermediate Code3 Why Intermediate Code? Closer to target language. –simplifies code generation. Machine-independent. –simplifies retargeting of the compiler. –Allows a variety of optimizations to be implemented in a machine-independent way. Many compilers use several different intermediate representations.

4 Intermediate Code4 Different Kinds of IRs Graphical IRs: the program structure is represented as a graph (or tree) structure. Example: parse trees, syntax trees, DAGs. Linear IRs: the program is represented as a list of instructions for some virtual machine. Example: three-address code. Hybrid IRs: combines elements of graphical and linear IRs. Example: control flow graphs with 3-address code.

5 Intermediate Code5 Graphical IRs 1: Parse Trees A parse tree is a tree representation of a derivation during parsing. Constructing a parse tree: –The root is the start symbol S of the grammar. –Given a parse tree for  X , if the next derivation step is  X     1 …  n  then the parse tree is obtained as:

6 Intermediate Code6 Graphical IRs 2: Abstract Syntax Trees (AST) A syntax tree shows the structure of a program by abstracting away irrelevant details from a parse tree. –Each node represents a computation to be performed; –The children of the node represents what that computation is performed on.

7 Intermediate Code7 Abstract Syntax Trees: Example Grammar : E  E + T | T T  T * F | F F  ( E ) | id Input: id + id * id Parse tree: Syntax tree:

8 Intermediate Code8 Syntax Trees: Structure Expressions: –leaves: identifiers or constants; –internal nodes are labeled with operators; –the children of a node are its operands. Statements: –a node’s label indicates what kind of statement it is; –the children correspond to the components of the statement.

9 Intermediate Code9 Graphical IRs 3: Directed Acyclic Graphs (DAGs) A DAG is a contraction of an AST that avoids duplication of nodes. reduces compiler memory requirements; exposes redundancies. E.g.: for the expression (x+y)*(x+y), we have: AST: DAG:

10 Intermediate Code10 Linear IRs A linear IR consists of a sequence of instructions that execute in order. –“machine-independent assembly code” Instructions may contain multiple operations, which (if present) execute in parallel. They often form a starting point for hybrid representations (e.g., control flow graphs).

11 Intermediate Code11 Linear IR 1: Three Address Code Instructions are of the form ‘ x = y op z,’ where x, y, z are variables, constants, or “temporaries”. At most one operator allowed on RHS, so no ‘built-up” expressions. Instead, expressions are computed using temporaries (compiler-generated variables). The specific set of operators represented, and their level of abstraction, can vary widely.

12 Intermediate Code12 Three Address Code: Example Source: if ( x + y*z > x*y + z) a = 0; Three Address Code: t1 = y*z t2 = x+t1 // x + y*z t3 = x*y t4 = t3+z // x*y + z if (t2  t4) goto L a = 0 L:

13 Intermediate Code13 An Example Intermediate Instruction Set Assignment: –x = y op z (op binary) –x = op y (op unary); –x = y Jumps: –if ( x op y ) goto L (L a label); –goto L Pointer and indexed assignments: –x = y[ z ] –y[ z ] = x –x = &y –x = *y –*y = x. Procedure call/return: –param x, k (x is the k th param) –retval x –call p –enter p –leave p –return –retrieve x Type Conversion: –x = cvt_ A _to_ B y ( A, B base types) e.g.: cvt_int_to_float Miscellaneous –label L

14 Intermediate Code14 Three Address Code: Representation Each instruction represented as a structure called a quadruple (or “quad”): –contains info about the operation, up to 3 operands. –for operands: use a bit to indicate whether constant or Symbol Table pointer. E.g.: x = y + z if ( x  y ) goto L

15 Intermediate Code15 Linear IRs 2: Stack Machine Code Sometimes called “One-address code.” Assumes the presence of an operand stack. –Most operations take (pop) their operands from the stack and push the result on the stack. Example: code for “x*y + z” Stack machine code push x push y mult push z add Three Address Code tmp1 = x tmp2 = y tmp3 = tmp1 * tmp2 tmp4 = z tmp5 = tmp3 + tmp4

16 Intermediate Code16 Stack Machine Code: Features Compact –the stack creates an implicit name space, so many operands don’t have to be named explicitly in instructions. –this shrinks the size of the IR. Necessitates new operations for manipulating the stack, e.g., “swap top two values”, “duplicate value on top.” Simple to generate and execute. Interpreted stack machine codes easy to port.

17 Intermediate Code17 Linear IRs 3: Register Transfer Language (GNU RTL) Inspired by (and has syntax resembling) Lisp lists. Expressions are not “flattened” as in three- address code, but may be nested. –gives them a tree structure. Incorporates a variety of machine-level information.

18 Intermediate Code18 RTLs (cont ’ d) Low-level information associated with an RTL expression include: “machine modes” – gives the size of a data object; information about access to registers and memory; information relating to instruction scheduling and delay slots; whether a memory reference is “volatile.”

19 Intermediate Code19 RTLs: Examples Example operations: –(plus: m x y), (minus: m x y), (compare: m x y), etc., where m is a machine mode. –(cond [test 1 value 1 test 2 value 2 …] default) –(set lval x) ( assigns x to the place denoted by lval ). –(call func argsz), (return) –(parallel [x 0 x 1 …]) (simultaneous side effects). –(sequence [ins 1 ins 2 … ])

20 Intermediate Code20 RTL Examples (cont ’ d) A call to a function at address a passing n bytes of arguments, where the return value is in a (“hard”) register r : (set (reg:m r ) (call (mem:fm a ) n )) –here m and fm are machine modes. A division operation where the result is truncated to a smaller size: (truncate:m 1 (div:m 2 x (sign_extend:m 2 y)))

21 Intermediate Code21 Hybrid IRs Combine features of graphical and linear IRs: –linear IR aspects capture a lower-level program representation; –graphical IR aspects make control flow behavior explicit. Examples: –control flow graphs –static single assignment form (SSA).

22 Intermediate Code22 Hybrid IRs 1: Control Flow Graphs Example: L1: if x > y goto L0 t1 = x+1 x = t1 L0: y = 0 goto L1 Definition: A control flow graph for a function is a directed graph G = (V, E) such that: –each v  V is a straight-line code sequence (“basic block”); and –there is an edge a  b  E iff control can go directly from a to b.

23 Intermediate Code23 Basic Blocks Definition: A basic block B is a sequence of consecutive instructions such that: 1.control enters B only at its beginning; and 2.control leaves B only at its end (under normal execution); and This implies that if any instruction in a basic block B is executed, then all instructions in B are executed.  for program analysis purposes, we can treat a basic block as a single entity.

24 Intermediate Code24 Identifying Basic Blocks 1.Determine the set of leaders, i.e., the first instruction of each basic block: –the entry point of the function is a leader; –any instruction that is the target of a branch is a leader; –any instruction following a (conditional or unconditional) branch is a leader. 2.For each leader, its basic block consists of: –the leader itself; –all subsequent instructions upto, but not including, the next leader.

25 Intermediate Code25 Example int dotprod(int a[], int b[], int N) { int i, prod = 0; for (i = 1; i  N; i++) { prod += a[i]  b[i]; } return prod; } No.Instructionleader?Block No. 1 enter dotprod Y 1 2 prod = 0 1 3 i = 1 1 4 t1 = 4*i Y 2 5 t2 = a[t1] 2 6 t3 = 4*i 2 7 t4 = b[t3] 2 8 t5 = t2*t4 2 9 t6 = prod+t5 2 10 prod = t6 2 11 t7 = i+i 2 12 i = t7 2 13 if i  N goto 4 2 14 retval prod Y 3 15 leave dotprod 3 16 return 3

26 Intermediate Code26 Hybrid IRs 2: Static Single Assignment Form The Static Single Assignment (SSA) form of a program makes information about variable definitions and uses explicit. –This can simplify program analysis. A program is in SSA form if it satisfies: –each definition has a distinct name; and –each use refers to a single definition. To make this work, the compiler inserts special operations, called  -functions, at points where control flow paths join.

27 Intermediate Code27 SSA Form:  - Functions A  -function behaves as follows: x 1 = … x 2 = … x 3 =  (x 1, x 2 ) This assigns to x 3 the value of x 1, if control comes from the left, and that of x 2 if control comes from the right. On entry to a basic block, all the  -functions in the block execute (conceptually) in parallel.

28 Intermediate Code28 SSA Form: Example Example: Original code Code in SSA form


Download ppt "Intermediate Representations Saumya Debray Dept. of Computer Science The University of Arizona Tucson, AZ 85721."

Similar presentations


Ads by Google