Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Structure of a Compiler Front end of a compiler is efficient and can be automated Back end is generally hard to automate and finding the optimum solution.

Similar presentations


Presentation on theme: "1 Structure of a Compiler Front end of a compiler is efficient and can be automated Back end is generally hard to automate and finding the optimum solution."— Presentation transcript:

1 1 Structure of a Compiler Front end of a compiler is efficient and can be automated Back end is generally hard to automate and finding the optimum solution requires exponential time Intermediate code generation can effect the performance of the back end Instruction Selection Instruction Scheduling Register Allocation ScannerParser Semantic Analysis Code Optimization Intermediate Code Generation IR

2 2 Intermediate Representations Abstract Syntax Trees (AST) Directed Acyclic Graphs (DAG) Control Flow Graphs (CFG) Static Single Assignment Form (SSA) Stack Machine Code Three Address Code Hybrid approaches mix graphical and linear representations –SGI and SUN compilers use three address code but provide ASTs for loops if-statements and array references –Use three-address code in basic blocks in control flow graphs high-level low-level Graphical IRs Linear IRs

3 3 Abstract Syntax Trees (ASTs) if (x < y) x = 5*y + 5*y/3; else y = 5; x = x+y; Statements < AssignStmt + * x IfStmt AssignStmt xxy+ yx y / 5 y 3 * 5 y 5

4 4 Directed Acyclic Graphs (DAGs) Use directed acyclic graphs to represent expressions –Use a unique node for each expression if (x < y) x = 5*y + 5*y/3; else y = 5; x = x+y; Statements < AssignStmt * IfStmt AssignStmt x+y / 5 3

5 5 Control Flow Graphs (CFGs) Nodes in the control flow graph are basic blocks –A basic block is a sequence of statements always entered at the beginning of the block and exited at the end Edges in the control flow graph represent the control flow if (x < y) x = 5*y + 5*y/3; else y = 5; x = x+y; if (x < y) goto B 1 else goto B 2 x = 5*y + 5*y/3y = 5 x = x+y B1B1 B2B2 B0B0 B3B3 Each block has a sequence of statements No jump from or to the middle of the block Once a block starts executing, it will execute till the end

6 6 Stack Machine Code if (x < y) x = 5*y + 5*y/3; else y = 5; x = x+y; load x load y iflt L1 goto L2 L1: push 5 load y multiply push 5 load y multiply push 3 divide add store x goto L3 L2: push 5 store y L3: load x load y add store x pops the top two elements and compares them pops the top two elements, multiplies them, and pushes the result back to the stack pushes the value at the location x to the stack stores the value at the top of the stack to the location x

7 7 Three-Address Code Each instruction can have at most three operands Assignments –x := y –x := y op z op: binary arithmetic or logical operators –x := op yop: unary operators (minus, negation, integer to float conversion) Branch –goto LExecute the statement with labeled L next Conditional Branch –if x relop y goto Lrelop: =, ==, != if the condition holds we execute statement labeled L next if the condition does not hold we execute the statement following this statement next

8 Data structures for three address codes Quadruples –Has four fields: op, arg1, arg2 and result Triples –Temporaries are not used and instead references to instructions are made Indirect triples –In addition to triples we use a list of pointers to triples

9 Quadruples A record structure with 4 fields –op, arg1, arg2 and result Examples –For x := y op z we have: y in arg1, z in arg2 and x in result –For unary operators, arg2 not used Content of fields are pointers to ST entries

10 Triples Temps generated in quadruples must be entered in symbol table To avoid this, we can refer to a temp value by the location of the relevant statement –We can have records with only 3 fields op, arg1 and arg2 –Fields arg1 and arg2 can be pointers to ST entries or to triple structure for temp values

11 Indirect Triples Listing of pointers to triples, rather than triples themselves Example –We can use an array to list pointers to triples in the desired order

12 Example b * minus c + b * minus c t1 = minus c t2 = b * t1 t3 = minus c t4 = b * t3 t5 = t2 + t4 a = t5 Three address code minus * c t3 * + = ct1 bt2t1 bt4t3 t2t5t4 t5a arg1resultarg2op Quadruples minus * c * + = c b(0) b(2) (1)(3) a arg1arg2op Triples (4) 0 1 2 3 4 5 minus * c * + = c b(0) b(2) (1)(3) a arg1arg2op Indirect Triples (4) 0 1 2 3 4 5 (0) (1) (2) (3) (4) (5) op 35 36 37 38 39 40

13 Examples 1.X=(a+b)*-c/d 13

14 14 Three-Address Code Generation for a Simple Grammar ProductionsSemantic Rules S  id := Eid.place  lookup(id.name); S.code  E.code || gen(id.place ‘:=‘ E.place); E  E 1 + E 2 E.place  newtemp(); E.code  E 1.code || E 2.code || gen(E.place ‘:=‘ E 1.place ‘+’ E 2.place); E  E 1 * E 2 E.place  newtemp(); E.code  E 1.code || E 2.code || gen(E.place ‘:=‘ E 1.place ‘*’ E 2.place); E  ( E 1 )E.code  E 1.code; E.place  E 1.place; E   E 1 E.place  newtemp(); E.code  E 1.code || gen(E.place ‘:=‘ ‘uminus’ E 1.place); E  id E.place  lookup(id.name); E.code  ‘’ (empty string) Attributes:E.place: location that holds the value of expression E E.code: sequence of instructions that are generated for E Procedures:newtemp(): Returns a new temporary each time it is called gen(): Generates instruction (have to call it with appropriate arguments) lookup(id.name): Returns the location of id from the symbol table

15 15 Stack Machine Code Generation for a Simple Grammar ProductionsSemantic Rules S  id := Eid.place  lookup(id.name); S.code  E.code || gen(‘store’ id.place); E  E 1 + E 2 E.code  E 1.code || E 2.code || gen(‘add’); (arguments for the add instruction are in the top of the stack) E  E 1 * E 2 E.code  E 1.code || E 2.code || gen(‘multiply’); E  ( E 1 )E.code  E 1.code; E   E 1 E.code  E 1.code || gen( ‘negate‘); E  id E.code  gen(‘load’ id.place) Attributes:E.code: sequence of instructions that are generated for E (no place for an expression is needed since the result of an expression is stored in the operand stack) Procedures:newtemp(): Returns a new temporary each time it is called gen(): Generates instruction (have to call it with appropriate arguments) lookup(id.name): Returns the location of id from the symbol table

16 16 Code Generation for Boolean Expressions Two approaches –Numerical representation –Implicit representation Numerical representation –Use 1 to represent true, use 0 to represent false –For three-address code store this result in a temporary –For stack machine code store this result in the stack Implicit representation –For the boolean expressions which are used in flow-of-control statements (such as if-statements, while-statements etc.) boolean expressions do not have to explicitly compute a value, they just need to branch to the right instruction –Generate code for boolean expressions which branch to the appropriate instruction based on the result of the boolean expression

17 17 Boolean Expressions: Numerical Representation Attributes :E.place: location that holds the value of expression E E.code: sequence of instructions that are generated for E id.place: location for id Global variable:nextstat: Returns the location of the next instruction to be generated (each call to gen() increments nextstat by 1) ProductionsSemantic Rules E  id 1 relop id 2 E.place  newtemp(); E.code  gen(‘if’ id 1.place relop.op id 2.place ‘goto’ nextstat+3); || gen(E.place ‘:=‘ ‘0’) || gen(‘goto’ nextstat+2) || gen(E.place ‘:=‘ ‘1’); E  E 1 and E 2 E.place  newtemp(); E.code  E 1.code || E 2.code || gen(E.place ‘:=‘ E 1.place ‘and’ E 2.place);

18 18 Boolean Expressions: Implicit Representation ProductionsSemantic Rules E  id 1 relop id 2 E.code  gen(‘if’ id 1.place relop.op id 2.place ‘goto’ E.true) || gen(‘goto’ E.false); E  E 1 and E 2 E 1.true  newlabel(); E 1.false  E. false; (short-circuiting) E 2.true  E. true; E 2.false  E. false; E.code  E 1.code || gen(E 1.true ‘:’) || E 2.code ; Attributes :E.code: sequence of instructions that are generated for E E.false: instruction to branch to if E evaluates to false E.true: instruction to branch to if E evaluates to true (E.code is synthesized whereas E.true and E.false are inherited) id.place: location for id can be any relational operator: ==, = != These places will be filled with lables later on when they become available This generated label will be inserted to the place for E 1.true in the code generated for E 1

19 19 Example 100if x < y goto 103 101t1 := 0 102goto 104 103t1 := 1 104if a = b goto 107 105t2 := 0 106goto 108 107t2 := 1 108t3 := t1 and t2 Input boolean expression: x < y and a == b Numerical representation: if x < y goto L1 goto LFalse L1:if a = b goto LTrue goto LFalse... LTrue: LFalse: Implicit representation: These are the locations of three-address code instructions, they are not labels These labels will be generated later on, and will be inserted to the corresponding places

20 20 Flow-of-Control Statements If-then-else Branch based on the result of boolean expression Loops Evaluate condition before loop (if needed) Evaluate condition after loop Branch back to the top if condition holds Merges test with last block of loop body While, for, do, and until all fit this basic model Pre-test Loop body Post-test Next block

21 21 Flow-of-Control Statements: Code Structure E.code S 1.code goto S.next S 2.code    to E.true to E.false E.true: E.false: S.next: S  if E then S 1 else S 2 if E evaluates to true if E evaluates to false E.code S 1.code goto S.begin    to E.true to E.false E.true: E.false: S.begin: S  while E do S 1 Another approach is to place E.code after S 1.code

22 22 Flow-of-Control Statements ProductionsSemantic Rules S  if E then S 1 else S 2 E.true  newlabel(); E.false  newlabel(); S 1.next  S. next; S 2.next  S. next; S.code  E.code || gen(E.true ‘:’) || S 1.code || gen(‘goto’ S.next) || gen(E.false ‘:’) || S 2.code ; S  while E do S 1 S.begin  newlabel(); E.true  newlabel(); E.false  S. next; S 1.next  S. begin; S.code  gen(S.begin ‘:’) || E.code || gen(E.true ‘:’) || S 1.code || gen(‘goto’ S.begin); S  S 1 ; S 2 S 1.next  newlabel(); S 2.next  S.next; S.code  S 1.code || gen(S 1.next ‘:’) || S 2.code Attributes : S.code: sequence of instructions that are generated for S S.next: label of the instruction that will be executed immediately after S (S.next is an inherited attribute)

23 23 Example Input code fragment: while (a < b) { if (c < d) x = y + z; else x = y – z } L1:if a < b goto L2 goto LNext L2:if c < d goto L3 goto L4 L3:t1 := y + z x := t1 goto L1 L4:t2 := y – z x := t2 goto L1 LNext:...

24 24 Backpatching E.true, E.false, S.next may not be computed in a single pass (they are inherited attributes) Backpatching is a technique for generating labels for E.true, E.false, S.next and inserting them to the appropriate locations Basic idea –Keep lists E.truelist, E.falselist, S.nextlist E.truelist: the list of instructions where the label for E.true have to be inserted when it becomes available S.nextlist: the list of instructions where the label for S.next have to be inserted when it becomes available –When labels E.true, E.false, S.next are computed these labels are inserted to the instructions in these lists

25 25 Flow-of-Control Statements: Case Statements Case Statements 1 Evaluate the controlling expression 2 Branch to the selected case 3 Execute the code for that case 4 Branch to the statement after the case Part 2 is the key Strategies Linear search (nested if-then-else constructs) Build a table of case values & binary search it Directly compute an address (requires dense case set) –Use an array of labels that is addressed by the case value

26 26 Type Conversions Mixed-type expressions Insert conversions as needed from conversion table –i2f r 1, r 2 (convert the integer value in register r 1 to float, and store the result in register r 2 ) Most languages have symmetric conversion tables Typical Addition Table

27 translation of a simple if-statement

28 Backpatching Previous codes for Boolean expressions insert symbolic labels for jumps It therefore needs a separate pass to set them to appropriate addresses We can use a technique named backpatching to avoid this We assume we save instructions into an array and labels will be indices in the array For nonterminal B we use two attributes B.truelist and B.falselist together with following functions: –makelist(i): create a new list containing only I, an index into the array of instructions –Merge(p1,p2): concatenates the lists pointed by p1 and p2 and returns a pointer to the concatenated list –Backpatch(p,i): inserts i as the target label for each of the instruction on the list pointed to by p

29 Backpatching for Boolean Expressions

30 Backpatching for Boolean Expressions Annotated parse tree for x 200 && x ! = y

31 Flow-of-Control Statements

32 Translation of a switch-statement


Download ppt "1 Structure of a Compiler Front end of a compiler is efficient and can be automated Back end is generally hard to automate and finding the optimum solution."

Similar presentations


Ads by Google