Topic #7: Intermediate Code EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Slides:



Advertisements
Similar presentations
Chapter 6 Intermediate Code Generation
Advertisements

Intermediate Code Generation
Intermediate Code Generation. 2 Intermediate languages Declarations Expressions Statements.
Backpatching: The syntax directed definition we discussed before can be implemented in two or more passes (we have both synthesized attributes and inheritent.
Lecture 08a – Backpatching & Recap Eran Yahav 1 Reference: Dragon 6.2,6.3,6.4,6.6.
8 Intermediate code generation
Chapter 8 Intermediate Code Generation. Intermediate languages: Syntax trees, three-address code, quadruples. Types of Three – Address Statements: x :=
1 Compiler Construction Intermediate Code Generation.
Overview of Previous Lesson(s) Over View  Front end analyzes a source program and creates an intermediate representation from which the back end generates.
Compiler Construction
Compiler Construction Sohail Aslam Lecture Boolean Expressions E → E 1 and M E 2 {backpatch(E 1.truelist, M.quad); E.truelist = E 2.truelist; E.falselist.
Compiler Designs and Constructions
Three Address Code Generation Backpatching-I Prepared By: Siddharth Tiwary 04CS3010.
CS412/413 Introduction to Compilers Radu Rugina Lecture 16: Efficient Translation to Low IR 25 Feb 02.
1 CMPSC 160 Translation of Programming Languages Fall 2002 Lecture-Modules 17 and 18 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon.
Intermediate Code Generation Professor Yihjia Tsai Tamkang University.
1 Intermediate Code generation. 2 Intermediate Code Generation l Intermediate languages l Declarations l Expressions l Statements l Reference: »Chapter.
Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2008/2009.
CH4.1 CSE244 Intermediate Code Generation Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Unit.
Test Yourself #2 Test Yourself #3 Initial code: a := x ** 2 b := 3 c := x d := c * c e := b * 2 f := a + d g := e * f.
CSc 453 Intermediate Code Generation Saumya Debray The University of Arizona Tucson.
What is Three Address Code? A statement of the form x = y op z is a three address statement. x, y and z here are the three operands and op is any logical.
CSc 453 Intermediate Code Generation Saumya Debray The University of Arizona Tucson.
1 Structure of a Compiler Front end of a compiler is efficient and can be automated Back end is generally hard to automate and finding the optimum solution.
Compiler Chapter# 5 Intermediate code generation.
1 Intermediate Code Generation Part I Chapter 8 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
Chapter 8: Intermediate Code Generation
Review: –What is an activation record? –What are the typical fields in an activation record? –What are the storage allocation strategies? Which program.
1 October 25, October 25, 2015October 25, 2015October 25, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Intermediate Code Generation
1 June 3, June 3, 2016June 3, 2016June 3, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University,
Topic #1: Introduction EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
1 Intermediate Code Generation Abstraction at the source level identifiers, operators, expressions, statements, conditionals, iteration, functions (user.
Boolean expressions 1 productionsemantic action E  E1 or E2E1.trueLabel = E.trueLabel; E1.falseLabel = freshLabel(); E2.trueLabel = E.trueLabel; E2.falseLabel.
Chap. 4, Intermediate Code Generation
1 Structure of a Compiler Source Language Target Language Semantic Analyzer Syntax Analyzer Lexical Analyzer Front End Code Optimizer Target Code Generator.
1 February 28, February 28, 2016February 28, 2016February 28, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.
Three Address Code Generation of Control Statements continued..
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 10 Ahmed Ezzat.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS 404 Introduction to Compiler Design
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler Construction
Compiler Construction
Subject Name:COMPILER DESIGN Subject Code:10CS63
Compiler Optimization and Code Generation
Intermediate Code Generation Part I
Intermediate Code Generation
Intermediate Code Generation Part II
Chapter 6 Intermediate-Code Generation
Intermediate Code Generation Part II
Three Address Code Generation - Control Statements
Intermediate Code Generation Part I
Intermediate Code Generation Part II
Intermediate code generation
Three-address code A more common representation is THREE-ADDRESS CODE . Three address code is close to assembly language, making machine code generation.
7.4 Boolean Expression and Control Flow Statements
Compiler Design 21. Intermediate Code Generation
Intermediate Code Generation Part I
Three Address Code Generation – Backpatching
Compiler Construction
Intermediate Code Generation Part II
Intermediate Code Generation
Compiler Construction
Intermediate Code Generation Part II
Review: For array a[2,5], how would the memory for this array looks like using row major layout? What about column major layout? How to compute the address.
Intermediate Code Generation Part I
Compiler Design 21. Intermediate Code Generation
Review: What is an activation record?
Compiler Construction
Presentation transcript:

Topic #7: Intermediate Code EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003

Intermediate Code Generation The front end of a compiler translates a source program into an intermediate representation Details of the back end are left to the back end Benefit include: –Retargeting –Machine-independent code optimization

Intermediate Languages Consider code: a := b * -c + b * -c A syntax tree graphically depicts code (example at right) Postfix notation is a linearized representation of a syntax tree: a b c uminus * b c uminus * + assign

Three-Address Code Three-address code is a sequence of statements of the general form x := y op z x, y, and z are names, constants, or compiler- generated temporaries op can be any operator Three-address code is a linearized representation of a syntax tree Explicit names correspond to interior nodes of the graph

Three-Address Code Example t1 := -c t2 := b * t1 t3 := -c t4 := b * t3 t5 := t2 + t4 a := t5

Types of Statements Assignment statements of the form x := y op z, where op is a binary operator Assignment statements of the form x := op y where op is a unary operator Copy statements of the form x := y The unconditional jump goto L Conditional jumps such is if x relop y goto L param x and call p, n and return y relating to procedure calls Assignments of the form x := y[i] and x[i] := y Address and pointer assignments of the form x := &y, x := *y, and *x = y

Generating Three-Address Code Temporary names are made up for the interior nodes of a syntax tree The synthesized attribute S.code represents the code for the assignment S The nonterminal E has attributes: –E.place is the name that will hold the value of E –E.code is a sequence of three-address statements evaluating E The function newtemp returns a sequence of distinct names The function newlabel returns a sequence of distinct labels

Assignments (1) ProductionSemantic Rules S  id := ES.code := E.code || gen(id.place ':=' E.place) E  E 1 + E 2 E.place := newtemp; E.code := E 1.code || E 2.code || gen(E.place ':=' E 1.place '+' E 2.place) E  E 1 * E 2 E.place := newtemp; E.code := E 1.code || E 2.code || gen(E.place ':=' E 1.place '*' E 2.place)

Assignments (2) ProductionSemantic Rules E  -E 1 E.place := newtemp; E.code := E 1.code || gen(E.place ':=' 'uminus' E 1.place) E  (E 1 )E.place := E 1.place; E.code := E 1.code E  idE.place := id.place; E.code := ''

Example: S  while E do S 1 S.begin := newlabel; S.after := newlabel; S.code := gen(S.begin ':') || E.code || gen('if' E.place '=' '0' 'goto' S.after) || S 1.code || gen('goto' S.begin) gen(S.after ':')

Quadruples A quadruple is a record structure with four fields: op, arg1, arg2, and result –The op field contains an internal code for an operator –Statements with unary operators do not use arg2 –Operators like param use neither arg2 nor result –The target label for conditional and unconditional jumps are in result The contents of fields arg1, arg2, and result are typically pointers to symbol table entries –If so, temporaries must be entered into the symbol table as they are created –Obviously, constants need to be handled differently

Quadruples Example oparg1arg2result (0)uminusct1 (1)*bt1t2 (2)uminusct3 (3)*bt3t4 (4)+t2t4t5 (5):=t5a

Triples Triples refer to a temporary value by the position of the statement that computes it –Statements can be represented by a record with only three fields: op, arg1, and arg2 –Avoids the need to enter temporary names into the symbol table Contents of arg1 and arg2: –Pointer into symbol table (for programmer defined names) –Pointer into triple structure (for temporaries) –Of course, still need to handle constants differently

Triples Example oparg1arg2 (0)uminusc (1)*b(0) (2)uminusc (3)*b(2) (4)+t2(3) (5)assigna(4)

Declarations (1) A symbol table entry is created for every declared name Information includes name, type, relative address of storage, etc. Relative address consists of an offset: –Offset is from the base of the static data area for globals –Offset is from the field for local data in an activation record for locals to procedures Types are assigned attributes type and width Becomes more complex if we need to deal with nested procedures or records

Declarations (2) ProductionSemantic Rules P  M D M  εoffset := 0 D  D ; D D  id : T enter(id.name, T.type, offset); offset := offset + T.width T  integer T.type := integer; T.width := 4 T  real T.type := real T.width := 8 T  array[num] of T 1 T.type := array(num, T 1.type); T.width := num * T 1.width

Addressing Array Elements The location of the i th element of array A is: base + (i – low) * w –w is the width of each element –low is the lower bound of the subscript –base is the relative address of A[low] The expression for the location can be rewritten as: i * w + (base – low * w) –The subexpression in parentheses is a constant –That subexpression can be evaluated at compile time More complex for multi-dimensional arrays

Translating Assignments (1) ProductionSemantic Rules S  id := E p := lookup(id.name); if p != NULL then emit(p ':=' E.place) else error E  E 1 + E 2 E.place := newtemp; emit(E.place ':=' E 1.place '+' E 2.place) E  E 1 * E 2 E.place := newtemp; emit(E.place ':=' E 1.place '*' E 2.place)

Translating Assignments (2) ProductionSemantic Rules E  -E 1 E.place := newtemp; emit(E.place ':=' 'uminus' E 1.place) E  (E 1 )E.place := E 1.place E  id p := lookup(id.name); if p != NULL then E.place := p else error

Type Conversions There are multiple types (e.g. integer, real ) for variables and constants –Compiler may need to reject certain mixed- type operations –At times, a compiler needs to general type conversion instructions An attribute E.type holds the type of an expression

Semantic Action: E  E 1 + E 2 E.place := newtemp; if E 1.type = integer and E 2.type = integer then begin emit(E.place ':=' E 1.place 'int+' E 2.place); E.type := integer end else if E 1.type = real and E2.type = real then … else if E 1.type = integer and E 2.type = real then begin u := newtemp; emit(u ':=' 'inttoreal' E 1.place); emit(E.place ':=' u 'real+' E 2.place); E.type := real end else if E 1.type = real and E 2.type = integer then … else E.type := type_error;

Example: x := y + i * j In this example, x and y have type real i and j have type integer The intermediate code is shown below: t1 := i int* j t3 := inttoreal t1 t2 := y real+ t3 x := t2

Boolean Expressions Boolean expressions compute logical values Often used with flow-of-control statements Methods of translating boolean expression: –Numerical methods: True is represented as 1 and false is represented as 0 Nonzero values are considered true and zero values are considered false –Flow-of-control methods: Represent the value of a boolean by the position reached in a program Often not necessary to evaluate entire expression

Numerical Representation (1) Expressions evaluated left to right using 1 to denote true and 0 to donate false Example: a or b and not c t1 := not c t2 := b and t1 t3 := a or t2 Another example: a < b 100: if a < b goto : t : = 0 102: goto : t : = 1 104: …

Numerical Representation (2) ProductionSemantic Rules E  E 1 or E 2 E.place := newtemp; emit(E.place ':=' E 1.place 'or' E 2.place) E  E 1 and E 2 E.place := newtemp; emit(E.place ':=' E 1.place 'and' E 2.place) E  not E 1 E.place := newtemp; emit(E.place ':=' 'not' E 1.place) E  (E 1 )E.place := E1.place;

Numerical Representation (3) ProductionSemantic Rules E  id 1 relop id 2 E.place := newtemp; emit('if' id 1.place relop.op id 2.place 'goto' nextstat+3); emit(E.place ':=' '0'); emit('goto' nextstat+2); emit(E.place ':=' '1'); E  true E.place := newtemp; emit(E.place ':=' '1') E  false E.place := newtemp; emit(E.place ':=' '0')

Example: a<b or c<d and e<f 100: if a < b goto : t1 := 0 102: goto : t1 := 1 104: if c < d goto : t2 := 0 106: goto : t2 := 1 108: if e < f goto : t3 := 0 110: goto : goto 23 := 1 112: t4 := t2 and t3 113: t5 := t1 or t4

Flow-of-Control (1) The function newlabel will return a new symbolic label each time it is called Each boolean expression will have two new attributes: –E.true is the label to which control flows if E is true –E.false is the label to which control flows if E is false Attribute S.next of a statement S : –Inherited attribute whose value is the label attached to the first instruction to be executed after the code for S –Used to avoid jumps to jumps

Flow-of-Control (2) ProductionSemantic Rules S  if E then S 1 E.true := newlabel; E.false := S.next; S1.next := S.next; S.code := E.code || gen(E.true ':') || S 1.code

Flow-of-Control (3) ProductionSemantic Rules S  if E then S 1 else S 2 E.true := newlabel; E.false := newlabel; S1.next := S.next; S2.next := S.next; S.code := E.code || gen(E.true ':') || S 1.code || gen('goto' S.next) || gen(E.false ':') || S 2.code

Flow-of-Control (4) ProductionSemantic Rules S  while E do S 1 S.begin := newlabel; E.true := newlabel; E.false := S.next; S1.next := S.begin; S.code := gen(S.begin ':') || E.code || gen(E.true ':') || S 1.code || gen('goto' S.begin)

Boolean Expressions Revisited (1) ProductionSemantic Rules E  E 1 or E 2 E 1.true := E.true; E 1.false := newlabel; E 2.true := E.true; E 2.false := E.false; E.code := E 1.code || gen(E 1.false ':') || E 2.code E  E 1 and E 2 E 1.true := newlabel; E 1.false := E.false; E 2.true := E.true; E 2.false := E.false; E.code := E 1.code || gen(E 1.true ':') || E 2.code

Boolean Expressions Revisited (2) ProductionSemantic Rules E  not E 1 E 1.true := E.false; E 1.false := E.true; E.code := E1.code E  (E 1 )E 1.true := E.true; E 1.false := E.false; E.code := E 1.code E  id 1 relop id 2 E.code := gen('if' id.place relop.op id2.place 'goto' E.true) || gen('goto' E.false) E  trueE.code := gen('goto' E.true) E  falseE.code := gen('goto' E.false)

Revisited: a<b or c<d and e<f if a < b goto Ltrue goto L1 L1:if c < d goto L2 goto Lfalse L2:if e < f goto Ltrue goto Lfalse

Another Example while a < b do if c < d then x := y + z else x := y - z L1:if a < b goto L2 goto Lnext L2:if c < d goto L3 goto L4 L3:t1 := y + z x:= t1 goto L1 L4:t2 := y – z X := t2 goto L1 Lnext:

Mixed-Mode Expressions Boolean expressions often have arithmetic subexpressions, e.g. (a + b) < c If false has the value 0 and true has the value 1 –arithmetic expressions can have boolean subexpressions –Example: (a < b) + (b < a) has value 0 if a and b are equal and 1 otherwise Some operators may require both operands to be boolean Other operators may take both types of arguments, including mixed arguments

Revisited: E  E 1 + E 2 E.type := arith; if E1.type = arith and E2.type = arith then begin /* normal arithmetic add */ E.place := newtemp; E.code := E 1.code || E 2.code || gen(E.place ':=' E 1.place '+' E 2.place) end else if E1.type := arith and E2.type = bool then begin E2.place := newtemp; E2.true := newlabel; E2.flase := newlabel; E.code := E1.code || E2.code || gen(E2.true ':' E.place ':=' E1.place + 1) || gen('goto' nextstat+1) || gen(E2.false ':' E.place ':=' E1.place) else if …

Two-Pass Implementation First, construct syntax tree Walk through syntax tree in depth-first order, computing translations May not know the labels to which control must flow at the time a jump is generated –Affect boolean expressions and flow control statements –Leave targets of jumps temporarily unspecified –Add each such statement to a list of goto statements whose labels will be filled in later –This filling in of labels is called backpatching

Backpatching This filling in of labels mentioned on the last slide is called backpatching Can be used to generate the code in a single pass Translations generated are basically the same as before except for labels Book notation – Quadruples are generated into quadruple array Labels are indices into the array

Lists of Labels Lists of labels (really statements with jumps to labels) are maintained New functions used to manipulate these lists –makelist(i) Creates a new list containing only i, and index into the array of quadruples Returns a pointer to the new list –merge(p1, p2) Concatenates two lists of labels Returns a pointer to the new list –backpatch(p, i) – inserts i as target label for each statement on the list pointed to by p

Boolean Expressions and Markers E  E 1 or M E 2 | E 1 and M E 2 | not E 1 | (E 1 ) | id 1 relop id 2 | true | false M  ε

The New Marker Translation scheme suitable for producing quadruples during bottom-up pass –The new marker an has associated semantic action which –Picks up, at appropriate times, the index of the next quadruple to be generated Nonterminal E will have two new synthesized attributes: –E.truelist contains a list of statements that jump when expression is true –E.falselist contains a list of statements that jump when expression is false

Example: E  E 1 and M E 2 If E 1 is false: –Then E is also false –So statements on E 1.falselist become part of E.falselist If E 1 is true: –Still need to test E 2 –Target for statements on E 1.truelist must be the beginning of code generated for E 2 –Target is obtained using the marker M

New Syntax-Directed Definition (1) ProductionSemantic Rules E  E 1 or M E 2 backpatch(E 1.falselist, M.quad); E.truelist := merge(E 1.truelist, E 2.truelist); E.falselist := E 2.falstlist E  E 1 and M E 2 backpatch(E 1.truelist, M.quad); E.truelist := E 2.truelist; E.falselist := merge(E 1.falselist, E 2.falselist) E  not E 1 E.truelist := E 1.falselist; E.falselist := E 1.truelist E  (E 1 ) E.truelist := E 1.truelist; E.falselist := E 1.falselist

New Syntax-Directed Definition (2) ProductionSemantic Rules E  id 1 relop id 2 E.truelist := makelist(nextquad); E.falselist := makelist(nextquad+1); emit('if' id 1.place relop.op id 2.place 'goto _'); emit('goto _') E  true E.truelist := makelist(nextquad); emit('goto _') E  false E.falselist := makelist(nextquad); emit('goto _') M  εM.quad := nextquad

Example Revisited (1) Reconsider: a<b or c<d and e<f First, a<b will be reduced, generating: 100: if a < b goto _ 101: goto _ Next, the marker M in E  E 1 or M E 2 will be reduced, and M.quad will be set to 102 Next, c<d will be reduced, generating: 102: if c < d goto _ 103: goto _

Example Revisited (2) Next, the marker M in E  E 1 and M E 2 will be reduced, and M.quad will be set to 104 Next, e<f will be reduced, generating: 104: if e < f goto _ 105: goto _ Next, we reduce by E  E 1 and M E 2 –Semantic action calls backpatch({102}, 104) –E 1.truelist contains only 102 –Line 102 now reads: if c <d goto 104

Example Revisited (3) Next, we reduce by E  E 1 or M E 2 –Semantic action calls backpatch({101}, 102) –E 1.falselist contains only 101 –Line 101 now reads: goto 102 Statements generated so far: 100: if a < b goto _ 101: goto : if c < d goto : goto _ 104: if e < f goto _ 105: goto _ Remaining goto instructions will have their addresses filled in later

Annotated Parse Tree

Flow-of-Control Revisited (1) S, L, A, and E represent statements, statement lists, assignments, and boolean expressions Clearly, other productions (not shown) are also necessary S  if E then S | if E then S else S | while E do S | begin L end | A L  L ; S | S

Flow-of-Control Revisited (2) New attributes and markers will be needed E will still have attributes E.truelist and E.falselist as previously indicated S.nextlist is list of statements that jump to quadruple following statement S L.nextlist is defined similarly New marker N will emit a goto statement and have an associated list as an attribute

New Flow-of-Control Rules (1) ProductionSemantic Rules M  εM.quad := nextquad N  ε N.nextlist := makelist(nextquad); emit('goto _') S  if E then M S 1 backpatch(E.truelist, M.quad); S.nextlist := merge(E.falselist, S 1.nextlist) S  if E then M 1 S 1 N else M 2 S 2 backpatch(E.truelist, M 1.quad); backpatch(E.falselist, M 2.quad); S.nextlist := merge(S 1.nextlist, merge(N.nextlist, S 2.nextlist))

New Flow-of-Control Rules (2) ProductionSemantic Rules S  while M 1 E do M 2 S 1 backpatch(S 1.next, M 1.quad); backpatch(E.truelist, M 2.quad); S.nextlist := E.falselist; emit('goto', M 1.quad) S  begin L endS.nextlist := L.nextlist S  AS.nextlist := NULL L  L 1 ; M S backpatch(L 1.nextlist, M.quad); L.nextlist := S.nextlist L  SL.nextlist := S.nextlist

Labels and Goto Statements The definition of a label is treated as a declaration of the label Labels are typically entered into the symbol table –Entry is created the first time the label is seen –This may be before the definition of the label if it is the target of any forward goto When a compiler encounters a goto statement: –It must ensure that there is exactly one appropriate label in the current scope –If so, it must generate the appropriate code; otherwise, an error should be indicated

Procedures The procedure is an extremely important, very commonly used construct Imperative that a compiler generates good calls and returns Much of the support for procedure calls is provided by a run-time support package S  call id ( Elist) Elist  Elist, E Elist  E

Calling Sequence Calling sequences can differ for different implementations of the same language Certain actions typically take place: –Space must be allocated for the activation record of the called procedure –The passed arguments must be evaluated and made available to the called procedure –Environment pointers must be established to enable the called procedure to access appropriate data –The state of the calling procedure must be saved –The return address must be stored in a known place –An appropriate jump statement must be generated

Return Statements Several actions must also take place when a procedure terminates –If the called procedure is a function, the result must be stored in a known place –The activation record of the calling procedure must be restored –A jump to the calling procedure's return address must be generated No exact division of run-time tasks between the calling and called procedure

Pass by Reference The param statements can be used as placeholders for arguments The called procedure is passed a pointer to the first of the param statements Any argument can by obtained by using the proper offset from the base pointer Arguments other than simple names: –First generate three-address statements needed to evaluate these arguments –Follow this by a list of param three-address statements

Using a Queue The code to evaluate arguments is emitted first, followed by param statements and then a call If desired, could augment rules to count the number of parameters ProductionSemantic Rules S  call id ( Elist ) for each item p on queue do emit('param' p); emit('call' id.place) Elist  Elist, Epush E.place to queue Elist  Einitialize queue to contain E