Presentation is loading. Please wait.

Presentation is loading. Please wait.

C HAPTER 2. A S IMPLE S YNTAX -D IRECTED T RANSLATOR DR. NIDJO SANDJOJO, M.Sc.

Similar presentations


Presentation on theme: "C HAPTER 2. A S IMPLE S YNTAX -D IRECTED T RANSLATOR DR. NIDJO SANDJOJO, M.Sc."— Presentation transcript:

1 C HAPTER 2. A S IMPLE S YNTAX -D IRECTED T RANSLATOR DR. NIDJO SANDJOJO, M.Sc

2 Outline Syntax Definition Syntax-Directed Translation Parsing A Translator for Simple Expressions Lexical Analysis Symbol Tables Intermediate Code Generation

3 Introduction An introduction to compiling techniques in Chap. 3-6 The working Java translator appears in Appendix A

4 A Code Fragment to be Translated { int i; int j; float[100] a; float v; float x; while (true) { do i = i+1; while (a[i] v); if (i >= j) break; x = a[i]; a[i] = a[j]; a[j] = x; } }

5 Simplified Intermediate Code 1.i = i + 1 2.t1 = a[ i ] 3.if t1 < v goto 1 4.j = j – 1 5.t2 = a [ j ] 6.if t2 > v goto 4 7.ifFalse I >= j goto 9 8.goto 14 9.x = a [ i ] 10.t3 = a [ j ] 11.a [ i ] = t3 12.a [ j ] = x 13.goto 1

6 A Model of a Compiler Front End Lexical Analyzer Parser Intermediate Code Generator Symbol Table source program tokens syntax tree three-address code

7 Intermediate Code for “ do i=i+1; while (a[i]<v); ” 1: i = i + 1 2: t1 = a [ i ] 3: if t1 < v goto 1 do-while body < [] v a assign i i+ i1

8 Syntax Definition Grammar: describes the hierarchical structure of programming language constructs –Ex: if-else statement in Java if (expression) statement else statement A production – stmt  if ( expr ) stmt else stmt –terminals: if, else, (, ) –nonterminals: expr, stmt

9 Definition of Grammars A context-free grammar –A set of terminal symbols (tokens) –A set of nonterminals (syntactic variables) –A set of productions –A designation of one of the nonterminals as the start symbol

10 Ex. “Lists of digits separated by plus or minus signs” – list  list + digit (2.1) – list  list – digit (2.2) – list  digit (2.3) – digit  0|1|2|3|4|5|6|7|8|9(2.4)

11 Derivations A grammar derives strings by beginning with the start symbol and repeatedly replacing a nonterminal by the body of a production The terminal strings that can be derived from the start symbol form the language defined by the grammar Ex. 9-5+2 is a list

12 Parsing Parsing: the problem of taking a string of terminals and figuring out how to derive it from the start symbol of the grammar, and reporting syntax errors (Chap. 4) Parse Trees A  XYZ A XZY

13 Example Parse Tree Ex. Parse tree for 9-5+2 list digit listdigit 9-5+2

14 Ambiguity A grammar is ambiguous if it has more than one parse tree generating a given string of terminals Ex. String  string + string | string – string | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

15 Two parse trees for 9-5+2 string 9-5+2 - 9 5+2

16 Associativity of Operators Left-associative: +, -, *, / Right-associative: ^, = (assignment) Ex. a=b=c right  letter = right | letter letter  a | b | … | z

17 Parse trees for left- and right-associative grammars right letterright = a letterright b=c list digit listdigit 9-5-2

18 Precedence of Operators Precedence: relative importance of operators Left-associative: + - Left-associative: * / A grammar for arithmetic expressions: expr  expr + term | expr – term | term term  term * factor | term / factor | factor factor  digit | ( expr )

19 Ex. An ambiguous grammar for a subset of Java statements stmt  id = expr ; | if ( expr ) stmt | if ( expr ) stmt else stmt | while ( expr ) stmt | do stmt while ( expr ) ; | { stmts } stmts  stmts stmt | 

20 Syntax-Directed Translation Syntax-directed translation is done by attaching rules or program fragments to productions in a grammar E.g. expr  expr 1 + term Translate expr 1 ; Translate term ; Handle +; Ex. Translation of infix expressions into postfix notation

21 Two Concepts Attributes : any quantity associated with a programming construct (Syntax-directed) translation schemes A notation for attaching program fragments to the productions of a grammar (Chap. 5)

22 Postfix Notation If E is a variable or constant,  E If E is an expression of the form E 1 op E 2,  E 1 ’ E 2 ’ op, where E 1 ’ and E 2 ’ are the postfix notations for E 1 and E 2, respectively If E is a parenthesized expression of the form ( E 1 ),  E 1 Ex:  (9-5)+2  95-2+  9-(5+2)  952+-  952+-3* = ?

23 Synthesized Attributes A syntax-directed definition associates  With each grammar symbol, a set of attributes  With each production, a set of semantic rules for computing the values of the attributes

24 Attribute values at nodes in a parse tree expr.t = 95-2+ expr.t = 95- term.t = 2 expr.t = 9term.t = 5 term.t = 9 9-5+2

25 A Simple Syntax-Directed Definitions Ex. ProductionSemantic rules expr  expr 1 + term expr  expr 1 – term expr  term term  0 … term  9 expr.t = expr 1.t ||term.t || ‘+’ expr.t = expr 1.t ||term.t || ‘-’ expr.t = term.t term.t = ‘ 0 ’ … term.t = ‘ 9 ’

26 Tree Traversals Depth-first traversal Procedure visit(node N) { for (each child C of N, from left to right) { visit(C); } evaluate semantic rules at node N; }

27 Translation Schemes Program fragments embedded within production bodies are called semantic actions –Ex. rest  + term {print(‘+’)} rest 1 rest +{print(‘+’)}term rest 1

28 Actions translating 9-5+2 into 95-2+ expr term exprterm 9 - 5 + 2 {print(‘+’)} {print(‘2’)} {print(‘-’)} {print(‘5’)} {print(‘9’)}

29 Actions for translating into postfix notation – expr  expr + term {print(‘+’)} expr  expr – term {print(‘-’)} expr  term term  0 {print(‘0’)} … term  9 {print(‘9’)}

30 Parsing Two classes: depending on the order in which nodes in the parse trees are constructed  Top-down: starts at the root Efficient parsers can be constructed more easily by hand  Bottom-up: starts at the leaves Can handle a large class of grammars, often used by software tools

31 Top-Down Parsing Two steps: –At node N (labeled with nonterminal A), select one of the productions for nonterminal A, and construct children at N –Find the next node at which a subtree is to be constructed, typically the leftmost unexpanded nonterminal

32 Ex: A grammar for some statements in C and Java – stmt  expr ; | if ( expr ) stmt | for ( optexpr ; optexpr ; optexpr ) stmt | other – optexpr  expr | 

33 A parse tree according to the grammar stmt ; optexpr stmt ) optexpr ; (for otherexpr 

34 Input: for ( ; expr ; expr ) other Parse tree: stmt ; optexpr stmt) optexpr ; (for

35 Predictive Parsing Recursive-decent parsing : a top-down method – Predictive parsing : a simple form of recursive- decent parsing, in which the lookahead symbol umambiguously determines the flow of control for each nonterminal

36 Pseudocode for a Predictive Parser void stmt() { switch (lookahead) { case expr : match( expr ); match(';'); break; case if : match( if ); match('('); match( expr ); match(')'); stmt(); break; case for : match( for ); match('('); optexpr(); match(';'); optexpr(); match(';'); optexpr(); match(')'); stmt(); break; case other : match( other ); break; default: report("syntax error"); }

37 void optexpr() { if (lookahead == expr ) match( expr ); } void match(terminal t) { if (lookahed == t) lookahead = nextTerminal; else report("syntax error"); }

38 FIRST(  ): the set of terminals that appear as the first symbols of one or more strings of terminals generated from  Ex. –FIRST( stmt ) = { expr, if, for, other } –FIRST( expr ;) = { expr } In predictive parsing, the FIRST sets must be disjoint for different productions for the same nonterminal

39  -Productions Doing nothing corresponds to applying an  -production –if (lookahead == expr ) match( expr ); (For more details, see LL(1) grammars in Section 4.4.3)

40 Designing a Predictive Parser The procedure for nonterminal A: –To decide which A-production to use by examining the lookahead symbol The production with body  is used if the lookahead symbol is in FIRST(  ) –To “execute” the symbols of the body in turn Nonterminals: call the procedure for the nonoterminal Terminals: reading the next input symbol

41 Left Recursion Problem: –Ex: expr  expr + term Left-recursive production can be eliminated by rewriting the offending production –Original: A  A  |  –Becomes: A   R R   R | 

42 A Translator for Simple Expressions A conflict: –We need a grammar that facilitates translation –We need a grammar that facilitates parsing Solution: –To begin with the grammar for easy translation –To carefully transform it to facilitate parsing

43 Abstract and Concrete Syntax Abstract syntax tree  Interior node: operator Any programming construct  Children of the node: operands Parse tree  Interior node: nonterminals Some are programming constructs, some are “helpers”  A concrete syntax tree It’s desirable for a translation scheme to be based on a grammar whose parse trees are as close to syntax trees as possible + -2 95

44 Adapting the Translation Scheme The left-recursion-elimination technique extends to multiple productions:  A  A  | A  |   A   R R   R |  R |  We need to transform productions that have embedded actions, not just terminals and nonterminals

45 Let A = expr  = + term {print(‘+’) }  = - term {print(‘-’) }  = term After left-recursion elimination expr  term rest rest  + term {print(‘+’) } rest | - term {print(‘-’) } rest |  term  0 { print(‘0’) } | 1 { print(‘1’) } … | 9 { print(‘9’) }

46 expr rest term rest term 9- 5 + 2 {print(‘+’)} {print(‘2’)} {print(‘-’)} {print(‘5’)} {print(‘9’)} rest  Translation of 9-5+2 to 95-2+

47 Procedures for the Nonterminals void expr() { term(); rest(); } void rest() { if (lookahead==‘+’) { match(‘+’); term(); print(‘+’); rest(); } else if (lookahead==‘-’) { match(‘-’); term(); print(‘-’); rest(); } else {} } void term() { if (lookahead is a digit) { t = lookahead; match(lookahead); print(t); } }

48 Simplifying the Translator Certain recursive calls can be replaced by iterations –Tail recursive calls void rest() { while (true) { if (lookahead==‘+’) { match(‘+’); term(); print(‘+’); continue; } else if (lookahead==‘-’) { match(‘-’); term(); print(‘-’); continue; } break; } }

49 The Complete Program (See Fig. 2.27)

50 Lexical Analysis Lexeme: a sequence of input characters that comprises a single token The extended translation scheme – expr  expr + term {print(‘+’)} | expr – term {print(‘-’)} | term term  term * factor { print(‘*’) } | term / factor { print(‘/’) } | factor factor  ( expr ) | num {print(num.value) } | id {print(id.lexeme) }

51 Removal of White Space and Comments for ( ; ; peek = next input character) { if (peek is a bank or a tab) do nothing; else if (peek is a newline) line=line+1; else break; }

52 Reading Ahead To maintain an input buffer from which the lexical analyzer can read and push back characters The lexical analyzer reads ahead only when it must

53 Constants The job of collecting characters into integers and computing their numerical value is given to a lexical analyzer –31+28+59 is transformed into –if (peek holds a digit) { v = 0; do { v = v*10 + integer value of digit peek; peek = net input character; } while (peek holds a digit); return token ; }

54 Recognizing Keywords and Identifiers Ex: –count = count + increment; Using a table to hold character strings –Single representation –Reserved words

55 In Java, –Hashtable words = new Hashtable(); –if (peek holds a letter) { collect letters or digits into a buffer b; s = string formed from the characters in b; w = token returned by words.get(s); if (w is not null) return w; else { enter the key-value pair (s, ) into words; return token ; } }

56 A Lexical Analyzer Token scan() { skip white space; handle numbers; handle reserved words and identifiers; Token t = new Token(peek); peek = blank; return t; }

57 class Token  class Num  class Word (details in Sec. 2.6.5 and Appendix A) Code: (Fig. 2.34 & 2.35)

58 Symbol Table Information about source programs constructs –Identifiers: lexeme, type, position in storage, … Scopes implemented by setting up a separate symbol table for each scope –{ int x; char y; { bool y; x; y; } x; y;} –{ { x:int; y:bool; } x:int; y:char; }

59 Symbol Table Per Scope The most-closely nested rule: an identifier x is in the scope of the most-closely nested declaration of x It can be implemented by chaining symbol tables

60 Chained Symbol Table Ex: (Fig. 2.36) B0B0 B1B1 B2B2

61 Chained Symbol Table Code: (Fig. 2.37)

62 The Use of Symbol Tables (Fig. 2.38)

63 Intermediate Code Generation Two kinds of intermediate representations  Trees: parse trees, syntax trees  Linear representations, e.g. “three-address code” A sequence of elementary program steps Static checking for syntax and semantic rules

64 Construction of Syntax Trees (See Fig. 2.39)

65 Static Checking Static checking includes:  Syntax checking  Type checking L-values and R-values  l-value: the left side of an assignment (location)  r-value: the right side of an assignment (value)

66 Coercions: the type of an operand is automatically converted to the type expected by the operator Overloading: having different meanings depending on the context

67 Three-Address Code x = y op z –x [ y ] = z, x = y [ z ] Translation of statements –if expr then stmt 1 ; –(Fig. 2.42): code to compute expr into x ifFalse x goto after code for stmt 1 –(Fig. 2.43)

68 Translation of expressions ▫i-j+k t1 = i – j t2 = t1 + k ▫2*a[i] t1 = a [ i ] t2 = 2 * t1 ▫Two functions: lvalue(), rvalue() ▫(Fig. 2.44, 2.45)

69 Expr lvalue(x: Expr) { if (x if an Id node) return x; else if (x is an Access(y, z) node and y is an Id node) { return new Access(y, rvalue(z)); } }

70 Expr rvalue(x: Expr) { if (x is an Id or a Constant node) return x; else if (x is an Op(op, y, z) or a Rel(op, y, z) node) { t = new temporary; emit string for t=rvalue(y) op rvalue(z); return a new node for t; } else if (x is an Access(y, z) node) { t = new temporary; call lvalue(x), which returns Access(y, z’); emit string for t = Access(y, z’); return a new node for t; } else if (x is an Assign(y, z) node) { z’ = rvalue(z); emit string for lvalue(y)=z’; return z’; } }

71 Ex: ▫a[i] = 2*a[j-k] ▫t3 = j-k t2 = a [ t3 ] t1 = 2 * t2 a [ i ] = t1

72 Better code for expressions ▫Reduce the number of copy instructions in a subsequent optimization phase ▫Generate fewer instructions in the first place by taking context into account

73 Summary (Fig. 2.46)


Download ppt "C HAPTER 2. A S IMPLE S YNTAX -D IRECTED T RANSLATOR DR. NIDJO SANDJOJO, M.Sc."

Similar presentations


Ads by Google