Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler.

Similar presentations


Presentation on theme: "Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler."— Presentation transcript:

1 Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler

2 Yu-Chen Kuo2 2.1 Overview Programming language: –What its program look like (Syntax : context-free grammars) –What its program mean (Semantics : more difficult)

3 Yu-Chen Kuo3 2.2 Syntax Definition Context-free grammar Grammar : hierarchical structure –stmt  if (expr) stmt else stmt –production –token: if, (, else –nonterminal: expr, stmt

4 Yu-Chen Kuo4 Context-free Grammar 1.A set of tokens (terminals) Digits Sign (+, -, <, =) if, while 2.A set of nonterminals 3.A set of productions nonterminal  ternimal/nonterminal left side  right side 4.First nonterminal symbol: start symbol

5 Yu-Chen Kuo5 Example 2.1: Grammars of expression ‘9-5+2’ Example 2.1: grammars of expression ‘9-5+2’ list  list + digit list  list – digit list  digit digit  0| 1| 2| 3| 4| 5| 6| 7| 8| 9  list  list+digit | list-digit | digit nonterminal: list ( start symbol ), digit terminal (token): 0| 1| 2| 3| 4| 5| 6| 7| 8| 9

6 Yu-Chen Kuo6 Example 2.1: Grammars of expression ‘9-5+2’ Token strings are derived from the start symbol and repeatedly replacing a nonterminal by the right side of a production Empty string:  All possible token strings form the language defined by the grammar

7 Yu-Chen Kuo7 Example 2.2: Parse Tree Show how the start symbol derives a string list  list + digit list  list – digit list  digit digit  0| 1| 2| 3| 4| 5| 6| 7| 8| 9

8 Yu-Chen Kuo8 Parse Trees A XY Z 1.Root is labeled by start symbol 2.Each leaf is labeled by a token or  3.Each interior is labeled by a nonterminal 4.If A is the nonterminal node and X 1, X 2,..X n are the labels of children of that node from left to right, then A  X 1, X 2,..X n, is a production

9 Yu-Chen Kuo9 Example 2.3: Pascal begin-end blocks block  begin opt_stmts end opt_stmts  stmt_list |  stmt_list  stmt_list ; stmt | stmt stmt  if (expr) stmt else stmt | assignment stmt

10 Yu-Chen Kuo10 Ambiguity of A Grammar A grammar is said to be ambiguous if it can have more than one parser tree generating a given string.

11 Yu-Chen Kuo11 Ambiguity of A Grammar string  string+string | string-string string  0|1|2|3|4|5|6|7|8|9 Two expressions (9-5)+2 and 9-(5+2)

12 Yu-Chen Kuo12 Associativity of Operators Left Associative: 9+5-2  (9+5)-2 –+, -, *, / –Parse tree grows down towards the left Right Associative: a=b=c  a=(b=c) –Parse tree grows down towards the right

13 Yu-Chen Kuo13 Associativity of Operators right  letter = right | letter letter  a|b|c|…|z

14 Yu-Chen Kuo14 Precedence of Operators 9+5*2  9+(5*2) *, / has higher precedence than +, - *, /, +, - are all left associative term for *, / –term  term * factor | term / factor | factor expr for +,- –expr  expr + factor | expr – factor | factor factor  digit |(expr)

15 Yu-Chen Kuo15 Precedence of Operators Syntax of expression expr  expr + term | expr – term | term term  term * factor | term / factor | factor factor  digit |(expr) Syntax of statement for Pascal (ambiguous?) stmt  id := expr | if expr then stmt | if expr then stmt else stmt | while expr do stmt | begin opt_stmts end

16 Yu-Chen Kuo16 2.3 Syntax-Directed Translation The syntax-directed definition and translation schema are two formalisms for specifying translations for programming language A syntax-directed definition uses a context- grammar to specify the syntactic structure With each grammar symbol X, it associates a set of attributes, and with each production, a set of semantic rules for computing value of the attributes X.a of the symbols The grammar and the set of semantic rules constitute the syntax-directed definition

17 Yu-Chen Kuo17 2.3 Syntax-Directed Translation A syntax-directed definition for translating expressions consisting of digits separated by plus or minus into postfix notation

18 Yu-Chen Kuo18 Postfix Notation 1.If E is a variable, then postfix(E)=E 2.If E is an expression of form E 1 op E 2, then the postfix(E)= E 1 E 2 op, where E 1 = postfix(E 1 )= and E 2 = postfix(E 2 ) 3.If E is an expression of the form (E 1 ), then postfix(E)= postfix (E 1 ) postfix(9-5+2)=95-2+

19 Yu-Chen Kuo19 Postfix Notation

20 Yu-Chen Kuo20 Robot’s position

21 Yu-Chen Kuo21 Robot’s position

22 Yu-Chen Kuo22 Robot’s position

23 Yu-Chen Kuo23 Depth-First Traversals

24 Yu-Chen Kuo24 Translation Schemes A translation scheme is a context-free grammar in which semantic actions are embedded within the right sides of productions A translation scheme is like a syntax- directed definition, except the order of evaluation of the semantic rules is explicitly shown

25 Yu-Chen Kuo25 Translation Schemes

26 Yu-Chen Kuo26 2.4 Parsing Parsing is the process of determining if a string of tokens can be generated by a grammar. For any context-free grammar, a parser will takes at most O(n 3 ) time to parse a string of n tokens, too expensive. Given a programming language, we can generally construct a grammar that can be parsed in linear time ( make a single left-to-right scan, looking ahead one token at a time)

27 Yu-Chen Kuo27 2.4 Parsing Top-down parser: parser tree construction starts at the root and proceeds towards the leaves Bottom-up parser : parser tree construction starts at the leaves and proceeds towards the root. (most class of grammars)

28 Yu-Chen Kuo28 Top-Down Parsing The construction of parser tree is done by started with the root, labeled with the starting nonterminal, and repeatedly performing the following two steps. 1.At node n, labeled with A, select one of production for A and construct children at n for the symbols on the right side of production. 2.Find the next node at which a subtree is to be constructed.

29 Yu-Chen Kuo29 Example type  simple |  id | array [simple] of type simple  integer | char | num dotdot num e.x.; array [ num dotdot num ] of integer

30 Yu-Chen Kuo30 Example (Cont.) type  simple |  id | array [simple] of type simple  integer | char | num dotdot num e.x.; array [ num dotdot num ] of integer

31 Yu-Chen Kuo31 Example (Cont.) type  simple |  id | array [simple] of type simple  integer | char | num dotdot num

32 Yu-Chen Kuo32 Example (Cont.) type  simple |  id | array [simple] of type simple  integer | char | num dotdot num

33 Yu-Chen Kuo33 Predictive Parsing Recursive-descent parsing is a top-down parsing

34 Yu-Chen Kuo34 Predictive Parsing (Cont.) type  simple |  id | array [simple] of type simple  integer | char | num dotdot num

35 Yu-Chen Kuo35 Predictive Parsing (Cont.) Use lookahead symbol and first symbol ( FIRST )of a production to unambiguously determine the procedure selected for each nonterminal. FIRST (  ): the set of tokens that appear as the first symbols of one or more strings generated from  –FIRST (simple) = { integer, char, num } –FIRST (  id ) = {  } –FIRST ( array [ simple] of type) = { array } A   | , then FIRST (  )  FIRST (  ) in predictive parsing

36 Yu-Chen Kuo36 When to Use  -Production stmt  begin opt_stmts end opt_stmts  stmt_list |  While parsing opt_stmts, if lookahead symbol is not in FIRST (stmt_list), then  –production is used, lookahead symbol is end ; otherwise, error

37 Yu-Chen Kuo37 Designing a Predictive Parser Consisting of a procedure for every nonterminal Each procedure does two things. 1.Decide which production to use by looking at the lookahead symbol. The production with right side  is used if the lookahead symbol is in FIRST (  ). If the lookahead symbol is not in the FIRST set of any other right hand side, a production with  on the right side is used. 2.The procedure uses a production by mimicking the right side. A nonterminal results in a procedure call for the nonterminal. A token matching the lookahead symbol results in reading the next input token.

38 Yu-Chen Kuo38 Eliminating Left Recursion expr  expr + term | term –Loop forever expr( )  A  A  |   A   R R   R |   expr  expr + term | term  expr  term rest rest  + term rest | 

39 Yu-Chen Kuo39 Eliminating Left Recursion (Cont.)

40 Yu-Chen Kuo40 A Translator for Simple Expressions

41 Yu-Chen Kuo41 Adapting the Translation Scheme Eliminate left recursion  A  A  | A  |   A   R R   R |  R |   expr  expr + term {print(‘+’)}  expr  term rest rest  + term {print(‘+’)} rest | - term {print(‘-’)} rest |  term  0 {print(‘0’)}  term  9 {print(‘9’)}

42 Yu-Chen Kuo42 Adapting the Translation Scheme (Cont.)

43 Yu-Chen Kuo43 Procedures for the Nonterminals expr, term, and rest

44 Yu-Chen Kuo44 Optimizing the Translator Replacing tail recursion by iteration rest ( ) { L: if (lookahead == ‘+’) { match(‘+’); term ( ); putchar(‘+’); goto L; } else if (lookahead == ‘-’) { match(‘-’); term ( ); putchar(‘-’); goto L; } else; }

45 Yu-Chen Kuo45 Optimizing the Translator (Cont.)

46 Yu-Chen Kuo46 The Complete Program

47 Yu-Chen Kuo47 The Complete Program (Cont.)

48 Yu-Chen Kuo48 The Complete Program (Cont.)

49 Yu-Chen Kuo49 2.6 Lexical Analysis Removal of White Space and Comments –Blanks, tabs, newlines Constants –Adding production to the grammar for expressions –Creating a token num for constants –31 + 28 + 59 Recognizing Identifiers and Keywords –Keywords are reserved –begin /* keyword */ count = count + increment; /* id = id + id */ end

50 Yu-Chen Kuo50 Interface to the Lexical Analyzer A lexical analyzer reads characters, group into lexemes, and passes the tokens formed by the lexemes, together with their attribute values to the later stages of the compiler.

51 Yu-Chen Kuo51 Interface to the Lexical Analyzer In some situations, the lexical analyzer has to read some characters ahead before it can decide on the token to be returned to the parser. –Decide ‘>’ or ‘>=‘ –Push back if need –Using an input buffer and a pointer keeping track the next character The lexical analyzer produces a token and the parser consumes the token. Usually, the parser call the lexical analyzer to return tokens on demand.

52 Yu-Chen Kuo52 A Lexical Analyzer A lexical analyzer allows white space and numbers to appear within expressions.

53 Yu-Chen Kuo53 A Lexical Analyzer (Cont.) If a data structure does not be allowed to be returned, then tokens and their attributed have to be passed separately. Usually, lexan returns an integer encoding of a token Use integer ‘256’ to encode num tokenval: token attribute value –When scans an integer 13, token num (256) and tokenval (13) are returned to parser –When scans an identifier initial, token id (259) and tokenval (symbol table index p) are returned to parser

54 Yu-Chen Kuo54 A Lexical Analyzer (Cont.) Allowing numbers within expressions requires a change in grammar expr  factor factor  (expr) | num {print( num.value)}

55 Yu-Chen Kuo55 A Lexical Analyzer (Cont.)

56 Yu-Chen Kuo56 A Lexical Analyzer (Cont.)

57 Yu-Chen Kuo57 2.7 Incorporating a Symbol Table The symbol table is collected by the analysis phases (lexical:identifier, syntax: type) of the compiler and used by the synthesis phases (code generator). Primarily routines are saving and retrieving lexemes. –insert(s,t) : Returns index of new entry for string s, token t –lookup(s) : Returns index of the entry for string s, or 0 if s is not found. The lexical analyzer uses the lookup operation to determine if there is an entry for a lexeme in the symbol table. If no entry exists, then it uses the insert operation to create one.

58 Yu-Chen Kuo58 Handling Reserved Keywords Reserved keywords are inserted into the symbol table initially. For example, consider tokens div and mod with lexemes div and mod, respectively. We can initialize the symbol table using the calls –insert (“ div ”, div ); –insert (“ mod ”, mod ); Any subsequent call lookup(“ div ”) returns the token div, so div cannot be used as an identifier.

59 Yu-Chen Kuo59 A Symbol-Table Implementation 257 258 259 integer 259 real (type)

60 Yu-Chen Kuo60 Pseudo-code for a lexical analyzer

61 Yu-Chen Kuo61 Pseudo-code for a lexical analyzer (Cont.)

62 Yu-Chen Kuo62 2.9 Putting The Techniques Together (infix  postfix translator) An infix-to-postfix translator for expressions Expressions consist of numbers, identifiers, and operators +,-, *, /, div, and mod. id : a sequence of letters and digits beginning with a letter num : a sequence of digits Tokens are separated by blanks, tabs, newlines (white space)

63 Yu-Chen Kuo63 infix  postfix translator (Cont.)

64 Yu-Chen Kuo64 Modules the infix-to-postfix translator

65 Yu-Chen Kuo65 The Lexical Analysis Module lexer.c

66 Yu-Chen Kuo66 The Parser Module parser.c

67 Yu-Chen Kuo67 The Emitter Module emitter.c Emit(t, tval) –Output for token t with attribute value tval

68 Yu-Chen Kuo68 The Symbol-Table Module symbol.c and init.c Implement symtable data strucrure and functions –lookup(s) –insert(s, tok)

69 Yu-Chen Kuo69 The Error Module error.c Error reporting ( printf)


Download ppt "Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler."

Similar presentations


Ads by Google