Presentation is loading. Please wait.

Presentation is loading. Please wait.

4. Formal Grammars and Parsing and Top-down Parsing Chih-Hung Wang Compilers References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler.

Similar presentations


Presentation on theme: "4. Formal Grammars and Parsing and Top-down Parsing Chih-Hung Wang Compilers References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler."— Presentation transcript:

1 4. Formal Grammars and Parsing and Top-down Parsing Chih-Hung Wang Compilers References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler. Pearson Education Inc., 2010. 2. D. Grune, H. Bal, C. Jacobs, and K. Langendoen. Modern Compiler Design. John Wiley & Sons, 2000. 3. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986. (2 nd Ed. 2006) 1

2 2 Introduction Context-free Grammar The syntax of programming language constructs can be described by context-free grammar Important aspects A grammar serves to impose a structure on the linear sequence of tokens which is the program. Using techniques from the field of formal languages, a grammar can be employed to construct a parser for it automatically. Grammars aid programmers to write syntactically correct programs and provide answer to detailed questions about the syntax.

3 3 The role of the parser

4 Context-Free Grammars A context-free grammar(CFG) is a compact, finite representation of a language, defined by the following four components: A finite terminal alphabet Σ A finite non-terminal alphabet N A start symbol S  N A finite set of productions P 4

5 A Simple Expression Grammar 5

6 Leftmost Derivations A sentential form produced via a leftmost derivation is called a left sentential form. The production sequence discovered by a large class of parsers (the top-down parsers) is a leftmost derivation. Hence, these parsers are said to produce a leftmost parse. Example: f(V+V) 6 E  lm Prefix(E)  lm f(E)  lm f(V Tail)  lm f(V+E)  lm f(V+V Tail)  lm f(V+V)

7 Rightmost Derivations As a bottom-up parser discovers the productions that derive a given token sequence, it traces a rightmost derivation, but the productions are applied in reverse order. Called rightmost or canonical parse Example: f(V+V) 7 E  rm Prefix(E)  rm Prefix(V Tail)  rm Prefix(V+E)  rm Prefix(V+V Tail)  rm Prefix(V+V)  rm f(V+V)

8 Parse Tree It is rooted by the start symbol S Each node is either a grammar symbol or 8

9 Properties of CFGs The grammar may include useless symbols The grammar may allow multiple, distinct derivations (parse trees) for some input string. The grammar may include strings that do not belong in the language, or the grammar may exclude strings that are in the language. 9

10 Ambiguity (1) Some grammars allow a derived string to have two or more different parse trees (and thus a nonunique structure). Example: 1. Expr → Expr – Expr 2. | id This grammar allows two different parse tree for id - id - id. 10

11 Ambiguity (2) 11

12 Parsers and Recognizers Two approaches A parser is considered top-down if it generates a parse tree by starting at the root of the tree, expanding the tree by applying productions in a depth-first manner. The bottom-up parsers generate a parse tree by starting the tree’s leaves and working toward its root. 12

13 13 Two approaches of Parser Deterministic left-to-right top-down LL method Deterministic left-to-right bottom-up LR method Left-to-right The sequence of tokens is processed from left to right Deterministic No searching is involved: each token brings the parser one step closer to the goal of constructing the syntax tree

14 Parsers (Top- down) 14

15 15 Parsers (bottom-Up)

16 16 Pre-order and post-order (1) The top-down method constructs the syntax tree in pre- order The bottom-up method constructs the syntax tree in post- order

17 17 Pre-order and post-order (2)

18 18 Principles of top-down parsing The main task of a top-down parser is to choose the correct alternatives for known non-terminals

19 19 Principles of bottom-up parsing The main task of a bottom-up parser is to repeatedly find the first node all of whose children have already been constructed.

20 20 Creating a top-down parser manually Recursive descent parsing Simplest way but has its limitations

21 21 Recursive descent parsing program (1)

22 22 Recursive descent parsing program (2)

23 23 Drawbacks Three drawbacks There is still some searching through the alternatives The method often fails to produce a correct parser Error handling leaves much to be desired

24 24 Second problems (1) Example 1 Index_element will never be tried IDENTIFIER ‘ [ ‘

25 25 Second problems (2) Example 2 The recognizer will not recognize ab

26 26 Second problems (3) Example 3 Recursive descent parsers cannot handle left-recursive grammars

27 27 Creating a top-down parser automatically The principles of constructing a top-down parser automatically derive from those of writing one by hand, by applying precomputation. Grammars which allow the construction of a top-down parser to be performed are called LL(1) grammars.

28 28 LL(1) parsing FIRST set The sets of first tokens produced by all alternatives in the grammar. We have to precompute the FIRST sets of all non-terminals The first sets of the terminals are obvious. Finding FIRST(  ) is trivial when  starts with a terminal. FIRST(N) is the union of the FIRST sets of its alternatives. First(  )={a  Σ|   * a  }

29 29 Predictive recursive descent parser The FIRST sets can be used in the construction of a predictive parser because it predicts the presence of a given alternative without trying to find out if it is there.

30 30 Closure algorithm for computing the FIRST set (1) Data definitions

31 31 Closure algorithm for computing the FIRST set (2) Initializations

32 32 Closure algorithm for computing the FIRST set (3) Inference rules

33 33 FIRST sets example(1) Grammar

34 34 FIRST sets example(2) The initial FIRST sets

35 35 FIRST sets example(3) The final FIRST sets

36 Another Example of First Set 36

37 37 Another Example of First Set (II)

38 Algorithms of Computing First( α ) 38

39 39 The predictive parser (1)

40 40 The predictive parser (2)

41 41 Practice Find the FIRST sets of all alternative of the following grammar. E -> TE ’ E ’ ->+TE ’ |  T->FT ’ T ’ ->*FT ’ |  F->(E)| id

42 42 Nullable alternatives A complication arises with the case label for the empty alternative (ex. rest_expression). Since it does not itself start with any token, how can we decide whether it is the correct alternative?

43 43 FOLLOW sets Follow sets Determining the set of tokens that can immediately follow a given non-terminal N. LL(1) parser ‘ LL ’ because the parser works from Left to right identifying the nodes in what is called Leftmost derivation order. ‘ (1) ’ because all choices are based on a one token look-ahead. Follow(A)={b  Σ |S  +  Ab β}

44 44 Closure algorithm for computing the FOLLOW sets

45 45 The first and follow sets

46 Another Example of Follow Set 46

47 Another Example of Follow Set (II) 47

48 Algorithm of Follow(A) 48

49 49 Recall the predictive parser rest_expression  ‘+’ expression |  FIRST(rest_expr) = {‘+’,  } void rest_expression(void) { switch (Token.class) { case '+': token('+'); expression(); break; case EOF: case ')': break; default: error(); } FOLLOW(rest_expr) = {EOF, ‘)’}

50 50 LL(1) conflicts Example The codes

51 51 LL(1) conflicts FIRST/FIRST conflict term  IDENTIFIER | IDENTIFIER ‘ [ ‘ expression ‘ ] ’ | ‘ ( ’ expression ‘ ) ’

52 52 LL(1) conflicts FIRST/FOLLOW conflict FIRST set FOLLOW set S  A ‘ a ’ ‘ b ’ { ‘ a ’ } {} A  ‘ a ’ |  { ‘ a ’,  } { ‘ a ’ }

53 53 LL(1) conflicts left recursion expression  expression ‘ - ’ term | term Look-ahead token LL(1) method predicts the alternative A k for a non-terminal N FIRST(A k )  (if is nullable then FOLLOW(N)) LL(1) grammar No FIRST/FIRST conflicts No FIRST/FOLLOW conflicts No multiple nullable alternatives No non-terminal can have more than one nullable alternative.

54 54 Solve the LL(1) conflicts Two options Use a stronger parser Make the grammar LL(1)

55 55 Making a grammar LL(1) manual labour rewrite grammar adjust semantic actions three rewrite methods left factoring substitution left-recursion removal

56 56 Left-factoring term  IDENTIFIER | IDENTIFIER ‘ [ ‘ expression ‘ ] ’ factor out common prefix term  IDENTIFIER after_identifier after_identifier   | ‘ [ ‘ expression ‘ ] ’ ‘[’  FOLLOW(after_identifier)

57 57 Substitution A  a | B c |  S  p A q replace non-terminal by its alternative S  p a q | p B c q | p q Example S  A ‘ a ’ ‘ b ’ A  ‘ a ’ |  replace non-terminal by its alternative S  ‘ a ’ ‘ a ’ ‘ b ’ | ‘ a ’ ‘ b ’

58 58 Left-recursion removal Three types of left-recursion Direct left-recursion N  N  | … Indirect left-recursion Chain structure N  A … A  B … … Z  N … Hidden left-recursion N   N| … (  can produce  )

59 59 Left-recursion removal N  N  |  replace by N   M M   M |  example expression  expression ‘ - ’ term | term          ... expression  term expression_tail_option expression_tail_option  ‘-’ term expression_tail_option |  N 

60 60 Practice make the following grammar LL(1) expression  expression ‘ + ’ term | expression ‘ - ’ term | term term  term ‘ * ’ factor | term ‘ / ’ factor | factor factor  ‘ ( ‘ expression ‘ ) ’ | func-call | identifier | constant func-call  identifier ‘ ( ‘ expr-list? ‘ ) ’ expr-list  expression ( ‘, ’ expression)*

61 61 Answers substitution F  ‘ ( ‘ E ‘ ) ’ | ID ‘ ( ‘ expr-list? ‘ ) ’ | ID | constant left factoring E  E ( ‘ + ’ | ‘ - ’ ) T | T T  T ( ‘ * ’ | ‘ / ’ ) F | F F  ‘ ( ‘ E ‘ ) ’ | ID ( ‘ ( ‘ expr-list? ‘ ) ’ )? | constant left recursion removal E  T (( ‘ + ’ | ‘ - ’ ) T )* T  F (( ‘ * ’ | ‘ / ’ ) F )*

62 62 Undoing the semantic effects of grammar transformations While it is often possible to transform our grammar into a new grammar that is acceptable by a parser generator and that generates the same language, the new grammar usually assigns a different structure to strings in the language than our original grammar did Fortunately, in many cases we are not really interested in the structure but rather in the semantics implied by it.

63 63 Semantics Non-left-recursive equivalent

64 64 Automatic conflict resolution (1) There are two ways in which LL parsers can be strengthened By increasing the look-ahead Distinguishing alternatives not by their first token but by their first two tokens is called LL(2). Disadvantages: the parser code can get much bigger. By allowing dynamic conflict resolvers When the conflict arises during parsing, some of conditions are evaluated to solve it. The parser generator LLgen requires a conflict resolver to be placed on the first of two conflicting alternatives.

65 65 If-else statement in C else_tail_option: both FIRST set and FOLLOW set contain the token ‘ else ’ Conflict resolver Automatic conflict resolution (2)

66 66 The LL(1) push-down automation Transition table for an LL(1) parser

67 67 Push-down automation (PDA) Type of moves Prediction move Top of the prediction stack is a non-terminal N. N is removed from the stack Look up the prediction table Push the alternative of N into the prediction stack Match move Top of the prediction stack is a terminal Termination Parsing terminates when the prediction stack is exhausted.

68 68 Prediction move in an LL(1) PDA

69 69 Match move in an LL(1) PDA

70 70 Predictive parsing with an LL(1) PDA

71 71 PDA example (1) aap + ( noot + mies ) EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest- expr term IDENT( expression ) rest-expr + expression 

72 72 PDA example (2) aap + ( noot + mies ) EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest- expr term IDENT( expression ) rest-expr + expression  replace non-terminal by transition entry

73 73 PDA example (3) aap + ( noot + mies ) EOF expression EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest- expr term IDENT( expression ) rest-expr + expression 

74 74 PDA example (4) aap + ( noot + mies ) EOF expression EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest- expr term IDENT( expression ) rest-expr + expression  replace non-terminal by transition entry

75 75 PDA example (5) aap + ( noot + mies ) EOF term rest-expr EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest- expr term IDENT( expression ) rest-expr + expression 

76 76 PDA example (6) aap + ( noot + mies ) EOF term rest-expr EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest- expr term IDENT( expression ) rest-expr + expression  replace non-terminal by transition entry

77 77 PDA example (7) Please continue!! Example of parsing (i+i)+i

78 Another Example (1) 78

79 Another Example (2) 79

80 LL Parser Table 80

81 Trace of an LL(1) Parse 81

82 Obtaining LL(1) Grammars Most LL(1) prediction conflicts can be grouped into two categories: common prefix and left recursion 82

83 Common Prefixes 83 Factoring method

84 Algorithm of Factoring 84

85 Left Recursion 85

86 Algorithm of Eliminating Left Recursion 86

87 87 LLgen LLgen is part of the Amsterdam Compiler Kit takes LL(1) grammar + semantic actions in C and generates a recursive descent parser The non-terminals in the grammar can have parameters, and rules can have local variables, both again expressed in C. LLgen features: repetition operators advanced error handling parameter passing control over semantic actions dynamic conflict resolvers

88 88 LLgen start from LR(1) grammar make grammar LL(1) use repetition operators %token DIGIT; main : [line]+ ; line : expr '\n' ; expr : term [ '+' term ]* ; term : factor [ '*' factor ]* ; factor : '(' expr ')‘ | DIGIT ; LLgen add semantic actions attach parameters to grammar rules insert C-code between the symbols

89 89 Minimal non-left-recursive grammar for expressions

90 90 LLgen code for a parser GrammarSemantics

91 91 LLgen code for a parser The code from previous page resides in a file called parser.g. LLgen converts the file to one called parser.c, which contains a recursive descent parser.

92 92 LLgen interface to lexical analyzer

93 93 LLgen interface to back-end LLgen handles syntax errors by inserting missing tokens and deleting unexpected tokens LLmessage() is invoked to notify the lexical analyzer


Download ppt "4. Formal Grammars and Parsing and Top-down Parsing Chih-Hung Wang Compilers References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler."

Similar presentations


Ads by Google