Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS308 Compiler Principles Syntax Analyzer Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University Fall 2012.

Similar presentations


Presentation on theme: "CS308 Compiler Principles Syntax Analyzer Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University Fall 2012."— Presentation transcript:

1 CS308 Compiler Principles Syntax Analyzer Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University Fall 2012

2 Compiler Principles Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This syntactic structure is mostly a parse tree. Syntax Analyzer is also known as parser. The syntax of a program is described by a context-free grammar (CFG). We will use BNF (Backus-Naur Form) notation in the description of CFGs. The syntax analyzer (parser) checks whether a given source program satisfies the rules implied by a context-free grammar or not. –If it satisfies, the parser creates the parse tree of that program. –Otherwise the parser gives the error messages. A context-free grammar –gives a precise syntactic specification of a programming language. –the design of the grammar is an initial phase of the design of a compiler. –a grammar can be directly converted into a parser by some tools.

3 Compiler Principles Parser / Syntax Analyzer Lexical Analyzer Parser source program token get next token parse tree Parser works on a stream of tokens. The smallest item is a token. The parser obtains a string of tokens from the lexical analyzer, and verifies that the string of token names can be generated by the grammar for the source language.

4 Compiler Principles Parsers Cont’d We categorize the parsers into two groups: 1.Top-Down Parser –the parse tree is created top to bottom, starting from the root. 2.Bottom-Up Parser –the parse is created bottom to top; starting from the leaves Both scan the input from left to right (one symbol at a time). Efficient top-down and bottom-up parsers can be implemented only for sub-classes of context-free grammars. –LL for top-down parsing –LR for bottom-up parsing

5 Compiler Principles Context-Free Grammars Recursive structures of a programming language are defined by a context-free grammar. A context-free grammar consists of: –A finite set of terminals (in our case, these will be the set of tokens) –A finite set of non-terminals (syntactic-variables) –A finite set of production rules in the following form A   where A is a non-terminal and  is a string of terminals and non-terminals (including the empty string) –A start symbol (one of the non-terminal symbol) Example: E  E + E | E – E | E * E | E / E | - E E  ( E ) E  id

6 Compiler Principles Derivations E  E+E E derives E+E (E+E derives from E) –we can replace E by E+E –we have to have a production rule E  E+E in our grammar. E  E+E  id+E  id+id A sequence of replacements of non-terminal symbols is called a derivation of id+id from E. In general a derivation step is  A    if there is a production rule A  in our grammar where  and  are arbitrary strings of terminal and non-terminal symbols  1   2 ...   n (  n derives from  1 or  1 derives  n )  : derives in one step  : derives in zero or more steps  : derives in one or more steps * +

7 Compiler Principles CFG - Terminology L(G) is the language of grammar G (the language generated by G), it is a set of sentences. A sentence of L(G) is a string of terminal symbols of G. If S is the start symbol of G then  is a sentence of L(G) iff S   where  is a string of terminals of G. If G is a context-free grammar, L(G) is a context-free language. Two grammars are equivalent if they produce the same language. S   - If  contains non-terminals, it is called as a sentential form of G. - If  does not contain non-terminals, it is called as a sentence of G. * *

8 Compiler Principles Derivation Example E  -E  -(E)  -(E+E)  -(id+E)  -(id+id) OR E  -E  -(E)  -(E+E)  -(E+id)  -(id+id) At each derivation step, we can choose any of the non-terminal in the sentential form of G for the replacement. If we always choose the left-most non-terminal in each derivation step, this derivation is called as left-most derivation. If we always choose the right-most non-terminal in each derivation step, this derivation is called as right-most derivation.

9 Compiler Principles Left-Most and Right-Most Derivations Left-Most Derivation E  -E  -(E)  -(E+E)  -(id+E)  -(id+id) Right-Most Derivation E  -E  -(E)  -(E+E)  -(E+id)  -(id+id) The top-down parsers try to find the left-most derivation of the given source program. The bottom-up parsers try to find the right-most derivation of the given source program in the reverse order. lm rm

10 Compiler Principles Parse Tree A parse tree is a graphical representation of a derivation. Inner nodes of a parse tree are non-terminal symbols. The leaves of a parse tree are terminal symbols. E  -E E E- E E EE E + - () E E E- () E E id E E E+ - () E E E EE+ - ()  -(E)  -(E+E)  -(id+E)  -(id+id)

11 Compiler Principles Ambiguity A grammar produces more than one parse tree for a sentence is an ambiguous grammar. E  E+E  id+E  id+E*E  id+id*E  id+id*id E  E*E  E+E*E  id+E*E  id+id*E  id+id*id E id E+ E E * E E E+ E E * E

12 Compiler Principles Ambiguity Cont’d For the most parsers, the grammar must be unambiguous. unambiguous grammar  unique selection of the parse tree for a sentence We should eliminate the ambiguity in the grammar during the design phase of the compiler. An ambiguous grammar should be rewritten to eliminate the ambiguity. We have to prefer one of the parse trees of a sentence (generated by an ambiguous grammar) to disambiguate that grammar to restrict to this choice.

13 Compiler Principles Ambiguity Elimination Cont’d Ambiguous grammars (because of ambiguous operators) can be disambiguated according to the precedence and associativity rules. E  E+E | E*E | E^E | id | (E) disambiguate the grammar precedence: ^ (right to left) * (left to right) + (left to right) E  E+T | T T  T*F | F F  G^F | G G  id | (E) 

14 Compiler Principles Ambiguity Cont’d stmt  if expr then stmt | if expr then stmt else stmt | otherstmts if E 1 then if E 2 then S 1 else S 2 stmt if expr then stmt else stmt E 1 if expr then stmt S 2 E 2 S 1 stmt if expr then stmt E 1 if expr then stmt else stmt E 2 S 1 S 2 12

15 Compiler Principles Ambiguity Elimination Cont’d We prefer the parse tree, in which else matches with the closest if. So, we can disambiguate our grammar to reflect this choice. The unambiguous grammar will be: stmt  matchedstmt | unmatchedstmt matchedstmt  if expr then matchedstmt else matchedstmt | otherstmts unmatchedstmt  if expr then stmt | if expr then matchedstmt else unmatchedstmt

16 Compiler Principles Left Recursion A grammar is left recursive if it has a non- terminal A such that there is a derivation. A  A  for some string  Top-down parsing techniques cannot handle left- recursive grammars. So, we have to convert our left-recursive grammar into an equivalent grammar which is not left- recursive. The left-recursion may appear in a single step of the derivation (immediate left-recursion), or may appear in more than one step of the derivation. +

17 Compiler Principles Immediate Left-Recursion Elimination A  A  |  where  does not start with A  eliminate immediate left recursion A   A ’ A ’   A ’ |  an equivalent grammar A  A  1 |... | A  m |  1 |... |  n where  1...  n do not start with A  eliminate immediate left recursion A   1 A ’ |... |  n A ’ A ’   1 A ’ |... |  m A ’ |  an equivalent grammar In general:

18 Compiler Principles Immediate Left-Recursion Elimination Example E  E+T | T T  T*F | F F  id | (E) E  T E ’ E ’  +T E ’ |  T  F T ’ T ’  *F T ’ |  F  id | (E)  eliminate immediate left recursion

19 Compiler Principles Non-Immediate Left-Recursion Just eliminating the immediate left-recursion is not enough to get a left-recursion free grammar. S  Aa | b A  Sc | dThis grammar is still left-recursive. S  Aa  Sca or A  Sc  Aac causes to a left-recursion We have to eliminate all left-recursions from our grammar

20 Compiler Principles Algorithm for Eliminating Left-Recursion - Arrange non-terminals in some order: A 1... A n - for i from 1 to n do { - for j from 1 to i-1 do { replace each production A i  A j  by A i   1  |... |  k  where A j   1 |... |  k } - eliminate immediate left-recursions among A i productions }

21 Compiler Principles Example for Eliminating Left-Recursion S  Aa | b A  Ac | Sd | f - Order of non-terminals: S, A for S: - we do not enter the inner loop. - there is no immediate left recursion in S. for A: - Replace A  Sd with A  Aad | bd So, we will have A  Ac | Aad | bd | f - Eliminate the immediate left-recursion in A A  bdA ’ | fA ’ A ’  cA ’ | adA ’ |  So, the resulting equivalent grammar which is not left-recursive is: S  Aa | b A  bdA ’ | fA ’ A ’  cA ’ | adA ’ | 

22 Compiler Principles Example for Eliminating Left-Recursion Cont’d S  Aa | b A  Ac | Sd | f - Order of non-terminals: A, S for A: - Eliminate the immediate left-recursion in A A  SdA ’ | fA ’ A ’  cA ’ |  for S: - Replace S  Aa with S  SdA ’ a | fA ’ a So, we will have S  SdA ’ a | fA ’ a | b - Eliminate the immediate left-recursion in S S  fA’aS’ | bS ’ S ’  dA ’ aS’ |  So, the resulting equivalent grammar which is not left-recursive is: S  fA’aS’ | bS ’ S ’  dA ’ aS’ |  A  SdA ’ | fA ’ A ’  cA ’ | 

23 Compiler Principles Left-Factoring A predictive parser (a top-down parser without backtracking) needs the grammar to be left- factored. grammar  a new equivalent grammar suitable for predictive parsing stmt  if expr then stmt else stmt | if expr then stmt when we see if, we cannot know which production rule to choose to re-write stmt in the derivation.

24 Compiler Principles Left-Factoring Cont’d In general, A   1 |  2 where  is non-empty and the first symbols of  1 and  2 (if they have one) are different. when processing  we cannot know whether expand A to  1 or A to  2 But, if we re-write the grammar as follows A   A ’ A’   1 |  2 so, we can immediately expand A to  A’

25 Compiler Principles Algorithm for Left-Factoring For each non-terminal A with two or more alternatives (production rules) with a common non-empty prefix, say A   1 |... |  n |  1 |... |  m where  is the longest prefix convert it into A   A ’ |  1 |... |  m A ’   1 |... |  n

26 Compiler Principles Left-Factoring – Example1 A  abB | aB | cdg | cdeB | cdfB  A  aA ’ | cdg | cdeB | cdfB A ’  bB | B  A  aA ’ | cdA ’’ A ’  bB | B A ’’  g | eB | fB

27 Compiler Principles Left-Factoring – Example2 A  ad | a | ab | abc | b  A  aA’ | b A’  d |  | b | bc  A  aA’ | b A’  d |  | bA’’ A’’   | c

28 Compiler Principles CFG vs. Regular Expression Grammar is a more powerful notation than regular expressions. Every language described by a regular expression can be described by a grammar. –For each state i of the FA, create a nonterminal A i. –If state i has a transition to state j on input a (include ε), add the production A i  aA j. –If i is an accepting state, add A i  ε. –If i is the start state, make A i be the start symbol of the grammar. A 0  bA 0 | aA 1 A 1  aA 1 | bA 2 A 2  aA 1 | bA 0 A 2  ε (a|b) * a b

29 Compiler Principles CFG Vs. Regular Expression Cont’d A language described by a grammar may not be described by a regular expression. Because regular expression/finite automata cannot count. Example: –Language L = {a n b n | n >= 1} –Can be written as grammar S  aSb | ab –But cannot be expressed by a regular expression

30 Compiler Principles Non-Context-Free Language Constructs There are some language constructions in the programming languages which are not context-free. L1 = {  c  |  is in (a|b)*}is not context-free  declaring an identifier and checking whether it is declared or not later. We cannot do this with a context-free language. We need semantic analyzer (which is not context-free). L2 = {a n b m c n d m | n  1 and m  1 }is not context-free  declaring two functions (one with n parameters, the other one with m parameters), and then calling them with actual parameters.

31 CS308 Compiler Principles Top-Down Parsing

32 Compiler Principles Top-Down Parsing The parse tree is created top to bottom. Top-down parser –Recursive-Descent Parsing Backtracking is needed (If a choice of a production rule does not work, we backtrack to try other alternatives.) It is a general parsing technique, but not widely used. Not efficient –Predictive Parsing No backtracking Efficient Recursive Predictive Parsing is a special form of Recursive Descent parsing without backtracking. Non-Recursive (Table Driven) Predictive Parser is also known as LL(1) parser.

33 Compiler Principles Recursive-Descent Parsing A recursive-descent parsing program consists of a set of procedures, one for each nonterminal. Backtracking is needed (need repeated scans over the input). It tries to find the left-most derivation. S  aBc B  bc | b S input: abc aBc a B c b c b fails, backtrack

34 Compiler Principles Procedure for stmt A left-recursive grammar can cause a recursive- descent parser to go into an infinite loop.

35 Compiler Principles Predictive Parser a grammar  a grammar suitable for predictive eliminating left parsing (a LL(1) grammar) left recursionfactoring no %100 guarantee. When rewriting a non-terminal in a derivation step, a predictive parser can uniquely choose a production rule by just looking the current symbol in the input string. A   1 |... |  n input:... a....... current token

36 Compiler Principles Predictive Parser Example stmt  if...... | while...... | begin......| for..... When we are trying to write the non- terminal stmt, we can uniquely choose the production rule by just looking the current token. –if the current token is if we have to choose first production rule.

37 Compiler Principles Recursive Predictive Parsing Each non-terminal corresponds to a procedure. Example: A  aBb (Only production rule for A ) proc A { -match the current token with a, and move to the next token; -call proc B ; -match the current token with b, and move to the next token; }

38 Compiler Principles Recursive Predictive Parsing Cont’d A  aBb | bAB proc A { case of the current token { ‘ a ’: - match the current token with a, and move to the next token; - call B ; - match the current token with b, and move to the next token; ‘ b ’: - match the current token with b, and move to the next token; - call A ; - call B ; }

39 Compiler Principles Recursive Predictive Parsing Cont’d When to apply  -productions. A  aA | bB |  If all other productions fail, we should apply an  -production. –For example, if the current token is not a or b, we may apply the  -production. Most correct choice: –We should apply an  -production for a non- terminal A when the current token is in the follow set of A (which terminals can follow A in the sentential forms).

40 Compiler Principles Recursive Predictive Parsing Example A  aBe | cBd | C B  bB |  C  f proc A {proc C {match the current token with f, case of the current token { and move to the next token; } a:- match the current token with a, and move to the next token;proc B { - call B; case of the current token { - match the current token with e, b:- match the current token with b, and move to the next token; c:- match the current token with c,- call B and move to the next token; d, e: do nothing - call B; } - match the current token with d,} and move to the next token; f: - call C } follow set of B first set of C

41 Compiler Principles Non-Recursive Predictive Parsing Non-Recursive predictive parsing is a table-driven parsing method. It is a top-down parser. It is also known as LL(1) Parser. one input symbol used as a look-head symbol to determine parser action LL(1) left most derivation input scanned from left to right input buffer stackNon-recursive output Predictive Parser Parsing Table

42 Compiler Principles LL(1) Parser input buffer –string of tokens to be parsed, followed by endmarker $. output –a production rule representing a step of the derivation sequence (left-most derivation) of the string in the input buffer. stack –contains the grammar symbols –at the bottom of the stack, there is a special endmarker $. –initially the stack contains only the symbol $ and the starting symbol S. $S  –when the stack is emptied (i.e., only $ left in the stack), the parsing is completed. parsing table –a two-dimensional array M[A,a] –each row is a non-terminal symbol –each column is a terminal symbol or the special symbol $ –each entry holds a production rule.

43 Compiler Principles LL(1) Parser – Parser Actions The symbol at the top of the stack (say X ) and the current symbol in the input string (say a ) determine the parser action. There are four possible parser actions. 1.If X and a are $  parser halts (successful completion) 2.If X and a are the same terminal symbol (different from $ )  parser pops X from the stack, and moves to the next symbol in the input buffer. 3.If X is a non-terminal  parser looks at the parsing table entry M[X,a]. If M[X,a] holds a production rule X  Y 1 Y 2...Y k, it pops X from the stack and pushes Y k,Y k-1,...,Y 1 into the stack. 4.none of the above  error –all empty entries in the parsing table are errors. –If X is a terminal symbol different from a, this is also an error case.

44 Compiler Principles LL(1) Parser – Example1 S  aBa B  bB |  stackinputoutput $Sabba$ S  aBa $aBaabba$ $aBbba$ B  bB $aBbbba$ $aBba$ B  bB $aBbba$ $aBa$ B   $aa$ $$ accept, successful completion ab$ S S  aBa B B   B  bB LL(1) Parsing Table

45 Compiler Principles LL(1) Parser – Example1 Cont’d Outputs: S  aBa B  bB B  bB B   Derivation(left-most): S  aBa  abBa  abbBa  abba S Baa B Bb b  Parse tree

46 Compiler Principles LL(1) Parser – Example2 E  TE ’ E ’  +TE ’ |  T  FT ’ T ’  *FT ’ |  F  (E) | id id+*()$ E E  TE ’ E’E’ E ’  +TE ’ E’  E’   E’  E’   T T  FT ’ T’T’ T’  T’   T ’  *FT ’ T’  T’   T’  T’   F F  idF  (E)

47 Compiler Principles LL(1) Parser – Example2 Cont’d stackinputoutput $Eid+id$E  TE ’ $E ’ Tid+id$T  FT ’ $E ’ T ’ Fid+id$F  id $ E ’ T ’ idid+id$ $ E ’ T ’ +id$T ’   $ E ’ +id$E ’  +TE ’ $ E ’ T++id$ $ E ’ Tid$T  FT ’ $ E ’ T ’ Fid$F  id $ E ’ T ’ idid$ $ E ’ T ’ $T ’   $ E ’ $E ’   $$ accept

48 Compiler Principles Constructing LL(1) Parsing Tables Two functions are used in the construction of LL(1) parsing tables. FIRST(  ) is a set of the terminal symbols which occur as first symbols in strings derived from  –  is any string of grammar symbols. –if  derives to , then  is also in FIRST(  ). FOLLOW( A ) is the set of the terminals which occur immediately after (follow) the non-terminal A in the strings derived from the starting symbol. –a terminal a is in FOLLOW( A ) if S   Aa  –endmarker $ is in FOLLOW( A ) if S   A * *

49 Compiler Principles Computing FIRST( X ) If X is a terminal symbol  FIRST( X )={ X } If X is a non-terminal symbol and X   is a production rule   is in FIRST( X ) If X is a non-terminal symbol and X  Y 1 Y 2..Y n is a production rule  if terminal a in FIRST( Y i ) and  is in all FIRST( Y j ) for j=1,...,i-1, then a is in FIRST( X ).  if  is in all FIRST( Y j ) for j=1,...,n, then  is in FIRST( X ). If X is   FIRST( X )={  } We apply these rules until nothing more can be added to any FIRST set.

50 Compiler Principles FIRST Example E  TE ’ E ’  +TE ’ |  T  FT ’ T ’  *FT ’ |  F  (E) | id FIRST( F ) = { (, id }FIRST( TE’ ) = { (, id } FIRST( T’ ) = { *,  }FIRST( +TE’ ) = { + } FIRST( T ) = { (, id }FIRST(  ) = {  } FIRST( E’ ) = { +,  }FIRST( FT’ ) = { (, id } FIRST( E ) = { (, id }FIRST( *FT’ ) = { * } FIRST(  ) = {  } FIRST( (E) ) = { ( } FIRST( id ) = { id }

51 Compiler Principles Computing FOLLOW( X ) If S is the start symbol  $ is in FOLLOW( S ) if A   B  is a production rule  everything in FIRST(  ) is in FOLLOW( B ) except  If ( A   B is a production rule ) or ( A   B  is a production rule and  is in FIRST(  ) )  everything in FOLLOW( A ) is in FOLLOW( B ). We apply these rules until nothing more can be added to any FOLLOW set.

52 Compiler Principles FOLLOW Example E  TE ’ E ’  +TE ’ |  T  FT ’ T ’  *FT ’ |  F  (E) | id FOLLOW( E ) = { $, ) } FOLLOW( E’ ) = { $, ) } FOLLOW( T ) = { +, ), $ } FIRST( E’ ) = { +,  } FOLLOW( T’ ) = { +, ), $ } FOLLOW( F ) = { +, *, ), $ }FIRST( T’ ) = { *,  }

53 Compiler Principles Constructing LL(1) Parsing Table For each production A   of grammar G –for each terminal a in FIRST(  )  add A   to M[A,a] –If  in FIRST(  )  for each terminal a in FOLLOW( A ), add A   to M[A,a] –If  in FIRST(  ) and $ in FOLLOW( A )  add A   to M[A,$] All other undefined entries of the parsing table are error entries.

54 Compiler Principles Constructing LL(1) Parsing Table Example E  TE ’ FIRST(TE ’ )={(,id}  E  TE ’ into M[E,(] and M[E,id] E ’  +TE ’ FIRST(+TE ’ )={+}  E ’  +TE ’ into M[E ’,+] E ’   FIRST(  )={  }  none but since  in FIRST(  ) and FOLLOW(E ’ )={$,)}  E ’   into M[E ’,$] and M[E ’,)] T  FT ’ FIRST(FT ’ )={(,id}  T  FT ’ into M[T,(] and M[T,id] T ’  *FT ’ FIRST(*FT ’ )={*}  T ’  *FT ’ into M[T ’,*] T ’   FIRST(  )={  }  none but since  in FIRST(  ) and FOLLOW(T ’ )={$,),+}  T ’   into M[T ’,$], M[T ’,)] and M[T ’,+] F  (E) FIRST((E))={(}  F  (E) into M[F,(] F  idFIRST(id)={id}  F  id into M[F,id]

55 Compiler Principles LL(1) Grammars A grammar whose parsing table has no multiply-defined entries is said to be LL(1) grammar. The parsing table of a grammar may contain more than one production rule. In this case, we say that it is not a LL(1) grammar. a grammar  a LL(1) grammar (no %100 guarantee) eliminating left left recursionfactoring

56 Compiler Principles A Grammar which is not LL(1) S  i C t S E | a FOLLOW( S ) = { $, e } E  e S |  FOLLOW( E ) = { $, e } C  b FOLLOW( C ) = { t } FIRST( iCtSE ) = { i } FIRST( a ) = { a } FIRST( eS ) = { e } FIRST(  ) = {  } FIRST( b ) = { b } Problem: ambiguity two production rules for M[E,e] abeit$ S S  a S  iCtSE E E  e S E   C C  b

57 Compiler Principles A Grammar which is not LL(1) Cont’d What can we do if the resulting parsing table contains multiply defined entries? –eliminate the left recursion. –left factor the grammar. –If the parsing table still contains multiply defined entries, that grammar is ambiguous or it is inherently not a LL(1) grammar. A left recursive grammar cannot be a LL(1) grammar. –A  A  |  –any terminal that appears in FIRST(  ) also appears FIRST(A  ) because A   . –If  is , any terminal that appears in FIRST(  ) also appears in FIRST(A  ) and FOLLOW(A). A not left factored grammar cannot be a LL(1) grammar A   1 |  2 any terminal that appears in FIRST(  1 ) also appears in FIRST(  2 ). An ambiguous grammar cannot be a LL(1) grammar.

58 Compiler Principles Properties of LL(1) Grammars A grammar G is LL(1) if and only if the following conditions hold for any two distinctive production rules A   and A   1.  and  do not derive any string starting with the same terminals. 2.At most one of  and  can derive . 3.If  can derive , then  cannot derive to any string starting with a terminal in FOLLOW(A).

59 Compiler Principles Error Recovery in Predictive Parsing An error may occur in the predictive parsing (LL(1) parsing) –if the terminal symbol on the top of stack does not match with the current input symbol. –if the top of stack is a non-terminal A, the current input symbol is a, and the parsing table entry M[A,a] is empty. What should the parser do in an error case? –The parser should be able to give an error message (as much as possible meaningful error message). –It should be recover from that error case, and it should be able to continue the parsing with the rest of the input.

60 Compiler Principles Error Recovery Techniques Panic-Mode Error Recovery –Skipping the input symbols until a synchronizing token is found. Phrase-Level Error Recovery –Each empty entry in the parsing table is filled with a pointer to a specific error routine to take care that error case.

61 Compiler Principles Panic-Mode Error Recovery In panic-mode error recovery, we skip all the input symbols until a synchronizing token is found. What is the synchronizing token? –All the terminal-symbols in the follow set of a non-terminal can be used as a synchronizing token set for that non- terminal. A simple panic-mode error recovery for the LL(1) parsing: –All the empty entries are marked as synch to indicate that the parser will skip all the input symbols until a symbol in the follow set of the non-terminal A which on the top of the stack. Then the parser will pop that non-terminal A from the stack. The parsing continues from that state. –To handle unmatched terminal symbols, the parser pops that unmatched terminal symbol from the stack and it issues an error message saying that the unmatched terminal is inserted.

62 Compiler Principles Panic-Mode Error Recovery Example S  AbS | e |  A  a | cAd FOLLOW(S)={$} FOLLOW(A)={b,d} stackinputoutputstackinput output $Saab$S  AbS$Sceadb$ S  AbS $SbAaab$A  a $SbAceadb$ A  cAd $Sbaaab$$SbdAcceadb$ $Sbab$Error: missing b$SbdAeadb$Error:unexpected e (illegal A) $Sab$S  AbS(Remove all input tokens until first b or d, pop A) $SbAab$A  a $Sbddb$ $Sbaab$$Sbb$ $Sbb$$S$ S   $S$S   $$accept $$accept abcde$ S S  AbS syn c S  AbS syn c S  e S   A A  a syn c A  cAd syn c

63 Compiler Principles Phrase-Level Error Recovery Each empty entry in the parsing table is filled with a pointer to a special error routine which will take care that error case. These error routines may: –change, insert, or delete input symbols. –issue appropriate error messages –pop items from the stack. We should be careful when we design these error routines, because we may put the parser into an infinite loop.

64 CS308 Compiler Principles Bottom-Up Parsing

65 Compiler Principles Bottom-Up Parsing A bottom-up parser creates the parse tree of the given input starting from leaves towards the root. A bottom-up parser tries to find the right-most derivation of the given input in the reverse order. S ...   Bottom-up parsing is also known as shift-reduce parsing because its two main actions are shift and reduce. –At each shift action, the current symbol in the input string is pushed to a stack. –At each reduction step, the symbols at the top of the stack (this symbol sequence is the right side of a production) will be replaced by the non-terminal at the left side of that production.

66 Compiler Principles Shift-Reduce Parsing A shift-reduce parser tries to reduce the given input string into the starting symbol. a string  the starting symbol reduced to At each reduction step, a substring of the input matching to the right side of a production rule is replaced by the non-terminal at the left side of that production rule. If the substring is chosen correctly, the right most derivation of that string is created in the reverse order. Rightmost Derivation: S   Shift-Reduce Parser finds: S ...   * rm

67 Compiler Principles Shift-Reduce Parsing -- Example S  aABb input string: aaabb A  aA | aaaAbb B  bB | baAbb  reduction aABb S S  aABb  aAbb  aaAbb  aaabb Right Sentential Forms rm

68 Compiler Principles Handle In the following reduction, a handle of  is the body of production A   in the position following . S   A    (  is a string of terminals) A handle is a substring that matches the right side of a production rule. –But not every substring matches the right side of a production rule is a handle –Only that can move the reduction forward towards the start symbol in the reverse of a rightmost derivation. If the grammar is unambiguous, then every right- sentential form of the grammar has exactly one handle. rm *

69 Compiler Principles Handle Example S  aB | bA A  a | aS | bAA B  aBB | bS | b What is the handle of aabbAb ? S  aB  aaBB  aaBb  aabSb  aabbAb Handle is bA

70 Compiler Principles Handle Pruning A right-most derivation in reverse can be obtained by handle-pruning. S=  0   1   2 ...   n-1   n =  input string –From  n, find a handle A n  n in  n, and replace  n by A n to get  n-1. –Then find a handle A n-1  n-1 in  n-1, and replace  n-1 by A n-1 to get  n-2. –Repeat this, until we reach S. rm

71 Compiler Principles Handle Pruning Example E  E+T | T Right-Most Derivation of id+id*id T  T*F | FE  E+T  E+T*F  E+T*id  E+F*id F  (E) | id  E+id*id  T+id*id  F+id*id  id+id*id Right Sentential FormReducing Production id+id*idF  id F+id*idT  F T+id*idE  T E+id*idF  id E+F*idT  F E+T*idF  id E+T*FT  T*F E+TE  E+T E Handles are red and underlined in the right-sentential forms.

72 Compiler Principles Shift-Reduce Parsing Initial stack just contains only the end-marker $. The end of the input string is marked by the end-marker $. There are four possible actions in a shift-reduce parser: –Shift : The next input symbol is shifted into the top of the stack. –Reduce: Replace the handle on the top of the stack by the non-terminal. –Accept: Successful completion of parsing. –Error: Parser discovers a syntax error, and calls an error recovery routine.

73 Compiler Principles Shift-Reduce Parsing Example StackInputAction E  E+T | T $id+id*id$shift T  T*F | F $id+id*id$reduce by F  id F  (E) | id $F+id*id$reduce by T  F $T+id*id$reduce by E  T E 8 $E+id*id$shift $E+id*id$shift E 3 + T 7 $E+id*id$reduce by F  id $E+F*id$reduce by T  F T 2 T 5 * F6 $E+T*id$shift $E+T*id$shift F 1 F 4 id $E+T*id$reduce by F  id $E+T*F$reduce by T  T*F id id $E+T$reduce by E  E+T Parse Tree $E$accept

74 Compiler Principles Conflicts During Shift-Reduce Parsing There are context-free grammars for which shift-reduce parsers cannot be used. Stack contents and the next input symbol may not decide action: –shift/reduce conflict: Whether make a shift operation or a reduction. –reduce/reduce conflict: The parser cannot decide which of several reductions to make. If a shift-reduce parser cannot be used for a grammar, that grammar is called as non- LR(k) grammar. An ambiguous grammar can never be a LR grammar.

75 Compiler Principles Shift-Reduce Parsers There are two main categories of shift-reduce parsers 1.Operator-Precedence Parser –simple, but only a small class of grammars. 2.LR-Parsers –covers wide range of grammars. SLR – simple LR parser LR – most general LR parser LALR – intermediate LR parser (lookahead LR parser) –SLR, LR and LALR work same, only their parsing tables are different. SLR CFG LR LALR

76 Compiler Principles LR Parsers The most powerful shift-reduce parsing (yet efficient) is: LR(k) parsing. left to right right-mostk lookahead scanning derivation(k is omitted  it is 1) LR parsing’s advantages: –LR parsing is the most general non-backtracking shift-reduce parsing, yet it is still efficient. –The class of grammars that can be parsed using LR methods is a proper superset of the class of grammars that can be parsed with predictive parsers. LL(1)-Grammars  LR(1)-Grammars –An LR-parser can detect a syntactic error in a left-to-right scan of the input.

77 Compiler Principles Model of LR Parser SmSm XmXm S m-1 X m-1.. S1S1 X1X1 S0S0 a1a1...aiai anan $ Action Table terminals and $ s t four different a actions t e s Goto Table non-terminal s t each item is a a state number t e s LR Parsing Algorithm stack input output state symbol

78 Compiler Principles A Configuration of LR Parsing Algorithm A configuration of a LR parsing is: ( S o X 1 S 1... X m S m, a i a i+1... a n $ ) Stack Rest of Input S m and a i decides the parser action by consulting the parsing action table. (Initial Stack contains just S o ) A configuration of a LR parsing represents the right sentential form: X 1... X m a i a i+1... a n $

79 Compiler Principles Actions of A LR-Parser 1.shift s -- shifts the next input symbol and the state s into the stack ( S o X 1 S 1... X m S m, a i a i+1... a n $ )  ( S o X 1 S 1... X m S m a i s, a i+1... a n $ ) 2.reduce A  –pop 2|  | (r= |  |) items from the stack; –then push A and s, where s=goto[s m-r, A] ( S o X 1 S 1... X m S m, a i a i+1... a n $ )  ( S o X 1 S 1... X m-r S m-r A s, a i... a n $ ) –Output is the reducing production A  3.Accept – Parsing successfully completed 4.Error -- Parser detected an error (an empty entry in the action table)

80 Compiler Principles Reduce Action Pop 2|  | (r= |  |) items from the stack; Assume that  = Y 1 Y 2...Y r Push A and s where s=goto[s m-r, A] ( S o X 1 S 1... X m-r S m-r Y 1 S m-r...Y r S m, a i a i+1... a n $ )  ( S o X 1 S 1... X m-r S m-r A s, a i... a n $ ) In fact, Y 1 Y 2...Y r is a handle. X 1... X m-r A a i... a n $  X 1... X m Y 1...Y r a i a i+1... a n $

81 Compiler Principles (SLR) Parsing Table stat e id+*()$ETF 0s5s4123 1s6acc 2r2s7r2 3r4 4s5s4823 5r6 6s5s493 7s5s410 8s6s11 9r1s7r1 10r3 11r5 Action TableGoto Table 1) E  E+T 2) E  T 3) T  T*F 4) T  F 5) F  (E) 6) F  id

82 Compiler Principles Moves of A LR-Parser Example stackinputactionoutput 0id*id+id$shift 5 0id5*id+id$reduce by F  id F  id 0F3*id+id$reduce by T  F T  F 0T2*id+id$shift 7 0T2*7id+id$shift 5 0T2*7id5+id$reduce by F  id F  id 0T2*7F10+id$ reduce by T  T*F T  T*F 0T2+id$reduce by E  T E  T 0E1+id$shift 6 0E1+6id$shift 5 0E1+6id5$reduce by F  id F  id 0E1+6F3$reduce by T  F T  F 0E1+6T9$reduce by E  E+T E  E+T 0E1$accept

83 Compiler Principles Constructing SLR Parsing Tables – LR(0) Item An LR(0) item of a grammar G is a production of G with a dot at some position of the body. Ex: A  aBb Possible LR(0) Items: A . aBb (four different possibilities) A  a. Bb A  aB. b A  aBb. A collection of sets of LR(0) items (the canonical LR(0) collection) is the basis for constructing SLR parsers. (LR(0) automation) The collection of sets of LR(0) items will be the states. Augmented Grammar: G’ is G with a new production rule S’  S where S’ is the new starting symbol. CLOSURE and GOTO function

84 Compiler Principles The Closure Operation If I is a set of LR(0) items for a grammar G, then closure( I ) is the set of LR(0) items constructed from I by the two rules: 1.Initially, every LR(0) item in I is added to closure( I ). 2.If A  . B  is in closure( I ) and B  is a production rule of G, then B .  will be in the closure( I ). Apply this rule until no more new LR(0) items can be added to closure( I ).

85 Compiler Principles The Closure Operation -- Example E’  E closure ({E’ .E}) = E  E+T{ E’ . E kernel item E  T E . E+T T  T*FE . T T  F T . T*F F  (E) T . F F  id F . (E) F . id } Kernel items : the initial item, S’ . S, and all items whose dots are not at the left end. Nonkernel items : all items with their dots at the left end, except for S' . S.

86 Compiler Principles Goto Operation If I is a set of LR(0) items and X is a grammar symbol (terminal or non-terminal), then goto( I,X ) is defined as follows: –If A  . X  in I, then every item in closure({ A   X.  }) will be in goto( I,X ). Example: I ={ E’ . E, E . E+T, E . T, T . T*F, T . F, F . (E), F . id } goto(I,E) = { E’  E., E  E. +T } goto(I,T) = { E  T., T  T. *F } goto(I,F) = {T  F. } goto(I,() = { F  (. E), E . E+T, E . T, T . T*F, T . F, F . (E), F . id } goto(I,id) = { F  id. }

87 Compiler Principles Construction of The Canonical LR(0) Collections To create the SLR parsing tables for a grammar G, we will create the canonical LR(0) collection of the grammar G’. Algorithm: C is { closure({ S’ . S }) } repeat the followings until no more set of LR(0) items can be added to C. for each I in C and each grammar symbol X if goto( I,X ) is not empty and not in C add goto( I,X ) to C goto function is a DFA on the sets in C.

88 Compiler Principles The Canonical LR(0) Collection Example I 0 : E’ .EI 1 : E’  E.I 6 : E  E+.T I 9 : E  E+T. E .E+T E  E.+T T .T*F T  T.*F E .T T .F T .T*FI 2 : E  T. F .(E) I 10 : T  T*F. T .F T  T.*F F .id F .(E) F .id I 3 : T  F.I 7 : T  T*.FI 11 : F  (E). F .(E) I 4 : F  (.E) F .id E .E+T E .T I 8 : F  (E.) T .T*F E  E.+T T .F F .(E) F .id I 5 : F  id.

89 Compiler Principles Transition Diagram (DFA) of Goto Function I0I0 I1I2I3I4I5I1I2I3I4I5 I 6 I 7 I 8 to I 2 to I 3 to I 4 I 9 to I 3 to I 4 to I 5 I 10 to I 4 to I 5 I 11 to I 6 to I 7 id ( F * E E + T T T ) F F F ( ( * ( +

90 Compiler Principles Constructing SLR Parsing Table 1.Construct the canonical collection of sets of LR(0) items for G’. C  {I 0,...,I n } 2.Create the parsing action table as follows If a is a terminal, A .a  in I i and goto( I i,a )= I j then action[ i,a ] is shift j. If A . is in I i, then action[i,a] is reduce A  for all a in FOLLOW(A) where A  S’. If S’  S. is in I i, then action[ i,$ ] is accept. If any conflicting actions generated by these rules, the grammar is not SLR. 3.Create the parsing goto table for all non-terminals A, if goto(I i,A)=I j then goto[i,A]=j 4.All entries not defined by (2) and (3) are errors. 5.Initial state of the parser contains S’ .S

91 Compiler Principles Parsing Tables of Expression Grammar stat e id+*()$ETF 0s5s4123 1s6acc 2r2s7r2 3r4 4s5s4823 5r6 6s5s493 7s5s410 8s6s11 9r1s7r1 10r3 11r5 Action TableGoto Table 1) E  E+T 2) E  T 3) T  T*F 4) T  F 5) F  (E) 6) F  id

92 Compiler Principles SLR(1) Grammar An LR parser using SLR(1) parsing tables for a grammar G is called a SLR(1) parser for G. If a grammar G has an SLR(1) parsing table, it is called SLR(1) grammar (SLR grammar for short). Every SLR grammar is unambiguous, but not every unambiguous grammar is a SLR grammar.

93 Compiler Principles Shift/Reduce and Reduce/Reduce Conflicts If a state does not know whether it will make a shift operation or reduction for a terminal, we say that there is a shift/reduce conflict. If a state does not know whether it will make a reduction operation using the production rule i or j for a terminal, we say that there is a reduce/reduce conflict. If the SLR parsing table of a grammar G has a conflict, we say that that grammar is not SLR grammar.

94 Compiler Principles Conflict Example 1 S  L=RI 0 : S’ .SI 1 : S’  S. I 6 : S  L=.R I 9 : S  L=R. S  R S .L=R R .L L  *R S .RI 2 : S  L.=R L .*R L  id L .*R R  L. L .id R  L L .id R .LI 3 : S  R. I 4 : L  *.R I 7 : L  *R. Problem R .L FOLLOW(R) = {=,$} L .*RI 8 : R  L. = shift 6 L .id reduce by R  L shift/reduce conflict I 5 : L  id.

95 Compiler Principles Conflict Example 2 S  AaAb I 0 :S’ .S S  BbBaS .AaAb A   S .BbBa B   A . B . Problem FOLLOW(A)={a,b} FOLLOW(B)={a,b} areduce by A   breduce by A   reduce by B   reduce/reduce conflict

96 Compiler Principles Constructing Canonical LR(1) Items In SLR method, the state i makes a reduction by A  when the current token is a : –if the A . in the I i and a is in FOLLOW( A ) In some situations,  A cannot be followed by the terminal a in a right-sentential form when  and the state i are on the top stack. This means that making reduction in this case is not correct. –Consider previous example 1

97 Compiler Principles LR(1) Item To avoid some of invalid reductions, the states need to carry more information. Extra information is put into a state by including a terminal symbol as a second component in an item. A LR(1) item is: A  . ,a –where a is the look-head of the LR(1) item –a is a terminal or end-marker.

98 Compiler Principles LR(1) Item Cont’d When  ( in the LR(1) item A  . ,a ) is not empty, the lookahead does not have any effect. When  is empty ( A  .,a ), we do the reduction by A  only if the next input symbol is a (not for any terminal in FOLLOW( A )). A state will contain A  .,a 1 where {a 1,...,a n }  FOLLOW( A )... A  .,a n

99 Compiler Principles Canonical Collection of Sets of LR(1) Items The construction of the canonical collection of the sets of LR(1) items are similar to that of the sets of LR(0) items, except that closure and goto operations work a little bit different. closure( I ) is: ( where I is a set of LR(1) items) –every LR(1) item in I is in closure( I ) –if A . B ,a in closure( I ) and B  is a production rule of G ; then B . ,b will be in the closure( I ) for each terminal b in FIRST(  a ).

100 Compiler Principles goto operation If I is a set of LR(1) items and X is a grammar symbol (terminal or non-terminal), then goto( I,X ) is defined as follows: –If A  .X ,a in I then every item in closure({ A   X. ,a }) will be in goto( I,X ).

101 Compiler Principles Construction of The Canonical LR(1) Collection Algorithm: C is { closure({ S’ .S,$ }) } repeat the followings until no more set of LR(1) items can be added to C. for each I in C and each grammar symbol X if goto( I,X ) is not empty and not in C add goto( I,X ) to C goto function is a DFA on the sets in C.

102 Compiler Principles A Short Notation A set of LR(1) items containing the following items A  . ,a 1... A  . ,a n can be written as A  . ,a 1 /a 2 /.../a n

103 Compiler Principles Canonical LR(1) Collection Example 1 S’  S 1) S  L=R 2) S  R 3) L  *R 4) L  id 5) R  L I 0 :S’ .S,$ S .L=R,$ S .R,$ L .*R,$/= L .id,$/= R .L,$ I 1 :S’  S.,$ I 2 :S  L.=R,$ R  L.,$ I 3 :S  R.,$ I 4 :L  *.R,$/= R .L,$/= L .*R,$/= L .id,$/= I 5 :L  id.,$/= I 6 :S  L=.R,$ R .L,$ L .*R,$ L .id,$ I 7 :L  *R.,$/= I 8 : R  L.,$/= I 9 :S  L=R.,$ I 10 :R  L.,$ I 11 :L  *.R,$ R .L,$ L .*R,$ L .id,$ I 12 :L  id.,$ I 13 :L  *R.,$ to I 6 to I 7 to I 8 to I 4 to I 5 to I 10 to I 11 to I 12 to I 9 to I 10 to I 11 to I 12 to I 13 id S L L L R R R R L * * * * I 4 and I 11 I 5 and I 12 I 7 and I 13 I 8 and I 10

104 Compiler Principles Canonical LR(1) Collection Example 2 S  AaAb I 0 :S’ .S,$ I 1 : S’  S.,$ S  BbBaS .AaAb,$ A   S .BbBa,$ I 2 : S  A.aAb,$ B   A .,a B .,b I 3 : S  B.bBa,$ I 4 : S  Aa.Ab,$ I 6 : S  AaA.b,$ I 8 : S  AaAb.,$ A .,b I 5 : S  Bb.Ba,$ I 7 : S  BbB.a,$ I 9 : S  BbBa.,$ B .,a S A B a b A B a b to I 4 to I 5

105 Compiler Principles Construction of LR(1) Parsing Tables 1.Construct the canonical collection of sets of LR(1) items for G’. C  {I 0,...,I n } 2.Create the parsing action table as follows If a is a terminal, A . a ,b in I i and goto( I i,a )= I j then action[ i,a ] is shift j. If A .,a is in I i, then action[ i,a ] is reduce A  where A  S’. If S’  S.,$ is in I i, then action[ i, $ ] is accept. If any conflicting actions generated by these rules, the grammar is not LR(1). 3.Create the parsing goto table for all non-terminals A, if goto( I i,A )= I j then goto[ i,A ]= j 4.All entries not defined by (2) and (3) are errors. 5.Initial state of the parser contains S’ .S,$

106 Compiler Principles LR(1) Parsing Tables for Example 1 id*=$SLR 0s5s4123 1acc 2s6r5 3r2 4s5s487 5r4 6s12s11109 7r3 8r5 9r1 10r5 11s12s111013 12r4 13r3 no shift/reduce or no reduce/reduce conflict  so, it is a LR(1) grammar

107 Compiler Principles LALR Parsing Tables LALR stands for LookAhead LR. LALR parsers are often used in practice because LALR parsing tables are smaller than LR(1) parsing tables. The number of states in SLR and LALR parsing tables for a grammar G are equal. But LALR parsers recognize more grammars than SLR parsers. A state of LALR parser will be a set of LR(1) items with modifications. Yacc creates a LALR parser for the given grammar.

108 Compiler Principles Creating LALR Parsing Tables Canonical LR(1) Parser  LALR Parser shrink # of states This shrink process may introduce a reduce/reduce conflict in the resulting LALR parser (so the grammar is NOT LALR) But, this shrink process does not produce a shift/reduce conflict.

109 Compiler Principles The Core of A Set of LR(1) Items The core of a set of LR(1) items is the set of its first component. S  L.=R,$  S  L.=R Core R  L.,$R  L. Find the states (sets of LR(1) items) in a canonical LR(1) parser with the same core, and merge them into a single state. I 1 :L  id.,=  A new state: I 12 : L  id.,=/$ I 2 :L  id.,$ Do this for all states of a canonical LR(1) parser to get the states of the LALR parser.

110 Compiler Principles Shift/Reduce Conflict We cannot introduce a shift/reduce conflict during the shrinking process for the creation of the states of a LALR parser. Assume that we can introduce a shift/reduce conflict. In this case, a state of LALR parser must have: A  .,aandB  . a ,b This means that a state of the canonical LR(1) parser must have: A  .,aandB  . a ,c But, this state has also a shift/reduce conflict. i.e. The original canonical LR(1) parser has a conflict.

111 Compiler Principles Reduce/Reduce Conflict But, we may introduce a reduce/reduce conflict during the shrink process for the creation of the states of a LALR parser. I 1 : A  .,a I 2 : A  .,b B  .,b B  .,c  I 12 : A  .,a/b  reduce/reduce conflict B  .,b/c

112 Compiler Principles Creation of LALR Parsing Tables Create the canonical LR(1) collection of the sets of LR(1) items for the given grammar. For each core, find all sets having it, and replace those sets into a single set. C={I 0,...,I n }  C’={J 0,...,J m } where m  n Create the parsing table (action and goto tables) the same way as that of LR(1) parser. –Note: If J=I 1 ...  I k, since I 1,...,I k have the same core  cores of goto( I 1,X ),...,goto( I k,X ) must be same. –So, goto( J,X )= K where K is the union of all sets of items having the same core as goto( I 1,X ). If no conflict is introduced, the grammar is LALR(1) grammar.

113 Compiler Principles Canonical LR(1) Collection Example 1 S’  S 1) S  L=R 2) S  R 3) L  *R 4) L  id 5) R  L I 0 :S’ .S,$ S .L=R,$ S .R,$ L .*R,$/= L .id,$/= R .L,$ I 1 :S’  S.,$ I 2 :S  L.=R,$ R  L.,$ I 3 :S  R.,$ I 4 :L  *.R,$/= R .L,$/= L .*R,$/= L .id,$/= I 5 :L  id.,$/= I 6 :S  L=.R,$ R .L,$ L .*R,$ L .id,$ I 7 :L  *R.,$/= I 8 : R  L.,$/= I 9 :S  L=R.,$ I 10 :R  L.,$ I 11 :L  *.R,$ R .L,$ L .*R,$ L .id,$ I 12 :L  id.,$ I 13 :L  *R.,$ to I 6 to I 7 to I 8 to I 4 to I 5 to I 10 to I 11 to I 12 to I 9 to I 10 to I 11 to I 12 to I 13 id S L L L R R R R L * * * * I 4 and I 11 I 5 and I 12 I 7 and I 13 I 8 and I 10

114 Compiler Principles Canonical LALR(1) Collection Example 1 S’  S 1) S  L=R 2) S  R 3) L  *R 4) L  id 5) R  L I 0 :S’ . S,$ S . L=R,$ S . R,$ L . *R,$/= L . id,$/= R . L,$ I 1 :S’  S.,$ I 2 :S  L. =R,$ R  L.,$ I 3 :S  R.,$ I 411 :L  *. R,$/= R . L,$/= L . *R,$/= L . id,$/= I 512 :L  id.,$/= I 6 :S  L=. R,$ R . L,$ L . *R,$ L . id,$ I 713 :L  *R.,$/= I 810 : R  L.,$/= I 9 :S  L=R.,$ to I 6 to I 713 to I 810 to I 411 to I 512 to I 810 to I 411 to I 512 to I 9 S L L L R R id R * * * Same Cores I 4 and I 11 I 5 and I 12 I 7 and I 13 I 8 and I 10

115 Compiler Principles LALR(1) Parsing Tables for Example2 id*=$SLR 0s5s4123 1acc 2s6r5 3r2 4s5s487 5r4 6s12s11109 7r3 8r5 9r1 no shift/reduce or no reduce/reduce conflict  so, it is a LALR(1) grammar

116 Compiler Principles Using Ambiguous Grammars Why we want to use an ambiguous grammar? –Some of the ambiguous grammars are much natural, and a corresponding unambiguous grammar can be very complex. –Usage of an ambiguous grammar may eliminate unnecessary reductions. All grammars used in the construction of LR-parsing tables must be un-ambiguous. Can we create LR-parsing tables for ambiguous grammars? –Yes, but they will have conflicts. –We can resolve these conflicts in favor of one of them to disambiguate the grammar. –At the end, we will have again an unambiguous grammar. Example: E  E+T | T E  E+E | E*E | (E) | id  T  T*F | F F  (E) | id

117 Compiler Principles Sets of LR(0) Items for Ambiguous Grammar I 0 : E’ . E E . E+E E . E*E E . (E) E . id I 1 : E’  E. E  E. +E E  E. *E I 2 : E  (. E) E . E+E E . E*E E . (E) E . id I 3 : E  id. I 4 : E  E +. E E . E+E E . E*E E . (E) E . id I 5 : E  E *. E E . E+E E . E*E E . (E) E . id I 6 : E  (E. ) E  E. +E E  E. *E I 7 : E  E+E. E  E. +E E  E. *E I 8 : E  E*E. E  E. +E E  E. *E I 9 : E  (E). I5I5 ) E E E E * + + + + * * * ( ( ( ( id I4I4 I2I2 I2I2 I3I3 I3I3 I4I4 I4I4 I5I5 I5I5

118 Compiler Principles SLR-Parsing Tables for Ambiguous Grammar FOLLOW(E) = { $,+,*,) } State I 7 has shift/reduce conflicts for symbols + and *. I0I0 I1I1 I7I7 I4I4 E+E when current token is + shift  + is right-associative reduce  + is left-associative when current token is * shift  * has higher precedence than + reduce  + has higher precedence than *

119 Compiler Principles SLR-Parsing Tables for Ambiguous Grammar FOLLOW(E) = { $,+,*,) } State I 8 has shift/reduce conflicts for symbols + and *. I0I0 I1I1 I8I8 I5I5 E*E when current token is * shift  * is right-associative reduce  * is left-associative when current token is + shift  + has higher precedence than * reduce  * has higher precedence than +

120 Compiler Principles SLR-Parsing Tables for Ambiguous Grammar id+*()$E 0s3s21 1s4s5acc 2s3s26 3r4 4s3s27 5s3s28 6s4s5s9 7r1s5r1 8r2 9r3 ActionGoto

121 Compiler Principles Error Recovery in LR Parsing An LR parser will detect an error when it consults the parsing action table and finds an error entry. All empty entries in the action table are error entries. Errors are never detected by consulting the goto table. An LR parser will announce error if there is no valid continuation for the scanned portion of the input. A canonical LR parser (LR(1) parser) will never make even a single reduction before announcing an error. The SLR and LALR parsers may make several reductions before announcing an error. But, all LR parsers (LR(1), LALR and SLR parsers) will never shift an erroneous input symbol onto the stack.

122 Compiler Principles Panic Mode Error Recovery in LR Parsing Scan down the stack until a state s with a goto on a particular nonterminal A is found. (Get rid of everything from the stack before this state s). Discard zero or more input symbols until a symbol a is found that can legitimately follow A. –The symbol a is simply in FOLLOW(A). The parser stacks the nonterminal A and the state goto[s,A], and it resumes the normal parsing. This nonterminal A is normally a basic programming block (there can be more than one choice for A). –stmt, expr, block,...

123 Compiler Principles Phrase-Level Error Recovery in LR Parsing Each empty entry in the action table is marked with a specific error routine. An error routine reflects the error that the user most likely will make in that case. An error routine inserts the symbols into the stack or the input (or deletes the symbols from the stack and the input, or do both insertion and deletion). –missing operand –unbalanced right parenthesis

124 Compiler Principles Homework Exercise 4.2.1 Exercise 4.4.1(e), 4.4.12 Exercise 4.6.5 Exercise 4.7.1 Due date: Oct. 15 (Monday), 2012


Download ppt "CS308 Compiler Principles Syntax Analyzer Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University Fall 2012."

Similar presentations


Ads by Google