Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.

Similar presentations


Presentation on theme: "COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ."— Presentation transcript:

1 COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.

2 Overview of the Subject (COMP 3438) Overview of Unix Sys. Prog. ProcessFile System Overview of Device Driver Development Character Device Driver Development Introduction to Block Device Driver Overview of Complier Design Lexical Analysis Syntax Analysis (HW #4) Part I: Unix System Programming (Device Driver Development) Part II: Compiler Design Course Organization (This lecture is in red)

3 Outline Part I: Introduction to Syntax Analysis 1. Input (Tokens) and Output (Parse Tree) 2. How to specify syntax? Context Free Grammar (CFG) 3. How to obtain parse tree? CFG  Remove left recursion, left factoring, ambiguity  LL (Leftmost Derivation) CFG  (Remove ambiguity)  LR (Reverse Rightmost Derivation) Part II: Context Free Grammar, Parse Tree and Ambiguity Part III: Bottom-up Paring (LR) SLR, Canonical LR, LALR Part III: Top-down Parsing (LL) Left Recursion, Left factoring (Tutorial) Recursive-Decent Paring Predictive Parsing (without backtracking) –HW4 Nonrecursive Predictive Parsing Software Tool: yacc (Lab)

4 Part IV: Predictive Parsing & Nonrecursive Predictive Parsing

5 Predictive Parser A special case of recursive-decent parser Do NOT need backtracking How to get the grammar that can be parsed by a predictive parser: Remove left recursion Left factoring the resulting grammar

6 6 In order to eliminate backtracking, we must know, given input symbol a and nonterminal A, which alternative of A   1 |  2 |... |  n is the right one that derives a string beginning with a. That is, we must be able to detect the proper alternative by looking at only the FIRST symbol it derives. Predictive parsing relies on the information about what FIRST symbols can be generated by the right side of a production Predictive parsers

7 7 Let  be the right side of a production for nonterminal A, i.e., A   is a production. We define FIRST(  ) to be the set of tokens {a} that appear as the FIRST symbols of the strings generated from , i.e. FIRST(  ) = {a |  => * a  } Consider again the example grammar G E  TE' E'  +TE' |  T  FT' T'  *FT' |  F  ( E ) | id Then FIRST(E) = FIRST(T) = FIRST(F) = {(, id} FIRST(E ’ )={+,  } FISRT(T ’ ) = {*,  } For two productions A   and A  , predictive parsing requires FIRST(  ) and FIRST(  ) to be disjoint so that the lookahead symbol can be used to decide which production to use. Predictive parsers

8 8 For each nonterminal, we have one corresponding procedures Each procedure does two things: select a production to use based on the lookahead symbol. Use the production with right side  if the lookahead symbol is in FIRST(  ). A production with  on the right side is used if the lookahead symbol is not in FIRST set for any other right hand side. apply a production by mimicking the right side. Call the procedure for the nonterminal, and if a token matches the lookahead symbol, the next input token is read. If at some point the token in the production does not match the lookahead symbol, an error is declared. The parser begins with the procedure for the start symbol. match terminal symbols against input, and make a potentially recursive procedure call whenever it has to expand a nonterminal. Implement a predictive parser

9 9 The above approach works only if the given grammar does not have nondeterminism, i.e, there is no conflict between right sides for any lookahead symbol. If ambiguity occurs, we try to resolve it in an ad-hoc way. If the nondeterminism cannot be eliminated, use recursive-descent parser with backtracking to systematically try all possibilities Predictive parsers

10 10 Non-recursive predictive parsers If we don ’ t have a recursive language for writing the parser or the overhead of recursive calls is too much, a non-recursive version - a tabular implementation of predictive parsing - can be used The parser maintains an input buffer, a stack and a parsing table a + b $ Predictive Parsing Program Parsing Table M XYZ$XYZ$ OUTPUT Stack A two-dimensional array: M[A,a] where A is nonterminal, and a is terminal or symbol $ Input buffer Contain input tokens with “$” (denoting the end). A sequence of grammar sym- bols with $ on the bottom.

11 11 The parser is controlled by a program that behaves as follows. Given the top stack symbol X and the current input symbol a, If X = a = $, stops and announces successful completion of parsing. If X = a  $, pops X off the stack and advances the input pointer to the next input symbol. If X is a nonterminal, looks up entry M[X, a] of parsing table.  If M[X,a] = {X  UVW}, replaces X on top of stack by WVU (U on top).  If M[X, a] = error, calls an error recovery routine. Non-recursive predictive parsers X... $ U V W... $

12 12 input id + id * id Grmmar: E  TE' E'  +TE' |  T  FT’ T'  *FT' |  F  ( E ) | id Example

13 13 Construct predictive parsing table Use two functions, FIRST and FOLLOW FIRST: let  be any string of grammar symbols. FIRST(  ) is the set of terminals that begin the strings derived from . If   *  then  is also in FIRST(  ). FOLLOW: let A be a nonterminal, FOLLOW(A) is the set of terminals {a} that can appear immediately to the right of A in some sentential form, i.e., there exists a derivation of the form S  *  Aa  for some  and . If A can be the rightmost symbol in some sentential form, then FOLLOW(S) is also in FOLLOW(A). If A is the start symbol, then $ is in FOLLOW(A). How to compute FIRST and FOLLOW?

14 14 FIRST(X) To compute FIRST(X) for all symbols X Rules: 1.If t is a terminal, then FIRST(t) is {t}. 2.If X  , then add  to FIRST(X) 3.If X  A1 … An  and   FIRST(Ai), for all i : 1  i  n do add FIRST(  ) to FIRST(X) 4.For each X  A1 … An s.t.   FIRST(Ai), 1  i  n do add  to FIRST(X) 5.repeat steps 3 & 4 until no FIRST sets can be grown

15 15 Example for FIRST Given the grammar E  TE ’ E ’  + TE ’ |  T  FT ’ T ’  *FT ’ |  F  (E) | id Computer the FIRST sets FIRST( ( ) = { ( } FIRST( E )=FIRST(T)=FIRST(F) = { (, id } FIRST( ) ) = { ) } FIRST( E ’ ) = {+,  } FIRST( id ) = { id } FIRST( T ’ ) = {*,  } FIRST( + ) = { + } FIRST( * ) = { * }

16 16 FOLLOW(A) To compute FOLLOW(A) for all nonterminal A. Rules: If S is the start symbol then $  FOLLOW(S) If A   D β, then everything in FIRST(β) except  is placed in FOLLOW(D). If A   D or A   D β where   FIRST(β), then everything in FOLLOW(A) is in FOLLOW(D).

17 17 Example for FOLLOW(A) Given the grammar E  TE ’ E ’  + TE ’ |  T  FT ’ T ’  *FT ’ |  F  (E) | id Computer the FIRST sets FIRST( ( ) = { ( } FIRST( E )=FIRST(T)=FIRST(F) = { (, id } FIRST( ) ) = { ) } FIRST( E ’ ) = {+,  } FIRST( id ) = { id } FIRST( T ’ ) = {*,  } FIRST( + ) = { + } FIRST( * ) = { * } Computer the FOLLOW sets FOLLOW( E ) = FOLLOW(E ’ ) = { ), $} FOLLOW( T ) = FOLLOW(T ’ ) = {+, ), $} FOLLOW( F ) = { *,+, ), $}

18 Construction of Parse Table For each production A  of grammar 1. For each terminal a  First(  ), add A  to M[A, a]; 2. If   First(  ) then for each terminal b  Follow(A), add A  to M[A, b]; 3. If   First(  ) and $  Follow(A), add A  to M[A, $]; Idea behind: If production A  where a  First(  ) ( if A is top of stack and a is the input symbol) then replace A by  in the stack else if  *  then expand A by  if current input symbol a  Follow(A)

19 19 LL(1) parsing The recursive descent method is a special case of so- called LL(k) parsing. scan the input string from Left to right, apply productions to the Leftmost non-terminal in the sentential form we are manipulating, and look ahead only as far as the next k terminals in the input string. LL(1) parsing is the most common form of LL(k) parsing in practice. A parse table using the above method without multi- defined entries is the parsing table for LL(1).

20 If grammar is left recursive or ambiguous, M[A,a] would have multiple entries  Given a grammar G, G is LL(1) if for every rule A   |  1. There exists no terminal a, such that a  First(  ) and also First(  ); 2. At most one of the  and  can derive the empty string; 3. If  derives the empty string then  does not derive any string beginning with a terminal in FOLLOW(A). If a grammar is LL(1) ?

21 Ambiguous Grammars Some grammars may need more than 1 symbol look ahead (k); However, some grammar are not LL regardless of how the grammar is changed: S  if C then S | if C then S else S | a (other stmts) C  b Change to: S  if C then S X | a X  else S |  C  b “else”  FIRST(X) FRIST(X) -   FOLLOW(S) X  else … |  “else”  FOLLOW(X) Problem sentence “if b then if b then a else a”

22 LL(1) parsers operate in linear time and need linear space relative to the length of input because Time – each input symbol is processed constant number of times Space – stack is smaller than the input But, by changing the grammar, it might make the other phases of the compiler more difficult Hard to determine semantics and generate code Complexity of LL(1) Parser

23 Summary A non-recursive predictive parser maintains an input buffer, a stack and a parsing table. The parsing table is constructed using two functions: FIRST and FOLLOW A set of rules have been introduced to get FIRST and FOLLOW Based on FIRST and FOLLOW, how to construct paring table? LL(1) parsing What is LL(1)? Is a grammar LL(1)? The complexity of LL(1)


Download ppt "COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ."

Similar presentations


Ads by Google