COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.

Slides:



Advertisements
Similar presentations
Compiler Construction
Advertisements

Lesson 8 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Joey Paquet, 2000, 2002, 2008, Lecture 7 Bottom-Up Parsing II.
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
YANGYANG 1 Chap 5 LL(1) Parsing LL(1) left-to-right scanning leftmost derivation 1-token lookahead parser generator: Parsing becomes the easiest! Modifying.
Mooly Sagiv and Roman Manevich School of Computer Science
Predictive Parsing l Find derivation for an input string, l Build a abstract syntax tree (AST) –a representation of the parsed program l Build a symbol.
Top-Down Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Professor Yihjia Tsai Tamkang University
Top-Down Parsing.
– 1 – CSCE 531 Spring 2006 Lecture 7 Predictive Parsing Topics Review Top Down Parsing First Follow LL (1) Table construction Readings: 4.4 Homework: Program.
CPSC 388 – Compiler Design and Construction
COP4020 Programming Languages Computing LL(1) parsing table Prof. Xin Yuan.
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
Review: –How do we define a grammar (what are the components in a grammar)? –What is a context free grammar? –What is the language defined by a grammar?
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
Chapter 5 Top-Down Parsing.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
1 Compiler Construction Syntax Analysis Top-down parsing.
Lesson 5 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
CSI 3120, Syntactic analysis, page 1 Syntactic Analysis and Parsing Based on A. V. Aho, R. Sethi and J. D. Ullman Compilers: Principles, Techniques and.
Syntax and Semantics Structure of programming languages.
COP4020 Programming Languages Parsing Prof. Xin Yuan.
Parsing Top-Down.
1 Compiler Construction Syntax Analysis Top-down parsing.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Top-Down Parsing.
Top-Down Predictive Parsing We will look at two different ways to implement a non- backtracking top-down parser called a predictive parser. A predictive.
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
COMP 3438 – Part II - Lecture 4 Syntax Analysis I Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Compiler Construction
Programming Languages Translator
Context free grammars Terminals Nonterminals Start symbol productions
Unit-3 Bottom-Up-Parsing.
Lecture #12 Parsing Types.
Table-driven parsing Parsing performed by a finite state machine.
Syntactic Analysis and Parsing
Compiler Construction
Top-down parsing cannot be performed on left recursive grammars.
CS 404 Introduction to Compiler Design
Top-Down Parsing.
Subject Name:COMPILER DESIGN Subject Code:10CS63
Top-Down Parsing CS 671 January 29, 2008.
Lecture 7 Predictive Parsing
Compiler Design 7. Top-Down Table-Driven Parsing
Top-Down Parsing Identify a leftmost derivation for an input string
Top-Down Parsing The parse tree is created top to bottom.
Bottom Up Parsing.
Syntax Analysis - Parsing
Lecture 7 Predictive Parsing
Kanat Bolazar February 16, 2010
Nonrecursive Predictive Parsing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Predictive Parsing Program
Presentation transcript:

COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.

Overview of the Subject (COMP 3438) Overview of Unix Sys. Prog. ProcessFile System Overview of Device Driver Development Character Device Driver Development Introduction to Block Device Driver Overview of Complier Design Lexical Analysis Syntax Analysis (HW #4) Part I: Unix System Programming (Device Driver Development) Part II: Compiler Design Course Organization (This lecture is in red)

Outline Part I: Introduction to Syntax Analysis 1. Input (Tokens) and Output (Parse Tree) 2. How to specify syntax? Context Free Grammar (CFG) 3. How to obtain parse tree? CFG  Remove left recursion, left factoring, ambiguity  LL (Leftmost Derivation) CFG  (Remove ambiguity)  LR (Reverse Rightmost Derivation) Part II: Context Free Grammar, Parse Tree and Ambiguity Part III: Bottom-up Paring (LR) SLR, Canonical LR, LALR Part III: Top-down Parsing (LL) Left Recursion, Left factoring (Tutorial) Recursive-Decent Paring Predictive Parsing (without backtracking) –HW4 Nonrecursive Predictive Parsing Software Tool: yacc (Lab)

Part IV: Predictive Parsing & Nonrecursive Predictive Parsing

Predictive Parser A special case of recursive-decent parser Do NOT need backtracking How to get the grammar that can be parsed by a predictive parser: Remove left recursion Left factoring the resulting grammar

6 In order to eliminate backtracking, we must know, given input symbol a and nonterminal A, which alternative of A   1 |  2 |... |  n is the right one that derives a string beginning with a. That is, we must be able to detect the proper alternative by looking at only the FIRST symbol it derives. Predictive parsing relies on the information about what FIRST symbols can be generated by the right side of a production Predictive parsers

7 Let  be the right side of a production for nonterminal A, i.e., A   is a production. We define FIRST(  ) to be the set of tokens {a} that appear as the FIRST symbols of the strings generated from , i.e. FIRST(  ) = {a |  => * a  } Consider again the example grammar G E  TE' E'  +TE' |  T  FT' T'  *FT' |  F  ( E ) | id Then FIRST(E) = FIRST(T) = FIRST(F) = {(, id} FIRST(E ’ )={+,  } FISRT(T ’ ) = {*,  } For two productions A   and A  , predictive parsing requires FIRST(  ) and FIRST(  ) to be disjoint so that the lookahead symbol can be used to decide which production to use. Predictive parsers

8 For each nonterminal, we have one corresponding procedures Each procedure does two things: select a production to use based on the lookahead symbol. Use the production with right side  if the lookahead symbol is in FIRST(  ). A production with  on the right side is used if the lookahead symbol is not in FIRST set for any other right hand side. apply a production by mimicking the right side. Call the procedure for the nonterminal, and if a token matches the lookahead symbol, the next input token is read. If at some point the token in the production does not match the lookahead symbol, an error is declared. The parser begins with the procedure for the start symbol. match terminal symbols against input, and make a potentially recursive procedure call whenever it has to expand a nonterminal. Implement a predictive parser

9 The above approach works only if the given grammar does not have nondeterminism, i.e, there is no conflict between right sides for any lookahead symbol. If ambiguity occurs, we try to resolve it in an ad-hoc way. If the nondeterminism cannot be eliminated, use recursive-descent parser with backtracking to systematically try all possibilities Predictive parsers

10 Non-recursive predictive parsers If we don ’ t have a recursive language for writing the parser or the overhead of recursive calls is too much, a non-recursive version - a tabular implementation of predictive parsing - can be used The parser maintains an input buffer, a stack and a parsing table a + b $ Predictive Parsing Program Parsing Table M XYZ$XYZ$ OUTPUT Stack A two-dimensional array: M[A,a] where A is nonterminal, and a is terminal or symbol $ Input buffer Contain input tokens with “$” (denoting the end). A sequence of grammar sym- bols with $ on the bottom.

11 The parser is controlled by a program that behaves as follows. Given the top stack symbol X and the current input symbol a, If X = a = $, stops and announces successful completion of parsing. If X = a  $, pops X off the stack and advances the input pointer to the next input symbol. If X is a nonterminal, looks up entry M[X, a] of parsing table.  If M[X,a] = {X  UVW}, replaces X on top of stack by WVU (U on top).  If M[X, a] = error, calls an error recovery routine. Non-recursive predictive parsers X... $ U V W... $

12 input id + id * id Grmmar: E  TE' E'  +TE' |  T  FT’ T'  *FT' |  F  ( E ) | id Example

13 Construct predictive parsing table Use two functions, FIRST and FOLLOW FIRST: let  be any string of grammar symbols. FIRST(  ) is the set of terminals that begin the strings derived from . If   *  then  is also in FIRST(  ). FOLLOW: let A be a nonterminal, FOLLOW(A) is the set of terminals {a} that can appear immediately to the right of A in some sentential form, i.e., there exists a derivation of the form S  *  Aa  for some  and . If A can be the rightmost symbol in some sentential form, then FOLLOW(S) is also in FOLLOW(A). If A is the start symbol, then $ is in FOLLOW(A). How to compute FIRST and FOLLOW?

14 FIRST(X) To compute FIRST(X) for all symbols X Rules: 1.If t is a terminal, then FIRST(t) is {t}. 2.If X  , then add  to FIRST(X) 3.If X  A1 … An  and   FIRST(Ai), for all i : 1  i  n do add FIRST(  ) to FIRST(X) 4.For each X  A1 … An s.t.   FIRST(Ai), 1  i  n do add  to FIRST(X) 5.repeat steps 3 & 4 until no FIRST sets can be grown

15 Example for FIRST Given the grammar E  TE ’ E ’  + TE ’ |  T  FT ’ T ’  *FT ’ |  F  (E) | id Computer the FIRST sets FIRST( ( ) = { ( } FIRST( E )=FIRST(T)=FIRST(F) = { (, id } FIRST( ) ) = { ) } FIRST( E ’ ) = {+,  } FIRST( id ) = { id } FIRST( T ’ ) = {*,  } FIRST( + ) = { + } FIRST( * ) = { * }

16 FOLLOW(A) To compute FOLLOW(A) for all nonterminal A. Rules: If S is the start symbol then $  FOLLOW(S) If A   D β, then everything in FIRST(β) except  is placed in FOLLOW(D). If A   D or A   D β where   FIRST(β), then everything in FOLLOW(A) is in FOLLOW(D).

17 Example for FOLLOW(A) Given the grammar E  TE ’ E ’  + TE ’ |  T  FT ’ T ’  *FT ’ |  F  (E) | id Computer the FIRST sets FIRST( ( ) = { ( } FIRST( E )=FIRST(T)=FIRST(F) = { (, id } FIRST( ) ) = { ) } FIRST( E ’ ) = {+,  } FIRST( id ) = { id } FIRST( T ’ ) = {*,  } FIRST( + ) = { + } FIRST( * ) = { * } Computer the FOLLOW sets FOLLOW( E ) = FOLLOW(E ’ ) = { ), $} FOLLOW( T ) = FOLLOW(T ’ ) = {+, ), $} FOLLOW( F ) = { *,+, ), $}

Construction of Parse Table For each production A  of grammar 1. For each terminal a  First(  ), add A  to M[A, a]; 2. If   First(  ) then for each terminal b  Follow(A), add A  to M[A, b]; 3. If   First(  ) and $  Follow(A), add A  to M[A, $]; Idea behind: If production A  where a  First(  ) ( if A is top of stack and a is the input symbol) then replace A by  in the stack else if  *  then expand A by  if current input symbol a  Follow(A)

19 LL(1) parsing The recursive descent method is a special case of so- called LL(k) parsing. scan the input string from Left to right, apply productions to the Leftmost non-terminal in the sentential form we are manipulating, and look ahead only as far as the next k terminals in the input string. LL(1) parsing is the most common form of LL(k) parsing in practice. A parse table using the above method without multi- defined entries is the parsing table for LL(1).

If grammar is left recursive or ambiguous, M[A,a] would have multiple entries  Given a grammar G, G is LL(1) if for every rule A   |  1. There exists no terminal a, such that a  First(  ) and also First(  ); 2. At most one of the  and  can derive the empty string; 3. If  derives the empty string then  does not derive any string beginning with a terminal in FOLLOW(A). If a grammar is LL(1) ?

Ambiguous Grammars Some grammars may need more than 1 symbol look ahead (k); However, some grammar are not LL regardless of how the grammar is changed: S  if C then S | if C then S else S | a (other stmts) C  b Change to: S  if C then S X | a X  else S |  C  b “else”  FIRST(X) FRIST(X) -   FOLLOW(S) X  else … |  “else”  FOLLOW(X) Problem sentence “if b then if b then a else a”

LL(1) parsers operate in linear time and need linear space relative to the length of input because Time – each input symbol is processed constant number of times Space – stack is smaller than the input But, by changing the grammar, it might make the other phases of the compiler more difficult Hard to determine semantics and generate code Complexity of LL(1) Parser

Summary A non-recursive predictive parser maintains an input buffer, a stack and a parsing table. The parsing table is constructed using two functions: FIRST and FOLLOW A set of rules have been introduced to get FIRST and FOLLOW Based on FIRST and FOLLOW, how to construct paring table? LL(1) parsing What is LL(1)? Is a grammar LL(1)? The complexity of LL(1)