Lecture 8: Top-Down Parsing

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

Parsing II : Top-down Parsing
lec02-parserCFG March 27, 2017 Syntax Analyzer
Compiler Construction
Top-Down Parsing.
Parsing III (Eliminating left recursion, recursive descent parsing)
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Top-Down Parsing.
MIT Top-Down Parsing Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
Review: –How do we define a grammar (what are the components in a grammar)? –What is a context free grammar? –What is the language defined by a grammar?
Top-Down Parsing - recursive descent - predictive parsing
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Parsing III (Top-down parsing: recursive descent & LL(1) )
-Mandakinee Singh (11CS10026).  What is parsing? ◦ Discovering the derivation of a string: If one exists. ◦ Harder than generating strings.  Two major.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Parsing III (Top-down parsing: recursive descent & LL(1) ) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Comp 311 Principles of Programming Languages Lecture 3 Parsing Corky Cartwright August 28, 2009.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Top-Down Parsing.
Syntax Analyzer (Parser)
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #6 Parsing.
Parsing III (Top-down parsing: recursive descent & LL(1) )
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Comp 411 Principles of Programming Languages Lecture 3 Parsing
Compiler Construction Parsing Part I
Parsing #1 Leonidas Fegaras.
lec02-parserCFG May 8, 2018 Syntax Analyzer
Parsing — Part II (Top-down parsing, left-recursion removal)
Parsing III (Top-down parsing: recursive descent & LL(1) )
Lexical and Syntax Analysis
Lecture #12 Parsing Types.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Parsing IV Bottom-up Parsing
Table-driven parsing Parsing performed by a finite state machine.
Parsing — Part II (Top-down parsing, left-recursion removal)
Top-down parsing cannot be performed on left recursive grammars.
Syntax Analysis Chapter 4.
Chapter 4 Top-Down Parsing Part-1 September 8, 2018
Top-Down Parsing.
4 (c) parsing.
Parsing Techniques.
Top-Down Parsing CS 671 January 29, 2008.
CS 540 George Mason University
Syntax Analysis source program lexical analyzer tokens syntax analyzer
Compiler Design 7. Top-Down Table-Driven Parsing
Lecture 7: Introduction to Parsing (Syntax Analysis)
R.Rajkumar Asst.Professor CSE
Programming Language Syntax 5
LL and Recursive-Descent Parsing Hal Perkins Autumn 2011
Parsing IV Bottom-up Parsing
Parsing — Part II (Top-down parsing, left-recursion removal)
LL and Recursive-Descent Parsing
Compiler Construction
LL and Recursive-Descent Parsing Hal Perkins Autumn 2009
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Predictive Parsing Program
LL and Recursive-Descent Parsing Hal Perkins Winter 2008
lec02-parserCFG May 27, 2019 Syntax Analyzer
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

Lecture 8: Top-Down Parsing Front-End Back-End Source code IR Object code Lexical Analysis Syntax Analysis Parsing: Context-free syntax is expressed with a context-free grammar. The process of discovering a derivation for some sentence. Today’s lecture: Top-down parsing 4-Dec-18 COMP36512 Lecture 8

Recursive-Descent Parsing 1. Construct the root with the starting symbol of the grammar. 2. Repeat until the fringe of the parse tree matches the input string: Assuming a node labelled A, select a production with A on its left-hand-side and, for each symbol on its right-hand-side, construct the appropriate child. When a terminal symbol is added to the fringe and it doesn’t match the fringe, backtrack. Find the next node to be expanded. The key is picking the right production in the first step: that choice should be guided by the input string. Example: 1. Goal  Expr 5. Term  Term * Factor 2. Expr  Expr + Term 6. | Term / Factor 3. | Expr – Term 7. | Factor 4. | Term 8. Factor  number 9. | id 4-Dec-18 COMP36512 Lecture 8

Example: Parse x-2*y Goal Term Expr - x Factor y * 2 Steps (one scenario from many) Other choices for expansion are possible: Wrong choice leads to non-termination! This is a bad property for a parser! Parser must make the right choice! 4-Dec-18 COMP36512 Lecture 8

Left-Recursive Grammars Definition: A grammar is left-recursive if it has a non-terminal symbol A, such that there is a derivation AAa, for some string a. A left-recursive grammar can cause a recursive-descent parser to go into an infinite loop. Eliminating left-recursion: In many cases, it is sufficient to replace AAa | b with A bA' and A' aA' |  Example: Sum  Sum+number | number would become: Sum  number Sum' Sum'  +number Sum' |  4-Dec-18 COMP36512 Lecture 8

Eliminating Left Recursion Applying the transformation to the Grammar of the Example in Slide 2 we get: Expr  Term Expr' Expr'  +Term Expr' | – Term Expr' |  Term  Factor Term' Term'  *Factor Term' | / Factor Term' |  (Goal  Expr and Factor  number | id remain unchanged) Non-intuitive, but it works! General algorithm: works for non-cyclic, no -productions grammars 1. Arrange the non-terminal symbols in order: A1, A2, A3, …, An 2. For i=1 to n do for j=1 to i-1 do I) replace each production of the form AiAj with the productions Ai 1  | 2  | … | k  where Aj 1 | 2 | … | k are all the current Aj productions II) eliminate the immediate left recursion among the Ai 4-Dec-18 COMP36512 Lecture 8

Where are we? We can produce a top-down parser, but: if it picks the wrong production rule it has to backtrack. Idea: look ahead in input and use context to pick correctly. How much lookahead is needed? In general, an arbitrarily large amount. Fortunately, most programming language constructs fall into subclasses of context-free grammars that can be parsed with limited lookahead. 4-Dec-18 COMP36512 Lecture 8

Predictive Parsing Basic idea: FIRST sets: The LL(1) property: For any production A  a | b we would like to have a distinct way of choosing the correct production to expand. FIRST sets: For any symbol A, FIRST(A) is defined as the set of terminal symbols that appear as the first symbol of one or more strings derived from A. E.g. (grammar in Slide 5): FIRST(Expr' )={+,-,}, FIRST(Term' )={*,/,}, FIRST(Factor)={number, id} The LL(1) property: If Aa and Ab both appear in the grammar, we would like to have: FIRST(a)FIRST(b) = . This would allow the parser to make a correct choice with a lookahead of exactly one symbol! The Grammar of Slide 5 has this property! 4-Dec-18 COMP36512 Lecture 8

No backtracking is needed! Recursive Descent Predictive Parsing (a practical implementation of the Grammar in Slide 5) Main() TPrime() token=next_token(); if (token=='*' or '/') then if (Expr()!=false) token=next_token() then <next_compilation_step> if (Factor()==false) else return false; then result=false else if (TPrime()==false) Expr() then result=false if (Term()==false) else result=true then result=false else result=true else if (EPrime()==false) return result then result=false else result=true Factor() return result if (token=='number' or 'id')then token=next_token() EPrime() result=true if (token=='+' or '-') then else token=next_token() report syntax_error if (Term()==false) result=false then result=false return result elseif (EPrime()==false) else result=true else result=true /*  */ return result Term() if (Factor()==false) No backtracking is needed! check :-) 4-Dec-18 COMP36512 Lecture 8

Left Factoring What if my grammar does not have the LL(1) property? Sometimes, we can transform a grammar to have this property. Algorithm: 1. For each non-terminal A, find the longest prefix, say a, common to two or more of its alternatives 2. if a then replace all the A productions, Aab1|ab2|ab3|...|abn|, where  is anything that does not begin with a, with AaZ |  and Zb1|b2|b3|...|bn Repeat the above until no common prefixes remain Example: A  ab1 | ab2 | ab3 would become A  aZ and Z  b1|b2|b3 Note the graphical representation: b1 ab1 A aZ b2 A ab2 b3 4-Dec-18 COMP36512 Lecture 8 ab3

Example Applying left factoring: Goal  Expr Term  Factor * Term (NB: this is a different grammar from the one in Slide 2) Goal  Expr Term  Factor * Term Expr  Term + Expr | Factor / Term | Term – Expr | Factor | Term Factor  number | id We have a problem with the different rules for Expr as well as those for Term. In both cases, the first symbol of the right-hand side is the same (Term and Factor, respectively). E.g.: FIRST(Term)=FIRST(Term)FIRST(Term)={number, id}. FIRST(Factor)=FIRST(Factor)FIRST(Factor)={number, id}. Applying left factoring: Expr  Term Expr´ FIRST(+)={+}; FIRST(–)={–}; FIRST()={}; Expr´ + Expr | – Expr |  FIRST(–) FIRST(+)  FIRST()= = Term  Factor Term´ FIRST(*)={*}; FIRST(/)={/}; FIRST()={}; Term´ * Term | / Term |  FIRST(*) FIRST(/)  FIRST()= = 4-Dec-18 COMP36512 Lecture 8

Example (cont.) The next symbol determines each choice 1. Goal  Expr 2. Expr  Term Expr´ 3. Expr´ + Expr 4. | - Expr 5. |  6. Term  Factor Term´ 7. Term´ * Term 8. | / Term 9. |  10. Factor  number 11. | id The next symbol determines each choice correctly. No backtracking needed. 4-Dec-18 COMP36512 Lecture 8

Conclusion Next time: Bottom-Up Parsing Top-down parsing: recursive with backtracking (not often used in practice) recursive predictive Nonrecursive Predictive Parsing is possible too: maintain a stack explicitly rather than implicitly via recursion and determine the production to be applied using a table (Aho, pp.186-190). Given a Context Free Grammar that doesn’t meet the LL(1) condition, it is undecidable whether or not an equivalent LL(1) grammar exists. Next time: Bottom-Up Parsing Reading: Aho2, Sections 4.3.3, 4.3.4, 4.4; Aho1, pp. 176-178, 181-185; Grune pp.117-133; Hunter pp. 72-93; Cooper, Section 3.3. COMP36512 Lecture 8