Parsing II : Top-down Parsing Lecture 7 CS 4318/5331 Apan Qasem Texas State University Spring 2015 *some slides adopted from Cooper and Torczon.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

Parsing II : Top-down Parsing
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Prof. Bodik CS 164 Lecture 81 Grammars and ambiguity CS164 3:30-5:00 TT 10 Evans.
Parsing III (Eliminating left recursion, recursive descent parsing)
Prof. Bodik CS 164 Lecture 61 Building a Parser II CS164 3:30-5:00 TT 10 Evans.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #5 Introduction.
CPSC Compiler Tutorial 3 Parser. Parsing The syntax of most programming languages can be specified by a Context-free Grammar (CGF) Parsing: Given.
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.
Compiler Construction Parsing Part I
EECS 6083 Intro to Parsing Context Free Grammars
8/19/2015© Hal Perkins & UW CSEC-1 CSE P 501 – Compilers Parsing & Context-Free Grammars Hal Perkins Winter 2008.
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Introduction to Parsing Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
-Mandakinee Singh (11CS10026).  What is parsing? ◦ Discovering the derivation of a string: If one exists. ◦ Harder than generating strings.  Two major.
CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring CSE3302 Programming Languages,
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
1 CIS 461 Compiler Design and Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #8 Parsing Techniques.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Top-Down Parsing.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #6 Parsing.
Parsing III (Top-down parsing: recursive descent & LL(1) )
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Introduction to Parsing
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
CSE 3302 Programming Languages
lec02-parserCFG May 8, 2018 Syntax Analyzer
Parsing — Part II (Top-down parsing, left-recursion removal)
Parsing & Context-Free Grammars
CS510 Compiler Lecture 4.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Parsing IV Bottom-up Parsing
Parsing — Part II (Top-down parsing, left-recursion removal)
Parsing Techniques.
Parsing & Context-Free Grammars Hal Perkins Autumn 2011
(Slides copied liberally from Ruth Anderson, Hal Perkins and others)
Top-Down Parsing CS 671 January 29, 2008.
Lecture 7: Introduction to Parsing (Syntax Analysis)
Lecture 8: Top-Down Parsing
R.Rajkumar Asst.Professor CSE
Parsing IV Bottom-up Parsing
Introduction to Parsing
Introduction to Parsing
Parsing — Part II (Top-down parsing, left-recursion removal)
Parsing & Context-Free Grammars Hal Perkins Summer 2004
Parsing & Context-Free Grammars Hal Perkins Autumn 2005
Presentation transcript:

Parsing II : Top-down Parsing Lecture 7 CS 4318/5331 Apan Qasem Texas State University Spring 2015 *some slides adopted from Cooper and Torczon

Announcements

Review Parsing Goals Context-free grammars Derivations Sequence of production rules leading to a sentence Leftmost derivations Rightmost derivations Parse Trees Tree representation of a derivation Transforms into IR Precedence in languages Can manipulate grammar to enforce precedence Cannot do this with REs

Review Practice with CFG Based on the expression grammar, show the derivation of x + 2 * 17 – y / 31 2 * 2 * 2 * 2 Extend the expression grammar to add parentheses add mod operation Based on the extended grammar show the derivation of (x + y) * 2 - z (a + b + c + d)

Chomsky Hierarchy RL CFL CSL Unrestricted LR(1) LL(1) Noam Chomsky Three Models for the Description of Language, 1956 Turing machines Recursively enumerable DFA/NFA PDA Many parsers LBA

Today Top-down parsing algorithm Issues in parsing Ambiguity Backtracking Left Recursion

Leftmost Derivation for x – 2 * y

Another Derivation for x – 2 * y What kind of derivation is this?

Two Leftmost Derivations for x – 2 * y Original choiceNew choice

Multiple Leftmost Derivations Having multiple leftmost (or multiple rightmost) derivation is a problem for syntax analysis Why? Implies non-determinism Difficult to automate

Ambiguous Grammar If a grammar has more than one leftmost derivation for a single sentential form, the grammar is ambiguous If a grammar has more than one rightmost derivation for a single sentential form, the grammar is ambiguous May have a leftmost and rightmost derivation for a single sentential form even in an unambiguous grammar

C++ Example #include using namespace std; int score; int main() { cin >> score; string grade = “B”; if (score <= 100) if (score > 90) grade = “A”; else grade = “A+”; cout << grade << endl; return 0; } What is the output when input is 95? What is the output when input is 105? What is the output when input is 85? No syntactic way to specify how we want the else and if paired up C++ says, else matches “nearest” if without an else

The Dangling else The dangling else is a classic example of ambiguity C++ CFG rules for the if statement Stmt  if Expr then Stmt | if Expr then Stmt else Stmt | … other stmts …

Ambiguity : Derivations Input: if E1 then if E2 then S1 else S2 First derivation stmt 2 if expr then stmt else stmt if E1 then stmt else stmt 1 If E1 then if expr then stmt else stmt if E1 then if E2 then S1 else S2 Second derivation stmt 1 if expr then stmt if E1 then stmt 2 if E1 then if expr then stmt else stmt if E1 then if E2 then S1 else S2

Ambiguity : Parse Trees then else if then if E1E1 E2E2 S2S2 S1S1 production 2, then production 1 then if then if E1E1 E2E2 S1S1 else S2S2 production 1, then production 2 Input: if E1 then if E2 then S1 else S2

Deeper Ambiguity Ambiguity usually refers to confusion in the CFG Overloading can create deeper ambiguity a = f(17) In some languages (e.g., Fortran), f could be either a function or a subscripted variable Disambiguating this one requires context Need values of declarations Really an issue of type, not context-free syntax Requires an extra-grammatical solution (not in CFG ) Must handle these with a different mechanism Step outside grammar rather than use a more complex grammar

Ambiguity in C int main() { { foo(); } foo() { int x = 1; } foo(); return 0; } int main() { { foo(); } foo() { int x = 1; } foo(); return 0; } Will this compile? // prototype or call?

Dealing with Ambiguity Ambiguity arises from two distinct sources Confusion in the context-free syntax (if-then-else) Confusion that requires context to resolve (overloading) Resolving ambiguity To remove context-free ambiguity, rewrite the grammar To handle context-sensitive ambiguity takes cooperation Knowledge of declarations, types, … Accept a superset of L(G) & check it by other means This is a language design problem Sometimes, the compiler writer accepts an ambiguous grammar Parsing techniques that “do the right thing” i.e., always select the same derivation

Parsing Goal Is there a derivation that produces a string of terminals that matches the input string? Answer this question by attempting to build a parse tree

Two Approaches to Parsing Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves At each step pick a re-write rule to apply When the sentential form consists of only terminals check if it matches the input Bottom-up parsers (LR(1)) Start at the leaves and grow toward root At each step consume input string and find a matching rule to create parent node in parse tree When a node with the start symbol is created we are done Very high-level sketch, Lots of holes Plug-in the holes as we go along which one’s more intuitive?

Top-down Parsing Algorithm 1.Construct the root node of the parse tree with the start symbol 2.Repeat until input string matches fringe Pick a re-write rule to apply Start symbol Also called goal symbol (comes from bottom-up parsing) Fringe Leaf nodes from left to right (order is important) At any stage of the construction they can be labeled with both terminals and non-terminals

Top-down Parsing Algorithm 1.Construct the root node of the parse tree with the start symbol 2.Repeat until input string matches fringe Pick a re-write rule to apply Need to expand this step

Top-down Parsing Algorithm 1.Construct the root node of the parse tree with the start symbol 2.Repeat 1.Pick the leftmost node on the fringe labeled with an NT to expand 2.If the expansion adds a terminal to the leftmost node of the fringe compare terminal with input symbol if there is a match move the cursor on the input string until fringe consists of only terminals What type of derivation is this? What do we do if there is no match? What if it doesn’t?

Selecting The Right Rules What re-write rule do we pick? Can specify leftmost or rightmost NT Sentential Form: a B C d b A a B C d b A (Leftmost : Pick B to re-write) A B C d b A (Rightmost : Pick A to re-write) Solves one problem : which NT to re-write But we can still have multiple options for each NT B  a | b | c Grammar does not need to be ambiguous for this to happen Different choices may lead to different strings in (or not in) the language What happens if we pick the wrong re-write rule?

Back to the Expression Grammar Added a unique start symbol Enforced arithmetic precedence

Example : Problematic Parse of x – 2 * y S Expr Term + Expr Term Fact. Leftmost derivation, choose productions in an order that exposes problems

Example : Problematic Parse of x – 2 * y S Expr Term + Expr Term Fact. Followed legal production rules but “–” doesn’t match “+” The parser must backtrack to the second re-write applied

Example : Problematic Parse of x – 2 * y S Expr Term – Expr Term Fact.

Example : Problematic Parse of x – 2 * y S Expr Term – Expr Term Fact. This time “--” and “--” match Now, we need to expand Term, the last NT on the fringe

Example : Problematic Parse of x – 2 * y S Expr Term – Expr Term Fact. Fact.

Example : Problematic Parse of x – 2 * y S Expr Term - Expr Term Fact. Fact. “2” matches “2” We have more input, but no NT s left to expand The expansion terminated too soon This is also a problem !

Example : Problematic Parse of x – 2 * y S Expr Term – Expr Term Fact. Fact. Term Fact. * This time, we matched and consumed all the input Success!

Backtracking Whenever we have multiple production rules for the same NT there is a possibility that our parser might choose the wrong rule To get around this problem most parsers will do backtracking If the parser realizes that there is no match, it will go back and try other options Only when all the options have been tried out the parser will reject an input string In a way, the parser is simulating all possible paths Is this approach similar to something we have seen before?

Top-down Parsing Algorithm with Backtracking Construct the root node of the parse tree with the start symbol Repeat Pick the leftmost node on the fringe labeled with an NT to expand Let NT = A, select a production with A on its lhs and for each symbol on its rhs, construct the appropriate child If the expansion adds a terminal to the leftmost node of the fringe compare terminal with input symbol If there is a match move the cursor on the input string else backtrack Until fringe consists of only terminals Another stab at the algorithm

Another Possible Parse of x – 2 * y This doesn’t terminate Wrong choice of expansion leads to non-termination Non-termination is a bad property for a parser Parser must make the right choice consuming no input! Can’t backtrack

Left Recursion Top-down parsers cannot handle left-recursive grammars Formally, A grammar is left recursive if  A  NT such that  a sequence of productions A  + A , for some string   (NT  T ) + Example B  Babcd C  Dab D  Ecd E  Cde Our expression grammar is left recursive This can lead to non-termination in a top-down parser For a top-down parser, any recursion must be right recursion We would like to convert the left recursion to right recursion What do we do if we are doing a rightmost derivation?

Eliminating Left Recursion To remove left recursion, we can transform the grammar Consider a grammar fragment of the form Foo  Foo  |  where neither  nor  start with Foo We can rewrite this as Foo   Bar Bar   Bar |  where Bar is a new non-terminal This accepts the same language, but uses only right recursion Generate all  ’s and then  Generate  and then all  ’s

Eliminating Left Recursion The expression grammar contains two cases of left recursion

Eliminating Left Recursion The expression grammar contains two cases of left recursion

Eliminating Left Recursion We can eliminate both cases without changing the language Case 1 Case 2

Eliminating Left Recursion : Expr First step is to identify the  and   = + Term = - Term  = Term Expr  Term Expr’ Expr’  + Term Expr’ | - Term Expr’ | ε Foo  Foo  |  Foo  Foo  |  Foo   Bar Bar   Bar |  Foo   Bar Bar   Bar | 

Eliminating Left Recursion : Expr First step is to identify the  and   = * Factor = / Factor  = Factor Term  Factor Term’ Term’  * Factor Term’ | / Factor Term’ | ε Foo  Foo  |  Foo  Foo  |  Foo   Bar Bar   Bar |  Foo   Bar Bar   Bar | 

Eliminating Left Recursion Foo   Bar Bar   Bar |  Foo  Foo  |  Identify  and   = + Term = - Term  = Term Identify  and   = * Factor = / Factor  = Factor

Eliminating Left Recursion This grammar uses only right recursion Retains the original left associativity This grammar is correct, if somewhat non-intuitive. A top-down parser will terminate using it A top-down parser may need to backtrack with it

Eliminating General Left Recursion The  -  transformation eliminates immediate left recursion Need a generalized method arrange the NTs into some order A 1, A 2, …, A n for i  1 to n for s  1 to i – 1 replace each production A i  A s  with A i   1  2  k , where A s   1  2  k are all the current productions for A s eliminate any immediate left recursion on A i using the direct transformation This assumes that the initial grammar has no cycles (A i  + A i ) no ε productions

Eliminating Left Recursion How does this algorithm work? Impose arbitrary order on the non-terminals Outer loop cycles through NT in order Inner loop ensures that a production expanding A i has no non- terminal A s in its rhs, for s < I Last step in outer loop converts any direct recursion on A i to right recursion using the transformation shown earlier New non-terminals are added at the end of the order and have no left recursion

Example : Eliminating Left Recursion Order of symbols: G, E, T G  E E  E + T E  T T  E - T T  id

Example : Eliminating Left Recursion Order of symbols: G, E, T A i = G G  E E  E + T E  T T  E - T T  id No need to expand No non-terminal preceded G

Example : Eliminating Left Recursion Order of symbols: G, E, T Eliminate immediate left recursion G  E E  E + T E  T T  E - T T  id Identify  and   = + T  = T G  E E  T E' E'  + T E' E'  e T  E - T T  id G  E E  E + T E  T T  E - T T  id

Example : Eliminating Left Recursion Order of symbols: G, E, T A i = E, A s = G No matches for A i  A s G  E E  T E' E'  + T E' E'  e T  E - T T  id No immediate left recursion

Example : Eliminating Left Recursion Order of symbols: G, E, T A i = T, A s = G G  E E  T E' E'  + T E' E'  e T  E - T T  id No matches for A i  A s

Example : Eliminating Left Recursion Order of symbols: G, E, T A i = T, A s = E Match found G  E E  T E' E'  + T E' E'  e T  E - T T  id G  E E  T E' E'  + T E' E'  e T  T E’ - T T  id G  E E  T E' E'  + T E' E'  e T  E - T T  id Substitute

Example : Eliminating Left Recursion Order of symbols: G, E, T Eliminate immediate left recursion G  E E  T E' E'  + T E' E'  e T  T E’ - T T  id G  E E  T E' E'  + T E' E'  e T  id T' T'  E' - T T' T'  e Identify  and   = E’ - T  = id G  E E  T E' E'  + T E' E'  e T  T E’ - T T  id