Top-Down Parsing CS 671 January 29, 2008.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

Parsing III (Eliminating left recursion, recursive descent parsing)
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
Parsing V Introduction to LR(1) Parsers. from Cooper & Torczon2 LR(1) Parsers LR(1) parsers are table-driven, shift-reduce parsers that use a limited.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
MIT Top-Down Parsing Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Syntax and Semantics Structure of programming languages.
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
Parsing III (Top-down parsing: recursive descent & LL(1) )
-Mandakinee Singh (11CS10026).  What is parsing? ◦ Discovering the derivation of a string: If one exists. ◦ Harder than generating strings.  Two major.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Syntax and Semantics Structure of programming languages.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Parsing III (Top-down parsing: recursive descent & LL(1) ) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Top-Down Parsing.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Parsing III (Top-down parsing: recursive descent & LL(1) )
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Syntax and Semantics Structure of programming languages.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Parsing #1 Leonidas Fegaras.
Parsing — Part II (Top-down parsing, left-recursion removal)
Parsing III (Top-down parsing: recursive descent & LL(1) )
Programming Languages Translator
Compiler design Bottom-up parsing Concepts
Lexical and Syntax Analysis
Unit-3 Bottom-Up-Parsing.
Lecture #12 Parsing Types.
Parsing IV Bottom-up Parsing
Table-driven parsing Parsing performed by a finite state machine.
Parsing — Part II (Top-down parsing, left-recursion removal)
Parsing.
CS 404 Introduction to Compiler Design
4 (c) parsing.
Parsing Techniques.
CSC 4181 Compiler Construction Parsing
Syntax Analysis source program lexical analyzer tokens syntax analyzer
4d Bottom Up Parsing.
Lecture 8 Bottom Up Parsing
Compiler Design 7. Top-Down Table-Driven Parsing
Lecture 7: Introduction to Parsing (Syntax Analysis)
Lecture 8: Top-Down Parsing
Bottom Up Parsing.
LL and Recursive-Descent Parsing Hal Perkins Autumn 2011
Parsing IV Bottom-up Parsing
Parsing — Part II (Top-down parsing, left-recursion removal)
LL and Recursive-Descent Parsing
Syntax Analysis - Parsing
4d Bottom Up Parsing.
Kanat Bolazar February 16, 2010
4d Bottom Up Parsing.
Compiler Construction
4d Bottom Up Parsing.
LL and Recursive-Descent Parsing Hal Perkins Autumn 2009
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
LL and Recursive-Descent Parsing Hal Perkins Winter 2008
4d Bottom Up Parsing.
Presentation transcript:

Top-Down Parsing CS 671 January 29, 2008

Where Are We? Do tokens conform to the language syntax? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract Syntax Tree (AST) Lexical Analysis Syntactic Analysis if Semantic Analysis == ; = b a “Hi” Do tokens conform to the language syntax?

Last Time Parse trees vs. ASTs Derivations Grammar ambiguity Leftmost vs. Rightmost Grammar ambiguity

Won’t work on all context-free grammars Parsing What is parsing? Discovering the derivation of a string: If one exists Harder than generating strings Two major approaches Top-down parsing Bottom-up parsing Won’t work on all context-free grammars Properties of grammar determine parse-ability We may be able to transform a grammar

Two Approaches Top-down parsers LL(1), recursive descent Start at the root of the parse tree and grow toward leaves Pick a production & try to match the input Bad “pick”  may need to backtrack Bottom-up parsers LR(1), operator precedence Start at the leaves and grow toward root As input is consumed, encode possible parse trees in an internal state Bottom-up parsers handle a large class of grammars

Grammars and Parsers LL(1) parsers LR(1) parsers Left-to-right input Leftmost derivation 1 symbol of look-ahead LR(1) parsers Rightmost derivation Also: LL(k), LR(k), SLR, LALR, … Grammars that this can handle are called LL(1) grammars Grammars that this can handle are called LR(1) grammars

Top-Down Parsing Start with the root of the parse tree Algorithm: Root of the tree: node labeled with the start symbol Algorithm: Repeat until the fringe of the parse tree matches input string At a node A, select a production for A Add a child node for each symbol on rhs If a terminal symbol is added that doesn’t match, backtrack Find the next node to be expanded (a non-terminal) Done when: Leaves of parse tree match input string (success) All productions exhausted in backtracking (failure)

Example Expression grammar (with precedence) Input string x – 2 * y # Production rule 1 2 3 4 5 6 7 8 expr → expr + term | expr - term | term term → term * factor | term / factor | factor factor → number | identifier

Current position in the input stream Example Current position in the input stream Problem: Can’t match next terminal We guessed wrong at step 2 Rule Sentential form Input string - expr expr  x - 2 * y 2 expr + term  x - 2 * y 3 term + term  x – 2 * y expr + term 6 factor + term  x – 2 * y 8 <id> + term x  – 2 * y - <id,x> + term x  – 2 * y term fact x

Backtracking Rollback productions Choose a different production for expr Continue Rule Sentential form Input string - expr  x - 2 * y 2 expr + term  x - 2 * y 3 term + term  x – 2 * y Undo all these productions 6 factor + term  x – 2 * y 8 <id> + term x  – 2 * y ? <id,x> + term x  – 2 * y

Retrying Problem: - More input to read Another cause of backtracking 2 Rule Sentential form Input string - expr expr  x - 2 * y 2 expr - term  x - 2 * y expr - term 3 term - term  x – 2 * y 6 factor - term  x – 2 * y 8 <id> - term x  – 2 * y term fact - <id,x> - term x –  2 * y 3 <id,x> - factor x –  2 * y 7 <id,x> - <num> x – 2  * y fact 2 x

Successful Parse All terminals match – we’re finished - * y 2 x Rule Sentential form Input string - expr expr  x - 2 * y 2 expr - term  x - 2 * y expr - term 3 term - term  x – 2 * y 6 factor - term  x – 2 * y term * fact 8 <id> - term x  – 2 * y term - <id,x> - term x –  2 * y 4 <id,x> - term * fact x –  2 * y fact y 6 <id,x> - fact * fact x –  2 * y fact 7 <id,x> - <num> * fact x – 2  * y 2 - <id,x> - <num,2> * fact x – 2 *  y x 8 <id,x> - <num,2> * <id> x – 2 * y 

Other Possible Parses Problem: termination Wrong choice leads to infinite expansion (More importantly: without consuming any input!) May not be as obvious as this Our grammar is left recursive Rule Sentential form Input string - expr  x - 2 * y 2 expr + term  x - 2 * y 2 expr + term + term  x – 2 * y 2 expr + term + term + term  x – 2 * y 2 expr + term + term + term + term  x – 2 * y

Left Recursion Formally, Bad news: Good news: What does →* mean? A grammar is left recursive if  a non-terminal A such that A →* A a (for some set of symbols a) Bad news: Top-down parsers cannot handle left recursion Good news: We can systematically eliminate left recursion What does →* mean? A → B x B → A y

Removing Left Recursion Two cases of left recursion: Transform as follows: # Production rule 1 2 3 expr → expr + term | expr - term | term # Production rule 4 5 6 term → term * factor | term / factor | factor # Production rule 1 2 3 4 expr → term expr2 expr2 → + term expr2 | - term expr2 | e # Production rule 4 5 6 term → factor term2 term2 → * factor term2 | / factor term2 | e

Right-Recursive Grammar # Production rule 1 2 3 4 5 6 7 8 9 10 expr → term expr2 expr2 → + term expr2 | - term expr2 | e term → factor term2 term2 → * factor term2 | / factor term2 | e factor → number | identifier We can choose the right production by looking at the next input symbol This is called lookahead BUT, this can be tricky… Two productions with no choice at all All other productions are uniquely identified by a terminal symbol at the start of RHS

Top-Down Parsing Goal: Solution: FIRST sets Given productions A → a | b , the parser should be able to choose between a and b How can the next input token help us decide? Solution: FIRST sets Informally: FIRST(a) is the set of tokens that could appear as the first symbol in a string derived from a Def: x in FIRST(a) iff a →* x g

The LL(1) Property Given A → a and A → b, we would like: FIRST(a)  FIRST(b) =  Parser can make right choice by looking at one lookahead token ..almost..

Example: Calculating FIRST Sets # Production rule 1 2 3 4 5 6 7 8 9 10 11 goal → expr expr → term expr2 expr2 → + term expr2 | - term expr2 | e term → factor term2 term2 → * factor term2 | / factor term2 | e factor → number | identifier FIRST(3) = { + } FIRST(4) = { - } FIRST(5) = { e } FIRST(7) = { * } FIRST(8) = { / } FIRST(9) = { e } FIRST(1) = ? FIRST(1) = FIRST(2) = FIRST(6) = FIRST(10)  FIRST(11) = { number, identifier }

Top-Down Parsing What about e productions? Example: Solution Complicates the definition of LL(1) Consider A → a and A → b and a may be empty In this case there is no symbol to identify a Solution Build a FOLLOW set for each production with e Example: What is FIRST(3)? = {  } What lookahead symbol tells us we are matching production 3? # Production rule 1 2 3 A → x B | y C | 

FIRST and FOLLOW Sets FIRST() FOLLOW(A) For some  (T  NT)*, define FIRST() as the set of tokens that appear as the first symbol in some string that derives from  That is, x  FIRST() iff  * x , for some  FOLLOW(A) For some A  NT, define FOLLOW(A) as the set of symbols that can occur immediately after A in a valid sentence. FOLLOW(G) = {EOF}, where G is the start symbol

Example: Calculating Follow Sets (1) # Production rule 1 2 3 4 5 6 7 8 9 10 11 goal → expr expr → term expr2 expr2 → + term expr2 | - term expr2 | e term → factor term2 term2 → * factor term2 | / factor term2 | e factor → number | identifier FOLLOW(goal) = { EOF } FOLLOW(expr) = FOLLOW(goal) = { EOF } FOLLOW(expr2) = FOLLOW(expr) = { EOF } FOLLOW(term) = ? FOLLOW(term) += FIRST(expr2) += { +, -, e } += { +, -, FOLLOW(expr)} += { +, -, EOF }

Example: Calculating Follow Sets (2) # Production rule 1 2 3 4 5 6 7 8 9 10 11 goal → expr expr → term expr2 expr2 → + term expr2 | - term expr2 | e term → factor term2 term2 → * factor term2 | / factor term2 | e factor → number | identifier FOLLOW(term2) += FOLLOW(term) FOLLOW(factor) = ? FOLLOW(factor) += FIRST(term2) += { *, / ,  } += { *, / , FOLLOW(term)} += { *, / , +, -, EOF }

Updated LL(1) Property Including e productions FOLLOW(A) = the set of terminal symbols that can immediately follow A Def: FIRST+(A → a) as FIRST(a) U FOLLOW(A), if e  FIRST(a) FIRST(a), otherwise Def: a grammar is LL(1) iff A → a and A → b and FIRST+(A → a)  FIRST+(A → b) = 

Predictive Parsing Given an LL(1) Grammar The parser can “predict” the correct expansion Using lookahead and FIRST and FOLLOW sets Two kinds of predictive parsers Recursive descent Often hand-written Table-driven Generate tables from First and Follow sets

Recursive Descent This produces a parser with six mutually recursive routines: Goal Expr Expr2 Term Term2 Factor Each recognizes one NT or T The term descent refers to the direction in which the parse tree is built. # Production rule 1 2 3 4 5 6 7 8 9 10 11 12 goal → expr expr → term expr2 expr2 → + term expr2 | - term expr2 | e term → factor term2 term2 → * factor term2 | / factor term2 | e factor → number | identifier | ( expr )

Example Code Goal symbol: Top-level expression main() /* Match goal --> expr */ tok = nextToken(); if (expr() && tok == EOF) then proceed to next step; else return false; expr() /* Match expr --> term expr2 */ if (term() && expr2()); return true; else return false;

Check FIRST and FOLLOW sets to distinguish Example Code Match expr2 expr2() /* Match expr2 --> + term expr2 */ /* Match expr2 --> - term expr2 */ if (tok == ‘+’ or tok == ‘-’) tok = nextToken(); if (term()) then return expr2(); else return false; /* Match expr2 --> empty */ return true; Check FIRST and FOLLOW sets to distinguish

Example Code factor() /* Match factor --> ( expr ) */ if (tok == ‘(‘) tok = nextToken(); if (expr() && tok == ‘)’) return true; else syntax error: expecting ) return false /* Match factor --> num */ if (tok is a num) return true /* Match factor --> id */ if (tok is an id)

Top-Down Parsing So far: Add actions to matching routines Gives us a yes or no answer We want to build the parse tree How? Add actions to matching routines Create a node for each production How do we assemble the tree?

Building a Parse Tree Notice: Idea: use a stack Recursive calls match the shape of the tree Idea: use a stack Each routine: Pops off the children it needs Creates its own node Pushes that node back on the stack main expr term factor expr2 term

Building a Parse Tree With stack operations expr() /* Match expr --> term expr2 */ if (term() && expr2()) expr2_node = pop(); term_node = pop(); expr_node = new exprNode(term_node, expr2_node) push(expr_node); return true; else return false;

Recursive Descent Parsing Massage grammar to have LL(1) condition Remove left recursion Left factor, where possible Build FIRST (and FOLLOW) sets Define a procedure for each non-terminal Implement a case for each right-hand side Call procedures as needed for non-terminals Add extra code, as needed Can we automate this process?

Table-driven approach Encode mapping in a table Row for each non-terminal Column for each terminal symbol Table[NT, symbol] = rule# if symbol  FIRST+(NT  rhs(#)) +,- *, / id, num expr2 term expr2 error term2 factor term2 factor (do nothing)

Code push the start symbol, G, onto Stack top  top of Stack Note: Missing else conditions for errors push the start symbol, G, onto Stack top  top of Stack loop forever if top = EOF and token = EOF then break & report success if top is a terminal then if top matches token then pop Stack // recognized top token  next_token() else // top is a non-terminal if TABLE[top,token] is A B1B2…Bk then pop Stack // get rid of A push Bk, Bk-1, …, B1 // in that order

Next Time … Bottom-up Parsers Overview of YACC More powerful Widely used – yacc, bison, JavaCUP Overview of YACC Removing shift/reduce reduce/reduce conflicts Just in case you haven’t started your homework!