Presentation is loading. Please wait.

Presentation is loading. Please wait.

Top-Down Parsing CS 671 January 29, 2008. CS 671 – Spring 2008 1 Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.

Similar presentations


Presentation on theme: "Top-Down Parsing CS 671 January 29, 2008. CS 671 – Spring 2008 1 Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract."— Presentation transcript:

1 Top-Down Parsing CS 671 January 29, 2008

2 CS 671 – Spring 2008 1 Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract Syntax Tree (AST) Lexical Analysis Syntactic Analysis Semantic Analysis if == b 0 = a “Hi” ; Do tokens conform to the language syntax?

3 CS 671 – Spring 2008 2 Last Time Parse trees vs. ASTs Derivations –Leftmost vs. Rightmost Grammar ambiguity

4 CS 671 – Spring 2008 3 Parsing What is parsing? Discovering the derivation of a string: If one exists Harder than generating strings Two major approaches Top-down parsing Bottom-up parsing Won’t work on all context-free grammars Properties of grammar determine parse-ability We may be able to transform a grammar

5 CS 671 – Spring 2008 4 Two Approaches Top-down parsers LL(1), recursive descent Start at the root of the parse tree and grow toward leaves Pick a production & try to match the input Bad “pick”  may need to backtrack Bottom-up parsers LR(1), operator precedence Start at the leaves and grow toward root As input is consumed, encode possible parse trees in an internal state Bottom-up parsers handle a large class of grammars

6 CS 671 – Spring 2008 5 Grammars and Parsers LL(1) parsers Left-to-right input Leftmost derivation 1 symbol of look-ahead LR(1) parsers Left-to-right input Rightmost derivation 1 symbol of look-ahead Also: LL(k), LR(k), SLR, LALR, … Grammars that this can handle are called LL(1) grammars Grammars that this can handle are called LR(1) grammars

7 CS 671 – Spring 2008 6 Top-Down Parsing Start with the root of the parse tree Root of the tree: node labeled with the start symbol Algorithm: Repeat until the fringe of the parse tree matches input string At a node A, select a production for A Add a child node for each symbol on rhs If a terminal symbol is added that doesn’t match, backtrack Find the next node to be expanded (a non-terminal) Done when: Leaves of parse tree match input string (success) All productions exhausted in backtracking (failure)

8 CS 671 – Spring 2008 7 Example Expression grammar (with precedence) Input string x – 2 * y #Production rule 1234567812345678 expr → expr + term | expr - term | term term → term * factor | term / factor | factor factor → number | identifier

9 CS 671 – Spring 2008 8 Example Problem: Can’t match next terminal We guessed wrong at step 2 RuleSentential formInput string - expr x + term fact term 2 expr + term  x - 2 * y 3 term + term  x – 2 * y 6 factor + term  x – 2 * y 8 + term x  – 2 * y - + term x  – 2 * y  x - 2 * y Current position in the input stream

10 CS 671 – Spring 2008 9 Backtracking Rollback productions Choose a different production for expr Continue RuleSentential formInput string - expr 2 expr + term  x - 2 * y 3 term + term  x – 2 * y 6 factor + term  x – 2 * y 8 + term x  – 2 * y ? + term x  – 2 * y  x - 2 * y Undo all these productions

11 CS 671 – Spring 2008 10 Retrying Problem: More input to read Another cause of backtracking RuleSentential formInput string - expr x - term fact term 2 expr - term  x - 2 * y 3 term - term  x – 2 * y 6 factor - term  x – 2 * y 8 - term x  – 2 * y - - term x –  2 * y  x - 2 * y 3 - factor x –  2 * y 7 - x – 2  * y fact 2

12 CS 671 – Spring 2008 11 Successful Parse All terminals match – we’re finished RuleSentential formInput string - expr x - term fact term 2 expr - term  x - 2 * y 3 term - term  x – 2 * y 6 factor - term  x – 2 * y 8 - term x  – 2 * y - - term x –  2 * y  x - 2 * y 4 - term * fact x –  2 * y 6 - fact * fact x –  2 * y 2 7 - * fact x – 2  * y fact - - * fact x – 2 *  y 8 - * x – 2 * y  term * fact y

13 CS 671 – Spring 2008 12 Other Possible Parses Problem: termination Wrong choice leads to infinite expansion (More importantly: without consuming any input!) May not be as obvious as this Our grammar is left recursive RuleSentential formInput string - expr 2 expr + term  x - 2 * y 2 expr + term + term  x – 2 * y 2 expr + term + term + term  x – 2 * y 2 expr + term + term + term + term  x – 2 * y  x - 2 * y

14 CS 671 – Spring 2008 13 Left Recursion Formally, A grammar is left recursive if  a non-terminal A such that A → * A  (for some set of symbols ) Bad news: Top-down parsers cannot handle left recursion Good news: We can systematically eliminate left recursion What does →* mean? A → B x B → A y

15 CS 671 – Spring 2008 14 Removing Left Recursion Two cases of left recursion: Transform as follows: #Production rule 123123 expr → expr + term | expr - term | term #Production rule 456456 term → term * factor | term / factor | factor #Production rule 12341234 expr → term expr2 expr2 → + term expr2 | - term expr2 |  #Production rule 456456 term → factor term2 term2 → * factor term2 | / factor term2 | 

16 CS 671 – Spring 2008 15 Right-Recursive Grammar We can choose the right production by looking at the next input symbol This is called lookahead BUT, this can be tricky… #Production rule 1 2 3 4 5 6 7 8 9 10 expr → term expr2 expr2 → + term expr2 | - term expr2 |  term → factor term2 term2 → * factor term2 | / factor term2 |  factor → number | identifier Two productions with no choice at all All other productions are uniquely identified by a terminal symbol at the start of RHS

17 CS 671 – Spring 2008 16 Top-Down Parsing Goal: Given productions A →  | , the parser should be able to choose between  and  How can the next input token help us decide? Solution: FIRST sets Informally: FIRST() is the set of tokens that could appear as the first symbol in a string derived from  Def: x in FIRST() iff  → * x 

18 CS 671 – Spring 2008 17 The LL(1) Property Given A →  and A → , we would like: FIRST()  FIRST() =  Parser can make right choice by looking at one lookahead token..almost..

19 CS 671 – Spring 2008 18 Example: Calculating FIRST Sets #Production rule 1 2 3 4 5 6 7 8 9 10 11 goal → expr expr → term expr2 expr2 → + term expr2 | - term expr2 |  term → factor term2 term2 → * factor term2 | / factor term2 |  factor → number | identifier FIRST(3) = { + } FIRST(4) = { - } FIRST(5) = {  } FIRST(7) = { * } FIRST(8) = { / } FIRST(9) = {  } FIRST(1) = ? FIRST(1) = FIRST(2) = FIRST(6) = FIRST(10)  FIRST(11) = { number, identifier }

20 CS 671 – Spring 2008 19 Top-Down Parsing What about  productions? Complicates the definition of LL(1) Consider A →  and A →  and  may be empty In this case there is no symbol to identify  Solution Build a FOLLOW set for each production with  #Production rule 123123 A → x B | y C |  Example: What is FIRST(3)? = {  } What lookahead symbol tells us we are matching production 3?

21 CS 671 – Spring 2008 20 FIRST and FOLLOW Sets FIRST() For some  (T  NT)*, define FIRST() as the set of tokens that appear as the first symbol in some string that derives from  That is, x  FIRST() iff   * x , for some  FOLLOW(A) For some A  NT, define FOLLOW(A) as the set of symbols that can occur immediately after A in a valid sentence. FOLLOW(G) = {EOF}, where G is the start symbol

22 CS 671 – Spring 2008 21 Example: Calculating Follow Sets (1) #Production rule 1 2 3 4 5 6 7 8 9 10 11 goal → expr expr → term expr2 expr2 → + term expr2 | - term expr2 |  term → factor term2 term2 → * factor term2 | / factor term2 |  factor → number | identifier F OLLOW (goal) = { EOF } F OLLOW (expr) = F OLLOW (goal) = { EOF } F OLLOW (expr2) = F OLLOW (expr) = { EOF } F OLLOW (term) = ? F OLLOW (term) += F IRST (expr2) += { +, -,  } += { +, -, F OLLOW (expr)} += { +, -, EOF }

23 CS 671 – Spring 2008 22 Example: Calculating Follow Sets (2) #Production rule 1 2 3 4 5 6 7 8 9 10 11 goal → expr expr → term expr2 expr2 → + term expr2 | - term expr2 |  term → factor term2 term2 → * factor term2 | / factor term2 |  factor → number | identifier FOLLOW(term2) += FOLLOW(term) FOLLOW(factor) = ? FOLLOW(factor) += FIRST(term2) += { *, /,  } += { *, /, FOLLOW(term)} += { *, /, +, -, EOF }

24 CS 671 – Spring 2008 23 Updated LL(1) Property Including  productions FOLLOW(A) = the set of terminal symbols that can immediately follow A Def: FIRST+(A → ) as –FIRST() U FOLLOW(A), if   FIRST() –FIRST(), otherwise Def: a grammar is LL(1) iff A →  and A →  and FIRST+(A → )  FIRST+(A → ) = 

25 CS 671 – Spring 2008 24 Predictive Parsing Given an LL(1) Grammar The parser can “predict” the correct expansion Using lookahead and FIRST and FOLLOW sets Two kinds of predictive parsers Recursive descent Often hand-written Table-driven Generate tables from First and Follow sets

26 CS 671 – Spring 2008 25 Recursive Descent This produces a parser with six mutually recursive routines: Goal Expr Expr2 Term Term2 Factor Each recognizes one NT or T The term descent refers to the direction in which the parse tree is built. #Production rule 1 2 3 4 5 6 7 8 9 10 11 12 goal → expr expr → term expr2 expr2 → + term expr2 | - term expr2 |  term → factor term2 term2 → * factor term2 | / factor term2 |  factor → number | identifier | ( expr )

27 CS 671 – Spring 2008 26 Example Code Goal symbol: Top-level expression main() /* Match goal  expr */ tok = nextToken(); if ( expr() && tok == EOF) then proceed to next step; else return false; expr() /* Match expr  term expr2 */ if ( term() && expr2() ); return true; else return false;

28 CS 671 – Spring 2008 27 Example Code Match expr2 expr2() /* Match expr2  + term expr2 */ /* Match expr2  - term expr2 */ if (tok == ‘+’ or tok == ‘-’) tok = nextToken(); if ( term() ) then return expr2() ; else return false; /* Match expr2 --> empty */ return true; Check F IRST and F OLLOW sets to distinguish

29 CS 671 – Spring 2008 28 Example Code factor() /* Match factor --> ( expr ) */ if (tok == ‘(‘) tok = nextToken(); if ( expr() && tok == ‘)’) return true; else syntax error: expecting ) return false /* Match factor --> num */ if (tok is a num) return true /* Match factor --> id */ if (tok is an id) return true;

30 CS 671 – Spring 2008 29 Top-Down Parsing So far: Gives us a yes or no answer We want to build the parse tree How? Add actions to matching routines Create a node for each production How do we assemble the tree?

31 CS 671 – Spring 2008 30 Notice: Recursive calls match the shape of the tree Idea: use a stack Each routine: –Pops off the children it needs –Creates its own node –Pushes that node back on the stack Building a Parse Tree main expr term factor expr2 term

32 CS 671 – Spring 2008 31 Building a Parse Tree With stack operations expr() /* Match expr  term expr2 */ if ( term() && expr2() ) expr2_node = pop() ; term_node = pop() ; expr_node = new exprNode (term_node, expr2_node) push (expr_node); return true; else return false;

33 CS 671 – Spring 2008 32 Recursive Descent Parsing Massage grammar to have LL(1) condition Remove left recursion Left factor, where possible Build FIRST (and FOLLOW) sets Define a procedure for each non-terminal Implement a case for each right-hand side Call procedures as needed for non-terminals Add extra code, as needed Can we automate this process?

34 CS 671 – Spring 2008 33 Table-driven approach Encode mapping in a table Row for each non-terminal Column for each terminal symbol Table[NT, symbol] = rule# if symbol  FIRST+(NT  rhs(#)) +,-*, /id, num expr2term expr2error term2errorfactor term2error factorerror (do nothing)

35 CS 671 – Spring 2008 34 Code Note: Missing else conditions for errors push the start symbol, G, onto Stack top  top of Stack loop forever if top = EOF and token = EOF then break & report success if top is a terminal then if top matches token then pop Stack// recognized top token  next_token() else // top is a non-terminal if TABLE[top,token] is A  B 1 B 2 …B k then pop Stack // get rid of A push Bk, Bk-1, …, B1 // in that order top  top of Stack

36 CS 671 – Spring 2008 35 Next Time … Bottom-up Parsers More powerful Widely used – yacc, bison, JavaCUP Overview of YACC Removing shift/reduce reduce/reduce conflicts Just in case you haven’t started your homework!


Download ppt "Top-Down Parsing CS 671 January 29, 2008. CS 671 – Spring 2008 1 Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract."

Similar presentations


Ads by Google