Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.

Similar presentations


Presentation on theme: "CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras."— Presentation transcript:

1 CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras

2 CSE 5317/4305 L3: Parsing #12 Parser A parser recognizes sequences of tokens according to some grammar and generates Abstract Syntax Trees (ASTs) A context-free grammar (CFG) has –a finite set of terminals (tokens) –a finite set of nonterminals from which one is the start symbol –and a finite set of productions of the form: A ::= X1 X2... Xn where A is a nonterminal and each Xi is either a terminal or nonterminal symbol scannerparser get token token source file get next character AST

3 CSE 5317/4305 L3: Parsing #13 Example Expressions: E ::= E + T | E - T | T T ::= T * F | T / F | F F ::= num | id Nonterminals: E T F Start symbol: E Terminals: + - * / id num Example: x+2*y –... or equivalently: E ::= E + T E ::= E - T E ::= T T ::= T * F T ::= T / F T ::= F F ::= num F ::= id

4 CSE 5317/4305 L3: Parsing #14 Derivations Notation: –terminals: t, s,... –nonterminals: A, B,... –symbol (terminal or nonterminal): X, Y,... –sequence of symbols: a, b,... Given a production: A ::= X 1 X 2... X n the form aAb => aX 1 X 2... X n b is called a derivation eg, using the production T ::= T * F we get T / F + 1 - x => T * F / F + 1 - x Leftmost derivation: when you always expand the leftmost nonterminal in the sequence Rightmost derivation:... rightmost nonterminal

5 CSE 5317/4305 L3: Parsing #15 Top-down Parsing It starts from the start symbol of the grammar and applies derivations until the entire input string is derived Example that matches the input sequence id(x) + num(2) * id(y) E => E + Tuse E ::= E + T => E + T * Fuse T ::= T * F => T + T * Fuse E ::= T => T + F * Fuse T ::= F => T + num * Fuse F ::= num => F + num * Fuse T ::= F => id + num * Fuse F ::= id => id + num * iduse F ::= id You may have more than one choice at each derivation step: my have multiple nonterminals in each sequence for each nonterminal in the sequence, may have many rules to choose from Wrong predictions will cause backtracking need predictive parsing that never backtracks

6 CSE 5317/4305 L3: Parsing #16 Bottom-up Parsing It starts from the input string and uses derivations in the opposite directions (from right to left) until you derive the start symbol Previous example: id(x) + num(2) * id(y) <= id(x) + num(2) * Fuse F ::= id <= id(x) + F * Fuse F ::= num <= id(x) + T * Fuse T ::= F <= id(x) + Tuse T ::= T * F <= F + Tuse F ::= id <= T + Tuse T ::= F <= E + Tuse E ::= T <= Euse E ::= E + T At each derivation step, need to recognize a handle (the sequence of symbols that matches the right-hand-side of a production)

7 CSE 5317/4305 L3: Parsing #17 Parse Tree Given the derivations used in the top-down/bottom-up parsing of an input sequence, a parse tree has –the start symbol as the root –the terminals of the input sequence as leafs –for each production A ::= X 1 X 2... X n used in a derivation, a node A with children X 1 X 2... X n E E T T T F F F id(x) + num(2) * id(y) E => E + T => E + T * F => T + T * F => T + F * F => T + num * F => F + num * F => id + num * F => id + num * id

8 CSE 5317/4305 L3: Parsing #18 Playing with Associativity What about this grammar? E ::= T + E | T - E | T T ::= F * T | F / T | F F ::= num | id Right associative Now x+y+z is equivalent to x+(y+z) E T E F T E F T F id(x) + id(y) + id(z)

9 CSE 5317/4305 L3: Parsing #19 Ambiguous Grammars What about this grammar? E ::= E + E | E - E | E * E | E / E | num | id Operators + - * / have the same precedence! It is ambiguous: has more than one parse tree for the same input sequence (depending which derivations are applied each time) E E E id(x) * id(y) + id(z) E E E id(x) * id(y) + id(z)

10 CSE 5317/4305 L3: Parsing #110 Predictive Parsing The goal is to construct a top-down parser that never backtracks Always leftmost derivations –left recursion is bad! We must transform a grammar in two ways: –eliminate left recursion –perform left factoring These rules eliminate most common causes for backtracking although they do not guarantee a completely backtrack-free parsing

11 CSE 5317/4305 L3: Parsing #111 Left Recursion Elimination For example, the grammar A ::= A a | b recognizes the regular expression ba*. But a top-down parser may have hard time to decide which rule to use Need to get rid of left recursion: A ::= b A' A' ::= a A' | ie, A' parses the RE a*. The second rule is recursive, but not left recursive

12 CSE 5317/4305 L3: Parsing #112 Left Recursion Elimination (cont.) For each nonterminal X, we partition the productions for X into two groups: –one that contains the left recursive productions –the other with the rest That is: X ::= X a 1... X ::= X a n where a and b are symbol sequences. Then we eliminate the left recursion by rewriting these rules into: X ::= b 1 X'... X ::= b m X' X ::= b 1... X ::= b m X' ::= a 1 X'... X' ::= a n X' X' ::=

13 CSE 5317/4305 L3: Parsing #113 Example E ::= E + T | E - T | T T ::= T * F | T / F | F F ::= num | id E ::= T E' E' ::= + T E' | - T E' | T ::= F T' T' ::= * F T' | / F T' | F ::= num | id

14 CSE 5317/4305 L3: Parsing #114 Example A grammar that recognizes regular expressions: R ::= R R | R bar R | R * | ( R ) | char After left recursion elimination: R ::= ( R ) R' | char R' R' ::= R R' | bar R R' | * R' |

15 CSE 5317/4305 L3: Parsing #115 Left Factoring Factors out common prefixes: X ::= a b 1... X ::= a b n becomes: X ::= a X' X' ::= b 1... X' ::= b n Example: E ::= T + E | T - E | T E ::= T E' E' ::= + E | - E |

16 CSE 5317/4305 L3: Parsing #116 Recursive Descent Parsing E ::= T E' E' ::= + T E' | - T E' | T ::= F T' T' ::= * F T' | / F T' | F ::= num | id static void E () { T(); Eprime(); } static void Eprime () { if (current_token == PLUS) { read_next_token(); T(); Eprime(); } else if (current_token == MINUS) { read_next_token(); T(); Eprime(); }; } static void T () { F(); Tprime(); } static void Tprime() { if (current_token == TIMES) { read_next_token(); F(); Tprime(); } else if (current_token == DIV) { read_next_token(); F(); Tprime(); }; } static void F () { if (current_token == NUM || current_token == ID) read_next_token(); else error(); }

17 CSE 5317/4305 L3: Parsing #117 Predictive Parsing Using a Table The symbol sequence from a derivation is stored in a stack (first symbol on top) if the top of the stack is a terminal, it should match the current token from the input if the top of the stack is a nonterminal X and the current input token is t, we get a rule for the parse table: M[X,t] the rule is used as a derivation to replace X in the stack with the right-hand symbols push(S); read_next_token(); repeat X = pop(); if (X is a terminal or '$') if (X == current_token) read_next_token(); else error(); else if (M[X,current_token] == "X ::= Y1 Y2... Yk") { push(Yk);... push(Y1); } else error(); until X == '$';

18 CSE 5317/4305 L3: Parsing #118 Parsing Table Example 1) E ::= T E' $ 2) E' ::= + T E' 3) | - T E' 4) | 5) T ::= F T' 6) T' ::= * F T' 7) | / F T' 8) | 9) F ::= num 10) | id num id + - * / $ E 1 1 E' 2 3 4 T 5 5 T' 8 8 6 7 8 F 9 10

19 CSE 5317/4305 L3: Parsing #119 Example: Parsing x-2*y$ Stack current_token Rule ExM[E,id] = 1 (using E ::= T E' $) $ E' TxM[T,id] = 5 (using T ::= F T') $ E' T' FxM[F,id] = 10 (using F ::= id) $ E' T' idxread_next_token $ E' T'-M[T',-] = 8 (using T' ::= ) $ E'-M[E',-] = 3 (using E' ::= - T E') $ E' T --read_next_token $ E' T 2M[T,num] = 5 (using T ::= F T') $ E' T' F2M[F,num] = 9 (using F ::= num) $ E' T' num2read_next_token $ E' T'*M[T',*] = 6 (using T' ::= * F T') $ E' T' F **read_next_token $ E' T' FyM[F,id] = 10 (using F ::= id) $ E' T' idyread_next_token $ E' T'$M[T',$] = 8 (using T' ::= ) $ E'$M[E',$] = 4 (using E' ::= ) $$stop (accept) top

20 CSE 5317/4305 L3: Parsing #120 Constructing the Parsing Table FIRST[a] is the set of terminals t that result after a number of derivations on the symbol sequence a ie, a =>... => tb for some symbol sequence b –FIRST[ta]={t} eg, FIRST[3+E]={3} –FIRST[X]=FIRST[a 1 ]  …  FIRST[a n ] for each production X ::= a i –FIRST[Xa]=FIRST[X] but if X has an empty derivation then FIRST[Xa]=FIRST[X]  FIRST[a] FOLLOW[X] is the set of all terminals that follow X in any legal derivation –find all productions Z ::= a X b in which X appears at the RHS; then FIRST[b] must be included in FOLLOW[X] if b has an empty derivation, FOLLOW[Z] must be included in FOLLOW[X]

21 CSE 5317/4305 L3: Parsing #121 Example 1) E ::= T E' $ 2) E' ::= + T E' 3) | - T E' 4) | 5) T ::= F T' 6) T' ::= * F T' 7) | / F T' 8) | 9) F ::= num 10) | id FIRST FOLLOW E {num,id} {} E' {+,-} {$} T {num,id} {+,-,$} T' {*,/} {+,-,$} F {num,id} {+,-,*,/,$}

22 CSE 5317/4305 L3: Parsing #122 Constructing the Parsing Table (cont.) For each rule X ::= a do: –for each t in FIRST[a], add X ::= a to M[X,t] –if a can be reduced to the empty sequence, then for each t in FOLLOW[X], add X ::= a to M[X,t] FIRST FOLLOW E {num,id} {} E' {+,-} {$} T {num,id} {+,-,$} T' {*,/} {+,-,$} F {num,id} {+,-,*,/,$} 1) E ::= T E' $ 2) E' ::= + T E' 3) | - T E' 4) | 5) T ::= F T' 6) T' ::= * F T' 7) | / F T' 8) | 9) F ::= num 10) | id num id + - * / $ E 1 1 E' 2 3 4 T 5 5 T' 8 8 6 7 8 F 9 10

23 CSE 5317/4305 L3: Parsing #123 Another Example G ::= S $ S ::= ( L ) | a L ::= L, S | S 0) G := S $ 1) S ::= ( L ) 2) S ::= a 3) L ::= S L' 4) L' ::=, S L' 5) L' ::= ( ) a, $ G 0 0 S 1 2 L 3 3 L' 5 4

24 CSE 5317/4305 L3: Parsing #124 LL(1) A grammar is called LL(1) if each element of the parsing table of the grammar has at most one production element –the first L in LL(1) means that we read the input from left to right –the second L means that it uses left-most derivations only –the number 1 means that we need to look one token ahead from the input


Download ppt "CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras."

Similar presentations


Ads by Google