Grammar vs Recursive Descent Parser

Slides:



Advertisements
Similar presentations
Parsing 4 Dr William Harrison Fall 2008
Advertisements

Compiler construction in4020 – lecture 4 Koen Langendoen Delft University of Technology The Netherlands.
Bottom-up Parsing A general style of bottom-up syntax analysis, known as shift-reduce parsing. Two types of bottom-up parsing: Operator-Precedence parsing.
Exercise: Balanced Parentheses
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Exercise 1: Balanced Parentheses Show that the following balanced parentheses grammar is ambiguous (by finding two parse trees for some input sequence)
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Pushdown Automata Consists of –Pushdown stack (can have terminals and nonterminals) –Finite state automaton control Can do one of three actions (based.
Lecture # 11 Grammar Problems.
Lexical and Syntactic Analysis Here, we look at two of the tasks involved in the compilation process –Given source code, we need to first break it into.
Chapter 4 Lexical and Syntax Analysis Sections 1-4.
Parsing III (Eliminating left recursion, recursive descent parsing)
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
CH4.1 CSE244 Sections 4.5,4.6 Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Box U-155 Storrs,
1 Compilers. 2 Compiler Program v = 5; if (v>5) x = 12 + v; while (x !=3) { x = x - 3; v = 10; } Add v,v,0 cmp v,5 jmplt ELSE THEN: add x, 12,v.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Parsing. Goals of Parsing Check the input for syntactic accuracy Return appropriate error messages Recover if possible Produce, or at least traverse,
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
Chapter 5 Top-Down Parsing.
Parsing III (Top-down parsing: recursive descent & LL(1) )
Exercise 1 Consider a language with the following tokens and token classes: ident ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Exercise 1 A ::= B EOF B ::=  | B B | (B) Tokens: EOF, (, ) Generate constraints and compute nullable and first for this grammar. Check whether first.
Compiler verification for fun and profit a week from now, 8:55am in Salle Polyvalente Xavier LeroyInria Paris–Rocquencourt …Compilers are, however, vulnerable.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Computing if a token can follow first(B 1... B p ) = {a  | B 1...B p ...  aw } follow(X) = {a  | S ... ...Xa... } There exists a derivation from.
Syntax Analyzer (Parser)
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Bernd Fischer RW713: Compiler and Software Language Engineering.
Parsing III (Top-down parsing: recursive descent & LL(1) )
Lecture # 10 Grammar Problems. Problems with grammar Ambiguity Left Recursion Left Factoring Removal of Useless Symbols These can create problems for.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Chapter 3 – Describing Syntax
Describing Syntax and Semantics
Parsing #1 Leonidas Fegaras.
lec02-parserCFG May 8, 2018 Syntax Analyzer
Chapter 4 - Parsing CSCE 343.
CS510 Compiler Lecture 4.
Lecture #12 Parsing Types.
Introduction to Parsing (adapted from CS 164 at Berkeley)
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 5, 09/25/2003 Prof. Roy Levow.
Chapter 3 – Describing Syntax
Simplifications of Context-Free Grammars
Syntax versus Semantics
Programming Language Syntax 7
Top-Down Parsing CS 671 January 29, 2008.
Compiler Design 7. Top-Down Table-Driven Parsing
Lecture 7: Introduction to Parsing (Syntax Analysis)
R.Rajkumar Asst.Professor CSE
Programming Language Syntax 5
Syntax Analysis - Parsing
BNF 9-Apr-19.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Predictive Parsing Program
Presentation transcript:

Grammar vs Recursive Descent Parser def expr = { term; termList } def termList = if (token==PLUS) { skip(PLUS); term; termList } else if (token==MINUS) skip(MINUS); term; termList } def term = { factor; factorList } ... def factor = if (token==IDENT) name else if (token==OPAR) { skip(OPAR); expr; skip(CPAR) } else error("expected ident or )") expr ::= term termList termList ::= + term termList | - term termList |  term ::= factor factorList factorList ::= * factor factorList | / factor factorList |  factor ::= name | ( expr ) name ::= ident

T1, T2, T3 should be disjoint sets of tokens. Rough General Idea def A = if (token  T1) { B1 ... Bp else if (token  T2) { C1 ... Cq } else if (token  T3) { D1 ... Dr } else error("expected T1,T2,T3") A ::= B1 ... Bp | C1 ... Cq | D1 ... Dr where: T1 = first(B1 ... Bp) T2 = first(C1 ... Cq) T3 = first(D1 ... Dr) first(B1 ... Bp) = {a | B1...Bp ...  aw } T1, T2, T3 should be disjoint sets of tokens.

Computing first in the example first(name) = {ident} first(( expr ) ) = { ( } first(factor) = first(name) U first( ( expr ) ) = {ident} U{ ( } = {ident, ( } first(* factor factorList) = { * } first(/ factor factorList) = { / } first(factorList) = { *, / } first(term) = first(factor) = {ident, ( } first(termList) = { + , - } first(expr) = first(term) = {ident, ( } expr ::= term termList termList ::= + term termList | - term termList |  term ::= factor factorList factorList ::= * factor factorList | / factor factorList |  factor ::= name | ( expr ) name ::= ident

Algorithm for first Given an arbitrary context-free grammar with a set of rules of the form X ::= Y1 ... Yn compute first for each right-hand side and for each symbol. How to handle alternatives for one non-terminal sequences of symbols nullable non-terminals recursion

Rules with Multiple Alternatives A ::= B1 ... Bp | C1 ... Cq | D1 ... Dr first(A) = first(B1... Bp) U first(C1 ... Cq) U first(D1 ... Dr) Sequences first(B1... Bp) = first(B1) if not nullable(B1) first(B1... Bp) = first(B1) U ... U first(Bk) if nullable(B1), ..., nullable(Bk-1) and not nullable(Bk) or k=p

Abstracting into Constraints recursive grammar: constraints over finite sets: expr' is first(expr) expr ::= term termList termList ::= + term termList | - term termList |  term ::= factor factorList factorList ::= * factor factorList | / factor factorList |  factor ::= name | ( expr ) name ::= ident expr' = term' termList' = {+} U {-} term' = factor' factorList' = {*} U { / } factor' = name' U { ( } name' = { ident } For this nice grammar, there is no recursion in constraints. Solve by substitution. nullable: termList, factorList

Example to Generate Constraints S ::= X | Y X ::= b | S Y Y ::= Z X b | Y b Z ::=  | a S' = X' U Y' X' = terminals: a,b non-terminals: S, X, Y, Z reachable (from S): productive: nullable: First sets of terminals: S', X', Y', Z'  {a,b}

Example to Generate Constraints S ::= X | Y X ::= b | S Y Y ::= Z X b | Y b Z ::=  | a S' = X' U Y' X' = {b} U S' Y' = Z' U X' U Y' Z' = {a} terminals: a,b non-terminals: S, X, Y, Z These constraints are recursive. How to solve them? S', X', Y', Z'  {a,b} How many candidate solutions in this case? for k tokens, n nonterminals? reachable (from S): S, X, Y, Z productive: X, Z, S, Y nullable: Z

Iterative Solution of first Constraints 1. 2. 3. 4. 5. S' X' Y' Z' {} {} {} {} {} {b} {b} {a} {b} {b} {a,b} {a} {a,b} {a,b} {a,b} {a} {a,b} {a,b} {a,b} {a} S' = X' U Y' X' = {b} U S' Y' = Z' U X' U Y' Z' = {a} Start from all sets empty. Evaluate right-hand side and assign it to left-hand side. Repeat until it stabilizes. Sets grow in each step initially they are empty, so they can only grow if sets grow, the RHS grows (U is monotonic), and so does LHS they cannot grow forever: in the worst case contain all tokens

Constraints for Computing Nullable Non-terminal is nullable if it can derive  S ::= X | Y X ::= b | S Y Y ::= Z X b | Y b Z ::=  | a S' = X' | Y' X' = 0 | (S' & Y') Y' = (Z' & X' & 0) | (Y' & 0) Z' = 1 | 0 S', X', Y', Z'  {0,1} 0 - not nullable 1 - nullable | - disjunction & - conjunction S' X' Y' Z' 0 0 0 0 0 0 0 1 0 0 0 1 1. 2. 3. again monotonically growing

Computing first and nullable Given any grammar we can compute for each non-terminal X whether nullable(X) using this, the set first(X) for each non-terminal X General approach: generate constraints over finite domains, following the structure of each rule solve the constraints iteratively start from least elements keep evaluating RHS and re-assigning the value to LHS stop when there is no more change

T1, T2, T3 should be disjoint sets of tokens. Rough General Idea def A = if (token  T1) { B1 ... Bp else if (token  T2) { C1 ... Cq } else if (token  T3) { D1 ... Dr } else error("expected T1,T2,T3") A ::= B1 ... Bp | C1 ... Cq | D1 ... Dr where: T1 = first(B1 ... Bp) T2 = first(C1 ... Cq) T3 = first(D1 ... Dr) T1, T2, T3 should be disjoint sets of tokens.

Exercise 1 A ::= B EOF B ::=  | B B | (B) Tokens: EOF, (, ) Generate constraints and compute nullable and first for this grammar. Check whether first sets for different alternatives are disjoint.

Exercise 2 S ::= B EOF B ::=  | B (B) Tokens: EOF, (, ) Generate constraints and compute nullable and first for this grammar. Check whether first sets for different alternatives are disjoint.

Exercise 3 Compute nullable, first for this grammar: stmtList ::=  | stmt stmtList stmt ::= assign | block assign ::= ID = ID ; block ::= beginof ID stmtList ID ends Describe a parser for this grammar and explain how it behaves on this input: beginof myPrettyCode x = u; y = v; myPrettyCode ends

Problem Identified stmtList ::=  | stmt stmtList stmt ::= assign | block assign ::= ID = ID ; block ::= beginof ID stmtList ID ends Problem parsing stmtList: ID could start alternative stmt stmtList ID could follow stmt, so we may wish to parse  that is, do nothing and return For nullable non-terminals, we must also compute what follows them

General Idea for nullable(A) def A = if (token  T1) { B1 ... Bp else if (token  (T2 U TF)) { C1 ... Cq } else if (token  T3) { D1 ... Dr } // no else error, just return A ::= B1 ... Bp | C1 ... Cq | D1 ... Dr where: T1 = first(B1 ... Bp) T2 = first(C1 ... Cq) T3 = first(D1 ... Dr) TF = follow(A) Only one of the alternatives can be nullable (e.g. second) T1, T2, T3, TF should be pairwise disjoint sets of tokens.

LL(1) Grammar - good for building recursive descent parsers Grammar is LL(1) if for each nonterminal X first sets of different alternatives of X are disjoint if nullable(X), first(X) must be disjoint from follow(X) For each LL(1) grammar we can build recursive-descent parser Each LL(1) grammar is unambiguous If a grammar is not LL(1), we can sometimes transform it into equivalent LL(1) grammar

Computing if a token can follow first(B1 ... Bp) = {a | B1...Bp ...  aw } follow(X) = {a | S ...  ...Xa... } There exists a derivation from the start symbol that produces a sequence of terminals and nonterminals of the form ...Xa... (the token a follows the non-terminal X)

Rule for Computing Follow Given X ::= YZ (for reachable X) then first(Z)  follow(Y) and follow(X)  follow(Z) now take care of nullable ones as well: For each rule X ::= Y1 ... Yp ... Yq ... Yr follow(Yp) should contain: first(Yp+1Yp+2...Yr) also follow(X) if nullable(Yp+1Yp+2Yr)

Compute nullable, first, follow stmtList ::=  | stmt stmtList stmt ::= assign | block assign ::= ID = ID ; block ::= beginof ID stmtList ID ends Is this grammar LL(1)?

Conclusion of the Solution The grammar is not LL(1) because we have nullable(stmtList) first(stmt)  follow(stmtList) = {ID} If a recursive-descent parser sees ID, it does not know if it should finish parsing stmtList or parse another stmt