Parsing & Context-Free Grammars Hal Perkins Autumn 2011

Slides:



Advertisements
Similar presentations
lec02-parserCFG March 27, 2017 Syntax Analyzer
Advertisements

CPSC Compiler Tutorial 4 Midterm Review. Deterministic Finite Automata (DFA) Q: finite set of states Σ: finite set of “letters” (input alphabet)
Prof. Bodik CS 164 Lecture 81 Grammars and ambiguity CS164 3:30-5:00 TT 10 Evans.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
CPSC Compiler Tutorial 3 Parser. Parsing The syntax of most programming languages can be specified by a Context-free Grammar (CGF) Parsing: Given.
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.
EECS 6083 Intro to Parsing Context Free Grammars
8/19/2015© Hal Perkins & UW CSEC-1 CSE P 501 – Compilers Parsing & Context-Free Grammars Hal Perkins Winter 2008.
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Grammars CPSC 5135.
CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring CSE3302 Programming Languages,
Bernd Fischer RW713: Compiler and Software Language Engineering.
Introduction to Parsing
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Syntax Analyzer (Parser)
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
CSE P501 – Compilers Parsing Context Free Grammars (CFG) Ambiguous Grammars Next Spring 2014Jim Hogg - UW - CSE - P501C-1.
Introduction to Parsing
Comp 411 Principles of Programming Languages Lecture 3 Parsing
CSE 3302 Programming Languages
Chapter 3 – Describing Syntax
lec02-parserCFG May 8, 2018 Syntax Analyzer
Introduction to Parsing
Parsing & Context-Free Grammars
G. Pullaiah College of Engineering and Technology
Programming Languages Translator
CS510 Compiler Lecture 4.
Lexical and Syntax Analysis
Chapter 3 Context-Free Grammar and Parsing
Introduction to Parsing (adapted from CS 164 at Berkeley)
Chapter 3 – Describing Syntax
Syntax Specification and Analysis
What does it mean? Notes from Robert Sebesta Programming Languages
Syntax versus Semantics
CSE 3302 Programming Languages
Lecture 3: Introduction to Syntax (Cont’)
Compiler Design 4. Language Grammars
Lexical and Syntax Analysis
(Slides copied liberally from Ruth Anderson, Hal Perkins and others)
Top-Down Parsing CS 671 January 29, 2008.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler Design 7. Top-Down Table-Driven Parsing
Lecture 7: Introduction to Parsing (Syntax Analysis)
CSC 4181Compiler Construction Context-Free Grammars
R.Rajkumar Asst.Professor CSE
Finite Automata and Formal Languages
LL and Recursive-Descent Parsing Hal Perkins Autumn 2011
Introduction to Parsing
Introduction to Parsing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
LL and Recursive-Descent Parsing
CSC 4181 Compiler Construction Context-Free Grammars
BNF 9-Apr-19.
Parsing & Context-Free Grammars Hal Perkins Summer 2004
LL and Recursive-Descent Parsing Hal Perkins Autumn 2009
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
LL and Recursive-Descent Parsing Hal Perkins Winter 2008
lec02-parserCFG May 27, 2019 Syntax Analyzer
Parsing & Context-Free Grammars Hal Perkins Autumn 2005
COMPILER CONSTRUCTION
Presentation transcript:

Parsing & Context-Free Grammars Hal Perkins Autumn 2011 CSE P 501 – Compilers Parsing & Context-Free Grammars Hal Perkins Autumn 2011 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Agenda for Today Parsing overview Context free grammars Ambiguous grammars Reading: Cooper/Torczon ch. 3, or Dragon Book ch. 4, or Appel ch. 3 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Parsing The syntax of most programming languages can be specified by a context-free grammar (CGF) Parsing: Given a grammar G and a sentence w in L(G ), traverse the derivation (parse tree) for w in some standard order and do something useful at each node The tree might not be produced explicitly, but the control flow of a parser corresponds to a traversal 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Old Example a = 1 ; if ( a + 1 ) b = 2 ; G w program program statement program ::= statement | program statement statement ::= assignStmt | ifStmt assignStmt ::= id = expr ; ifStmt ::= if ( expr ) statement expr ::= id | int | expr + expr Id ::= a | b | c | i | j | k | n | x | y | z int ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Old Example G program a = 1 ; if ( a + 1 ) b = 2 ; program statement statement ifStmt statement assignStmt assignStmt expr id expr expr expr id expr int id int int w 11/13/2018 © 2002-11 Hal Perkins & UW CSE

“Standard Order” For practical reasons we want the parser to be deterministic (no backtracking), and we want to examine the source program from left to right. (i.e., parse the program in linear time in the order it appears in the source file) 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Common Orderings Top-down Bottom-up Start with the root Traverse the parse tree depth-first, left-to-right (leftmost derivation) LL(k) Bottom-up Start at leaves and build up to the root Effectively a rightmost derivation in reverse(!) LR(k) and subsets (LALR(k), SLR(k), etc.) 11/13/2018 © 2002-11 Hal Perkins & UW CSE

“Something Useful” At each point (node) in the traversal, perform some semantic action Construct nodes of full parse tree (rare) Construct abstract syntax tree (common) Construct linear, lower-level representation (more common in later parts of a modern compiler) Generate target code on the fly (1-pass compiler; not common in production compilers – can’t generate very good code in one pass – but great if you need a quick ‘n dirty working compiler) 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Context-Free Grammars Formally, a grammar G is a tuple <N,Σ,P,S> where N a finite set of non-terminal symbols Σ a finite set of terminal symbols P a finite set of productions A subset of N × (N  Σ )* S the start symbol, a distinguished element of N If not specified otherwise, this is usually assumed to be the non-terminal on the left of the first production 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Standard Notations a, b, c elements of Σ w, x, y, z elements of Σ* A, B, C elements of N X, Y, Z elements of N Σ , ,  elements of (N Σ )* A  or A ::=  if <A, > in P 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Derivation Relations (1)  A  =>    iff A ::=  in P derives A =>*  if there is a chain of productions starting with A that generates  transitive closure 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Derivation Relations (2) w A  =>lm w   iff A ::=  in P derives leftmost  A w =>rm   w iff A ::=  in P derives rightmost We will only be interested in leftmost and rightmost derivations – not random orderings 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Languages For A in N, L(A) = { w | A =>* w } If S is the start symbol of grammar G, define L(G ) = L(S ) Nonterminal on the left of the first rule is taken to be the start symbol if one is not specified explicitly 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Reduced Grammars Grammar G is reduced iff for every production A ::=  in G there is some derivation S =>* x A z => x  z =>* xyz i.e., no production is useless Convention: we will use only reduced grammars 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Ambiguity Grammar G is unambiguous iff every w in L(G ) has a unique leftmost (or rightmost) derivation Fact: unique leftmost or unique rightmost implies the other A grammar lacking this property is ambiguous Note that other grammars that generate the same language may be unambiguous We need unambiguous grammars for parsing 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Example: Ambiguous Grammar for Arithmetic Expressions expr ::= expr + expr | expr - expr | expr * expr | expr / expr | int int ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Exercise: show that this is ambiguous How? Show two different leftmost or rightmost derivations for the same string Equivalently: show two different parse trees for the same string 11/13/2018 © 2002-11 Hal Perkins & UW CSE

expr ::= expr + expr | expr - expr | expr * expr | expr / expr | int Example (cont) Give a leftmost derivation of 2+3*4 and show the parse tree 11/13/2018 © 2002-11 Hal Perkins & UW CSE

expr ::= expr + expr | expr - expr | expr * expr | expr / expr | int Example (cont) Give a different leftmost derivation of 2+3*4 and show the parse tree 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Another example Give two different derivations of 5+6+7 expr ::= expr + expr | expr - expr | expr * expr | expr / expr | int int ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Another example Give two different derivations of 5+6+7 11/13/2018 © 2002-11 Hal Perkins & UW CSE

What’s going on here? The grammar has no notion of precedence or associatively Solution Create a non-terminal for each level of precedence Isolate the corresponding part of the grammar Force the parser to recognize higher precedence subexpressions first Au02: taken from one of Cooper’s slides 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Classic Expression Grammar expr ::= expr + term | expr – term | term term ::= term * factor | term / factor | factor factor ::= int | ( expr ) int ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Check: Derive 2 + 3 * 4 expr ::= expr + term | expr – term | term term ::= term * factor | term / factor | factor factor ::= int | ( expr ) int ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 Check: Derive 2 + 3 * 4 11/13/2018 © 2002-11 Hal Perkins & UW CSE

expr ::= expr + term | expr – term | term term ::= term * factor | term / factor | factor factor ::= int | ( expr ) int ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 Check: Derive 5 + 6 + 7 Note interaction between left- vs right-recursive rules and resulting associativity 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Check: Derive 5 + (6 + 7) expr ::= expr + term | expr – term | term term ::= term * factor | term / factor | factor factor ::= int | ( expr ) int ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 Check: Derive 5 + (6 + 7) 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Another Classic Example Grammar for conditional statements stmt ::= if ( cond ) stmt | if ( cond ) stmt else stmt Exercise: show that this is ambiguous How? 11/13/2018 © 2002-11 Hal Perkins & UW CSE

| if ( cond ) stmt else stmt stmt ::= if ( cond ) stmt | if ( cond ) stmt else stmt One Derivation if ( cond ) if ( cond ) stmt else stmt 11/13/2018 © 2002-11 Hal Perkins & UW CSE

| if ( cond ) stmt else stmt stmt ::= if ( cond ) stmt | if ( cond ) stmt else stmt Another Derivation if ( cond ) if ( cond ) stmt else stmt 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Solving “if” Ambiguity Fix the grammar to separate if statements with else clause and if statements with no else Done in Java reference grammar Adds lots of non-terminals Use some ad-hoc rule in parser “else matches closest unpaired if” Change the language You better have permission to do this 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Resolving Ambiguity with Grammar (1) Stmt ::= MatchedStmt | UnmatchedStmt MatchedStmt ::= ... | if ( Expr ) MatchedStmt else MatchedStmt UnmatchedStmt ::= if ( Expr ) Stmt | if ( Expr ) MatchedStmt else UnmatchedStmt formal, no additional rules beyond syntax sometimes obscures original grammar 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Resolving Ambiguity with Grammar (2) If you can (re-)design the language, avoid the problem entirely Stmt ::= ... | if Expr then Stmt end | if Expr then Stmt else Stmt end formal, clear, elegant allows sequence of Stmts in then and else branches, no { , } needed extra end required for every if (But maybe this is a good idea anyway?) 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Parser Tools and Operators Most parser tools can cope with ambiguous grammars Makes life simpler if used with discipline Usually can specify operator precedence & associativity Allows simpler, ambiguous grammar with fewer nonterminals as basis for generated parser, without creating problems 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Parser Tools and Ambiguous Grammars Possible rules for resolving other problems Earlier productions in the grammar preferred to later ones Longest match used if there is a choice Parser tools normally allow for this But be sure that what the tool does is really what you want 11/13/2018 © 2002-11 Hal Perkins & UW CSE

Coming Attractions Next topic: LR parsing Continue reading ch. 3 or 4 or 3 (depending on your book) 11/13/2018 © 2002-11 Hal Perkins & UW CSE