Bernd Fischer RW713: Compiler and Software Language Engineering.

Slides:



Advertisements
Similar presentations
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Advertisements

Context-Free Grammars Lecture 7
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter 2.2 (Partial) Hashlama 11:00-14:00.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
COP4020 Programming Languages
CPSC Compiler Tutorial 3 Parser. Parsing The syntax of most programming languages can be specified by a Context-free Grammar (CGF) Parsing: Given.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.
EECS 6083 Intro to Parsing Context Free Grammars
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
8/19/2015© Hal Perkins & UW CSEC-1 CSE P 501 – Compilers Parsing & Context-Free Grammars Hal Perkins Winter 2008.
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 5 Mälardalen University 2005.
Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.
Syntax and Semantics Structure of programming languages.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Introduction to Parsing Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Context-Free Grammars
Grammars CPSC 5135.
PART I: overview material
C H A P T E R TWO Syntax and Semantic.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Syntax and Semantics Structure of programming languages.
COP4020 Programming Languages Syntax Prof. Robert van Engelen (modified by Prof. Em. Chris Lacher)
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
CFG1 CSC 4181Compiler Construction Context-Free Grammars Using grammars in parsers.
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Introduction to Compiling
LESSON 04.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Syntax Analyzer (Parser)
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
1 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.
C H A P T E R T W O Syntax and Semantic. 2 Introduction Who must use language definitions? Other language designers Implementors Programmers (the users.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Bernd Fischer RW713: Compiler and Software Language Engineering.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
COMP 3438 – Part II - Lecture 4 Syntax Analysis I Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Syntax(1). 2 Syntax  The syntax of a programming language is a precise description of all its grammatically correct programs.  Levels of syntax Lexical.
Syntax and Semantics Structure of programming languages.
Introduction to Parsing
Chapter 3 – Describing Syntax
Programming Languages Translator
CS510 Compiler Lecture 4.
Chapter 3 Context-Free Grammar and Parsing
Introduction to Parsing (adapted from CS 164 at Berkeley)
Compiler Design 4. Language Grammars
Lecture 7: Introduction to Parsing (Syntax Analysis)
CSC 4181Compiler Construction Context-Free Grammars
R.Rajkumar Asst.Professor CSE
CSC 4181 Compiler Construction Context-Free Grammars
BNF 9-Apr-19.
COMPILER CONSTRUCTION
Presentation transcript:

Bernd Fischer RW713: Compiler and Software Language Engineering

Parsing

Prelude: Reading text… thisissometextwithoutspacesandpunctuationmarkswhichist hereforequitedifficulttoreadbyhumanslexicalanalysiswillbre akthistextupintowordswhiletheparsingphasewillextractthegr ammaticalstructureofthetext this is some text without spaces and punctuation marks which is therefore quite difficult to read by humans lexical analysis will break this text up into words while the parsing phase will extract the grammatical structure of the text This is some text without spaces and punctuation-marks which is therefore quite difficult to read by humans. Lexical-analysis will break this text up into words while the parsing-phase will extract the grammatical-structure of the text.

Syntax analysis determines the structure behind the token stream. Machine Language Source Program Compiler Front end Back end analysis synthesis lexical syntax contextual intermediate code object code text tokens (abstract) syntax tree

Syntax analysis determines the structure behind the token stream. Machine Language Source Program Compiler Front end Back end analysis lexical syntax contextual IF ID(n) ROP(==) NUM(0) THEN RETURN NUM(0) ELSE RETURN ID(n) AOP(*) ID(f) LPAR ID(n) AOP(-) NUM(1) RPAR Syntax analysis… recovers implied structure: converts flat token stream into (abstract) syntax tree discards redundant tokens (keywords,…) drives lexical analysis and sometimes all phases (syntax-directed translation) expr==idnum return num return if test

Regular expressions are not expressive enough for parsing. Consider arithmetic expressions over natural numbers using + and *: L = { 0, 1, 2, 3,…, 0+0, 0+1, 0+2,…,1+0,1+1,1+2,…, 0*0, 0*1, 0*2,…, 0+0+0, 0+0+1,…, 0+0*0, 0+0*1, 0+0*2,…., 0*0+0, 0*0+1,…, 1*0+0, 1*0+1,…. } Pop-Quiz: Write a regular expression for L! Nat = 0 | [1-9][0-9]* Ex = Nat (( + | * ) Nat)* L(Ex) = L … but scanning does not recover operator precedences.

Regular expressions are not expressive enough for parsing. Consider arithmetic expressions over natural numbers using +, *, (, and ): L’ = { 0, 1, 2, 3,…, (0), (1), (2), … 0+0, 0+1, 0+2,…,1+0,1+1,1+2,…, 0*0, 0*1, 0*2,…, 0+0+0, 0+0+1,…, 0+0*0, 0+0*1, 0+0*2,…., 0*0+0, 0*0+1,…, 1*0+0, 1*0+1,…., (0+0), (0+1), (0+2),…,(1+0), (1+1), (1+2),…, (0*0), (0*1), (0*2),…, (0+0)+0, (0+0)+1,…, } Pop-Quiz: Write a regular expression for L’!

Regular expressions are not expressive enough for parsing. Consider arithmetic expressions over natural numbers using +, *, (, and ): Pop-Quiz: Write a regular expression for L’! Nat = 0 | [1-9][0-9]* Ex 1 = Nat | Nat + Nat | Nat * Nat Ex 2 = Ex 1 | (Ex 1 ) | Ex 1 + Ex 1 | Ex 1 * Ex 1 Ex 3 = Ex 2 | (Ex 2 ) | Ex 2 + Ex 2 | Ex 2 * Ex 2 Ex 4 = … No finite regular expression accepts L’!

Regular expressions are not expressive enough for parsing. Describing {“()”, “(())”, “((()))”, …} using a regular expression is impossible! –infinitely large expression! Similarly, a finite automaton cannot recognise it –DFA states have no memory ⇒ need more expressive specification formalism: context-free grammars  (  (  )   ) ) (  ... ) ) 

Reminder: Context-Free Grammars Definition: A context-free grammar G = (N, T, P, S) (CFG) consists of a set N of non-terminal symbols, a set T of terminal symbols, N ⋂ T = ∅, a set of productions P ⊆ N x (N ∪ T)*, a start symbol S ∈ N.

Reminder: Context-Free Grammars Notation: (simplified from the Dragon Book, 4.2.2) A, B, C,...denote non-terminals a, b, c,... denote terminals X, Y, Zdenote grammar symbols (i.e., X ∈ N ∪ T) u, v,...,z denote strings of terminals (i.e., x ∈ T*) α, β, γ,...denote strings of grammar symbols (i.e., α ∈ (N ∪ T)*) A → α denotes a production rule A → α | β | γ |... denotes A → α, A → β, A → γ,...

Context-Free Grammars: Pop-Quiz Write a context-free grammar for L’! L’ = { 0, 1, 2, 3,…, (0), (1), (2), … 0+0, 0+1, 0+2,…,1+0,1+1,1+2,…, 0*0, 0*1, 0*2,…, 0+0+0, 0+0+1,…, 0+0*0, 0+0*1, 0+0*2,…., 0*0+0, 0*0+1,…, 1*0+0, 1*0+1,…., (0+0), (0+1), (0+2),…,(1+0), (1+1), (1+2),…, (0*0), (0*1), (0*2),…, (0+0)+0, (0+0)+1,…, } P = {Ex → NAT | (Ex) | Ex + Ex | Ex * Ex} N = {Ex}, S = Ex, T = {NAT, (, ), +, *}

Context-Free Grammars: Pop-Quiz Write a “scanner-less” context-free grammar for L’! P= {NonZero → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Digit → 0 | NonZero Any → ε | Digit Any Nat → NonZero Any Ex → Nat | (Ex) | Ex + Ex | Ex * Ex} N= {Ex, Nat, NonZero, Any, Digit}, S = Ex, T= {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, (, ), +, *}

Context-Free Languages How do we construct the language described by... regular expressions: recursively build from languages of sub-expressions (syntax-directed) context-free grammars: derive individual words by recursively applying productions –start with ω = S –pick an occurrence of a non-terminal A in ω –pick a production A → α in P –replace A by α in ω –repeat until ω ∈ T*

Context-Free Derivations Definition: Let G = (N, T, P, S) be a context-free grammar. ψ is directly derivable in G from φ (written as φ ⇒ ψ) if there are α, σ, τ with φ = σAτ, A → α ∈ P, and φ = σατ. ψ is derivable in G from φ (written as φ ⇒ * ψ) if there are φ 0, φ 1,..., φ n with φ = φ 0, ψ = φ n and φ i ⇒ φ i+1, 0 ≤ i < n. φ 0,..., φ n is called a derivation of ψ from φ. Note: ⇒ * is the reflexive-transitive closure of ⇒.

Context-Free Languages Definition: Let G = (N, T, P, S) be a context-free grammar. The language generated by G is defined as L(G) = {x ∈ T* | S ⇒ * x}. x ∈ L(G) is also called a sentence. φ ∈ (N ∪ T)* is a sentential form if S ⇒ * φ.

Derivations can be represented by derivation trees. Definition: Let G = (N, T, P, S) be a context-free grammar. A tree is a derivation tree for G if: every node is labelled with a symbol of N ∪ T; the root is labelled with S; if a node n is labelled with A and has at least one descendant, then A ∈ N; if nodes n 1,..., n k with labels X 1,..., X k are direct descendants of n, then A → X 1,..., X k ∈ P. Note: Derivation trees are also called parse trees or (concrete) syntax trees.

Derivations can be represented by derivation trees: Pop-Quiz. Consider the grammar P ={Ex → NAT | (Ex) | Ex + Ex | Ex * Ex} N ={Ex}, S = Ex, T = {NAT, (, ), +, *} and construct a derivation tree for the derivation Ex ⇒ Ex * Ex ⇒ Ex + Ex * Ex ⇒ Ex + Ex * NAT (3) ⇒ ( Ex ) + Ex * NAT (3) ⇒ ( NAT (1) ) + Ex * NAT (3) ⇒ ( NAT (1) ) + NAT (2) * NAT (3)

Derivations can be represented by derivation trees: Pop-Quiz. Consider the grammar P ={Ex → NAT | (Ex) | Ex + Ex | Ex * Ex} N ={Ex}, S = Ex, T = {NAT, (, ), +, *} and construct a derivation tree for the derivation Ex ⇒ Ex * Ex ⇒ Ex + Ex * Ex ⇒ Ex + Ex * NAT (3) ⇒ ( Ex ) + Ex * NAT (3) ⇒ ( NAT (1) ) + Ex * NAT (3) ⇒ ( NAT (1) ) + NAT (2) * NAT (3)

Ambiguity Each derivation corresponds to one derivation tree,... but the same tree can be derived in different ways. Ex ⇒ Ex * Ex ⇒ Ex + Ex * Ex ⇒ (Ex) + Ex * Ex ⇒ (NAT (1) ) + Ex * Ex ⇒ (NAT (1) ) + NAT (2) * Ex ⇒ (NAT (1) ) + NAT (2) * NAT (3) leftmost derivation Ex ⇒ Ex * Ex ⇒ Ex * NAT (3) ⇒ Ex + Ex * NAT (3) ⇒ Ex + NAT (2) * NAT (3) ⇒ (Ex) + NAT (2) * NAT (3) ⇒ (NAT (1) ) + NAT (2) * NAT (3) rightmost derivation P = {Ex → NAT | (Ex) | Ex + Ex | Ex * Ex} N = {Ex}, S = Ex, T = {NAT, (, ), +, *}

Left-/rightmost derivations Definition: Let G = (N, T, P, S) be a context-free grammar. φ ⇒ l ψ is a leftmost derivation step if there are u, σ, τ with φ = uAτ, A → α ∈ P, and φ = uατ. φ ⇒ r ψ is a rightmost derivation step if there are σ, u, τ with φ = σAu, A → α ∈ P, and φ = σαu. φ 0,..., φ n is called a left-/rightmost derivation if every step is a left-/rightmost derivation step. Note: For each syntax tree, there exists exactly one leftmost derivation and exactly one rightmost derivation.

Ambiguity, again ⇒ syntax trees must be different! Ex ⇒ l Ex * Ex ⇒ l Ex + Ex * Ex ⇒ l (Ex) + Ex * Ex ⇒ l (NAT (1) ) + Ex * Ex ⇒ l (NAT (1) ) + NAT (2) * Ex ⇒ l (NAT (1) ) + NAT (2) * NAT (3) leftmost derivation Ex ⇒ l Ex + Ex ⇒ l (Ex) + Ex ⇒ l (NAT (1) ) + Ex ⇒ l (NAT (1) ) + Ex * Ex ⇒ l (NAT (1) ) + NAT (2) * Ex ⇒ l (NAT (1) ) + NAT (2) * NAT (3) another leftmost derivation ?? P = {Ex → NAT | (Ex) | Ex + Ex | Ex * Ex} N = {Ex}, S = Ex, T = {NAT, (, ), +, *}

Ambiguity of sentences, grammars, and languages Definition: A sentence is called unambiguous if it has exactly one syntax tree; it is called ambiguous otherwise. Definition: A grammar G is called ambiguous if L(G) contains an ambiguous sentence. Definition: A language for which every grammar is ambiguous is called inherently ambiguous. Note: There are inherently ambiguous context-free languages (Parikh’s Theorem, 1961). Programming language grammars should be unambiguous because semantics and translation are defined over the structure of the syntax trees.

Ambiguity, again: “dangling else” Consider the following C statement: if(a) if(b) c=1; else c=2; Which if owns the else ? Pop-Quiz: Draw the two syntax trees! Pop-Quiz: How would you implement the C way?

Resolving ambiguity: “dangling else” Consider the following original production: stmt →if (expr) stmt; | if (expr) stmt else stmt; | other Solution: introduce two new stmt-like non-terminals for the two different contexts. stmt →stmt u | stmt b stmt u →if (expr) stmt; | if (expr) stmt b else stmt u ; stmt b →if (expr) stmt b else stmt b ; | other Cannot be an unbalanced if (without matching else)

Resolving ambiguity: priorities Consider the following grammar: S →E E → E + E | E - E | E * E | (E) | id Solution: introduce a new non-terminal for each precedence level. S→ES→E E→T + E | T - E | T T→F * T | F F→(E) | id does not respect usual operator priorities (all operators at same precedence level) increasing precedence levels

Resolving ambiguity: associativity Consider the following grammar: S→ES→E E→T + E | T - E | T T→F * T | F F→(E) | id Solution: use left (right) recursion for left- (right-) associative operators. S→ES→E E→E + T | E - T | T … Note: not compatible with LL parsers (use %left directive) does not respect usual operator associtivity (interprets a - b - c as a - (b - c))

(Extended) Backus-Naur Form Backus-Naur form is an ASCII notation for grammars: ::= (“+” | “-”) | Many tools use an extended grammar formalism: EBNF ::= [“-”] ::= {“,” } BNF + regular operators ::= (“,” )*

Parsing is the search for derivations. There three classes of parsing algorithms that search... exhaustively –Early, CYK: bottom-up and memoization –GLR / GLL: split stacks on ambiguity for leftmost derivations –derive input from start symbol (top-down) –predictive parsing, LL for rightmost derivations –reduce input to start symbol (bottom-up) –shift-reduce parsing, LR