CMSC 330: Organization of Programming Languages

Slides:



Advertisements
Similar presentations
Lecture # 8 Chapter # 4: Syntax Analysis. Practice Context Free Grammars a) CFG generating alternating sequence of 0’s and 1’s b) CFG in which no consecutive.
Advertisements

1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
CS5371 Theory of Computation
Prof. Bodik CS 164 Lecture 81 Grammars and ambiguity CS164 3:30-5:00 TT 10 Evans.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
CS 536 Spring Ambiguity Lecture 8. CS 536 Spring Announcement Reading Assignment –“Context-Free Grammars” (Sections 4.1, 4.2) Programming.
Chapter 3: Formal Translation Models
Specifying Languages CS 480/680 – Comparative Languages.
Lecture 9UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 9.
COP4020 Programming Languages
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
Problem of the DAY Create a regular context-free grammar that generates L= {w  {a,b}* : the number of a’s in w is not divisible by 3} Hint: start by designing.
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
1 Section 3.3 Grammars A grammar is a finite set of rules, called productions, that are used to describe the strings of a language. Notational Example.
1 Chapter Construction Techniques. 2 Section 3.3 Grammars A grammar is a finite set of rules, called productions, that are used to describe the.
Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]
Syntax Analysis The recognition problem: given a grammar G and a string w, is w  L(G)? The parsing problem: if G is a grammar and w  L(G), how can w.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
1 Context-Free Languages Not all languages are regular. L 1 = {a n b n | n  0} is not regular. L 2 = {(), (()), ((())),...} is not regular.  some properties.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
Context-Free Grammars
Lecture # 19. Example Consider the following CFG ∑ = {a, b} Consider the following CFG ∑ = {a, b} 1. S  aSa | bSb | a | b | Λ The above CFG generates.
Chapter 5 Context-Free Grammars
Grammars CPSC 5135.
PART I: overview material
Languages & Grammars. Grammars  A set of rules which govern the structure of a language Fritz Fritz The dog The dog ate ate left left.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
Context Free Grammars. Context Free Languages (CFL) The pumping lemma showed there are languages that are not regular –There are many classes “larger”
CMSC 330: Organization of Programming Languages Context-Free Grammars.
Syntax Context-Free Grammars, Syntax Trees. CFG for Arithmetic Expressions E::= E op E| (E) | num op ::= + | * num ::= 0 | 1 | 2 | … Backus-Naur Form.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
CFG1 CSC 4181Compiler Construction Context-Free Grammars Using grammars in parsers.
Context Free Grammars.
CMSC 330: Organization of Programming Languages
1 Section 12.3 Context-Free Parsing We know (via a theorem) that the context-free languages are exactly those languages that are accepted by PDAs. When.
CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
Re-enter Chomsky More about grammars. 2 Parse trees S  A B A  aA | a B  bB | b Consider L = { a m b n | m, n > 0 } (one/more a ’s followed by one/more.
Grammars Hopcroft, Motawi, Ullman, Chap 5. Grammars Describes underlying rules (syntax) of programming languages Compilers (parsers) are based on such.
Grammars CS 130: Theory of Computation HMU textbook, Chap 5.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Module 29 Parse/Derivation Trees Ambiguous Grammars
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006.
Top-Down Parsing.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Transparency No. 1 Formal Language and Automata Theory Homework 5.
Lecture # 10 Grammar Problems. Problems with grammar Ambiguity Left Recursion Left Factoring Removal of Useless Symbols These can create problems for.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 will be out this evening Due Monday, 2/8 Submit in HW Server AND at start of class on 2/8 A review.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Theory of Computation Automata Theory Dr. Ayman Srour.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
5. Context-Free Grammars and Languages
CONTEXT-FREE LANGUAGES
Even-Even Devise a grammar that generates strings with even number of a’s and even number of b’s.
Compiler Construction
Context-Free Grammars
5. Context-Free Grammars and Languages
Theory of Computation Lecture #
Context-Free Grammars
Presentation transcript:

CMSC 330: Organization of Programming Languages Context-Free Grammars Ambiguity

Review Why should we study CFGs? What are the four parts of a CFG? How do we tell if a string is accepted by a CFG? What’s a parse tree? CMSC 330

Review A sentential form is a string of terminals and non- terminals produced from the start symbol Inductively: The start symbol If αAδ is a sentential form for a grammar, where (α and δ ∊ (V|Σ)*), and A → γ is a production, then αγδ is a sentential form for the grammar In this case, we say that αAδ derives αγδ in one step, which is written as αAδ ⇒ αγδ CMSC 330

Leftmost and Rightmost Derivation Example: S → a | SbS String: aba Leftmost Derivation Rightmost Derivation S ⇒ SbS ⇒ abS ⇒ aba S ⇒ SbS ⇒ Sba ⇒ aba At every step, apply production At every step, apply production to leftmost non-terminal to rightmost non-terminal Both derivations happen to have the same parse tree A parse tree has a unique leftmost and a unique rightmost derivation Not every string has a unique parse tree Parse trees don’t show the order productions are applied CMSC 330

More on Leftmost/Rightmost Derivations Is the following derivation leftmost or rightmost? S ⇒ aS ⇒ aT ⇒ aU ⇒ acU ⇒ ac There’s at most one non-terminal in each sentential form, so there's no choice between left or right non- terminals to expand How about the following derivation? S ⇒ SbS ⇒ SbSbS ⇒ SbabS ⇒ ababS ⇒ ababa CMSC 330

Multiple Leftmost Derivations S → a | SbS Can we find more than one leftmost derivation? A leftmost derivation S ⇒ SbS ⇒ abS ⇒ abSbS ⇒ ababS ⇒ ababa Another leftmost derivation S ⇒ SbS ⇒ SbSbS ⇒ abSbS ⇒ ababS ⇒ ababa CMSC 330

Ambiguity A string is ambiguous for a grammar if it has more than one parse tree Equivalent to more than one leftmost (or more than one rightmost) derivation A grammar is ambiguous if it generates an ambiguous string It can be hard to see this with manual inspection Exercise: can you create an unambiguous grammar for S → a | SbS ? CMSC 330

Are these Grammars Ambiguous? (1) S → aS | T T → bT | U U → cU | ε (2) S → T | T T → Tx | Tx | x | x (3) S → SS | () | (S) CMSC 330

Ambiguity of Grammar (Example 3) 2 different parse trees for the same string: ()()() 2 distinct leftmost derivations : S  SS  SSS ()SS ()()S ()()() S  SS  ()S ()SS ()()S ()()() We need unambiguous grammars to manage programming language semantics CMSC 330

Tips for Designing Grammars Closures: use recursive productions to generate an arbitrary number of symbols A → xA | ε Zero or more x’s A → yA | y One or more y’s CMSC 330

Tips for Designing Grammars Concatenation: use separate non-terminals to generate disjoint parts of a language, and then combine in a production G = S → AB A → aA | ε B → bB | ε L(G) = a*b* CMSC 330

Tips for Designing Grammars (cont’d) Matching constructs: write productions which generate strings from the middle {anbn | n ≥ 0} (not a regular language!) S → aSb | ε Example: S ⇒ aSb ⇒ aaSbb ⇒ aabb {anb2n | n ≥ 0} S → aSbb | ε CMSC 330

Tips for Designing Grammars (cont’d) {anbm | m ≥ 2n, n ≥ 0} S → aSbb | B | ε B → bB | b The following grammar also works: S → aSbb | B B → bB | ε How about the following? S → aSbb | bS | ε CMSC 330

Tips for Designing Grammars (cont’d) {anbman+m | n ≥ 0, m ≥ 0} Rewrite as anbmaman, which now has matching superscripts (two pairs) Would this grammar work? S → aSa | B B → bBa | ba Corrected: B → bBa | ε Doesn’t allow m = 0 The outer anan are generated first, then the inner bmam CMSC 330

Tips for Designing Grammars (cont’d) Union: use separate nonterminals for each part of the union and then combine { an(bm|cm) | m > n ≥ 0} Can be rewritten as { anbm | m > n ≥ 0} ∪ { ancm | m > n ≥ 0} CMSC 330

Tips for Designing Grammars (cont’d) { anbm | m > n ≥ 0} ∪ { ancm | m > n ≥ 0} S → T | U T → aTb | Tb | b T generates the first set U → aUc | Uc | c U generates the second set What’s the parse tree for string abbb? Ambiguous! CMSC 330

Tips for Designing Grammars (cont’d) { anbm | m > n ≥ 0} ∪ { ancm | m > n ≥ 0} Will this fix the ambiguity? S → T | U T → aTb | bT | b U → aUc | cU | c It's not ambiguous, but it can generate invalid strings such as babb CMSC 330

Tips for Designing Grammars (cont’d) { anbm | m > n ≥ 0} ∪ { ancm | m > n ≥ 0} Unambiguous version S → T | V T → aTb | U U → Ub | b V → aVc | W W → Wc | c CMSC 330

CFGs for Languages Recall that our goal is to describe programming languages with CFGs We had the following example which describes limited arithmetic expressions E → a | b | c | E+E | E-E | E*E | (E) What’s wrong with using this grammar? It’s ambiguous! CMSC 330

Example: a-b-c Corresponds to a-(b-c) Corresponds to (a-b)-c E ⇒ E-E ⇒ a-E ⇒ a-E-E ⇒ a-b-E ⇒ a-b-c E ⇒ E-E ⇒ E-E-E ⇒ a-E-E ⇒ a-b-E ⇒ a-b-c Corresponds to a-(b-c) Corresponds to (a-b)-c CMSC 330

The Issue: Associativity Ambiguity is bad here because if the compiler needs to generate code for this expression, it doesn’t know what the programmer intended So what do we mean when we write a-b-c? In mathematics, this only has one possible meaning It’s (a-b)-c, since subtraction is left-associative a-(b-c) would be the meaning if subtraction was right- associative CMSC 330

Another Example: If-Then-Else <stmt> ::= <assignment> | <if-stmt> | ... <if-stmt> ::= if (<expr>) <stmt> | if (<expr>) <stmt> else <stmt> (Here <>’s are used to denote nonterminals and ::= for productions) Consider the following program fragment: if (x > y) if (x < z) a = 1; else a = 2; Note: Ignore newlines CMSC 330

Parse Tree #1 Else belongs to inner if CMSC 330

Parse Tree #2 Else belongs to outer if CMSC 330

Fixing the Expression Grammar Idea: Require that the right operand of all of the operators is not a bare expression E → E+T | E-T | E*T | T T → a | b | c | (E) Now there's only one parse tree for a-b-c Exercise: Give a derivation for the string a-(b-c) CMSC 330

What if We Wanted Right-Associativity? Left-recursive productions are used for left- associative operators Right-recursive productions are used for right- associative operators Left: E → E+T | E-T | E*T | T T → a | b | c | (E) Right: E → T+E | T-E | T*E | T CMSC 330

Parse Tree Shape The kind of recursion/associativity determines the shape of the parse tree Exercise: draw a parse tree for a-b-c in the prior grammar in which subtraction is right-associative left recursion right recursion CMSC 330

A Different Problem How about the string a+b*c ? Doesn’t have correct E → E+T | E-T | E*T | T T → a | b | c | (E) Doesn’t have correct precedence for * When a nonterminal has productions for several operators, they effectively have the same precedence How can we fix this? CMSC 330

Final Expression Grammar E → E+T | E-T | T lowest precedence operators T → T*P | P higher precedence P → a | b | c | (E) highest precedence (parentheses) Each non-terminal represents a level of precedence At each level, operations are left-associative CMSC 330

Final Expression Grammar E → E+T | E-T | T T → T*P | P P → a | b | c | (E) Exercises: Construct tree and left and and right derivations for a+b*c a*(b+c) a*b+c a-b-c See what happens if you change the first set of productions to E → E +T | E-T | T | P See what happens if you change the last set of productions to P → a | b | c | E | (E) CMSC 330