David Evans cs302: Theory of Computation University of Virginia Computer Science Lecture 11: Parsimonious Parsing.

Slides:



Advertisements
Similar presentations
Simplifications of Context-Free Grammars
Advertisements

PDAs Accept Context-Free Languages
Parsing 4 Dr William Harrison Fall 2008
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
By John E. Hopcroft, Rajeev Motwani and Jeffrey D. Ullman
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 10 second questions
Lecture 10: Context-Free Languages Contextually David Evans
CSCI 3130: Formal Languages and Automata Theory Tutorial 5
Chapter 2-2 A Simple One-Pass Compiler
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
David Evans cs302: Theory of Computation University of Virginia Computer Science Lecture 17: ProvingUndecidability.
1 CS 3261 Computability Course Summary Zeph Grunschlag.
Data Structures ADT List
1 Symbol Tables. 2 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice.
Hash Tables.
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
Green Eggs and Ham.
VOORBLAD.
15. Oktober Oktober Oktober 2012.
Lecture 11 Context-Free Grammar. Definition A context-free grammar (CFG) G is a quadruple (V, Σ, R, S) where V: a set of non-terminal symbols Σ: a set.
Hector Miguel Chavez Western Michigan University.
1 public class Newton { public static double sqrt(double c) { double epsilon = 1E-15; if (c < 0) return Double.NaN; double t = c; while (Math.abs(t - c/t)
How to convert a left linear grammar to a right linear grammar
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Squares and Square Root WALK. Solve each problem REVIEW:
1..
© 2012 National Heart Foundation of Australia. Slide 2.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
CS 240 Computer Programming 1
25 seconds left…...
Slippery Slope
Finite-state Recognizers
Januar MDMDFSSMDMDFSSS
Week 1.
1 Let’s Recapitulate. 2 Regular Languages DFAs NFAs Regular Expressions Regular Grammars.
1 Week 9 Questions / Concerns Hand back Test#2 What’s due: Final Project due next Thursday June 5. Final Project check-off on Friday June 6 in class. Next.
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Chapter 30 Induction and Inductance In this chapter we will study the following topics: -Faraday’s law of induction -Lenz’s rule -Electric field induced.
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
David Evans CS150: Computer Science University of Virginia Computer Science Lecture 26: Proving Uncomputability Visualization.
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY
The Pumping Lemma for CFL’s
1 Programming Languages (CS 550) Mini Language Interpreter Jeremy R. Johnson.
1 Chapter Parsing Techniques. 2 Section 12.3 Parsing Techniques We know (via a theorem) that the context- free languages are exactly those languages.
Theory of Computation CS3102 – Spring 2014 A tale of computers, math, problem solving, life, love and tragic death Nathan Brunelle Department of Computer.
David Evans cs302: Theory of Computation University of Virginia Computer Science Lecture 13: Turing Machines.
1 Introduction to Computability Theory Lecture5: Context Free Languages Prof. Amos Israeli.
Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.
Cs3102: Theory of Computation Class 6: Pushdown Automata Spring 2010 University of Virginia David Evans TexPoint fonts used in EMF. Read the TexPoint manual.
Class 21: Introducing Complexity David Evans cs302: Theory of Computation University of Virginia Computer Science.
Cs3102: Theory of Computation Class 8: Non-Context-Free Languages Spring 2010 University of Virginia David Evans.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
Pumping Lemma for CFLs. Theorem 7.17: Let G be a CFG in CNF and w a string in L(G). Suppose we have a parse tree for w. If the length of the longest path.
Closed book, closed notes
Spring 2010 University of Virginia David Evans
Presentation transcript:

David Evans cs302: Theory of Computation University of Virginia Computer Science Lecture 11: Parsimonious Parsing

2 Lecture 11: Parsimonious Parsing Menu Fix proof from last class Interpretive Dance! Parsimonious Parsing (Parsimoniously) PS3 Comments Available Today PS3 will be returned Tuesday

3 Lecture 11: Parsimonious Parsing Closure Properties of CFLs If A and B are context free languages then: A R is a context-free language TRUE A * is a context-free language TRUE A is a context-free language (complement)? A  B is a context-free language TRUE A  B is a context-free language ?

4 Lecture 11: Parsimonious Parsing Complementing Non-CFLs L ww = {ww | w  Σ* } is not a CFL. Is its complement? Yes. This CFG recognizes is: S  0S0 | 1S1 | 0X1 | 1X0 X  0X0 | 1X1 | 0X1 | 1X0 | 0 | 1 | ε Bogus Proof! S  0X1  01X01  0101  L ww What is the actual language?

5 Lecture 11: Parsimonious Parsing CFG for L ww (L  ww ) S  S Odd | S Even All odd length strings are in L  ww S Odd  0R | 1R | 0 | 1 R  0S Odd | 1S Odd S Even  XY | YX X  ZXZ | 0 Y  ZYZ | 1 Z  0 | 1 How can we prove this is correct?

6 Lecture 11: Parsimonious Parsing S odd generates all odd-length strings S Odd  0R | 1R | 0 | 1 R  0S Odd | 1S Odd Proof by induction on the length of the string. Basis. S Odd generates all odd-length strings of length 1. There are two possible strings: 0 and 1. They are produces from the 3 rd and 4 th rules. Induction. Assume S Odd generates all odd-length strings of length n for n = 2k+1, k  0. Show it can generate all odd-length string of length n+2.

7 Lecture 11: Parsimonious Parsing S Odd generates all odd-length strings S Odd  0R | 1R | 0 | 1 R  0S Odd | 1S Odd Induction. Assume S Odd generates all odd-length strings of length n for n = 2k+1, k  0. Show it can generate all odd-length string of length n+2. All n+2 length strings are of the form abt where t is an n - length string and a  {0, 1}, b  {0, 1}. There is some derivation from S Odd  * t (by the induction hypothesis). We can generate all four possibilities for a and b : 00t : S Odd  0R  00S Odd  * 00t 01t : S Odd  0R  01S Odd  * 01t 10t : S Odd  1R  10S Odd  * 10t 11t : S Odd  1R  11S Odd  * 01t

8 Lecture 11: Parsimonious Parsing CFG for L ww (L  ww ) S  S Odd | S Even S Odd  0R | 1R | 0 | 1 R  0S Odd | 1S Odd S Even  XY | YX X  ZXZ | 0 Y  ZYZ | 1 Z  0 | 1 ? Proof-by-leaving-as-“Challenge Problem” (note: you cannot use this proof technique in your answers)

9 Lecture 11: Parsimonious Parsing Even Strings Show S Even generates the set of all even-length strings that are not in L ww. Proof by induction on the length of the string. Basis. S Even generates all even-length strings of length 0 that are not in L ww. The only length 0 string is ε. ε is in L ww since ε = εε, so ε should not be generated by S Even. Since S Even does not contain any right sides that go to ε, this is correct. S Even  XY | YX X  ZXZ | 0 Y  ZYZ | 1 Z  0 | 1

10 Lecture 11: Parsimonious Parsing Closure Properties of CFLs If A and B are context free languages then: A R is a context-free language TRUE A * is a context-free language TRUE A is not necessarily a context-free language (complement) A  B is a context-free language TRUE A  B is a context-free language ? Left for you to solve (possibly on Exam 1)

11 Lecture 11: Parsimonious Parsing Where is English? Regular Languages Context-Free Languages Violates Pumping Lemma For RLs Violates Pumping Lemma For CFLs Described by DFA, NFA, RegExp, RegGram Described by CFG, NDPDA 0n1n0n1n 0n1n2n0n1n2n 0n0n w A ww Deterministic CFLs

12 Lecture 11: Parsimonious Parsing English  Regular Languages The cat likes fish. The cat the dog chased likes fish. The cat the dog the rat bit chased likes fish. … This is a pumping lemma proof!

13 Lecture 11: Parsimonious Parsing Chomsky’s Answer (Syntactic Structures, 1957) = DFA = CFG

14 Lecture 11: Parsimonious Parsing Current Answer Most linguists argue that most natural languages are not context- free But, it is hard to really answer this question: e.g., “The cat the dog the rat bit chased likes fish.”  English?

15 Lecture 11: Parsimonious Parsing Where is Java? Regular Languages Context-Free Languages Violates Pumping Lemma For RLs Violates Pumping Lemma For CFLs Described by DFA, NFA, RegExp, RegGram Described by CFG, NDPDA 0n1n0n1n 0n1n2n0n1n2n 0n0n w A ww Deterministic CFLs

16 Lecture 11: Parsimonious Parsing Interpretive Dance

17 Lecture 11: Parsimonious Parsing Where is Java? Regular Languages Context-Free Languages Violates Pumping Lemma For RLs Violates Pumping Lemma For CFLs Described by DFA, NFA, RegExp, RegGram Described by CFG, NDPDA 0n1n0n1n 0n1n2n0n1n2n 0n0n w A ww Deterministic CFLs

18 Lecture 11: Parsimonious Parsing What is the Java Language? public class Test { public static void main(String [] a) { println("Hello World!"); } Test.java:3: cannot resolve symbol symbol : method println (java.lang.String) // C:\users\luser\Test.java public class Test { public static void main(String [] a) { println ("Hello Universe!"); } Test.java:1: illegal unicode escape // C:\users\luser\Test.java In the Java Language Not in the Java Language

19 Lecture 11: Parsimonious Parsing // C:\users\luser\Test.java public class Test { public static void main(String [] a) { println ("Hello Universe!"); } > javac Test.java Test.java:1: illegal unicode escape // C:\users\luser\Test.java ^ Test.java:6: 'class' or 'interface' expected } ^ Test.java:7: 'class' or 'interface' expected ^ Test.java:4: cannot resolve symbol symbol : method println (java.lang.String) location: class Test println ("Hello World"); ^ 4 errors Parsing errors Scanning error Static semantic errors

20 Lecture 11: Parsimonious Parsing Defining the Java Language { w | w can be generated by the CFG for Java in the Java Language Specification } { w | a correct Java compiler can build a parse tree for w }

21 Lecture 11: Parsimonious Parsing Parsing S  S + M | M M  M * T | T T  (S) | number * 1 S S M + MT * 1 T 2 M T 3 Derivation Parsing Programming languages are (should be) designed to make parsing easy, efficient, and unambiguous.

22 Lecture 11: Parsimonious Parsing Unambiguous S  S + S | S * S | (S) | number * 1 S S S + S * S S S S * 1 S S + 3 2

23 Lecture 11: Parsimonious Parsing Ambiguity How can one determine if a CFG is ambiguous? Super-duper-challenge problem: create a program that solve the “is this CFG ambiguous” problem: Input: CFG Output: “Yes” (ambiguous)/“No” (unambiguous) Warning: Undecidable Problem Alert! (Not only can you not do this, it is impossible for any program to do this.) (We will cover undecidable problems after Spring Break)

24 Lecture 11: Parsimonious Parsing Parsing S  S + M | M M  M * T | T T  (S) | number * 1 S S M + MT * 1 T 2 M T 3 Derivation Parsing Programming languages are (should be) designed to make parsing easy, efficient, and unambiguous.

25 Lecture 11: Parsimonious Parsing “Easy” and “Efficient” “Easy” - we can automate the process of building a parser from a description of a grammar “Efficient” – the resulting parser can build a parse tree quickly (linear time in the length of the input)

26 Lecture 11: Parsimonious Parsing Recursive Descent Parsing S  S + M | M M  M * T | T T  (S) | number Parse() { S(); } S() { try { S(); expect(“+”); M(); } catch { backup(); } try { M(); } catch {backup(); } error(); } M() { try { M(); expect(“*”); T(); } catch … try { T(); } catch { backup(); } error (); } T() { try { expect(“(“); S(); expect(“)”); } catch …; try { number(); } catch …; } Advantages: Easy to produce and understand Can be done for any CFG Problems: Inefficient (might not even finish) “Nondeterministic”

27 Lecture 11: Parsimonious Parsing LL(k) (Lookahead-Left) A CFG is an LL(k) grammar if it can be parser deterministically with  tokens lookahead S  S + M | M M  M * T | T T  (S) | number 1 + S  S + M S  M S  S + M 2 LL(1) grammar

28 Lecture 11: Parsimonious Parsing Look-ahead Parser Parse() { S(); } S() { if (lookahead(1, “+”)) { S(); eat(“+”); M(); } else { M();} M() { if (lookahead(1, “*”)) { M(); eat(“*”); T(); } else { T(); } } T() { if (lookahead(0, “(“)) { eat(“(“); S(); eat(“)”); } else { number();} S  S + M | M M  M * T | T T  (S) | number

29 Lecture 11: Parsimonious Parsing JavaCC Input: Grammar specification Output: A Java program that is a recursive descent parser for the specified grammar Doesn’t work for all CFGs: only for LL(k) grammars

30 Lecture 11: Parsimonious Parsing Language Classes Regular Languages Context-Free Languages Violates Pumping Lemma For RLs Violates Pumping Lemma For CFLs Described by DFA, NFA, RegExp, RegGram Described by CFG, NDPDA 0n1n0n1n 0n1n2n0n1n2n 0n0n w ww Deterministic CFLs LL(k) Languages Described by LL(k) Grammar Java Python Scheme

31 Lecture 11: Parsimonious Parsing Next Week Monday (2): Office Hours (Qi Mi in 226D) Monday (5:30): TA help session Tuesday’s class (Pieter Hooimeyer): starting to get outside the yellow circle: using grammars to solve security problems Wednesday (9:30am): Office Hours (Qi Mi in 226D) Wednesday (6pm): TAs’ Exam Review Thursday: exam in class