Automating Grammar Comparison by Ravichandhran Madhavan, EPFL Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL.

Slides:



Advertisements
Similar presentations
Context-Free and Noncontext-Free Languages
Advertisements

Automated Grading of DFA Constructions Rajeev Alur (Penn), Loris D’Antoni (Penn), Sumit Gulwani (MSR), Bjoern Hartmann (Berkeley), Dileep Kini (UIUC),
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
101 The Cocke-Kasami-Younger Algorithm An example of bottom-up parsing, for CFG in Chomsky normal form G :S  AB | BB A  CC | AB | a B  BB | CA | b C.
About Grammars CS 130 Theory of Computation HMU Textbook: Sec 7.1, 6.3, 5.4.
CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language.
CSCI 2670 Introduction to Theory of Computing September 15, 2004.
1 Polynomial Time Probabilistic Learning of a Subclass of Linear Languages with Queries Yasuhiro TAJIMA, Yoshiyuki KOTANI Tokyo Univ. of Agri. & Tech.
Introduction to Computability Theory
CS5371 Theory of Computation
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Top-Down Parsing.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
Context-Free Grammars Lecture 7
1 Simplifications of Context-Free Grammars. 2 A Substitution Rule Substitute Equivalent grammar.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Normal forms for Context-Free Grammars
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
January 15, 2014CS21 Lecture 61 CS21 Decidability and Tractability Lecture 6 January 16, 2015.
Cs466(Prasad)L8Norm1 Normal Forms Chomsky Normal Form Griebach Normal Form.
Fall 2006Costas Busch - RPI1 PDAs Accept Context-Free Languages.
CS 3813 Introduction to Formal Languages and Automata Chapter 6 Simplification of Context-free Grammars and Normal Forms These class notes are based on.
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.
EECS 6083 Intro to Parsing Context Free Grammars
8/19/2015© Hal Perkins & UW CSEC-1 CSE P 501 – Compilers Parsing & Context-Free Grammars Hal Perkins Winter 2008.
Grammars and Parsing. Sentence  Noun Verb Noun Noun  boys Noun  girls Noun  dogs Verb  like Verb  see Grammars Grammar: set of rules for generating.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Lecture 16 Oct 18 Context-Free Languages (CFL) - basic definitions Examples.
Compiler1 Chapter V: Compiler Overview: r To study the design and operation of compiler for high-level programming languages. r Contents m Basic compiler.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
CONVERTING TO CHOMSKY NORMAL FORM
CISC 471 First Exam Review Game Questions. Overview 1 Draw the standard phases of a compiler for compiling a high level language to machine code, showing.
Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]
Testing Grammars For Top Down Parsers By Asma M Paracha, Frantisek F. Franek Dept. of Computing & Software McMaster University Hamilton, Ont.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Lexical and Syntax Analysis
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
Chapter 6 Simplification of Context-free Grammars and Normal Forms These class notes are based on material from our textbook, An Introduction to Formal.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Automating Grammar Comparison Ravichandhran Madhavan, EPFL Joint work with Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Context-Free and Noncontext-Free Languages Chapter 13.
Syntax and Semantics Form and Meaning of Programming Languages Copyright © by Curt Hill.
Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006.
Top-Down Parsing.
Syntax Analyzer (Parser)
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
CSCI 2670 Introduction to Theory of Computing October 13, 2005.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
Learning regular tree languages from correction and equivalence queries Cătălin Ionuţ Tîrnăucă Research Group on Mathematical Linguistics, Rovira i Virgili.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Exercises on Chomsky Normal Form and CYK parsing
Mid-Terms Exam Scope and Introduction. Format Grades: 100 points -> 20% in the final grade Multiple Choice Questions –8 questions, 7 points each Short.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
About Grammars Hopcroft, Motawi, Ullman, Chap 7.1, 6.3, 5.4.
Costas Busch - LSU1 PDAs Accept Context-Free Languages.
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.
Context-Free Languages & Grammars (CFLs & CFGs) (part 2)
Context-free grammars, derivation trees, and ambiguity
Simplifications of Context-Free Grammars
4 (c) parsing.
Decidability continued….
Context-Free Languages
Presentation transcript:

Automating Grammar Comparison by Ravichandhran Madhavan, EPFL Mikael Mayër, EPFL Sumit Gulwani, MSR Viktor Kuncak, EPFL

Overview of the Talk Proving equivalence of two CFGs Generation of words belonging to a CFG Finding counter-examples for equivalence

Applications Online grammar tutoring system : grammar.epfl.ch grammar.epfl.ch Compatibility of programming language grammars

Motivation & Contributions

Example Well formed function types over Int Int Int => Int Int, Int => Int => Int... type → args => type | Int args → Int, args | Int

LL(1) Grammar for Function Types type → Int rest rest → => type |, Int args | ε args →, Int args | => type ●Equivalence is not obvious ●Hard to verify manually ●But, this is exactly what we do while grading

An Incorrect Solution type → args => type | Int args → Int, args | Int Smallest counter-example ? Int, Int, Int => Int T → A => T | Int A → Int, T | Int Buggy

Comparing PL Grammars Grammars are often rewritten for efficient parsing Need to catch errors introduced during rewriting Identify sources of imprecision

Imprecisions Example enum ID implements Char {... } accepted by ANTLR v4 Java 7 Grammar private private public public class ID {... } accepted by ANTLR v4 and Oracle JLS 7 Grammars

Summary of Results

Comparing PL Grammars ●Evaluated the tool on 5 pairs of CFGs of 5 languages ●Our tool found hundreds of discrepancies in < 1 min ●More than 40% of words sampled uniformly at random are counter-examples

Finding Regression Errors ●Automatically injected 3 kinds of errors into each grammar ●Generated a total of 300 benchmarks ●Our tool detected 87.3% of the errors ○ 3 times more errors than available state of the art and 10 times faster

Grammar Tutoring System ●Analyzed a total of 1395 unique equivalence queries ●Provided a verdict on 95% of queries ○Disproved 1042 (74.6%) by finding counter-examples ○Proved 289 (20.7%)

Proving Equivalence

Background Undecidable for CFGs but decidable for proper subclasses ○Deterministic CFGs [G. Snizergues, ‘01] ○LL grammars [Korenjak & Hopcroft, ‘66], [Olshansky & Pnueli, ‘77], [Rosenkrantz & Stearns, ‘69] ○Sub-deterministic CFGs [Valiant, 1970s] [Harrison, Havel & Yehudai, ‘79]

Generating Subgoals S → a T T → a T b | b P → a R R → a b b | a R b | b Ambiguous S ≣ P T ≣ R S => a T and P => a R Branch Rule Hypothesize this holds for all strings with length < k suffices to hold for strings of length < k

Equivalence To Inclusion S → a T T → a T b | b P → a R R → a b b | a R b | b S ≣ P T ≣ R S => a T and P => a R T b ≣ (bb U Rb) ≣ T b ⊆ (bb U Rb) (bb U Rb) ⊆ T b

Exploiting Test cases S → a T T → a T b | b P → a R R → a b b | a R b | b S ≣ P T ≣ R ≣ (bb U Rb) ⊆ T b T b ⊆ R b Use examples to simplify

Preventing Divergence S → a T T → a T b | b P → a R R → a b b | a R b | b T b ⊆ R b T ⊆ R b ⊆ b Derive shortest word of T from R b R b =>* b b R is used in the derivation Compare the parts that were used by the derivation and those that weren’t Split Rule

Using Inductive Hypotheses S → a T T → a T b | b P → a R R → a b b | a R b | b T ⊆ R b ⊆ b Holds by hypothesis S ≣ P T ≣ R ≣ (bb U Rb) ⊆ T b T b ⊆ Rb

Discovering Counter- Examples

Enumerating Words / Parse Trees Enumerators are bijective functions from natural numbers to parse trees We define the function explicitly ●Allows random access ●Sequential enumeration ●Randomly sampling ●Incremental O(|i| 2 |w||maxRHS|)

●Sample words from one grammar and check if they can be parsed by the other ●We use two parsers: ○CYK with optimizations for batch parsing ○Antlr v4 Parser Discovering Counter-examples for Equivalence

Results

Benchmark PL Grammars ●ANSI C 2011 grammars ●Javascript (Mozilla Spec Vs Antlr) ●Pascal ●Java 7 (Oracle Spec Vs Antlr) ●VHDL ●Avg. size: nonterminals & productions

Comparing Grammars for the Same Language ●More than 40% of words sampled uniformly at random are counter-examples ●~ 1ms to sample u.a.r one word of length between 1 and 50

Detecting Fine grained Errors ●We injected 3 types of Error in each benchmark ○Removing a production ○Removing a nonterminal of a production ○Making a production infeasible in a specific context A -> a B A -> a B’ B -> a c | dB’ -> a c B -> a c | d

Detecting Fine grained Errors [Cont.] # Errors Found (T1, T2, T3) Avg. time taken / error found Avg. counter example size Our tool262 / 300 (97%, 92%, 73%) 28.1s35 cfga81 / 300 (33%, 29%, 19%) 342.6s12.2

Offers exercises on ●Constructing grammars from english descriptions ●Conversions to normal forms CNF, GNF ●Leftmost derivation of a string from a grammar ●Has 60 problems with different difficulty level ●Used it in a 3rd year undergraduate course on computer language processing Grammar Tutoring System

Summary of Evaluation QueriesRefutedProvedUnknownAvg. Time / query (74.6%) 289 (20.7%) 64 (4.6%) 107ms

Results on Proving Equivalence Queries (fed to the proof engine) ProvedTimeLL(2)Ambiguous (81.9%) 410ms63 18% %

Conclusion We propose, implement, and evaluate approaches for ●Enumerating and sampling words in polynomial time ●Discovering counter-examples for equivalence ●Proving equivalence that is ○complete for LL grammars ○applicable to arbitrary (ambiguous) grammars