Compositional Program Synthesis from Natural Language and Examples Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft.

Slides:



Advertisements
Similar presentations
Semantics Static semantics Dynamic semantics attribute grammars
Advertisements

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
FlashExtract : A General Framework for Data Extraction by Examples
Data Manipulation using Programming by Examples and Natural Language Invited Upenn April 2015 Sumit Gulwani.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
UNIT-III By Mr. M. V. Nikum (B.E.I.T). Programming Language Lexical and Syntactic features of a programming Language are specified by its grammar Language:-
ISBN Chapter 3 Describing Syntax and Semantics.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
CS 330 Programming Languages 09 / 18 / 2007 Instructor: Michael Eckmann.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
The Structure of the GNAT Compiler. A target-independent Ada95 front-end for GCC Ada components C components SyntaxSemExpandgigiGCC AST Annotated AST.
CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.
Chapter 3: Formal Translation Models
Describing Syntax and Semantics
Chapter 2: Algorithm Discovery and Design
Programming by Example using Least General Generalizations Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft Research.
Cultivating Research Taste (illustrated via a journey in Program Synthesis research) Programming Languages Mentoring Workshop 2015 Sumit Gulwani Microsoft.
Bridging the chasm between MDE and the world of compilation Nondini Das 1.
Smten: Automatic Translation of High-level Symbolic Computations into SMT Queries Richard Uhler (MIT-CSAIL) and Nirav Dave (SRI International) CAV 2013.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Introduction Syntax: form of a sentence (is it valid) Semantics: meaning of a sentence Valid: the frog writes neatly Invalid: swims quickly mathematics.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Programming by Examples Marktoberdorf Lectures August 2015 Sumit Gulwani.
Dimensions in Synthesis Part 3: Ambiguity (Synthesis from Examples & Keywords) Sumit Gulwani Microsoft Research, Redmond May 2012.
AN IMPLEMENTATION OF A REGULAR EXPRESSION PARSER
PART I: overview material
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
CS 363 Comparative Programming Languages Semantics.
CPS 506 Comparative Programming Languages Syntax Specification.
ISBN Chapter 3 Describing Semantics.
Chapter 3 Part II Describing Syntax and Semantics.
Basic Parsing Algorithms: Earley Parser and Left Corner Parsing
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
1 / 48 Formal a Language Theory and Describing Semantics Principles of Programming Languages 4.
SmartSynth: Synthesizing Smartphone Automation Scripts from Natural Language Vu Le (UC Davis) Sumit Gulwani (MSR Redmond) Zhendong Su (UC Davis)
Predicting a Correct Program in PBE Rishabh Singh, Microsoft Research Sumit Gulwani, Microsoft Research.
Automating String Processing in Spreadsheets using Input-Output Examples Sumit Gulwani Microsoft Research, Redmond.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Syntax and Grammars.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Data Structures & Algorithms CHAPTER 1 Introduction Ms. Manal Al-Asmari.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Programming by Examples Marktoberdorf Lectures August 2015 Sumit Gulwani.
CSE 311 Foundations of Computing I Lecture 19 Recursive Definitions: Context-Free Grammars and Languages Autumn 2012 CSE
Chapter 1: Preliminaries Lecture # 2. Chapter 1: Preliminaries Reasons for Studying Concepts of Programming Languages Programming Domains Language Evaluation.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Deductive Techniques for synthesis from Inductive Specifications Dagstuhl Seminar Oct 2015 Sumit Gulwani.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
Sumit Gulwani Spreadsheet Programming using Examples Keynote at SEMS July 2016.
Sumit Gulwani Programming by Examples Applications, Algorithms & Ambiguity Resolution Keynote at IJCAR June 2016.
Tackling Ambiguity in PBE Rishabh Singh
Functional Programming
Describing Syntax and Semantics
Outline Core Synthesis Architecture [1 hour by Sumit]
BCT 2083 DISCRETE STRUCTURE AND APPLICATIONS
GC211Data Structure Lecture2 Sara Alhajjam.
Syntax Analysis Chapter 4.
Lexical and Syntax Analysis
Programming Language Syntax 2
R.Rajkumar Asst.Professor CSE
Algorithm Discovery and Design
Teori Bahasa dan Automata Lecture 9: Contex-Free Grammars
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Discrete Maths 13. Grammars Objectives
*Internal Synthesizer Flow *Details of Synthesis Steps
Visual Programming Languages ICS 539 Icon System Visual Languages & Visual Programming, Chapter 1, Editor Chang, 1990 ICS Department KFUPM Sept. 1,
Presentation transcript:

Compositional Program Synthesis from Natural Language and Examples Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft

Introduction End-user programming from NL and Examples Empowering the 99% of computer users who are non- programmers with the ability to program computers Important application area: text manipulation and string transformations in spreadsheets, word processing tools, etc. Domain Specific Language (DSL) formal programming language Task Specification Examples, NL, both,…. Program Synthesis Algorithm DSL-specific or DSL-agnostic Program

State of the art Regular Expressions from NL Kushman & Barzilay, NAACL 2013 Excel Flash Fill Gulwani, POPL 2011 Synthesis from NL + examples Manshadi, Gildea & Allen, AAAI 2013

Challenges Programming by example (PBE): expressivity bottleneck: strong language bias to learn effectively from few examples Programming by Natural Language (PBNL): supervision bottleneck: availability of training data for language learning Ambiguity and inaccuracy of NL descriptions of tasks Main challenge: scalability Supporting expressive DSLs to allow a wide range of tasks e.g. remove “Mr” or “Mrs” or “Miss” from all the names Supporting complex tasks e.g. find “G” followed by 1-5 numbers or “G” followed by 4 numbers followed by a single letter “A”-“Z”

The Lack of Compositionality Compositionality is fundamental to achieving scalability in programming Expressions, subroutines, classes, libraries, … Reasoning with declarative pre/post conditions, unit tests Compositionality is present in end user interactions with expert programmers Iterative descriptions of tasks and elaboration Compositionality is a challenge in existing PBE and PBNL approaches: End users are unaware of the formal DSL

A Compositional Synthesis Paradigm Use compositionality in natural language to decompose task into tractable subtasks User provides: NL specification of task Input-output examples Examples for constituent concepts Program synthesis using constituent examples: Aids search and ranking of synthesis Not relying on language training Not restricting DSL expressivity Synthesized program: “G” followed by 1-5 numbers or “G” followed by 4 numbers followed by a single letter “A”-“Z”

Domain Specific Language (DSL) Context-free grammar Terminal Symbols Non-terminal Symbols Start symbol Rules: (name, head, body) Semantics Each symbol is a type ranging over set of values Rule is a function from tuple of body types to head type Program is a concrete syntax tree constructed from CFG. Complete program - root is start symbol Program component - root is not start symbol Example DSL: Flash Fill with no expressivity constraints int k, nat n, char c, string s

Compositional Task Specifications Standard input-output examples specification: Compositional examples specification: output is a tree structure including constituent examples Input (“AB345678”, “RJ123456”, “DDD12345”) Output (“AB345678”, “RJ123456”, null) (“AB”, “RJ”, Ø)(“345678”, “123456”, Ø) Input (“AB345678”, “RJ123456”, “DDD12345”) Output (“AB345678”, “RJ123456”, null)

Program Synthesis Algorithm SynthProgs(I, O) P ← InitializeTerminals() while (true) P ← P ᴜ ApplyDSLRules(P) P’ ← { p ϵ P | p(I) = 0 } if (P’ ≠ Ø ) return P’ Rank(P) return smallest p ϵ P I = (“AB345678”, “RJ123456”, “DDD12345”) O = (“AB345678”, “RJ123456”, null) “Any 2 letters followed by any combination of 6 whole numbers” { …, 2, …, 6,...} { …, Interval(UpperChar,2), …, Interval(NumChar,6), …. } { …, Concat(Interval(UpperChar,2),Interval(NumChar,6)), …. } { …, Filter(Concat(Interval(UpperChar,2),Interval(NumChar,6))), …. } { …, Filter(Concat(Interval(UpperChar,2),Interval(NumChar,6))), …. …, Filter(Concat(Interval(UpperChar,2),KleeneStar(NumChar))), …. } { …, 2, …, 6,..., UpperChar, …, NumChar, … } Filter(Concat(Interval(UpperChar,2),KleeneStar(NumChar)))

Program Synthesis Algorithm SynthesizeProgs(I, T) let T = O[T 1, …, T n ] P ← InitializeTerminals() P ← P ᴜ SynthesizeProgs(I, T i ) while (true) P ← P ᴜ ApplyDSLRules(P) P’ ← { p ϵ P | p(I) O } if (P’ ≠ Ø ) return P’ Rank(P) return smallest p ϵ P with the most CSR-satisfying components i = 1…n CSR ͠ ᴜ I = (“AB345678”, “RJ123456”, “DDD12345”) O 0 = (“AB345678”, “RJ123456”, null) “Any 2 letters followed by any combination of 6 whole numbers” { …, 2, …, 6,...} SynthesizeProgs(I, O 1 ) = { …, Interval(UpperChar,2), …} SynthesizeProgs(I, O 2 ) = { …, Interval(NumChar,6), …. } { …, Concat(Interval(UpperChar,2),Interval(NumChar,6)), …. } { …, Filter(Concat(Interval(UpperChar,2),Interval(NumChar,6))), …. } { …, Filter(Concat(Interval(UpperChar,2),Interval(NumChar,6))), …. …, Filter(Concat(Interval(UpperChar,2),KleeneStar(NumChar))), …. } Filter(Concat(Interval(UpperChar,2),Interval(NumChar,6))) T = O 0 [O 1, O 2 ] O 1 = (“AB”, “RJ”, Ø) O 2 = (“345678”, “123456”, Ø)

Component Satisfaction Relation (CSR) Given input I, examples E and p(I) = V CSR (I, E, V) determines when values V of type Type are relevant for examples E on inputs I CSR for types in the string DSL: String: if the values are equal to the example strings Regex: if the value is a regex that matches the example string in the input string Char Class: if the characters in the examples and the values fall under the same minimal character class Position: if the value is the start or end position of the example string in the input string Input I = (“AB345678”, “RJ123456”, “DDD12345”) Output (“AB345678”, “RJ123456”, null) E = (“AB”, “RJ”, Ø)(“345678”, “123456”, Ø) String: Regex: Char Class: Position:

Program synthesis algorithm Parametric in DSL, CSR and compositional specification Systematic search Soundness and completeness Specification-guided optimization Search with recursive component synthesis using CSR Semantic equivalence optimization DSL-agnostic rule application patterns Ranking Based on constituent components and size

Evaluation Problems from online help forums covering range of DSL features Excel, StackOverflow and Regex Used original NL description of the task, detected noun phrases for constituent concepts using Stanford and MSR Splat parsers Average number of examples required: 2.73 Average number of constituent concepts: 1.53 Baselines: FF: Flash Fill (8 of 48 tasks expressible, of which 2 inferred correctly) B1: Our system without constituent examples B2: Our system without ranking based only on size FFB1B2CPS Number of correct results Number of incorrect results Number of timeouts02676 Avg. time (seconds)<

Task: replace within match If the cells contain a 16 digit number then Replace the first 12 digits of each string with “xxxxXXXXxxxx”

Task: dependent position expressions extract any numbers after “SN”. The numbers can be vary in digits. Also, at times there is some other text in between numbers and search word

Task: conditional with disjunction If column A contains the words “ear” or “mouth”, then I want to return the value of “face” otherwise I want to return the value of “body”

Task: inaccuracy in NL description The string must start with “1” or “2” (only once and mandatory) and then followed by any character between “a” to “z” (only once)

Conclusion New paradigm with NL, examples and compositionality Lifting the “expressivity” and “supervision” bottlenecks Domain-agnostic synthesis approach Synthesis technique Language learning/probabilistic relevance models from training data (potentially obtained from our system) Domain specific optimizations Interaction Dialog-based user interaction model Paraphrased NL descriptions of programs shown to user Counter-examples, and iterative elaboration Application domains Numerical algorithms, task completion (web, OS), robotics, … Future work