Cultivating Research Taste (illustrated via a journey in Program Synthesis research) Programming Languages Mentoring Workshop 2015 Sumit Gulwani Microsoft.

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Synthesizing Number Transformations from Input-Output Examples Rishabh Singh and Sumit Gulwani.
NP-Hard Nattee Niparnan.
From Verification to Synthesis Sumit Gulwani Microsoft Research, Redmond August 2013 Marktoberdorf Summer School Lectures: Part 1.
Greta YorshEran YahavMartin Vechev IBM Research. { ……………… …… …………………. ……………………. ………………………… } P1() Challenge: Correct and Efficient Synchronization { ……………………………
Synthesizing Geometry Constructions Sumit Gulwani MSR, Redmond Vijay Korthikanti UIUC Ashish Tiwari SRI.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Sumit Gulwani Microsoft Research, Redmond Dimensions in Program Synthesis ACM Symposium on Principles and Practice of Declarative.
Teaching Finite Automata with AutomataTutor Rajeev Alur (Penn), Loris D’Antoni (Penn), Sumit Gulwani (MSR), Bjoern Hartmann (Berkeley), Dileep Kini (UIUC),
FlashExtract : A General Framework for Data Extraction by Examples
Process Design (Specification)
Learning Semantic String Transformations from Examples Rishabh Singh and Sumit Gulwani.
Data Manipulation using Programming by Examples and Natural Language Invited Upenn April 2015 Sumit Gulwani.
ISBN Chapter 3 Describing Syntax and Semantics.
Program Verification as Probabilistic Inference Sumit Gulwani Nebojsa Jojic Microsoft Research, Redmond.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Paper Title Your Name CMSC 838 Presentation. CMSC 838T – Presentation Motivation u Problem paper is trying to solve  Characteristics of problem  … u.
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A Sumit Gulwani (MSR Redmond) Component-based Synthesis Susmit Jha.
CS 330 Programming Languages 09 / 18 / 2007 Instructor: Michael Eckmann.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
Usable Synthesis Sumit Gulwani Microsoft Research, Redmond Usable Verification Workshop November 2010 MSR Redmond.
VS 3 : Verification and Synthesis using SMT Solvers SMT Solvers for Program Verification Saurabh Srivastava * Sumit Gulwani ** Jeffrey S. Foster * * University.
Describing Syntax and Semantics
Synthesis of Loop-free Programs Sumit Gulwani (MSR), Susmit Jha (UC Berkeley), Ashish Tiwari (SRI) and Ramarathnam Venkatesan(MSR) Susmit Jha 1.
1 Lecture 10 – Synthesis from Examples Eran Yahav.
Dimensions in Synthesis Sumit Gulwani Microsoft Research, Redmond May 2012.
Programming by Example using Least General Generalizations Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft Research.
From Program Verification to Program Synthesis Saurabh Srivastava * Sumit Gulwani ♯ Jeffrey S. Foster * * University of Maryland, College Park ♯ Microsoft.
Chapter 14: Artificial Intelligence Invitation to Computer Science, C++ Version, Third Edition.
Purpose of study A high-quality computing education equips pupils to use computational thinking and creativity to understand and change the world. Computing.
Visual Sequences IQ Tests 1 Dipendra Kumar Misra (Y9201) Mukul Singh (Y9350) Tags : Search, Pattern Recognition, Logic etc Advisor : Dr. Amitabh Mukherjee.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Advanced Spreadsheet Skills for Game Designers
Programming by Examples Marktoberdorf Lectures August 2015 Sumit Gulwani.
Automatically Synthesizing SQL Queries from Input-Output Examples Sai Zhang University of Washington Joint work with: Yuyin Sun.
End-User Programming (using Examples & Natural Language) Sumit Gulwani Microsoft Research, Redmond August 2013 Marktoberdorf Summer.
Dimensions in Synthesis Part 3: Ambiguity (Synthesis from Examples & Keywords) Sumit Gulwani Microsoft Research, Redmond May 2012.
111 Notion of a Project Notes from OOSE Slides – a different textbook used in the past Read/review carefully and understand.
TECH Computer Science NP-Complete Problems Problems  Abstract Problems  Decision Problem, Optimal value, Optimal solution  Encodings  //Data Structure.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Synthesis with the Sketch System D AY 1 Armando Solar-Lezama.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Chapter 3 Part II Describing Syntax and Semantics.
CS703: PROJECT GUIDELINES 1. Logistics: Project Most important part of the course Teams of 1 or 2 people Expectations commensurate with size of team Deliverables.
Data Structures and Algorithms Dr. Tehseen Zia Assistant Professor Dept. Computer Science and IT University of Sargodha Lecture 1.
FlashNormalize: Programming by Examples for Text Normalization International Joint Conference on Artificial Intelligence, Buenos Aires 7/29/2015FlashNormalize1.
Predicting a Correct Program in PBE Rishabh Singh, Microsoft Research Sumit Gulwani, Microsoft Research.
Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun
Automating String Processing in Spreadsheets using Input-Output Examples Sumit Gulwani Microsoft Research, Redmond.
Compositional Program Synthesis from Natural Language and Examples Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft.
FlashMeta Microsoft PROSE SDK: A Framework for Inductive Program Synthesis Oleksandr Polozov University of Washington Sumit Gulwani Microsoft Research.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Programming by Examples Marktoberdorf Lectures August 2015 Sumit Gulwani.
Dagstuhl Seminar Oct 2015 Sumit Gulwani Applications of Inductive Programming in Data Wrangling.
Programming by Examples applied to Data Wrangling Invited SYNT July 2015 Sumit Gulwani.
Deductive Techniques for synthesis from Inductive Specifications Dagstuhl Seminar Oct 2015 Sumit Gulwani.
Big Data: Every Word Managing Data Data Mining TerminologyData Collection CrowdsourcingSecurity & Validation Universal Translation Monolingual Dictionaries.
On the Relation Between Simulation-based and SAT-based Diagnosis CMPE 58Q Giray Kömürcü Boğaziçi University.
CMSC 104, L041 Algorithms, Part 1 of 3 Topics Definition of an Algorithm Example: The Euclidean Algorithm Syntax versus Semantics Reading Sections 3.1.
Conceptual Foundations © 2008 Pearson Education Australia Lecture slides for this course are based on teaching materials provided/referred by: (1) Statistics.
Sumit Gulwani Spreadsheet Programming using Examples Keynote at SEMS July 2016.
Sumit Gulwani Programming by Examples Applications, Algorithms & Ambiguity Resolution Keynote at IJCAR June 2016.
Tackling Ambiguity in PBE Rishabh Singh
Outline Core Synthesis Architecture [1 hour by Sumit]
Algorithms, Part 1 of 3 The First step in the programming process
Programming by Examples
Cracking the Coding Interview
Programming by Examples
Templates of slides for P4 Experiments with your synthesizer
Presentation transcript:

Cultivating Research Taste (illustrated via a journey in Program Synthesis research) Programming Languages Mentoring Workshop 2015 Sumit Gulwani Microsoft Research, Redmond

Problem Definition –Advisor’s interest and funding, Internship, Course project –Intersection with your collaborator’s interest –Next logical advance in your current portfolio –Talk to potential customers, market surveys Solution Strategy –Develop new techniques vs. Apply existing techniques –Cross-disciplinary Impact –Paper, Tool, Awards, Media –Personal happiness Cultivating research taste is a journey! Once you develop it, you start on another journey! 1 Dimensions in Research

2 Program Synthesis Goal: Synthesize a program in the underlying domain-specific language (DSL) from user intent using some search algorithm. An old problem, but more significant today. Diverse computational platforms & programming languages. Enabling technology: Better algorithms & faster machines Synthesis can revolutionize end-user programming if we: target the right set of application domains –such as Data manipulation allow the right intent specification mechanism –Examples, Natural Language can tame the huge search space for real-time interaction –Domain-specific search algorithms PPDP 2010 [Invited talk paper]: “Dimensions in Program Synthesis”;

3 Graduation Advice (2005) George Necula UC-Berkeley You will have too many problems to solve; you can’t pursue them all. Make thoughtful choices.

4 From Program Verification to Program Synthesis Statement s Precondition P Postcondition Q Forward dataflow analysis: From s, P, compute Q Program Synthesis: Backward dataflow analysis: From s, Q, compute P From P, Q, compute s Nebojsa Jojic MSR Redmond (2005)

5 Synthesis using SAT/SMT Constraint Solvers Venkie MSR Bangalore (2006) Try using SAT solvers, which have been engineered to solve huge instances. Program synthesis is an extremely hard combinatorial search task!

Results: Managed to synthesize a wide variety of programs from logic specs. Approach: Reduce synthesis to solving SAT/SMT constraints. Bit-vector algorithms (e.g., turn-off rightmost one bit) –[PLDI 2011, ICSE 2010] SIMD algorithms (e.g., vectorization of CountIf) –[PPoPP 2013] Undergraduate book algorithms (e.g., sorting, dynamic prog) –[POPL 2010] Program Inverses (e.g, deserializers from serializers) –[PLDI 2011] Graph Algorithms (e.g., bi-partiteness check) –[OOPSLA 2010] 6 Initial results in program synthesis

Mid-life Awakening (2010) Software developers End users Two orders of magnitude more users

 Problem Definition –Advisor’s interest and funding, Internship, Course project –Intersection with your collaborator’s interest –Next logical advance in your current portfolio –Talk to potential customers, market surveys Solution Strategy –Develop new techniques vs. Apply existing techniques –Cross-disciplinary Impact –Paper, Tool, Media, Awards –Personal happiness Cultivating research taste is a journey! Once you develop it, you start on another journey! 8 Dimensions in Research

Problem Definition: Inspired by Excel help forums

Typical help-forum interaction 300_w5_aniSh_c1_b  w5 =MID(B1,5,2) 300_w30_aniSh_c1_b  w30 =MID(B1,FIND(“_”,$B:$B)+1, FIND(“_”,REPLACE($B:$B,1,FIND(“_”,$B:$B),””))-1) =MID(B1,5,2)

Flash Fill (Excel 2013 feature)

Problem Definition –Advisor’s interest and funding, Internship, Course project –Intersection with your collaborator’s interest –Next logical advance in your current portfolio –Talk to potential customers, market surveys  Solution Strategy –Develop new techniques vs. Apply existing techniques –Cross-disciplinary Impact –Paper, Tool, Awards, Media –Personal happiness Cultivating research taste is a journey! Once you develop it, you start on another journey! 12 Dimensions in Research

Guarded Expression G := Switch((b 1,e 1 ), …, (b n,e n )) Boolean Expression b := c 1 Æ … Æ c n Atomic Predicate c := Match(v i,k,r) Trace Expression e := Concatenate(f 1, …, f n ) Atomic Expression f := s // Constant String | SubStr(v i, p 1, p 2 ) | Loop( ¸ w: e) Index Expression p := k // Constant Integer | Pos(r 1, r 2, k) // k th position in string whose left/right side matches with r 1 /r 2 Regular Expression r := TokenSequence(T 1,…,T n ) 13 Flash Fill: Domain Specific Language POPL 2011: “Automating String Processing in Spreadsheets using Input-Output Examples”; Sumit Gulwani.

Let w = SubString(s, p, p’) where p = Pos(r 1, r 2, k) and p’ = Pos(r 1 ’, r 2 ’, k’) 14 Substring Operator s p p’ w w1w1 w2w2 w1’w1’ w2’w2’ r 1 matches w 1 r 2 matches w 2 r 1 ’ matches w 1 ’ r 2 ’ matches w 2 ’

15 Syntactic String Transformations: Example Switch((b 1, e 1 ), (b 2, e 2 )), where b 1 ´ Match(v 1,NumTok,3), b 2 ´ : Match(v 1,NumTok,3), e 1 ´ Concatenate(SubStr2(v 1,NumTok,1), ConstStr(“-”), SubStr2(v 1,NumTok,2), ConstStr(“-”), SubStr2(v 1,NumTok,3)) e 2 ´ Concatenate(ConstStr(“425-”),SubStr2(v 1,NumTok,1), ConstStr(“-”),SubStr2(v 1,NumTok,2)) Format phone numbers Input v 1 Output (425)

16 Goal: Given input-output pairs: (i 1,o 1 ), (i 2,o 2 ), (i 3,o 3 ), (i 4,o 4 ), find P such that P(i 1 )=o 1, P(i 2 )=o 2, P(i 3 )=o 3, P(i 4 )=o 4. Algorithm: 1.Learn set S 1 of trace expressions s.t. 8 e in S 1, [[e]] i 1 = o 1. Similarly compute S 2, S 3, S 4. Let S = S 1 Å S 2 Å S 3 Å S 4. 2(a). If S ≠ ; then result is S. Challenge: Each S j may have a huge number of expressions. Key Idea: We have a DAG based data-structure that allows for succinct representation and manipulation of S j. Flash Fill: Search Algorithm

17 Goal: Given input-output pairs: (i 1,o 1 ), (i 2,o 2 ), (i 3,o 3 ), (i 4,o 4 ), find P such that P(i 1 )=o 1, P(i 2 )=o 2, P(i 3 )=o 3, P(i 4 )=o 4. Algorithm: 1.Learn set S 1 of trace expressions s.t. 8 e in S 1, [[e]] i 1 = o 1. Similarly compute S 2, S 3, S 4. Let S = S 1 Å S 2 Å S 3 Å S 4. 2(a). If S ≠ ; then result is S. 2(b). Else find a smallest partition, say {S 1,S 2 }, {S 3,S 4 }, s.t. S 1 Å S 2 ≠ ; and S 3 Å S 4 ≠ ;. 3. Learn boolean formulas b 1, b 2 s.t. b 1 maps i 1, i 2 to true, and b 2 maps i 3, i 4 to true. 4. Result is: Switch((b 1,S 1 Å S 2 ), (b 2,S 3 Å S 4 )) Flash Fill: Search Algorithm Search Methodology: Reduce learning of an expression to learning of sub-expressions (Divide-and-Conquer!)

General Principles Prefer shorter programs. –Fewer number of conditionals. –Shorter string expression, regular expressions. Prefer programs with fewer constants. Strategies Baseline: Pick any minimal sized program using minimal number of constants. Machine Learning: Programs are scored using a weighted combination of program features. –Weights are learned using training data. 18 Ranking Rishabh Singh

19 Experimental Comparison of various Ranking Strategies StrategyAverage # of examples required Baseline4.17 Learning1.48 Technical Report: “Predicting a correct program in Programming by Example”; Singh, Gulwani

Current Flash Fill Model Auto-prediction avoids discoverability issue. User inspects output and may provide additional examples. Show programs in any desired language (after conversion from DSL). Paraphrase in English. Computer initiated interactivity Highlight less confident entries in the output. Ask directed questions based on distinguishing inputs. 20 User Interaction Model

Problem Definition –Advisor’s interest and funding, Internship, Course project –Intersection with your collaborator’s interest –Next logical advance in your current portfolio –Talk to potential customers, market surveys Solution Strategy –Develop new techniques vs. Apply existing techniques –Cross-disciplinary  Impact –Paper, Tool, Awards, Media –Personal happiness Cultivating research taste is a journey! Once you develop it, you start on another journey! 21 Dimensions in Research

Initial Success: Media articles & Blogposts

Defined a new research trajectory, which keeps me busy with a passionate sense of purpose. End-user Programming using Examples and Natural Language Intelligent Tutoring systems 23 Broader Impact

Dimensions in Research Problem definition, Solution strategy, Impact Cultivating research taste is a journey Mine involved: “Program analysis” -> “Program synthesis” -> “Program synthesis for end-users using examples” Once you develop it, you start a new journey Mine involves: having fun with cross-disciplinary research in “Frameworks for end-user programming using examples & NL” “Intelligent Tutoring systems” Conclusion

25 Backup Slides for Flash Fill Demo