Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automating String Processing in Spreadsheets using Input-Output Examples Sumit Gulwani Microsoft Research, Redmond.

Similar presentations


Presentation on theme: "Automating String Processing in Spreadsheets using Input-Output Examples Sumit Gulwani Microsoft Research, Redmond."— Presentation transcript:

1 Automating String Processing in Spreadsheets using Input-Output Examples Sumit Gulwani (sumitg@microsoft.com) Microsoft Research, Redmond

2 1 Automated Program Synthesis Deserves renewed interest today! Natural goal given that computing has become accessible, but most people are not expert programmers. Enabling technology is now available –Better search techniques AI style search techniques. logical reasoning based techniques (SAT/SMT solvers). –Faster machines (good application for multi-cores) State of the art: We can synthesize 10-20 lines of code.

3 Our techniques can synthesize a wide variety of algorithms/programs from logic and examples. Undergraduate book algorithms (e.g., sorting, dynamic prog) –[Srivastava/Gulwani/Foster, POPL 2010] Program Inverses (e.g, deserializers from serializers) –[Srivastava/Gulwani/Chaudhuri/Foster, MSR-TR-2010-34] Graph Algorithms (e.g., bi-partiteness check) –[Itzhaky/Gulwani/Immerman/Sagiv, OOPSLA 2010] Bit-vector algorithms (e.g., turn-off rightmost one bit) –[Jha/Gulwani/Seshia/Tiwari, ICSE 2010] 2 Recent Success in Program Synthesis

4 End-Users Algorithm Designers Software Developers Most Useful Target Potential Consumers of Synthesis Technology Pyramid of Technology Users

5 Demo 4

6  Language of String Programs Synthesis Algorithm Ranking Strategy Limitations 5 Outline

7 Guarded Expression G := Switch((b 1,e 1 ), …, (b n,e n )) String Expression e := Concatenate(f 1, …, f n ) Base Expression f := s // Constant String | SubStr(v i, p 1, p 2 ) Index Expression p := k // Constant Integer | Pos(r 1, r 2, k) // k th position in string whose left/right side matches with r 1 /r 2 Notation: SubStr2(v i,r,k) ´ SubsStr(v i,Pos( ²,r,k),Pos(r, ²,k)) –Denotes k th occurrence of regular expression r in v i 6 Language for Constructing Output Strings

8 7 Example Switch((b 1, e 1 ), (b 2, e 2 )), where b 1 ´ Match(v 1,NumTok,3), b 2 ´ : Match(v 1,NumTok,3), e 1 ´ Concatenate(SubStr2(v 1,NumTok,1), ConstStr(“-”), SubStr2(v 1,NumTok,2), ConstStr(“-”), SubStr2(v 1,NumTok,3)) e 2 ´ Concatenate(ConstStr(“425-”),SubStr2(v 1,NumTok,1), ConstStr(“-”),SubStr2(v 1,NumTok,2)) Format phone numbers Input v 1 Output (425)-706-7709425-706-7709 510.220.5586510-220-5586 235 7654425-235-7654 745-8139425-745-8139

9 Language of String Programs  Synthesis Algorithm Ranking Strategy Limitations 8 Outline

10 Reduction requires computing all solutions for each of the sub-problems: –This also allows to rank various solutions and select the highest ranked solution at the top-level. –A challenge here is to efficiently represent, compute, and manipulate huge number of such solutions. I will show three applications of this idea in the talk. –Read the paper for more tricks! 9 Key Synthesis Idea: Divide and Conquer Reduce the problem of synthesizing expressions into sub-problems of synthesizing sub-expressions.

11 10 Synthesizing Guarded Expression Goal: Given input-output pairs: (i 1,o 1 ), (i 2,o 2 ), (i 3,o 3 ), (i 4,o 4 ), find P such that P(i 1 )=o 1, P(i 2 )=o 2, P(i 3 )=o 3, P(i 4 )=o 4. Algorithm: 1. Learn set S 1 of string expressions s.t. 8 e in S 1, [[e]] i 1 = o 1. Similarly compute S 2, S 3, S 4. Let S = S 1 Å S 2 Å S 3 Å S 4. 2(a) If S ≠ ; then result is Switch((true,S)). Application #1: We reduce the problem of learning guarded expression P to the problem of learning string expressions for each input-output pair.

12 11 Example: Various choices for a String Expression Input Output Constant

13 Number of all possible string expressions (that can construct a given output string o 1 from a given input string i 1 ) is exponential in size of output string. –# of substrings is just quadratic in size of output string! –We use a DAG based data-structure, and it supports efficient intersection operation! 12 Synthesizing String Expressions Application #2: To represent/learn all string expressions, it suffices to represent/learn all base expressions for each substring of the output.

14 Various ways to extract “706” from “425-706-7709”: Chars after 1 st hyphen and before 2 nd hyphen. Substr(v 1, Pos(HyphenTok, ²,1), Pos( ²,HyphenTok,2)) Chars from 2 nd number and up to 2 nd number. Substr(v 1, Pos( ²,NumTok,2), Pos(NumTok, ²,2)) Chars from 2 nd number and before 2 nd hyphen. Substr(v 1, Pos( ²,NumTok,2), Pos( ²,HyphenTok,2)) Chars from 1 st hyphen and up to 2 nd number. Substr(v 1, Pos(HyphenTok, ²,1), Pos( ²,HyphenTok,2))  13 Example: Various choices for a SubStr Expression

15 The number of SubStr(v,p 1,p 2 ) expressions that can extract a given substring w from a given string v can be large! –This allows for representing and computing O(n 1 *n 2 ) choices for SubStr using size/time O(n 1 +n 2 ). 14 Synthesizing SubStr Expressions Application #3: To represent/learn all SubStr expressions, we can independently represent/learn all choices for each of the two index expressions.

16 15 Back to Synthesizing Guarded Expression Goal: Given input-output pairs: (i 1,o 1 ), (i 2,o 2 ), (i 3,o 3 ), (i 4,o 4 ), find P such that P(i 1 )=o 1, P(i 2 )=o 2, P(i 3 )=o 3, P(i 4 )=o 4. Algorithm: 1.Learn set S 1 of string expressions s.t. 8 e in S 1, [[e]] i 1 = o 1. Similarly compute S 2, S 3, S 4. Let S = S 1 Å S 2 Å S 3 Å S 4. 2(a). If S ≠ ; then result is Switch((true,S)). 2(b). Else find a smallest partition, say {S 1,S 2 }, {S 3,S 4 }, s.t. S 1 Å S 2 ≠ ; and S 3 Å S 4 ≠ ;. 3. Learn boolean formulas b 1, b 2 s.t. b 1 maps i 1, i 2 to true and i 3, i 4 to false. b 2 maps i 3, i 4 to true and i 1, i 2 to false. 4. Result is: Switch((b 1,S 1 Å S 2 ), (b 2,S 3 Å S 4 ))

17 Language of String Programs Synthesis Algorithm  Ranking Strategy Limitations 16 Outline

18 Prefer shorter programs. –Fewer number of conditionals. –Shorter string expression, regular expressions. Prefer programs with less number of constants. 17 Ranking Strategy

19 Language of String Programs Synthesis Algorithm Ranking Strategy  Limitations 18 Outline

20 This paper: Syntactic Manipulation of strings Extension 1: Semantic Manipulation of strings –Joint work with intern Rishabh Singh (MIT) Extension 2: Layout Manipulation of tables –Joint work with intern Bill Harris (UW-Madison) 19 Limitations and Follow-up Work

21 Demo 20

22 Problem: End-user Programming Solution: Program Synthesis with inter-disciplinary inspirations Programming Languages –Design of an expressive language that can succinctly represent string computations and is amenable to learning. Machine Learning –Version space algebra for learning straight-line code. –Boolean classification technique for learning control flow. HCI –Input-output based interaction model –Several usability features: Ranking scheme, Feedback to user, Quick Convergence, Noise tolerance. 21 Conclusion


Download ppt "Automating String Processing in Spreadsheets using Input-Output Examples Sumit Gulwani Microsoft Research, Redmond."

Similar presentations


Ads by Google