Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jiasi Shen, Martin Rinard MIT EECS & CSAIL

Similar presentations


Presentation on theme: "Jiasi Shen, Martin Rinard MIT EECS & CSAIL"— Presentation transcript:

1 Jiasi Shen, Martin Rinard MIT EECS & CSAIL
Using Active Learning to Synthesize Models of Applications That Access Databases Jiasi Shen, Martin Rinard MIT EECS & CSAIL 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

2 Input/output examples
Motivation Synthesized program Input/output examples ? Synthesizer Synthesized program Synthesized program I/O examples often underspecify the program behavior I/O examples not necessarily easier to write than the program 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

3 Input/output examples
Motivation Input/output examples Program (Black box) Synthesizer Leverage a program as the specification 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

4 Motivation Leverage a program as the specification
Choose inputs Program (Black box) Synthesizer Synthesized program Observe outputs Leverage a program as the specification Use active learning to select inputs that eliminate uncertainty 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

5 Inference and regeneration
Motivation Choose inputs Program (Black box) Inference and regeneration Synthesizer Regenerated program Synthesized program Observe outputs Leverage a program as the specification Use active learning to select inputs that eliminate uncertainty 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

6 Why synthesize another program?
6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

7 Why synthesize another program?
Migrate implemented functionality between platforms / languages Inference and regeneration _____ __________ >_ 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

8 Why synthesize another program?
Migrate implemented functionality between platforms / languages Write seed program, then regenerate for new platforms / languages [Rinard et al, Onward! ’18] _______ ____ ___________ ______________________ _____________ ________ _____ Inference and regeneration __________ >_ 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

9 Why synthesize another program?
Migrate implemented functionality between platforms / languages Write seed program, then regenerate for new platforms / languages [Rinard et al, Onward! ’18] Reverse engineering when source code is unavailable or obfuscated Inference and regeneration _____ __________ ? 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

10 Why synthesize another program?
Migrate implemented functionality between platforms / languages Write seed program, then regenerate for new platforms / languages [Rinard et al, Onward! ’18] Reverse engineering when source code is unavailable or obfuscated Rewrite overly engineered legacy code with simple core functionality Inference and regeneration _____ __________ 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

11 Inference and regeneration
Choose inputs Program (Black box) Inference and regeneration Regenerated program Observe outputs 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

12 Inference and regeneration
Choose inputs ? Inference and regeneration Regenerated program Observe outputs Observe traffic and outputs Observe component interactions in addition to final outputs 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

13 Data retrieval applications
? >_ Retrieved data SQL query Prevalent Potentially complex implementation Simple core functionality DB 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

14 Example: Student registration app
Input s (ID) Input p (Password) Database tables: students, teachers, courses, registration if student s exists: if student s has password p : Retrieve registration records s p s s p 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

15 Observations of data retrieval apps
Data flow often manifests as SQL queries Control flow largely depends on query results Observe database queries during program execution 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

16 ? Konure DB 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

17 ? Konure Conure DB 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

18 Input parameter format
Database schema Choose inputs ? Konure Choose DB values DB 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

19 Input parameter format
Database schema Choose inputs ? Konure Observe outputs Observe DB traffic Choose DB values DB 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

20 Input parameter format
Database schema Choose inputs ? Konure Regenerated program Observe outputs Observe DB traffic Choose DB values DB 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

21 Infeasible if x == 23076821 then A else B
6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

22 Degenerate solution if x == i1 then o1 else if x == i2 then o2 else if x == i3 then o3 ... 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

23 Use DSL to precisely capture programs that can be inferred
Rule out uninferable programs Rule out degenerate solutions Design DSL and inference algorithm together Restrictive: If program expressible in DSL, guarantee correct inference Expressive: DSL supports applications of practical interest (data retrieval apps) 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

24 Each statement performs a query
y ← select from (joined) tables the rows that satisfy an expression Retrieve data, store data in y, reference y later Expressions Reference retrieved data: Col = y.Col Reference input parameter: Col = x Compare columns: Col = Col Conjunctions: Expr /\ Expr 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

25 Control flow directly tied to query results
if y ← select … then { code if y is nonempty } else { code if y is empty } for y ← select … do { code for each row in y } else { code if y is empty } Dependency complications Observe control flow by observing DB traffic 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

26 Control flow directly tied to query results
if y ← select … then { code if y is nonempty } else { code if y is empty } for y ← select … do { code for each row in y } else { code if y is empty } Dependency complications Observe control flow by observing DB traffic Force execution down a path by populating DB with chosen values 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

27 Konure inference algorithm
6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

28 Two aspects to infer from program executions
Concrete SQL query with concrete values ⇢ Abstract query template with variable references Unstructured sequence of queries ⇢ Structured control flow of the program 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

29 Represent hypothesis in DSL sentential form
Prog y1 ← Q1 y2 ← Q2 y3 ← Q3 Prog for y1 ← Q1 do { if y2 ← Q2 then 𝜖 else Prog } else Prog Resolve each Prog nonterminal by applying appropriate production 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

30 Prog 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

31 ? Konure DB s s = 0 p = 1 p Prog Empty
6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

32 students teachers courses registration Empty DB
6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

33 ? Konure DB s s = 0 p = 1 p Prog SELECT * FROM student WHERE id = ‘0’
Empty Q1: select student.* where id = s s 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

34 ? Konure DB s s = 0 p = 1 p Prog SELECT * FROM student WHERE id = ‘0’
Empty DB Empty Q1: select student.* where id = s s (0 rows) 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

35 ? Konure Can we make Q1 retrieve rows? 1+ 2+ E0 E1 E2 DB 𝜖 Prog := 𝜖 ?
y ← Q1 Prog Prog := Seq ? if y ← Q1 then Prog else Prog Prog := If ? for y ← Q1 do Prog else Prog Prog := For ? ? Konure Prog Can we make Q1 retrieve rows? 1+ 2+ E0 E1 E2 DB Q1: select student.* where id = s s (0 rows) 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

36 Ask for three executions to resolve Prog
Q1 Q2 (0 rows) Execution E0 Q1 Q2 (0 rows) Execution E0 Q1 [rep1] ... [repN] (N≥2 rows) Execution E2 Q1 Q2 (1+ rows) Execution E1 Q1 Q3 (1+ rows) Execution E1 Prog := Seq Prog := If Prog := For DSL restrictions 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

37 ? Konure E0 (Q1 gets 0 rows): Previous execution
𝜖 Prog := 𝜖 ? y ← Q1 Prog Prog := Seq ? if y ← Q1 then Prog else Prog Prog := If ? for y ← Q1 do Prog else Prog Prog := For ? ? Konure Prog E0 (Q1 gets 0 rows): Previous execution E2 (Q1 gets 2+ rows): Unsat E1 (Q1 gets 1+ rows): Next execution… DB Execution E0 Q1: select student.* where id = s s (0 rows) 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

38 ? Konure DB 𝜖 Prog := 𝜖 ? y ← Q1 Prog Prog := Seq ?
if y ← Q1 then Prog else Prog Prog := If ? for y ← Q1 do Prog else Prog Prog := For ? s = 0 p = 1 s p Execution E1 ? Konure Prog DB student: id = 0, password = 2, firstname = 3, … 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

39 students teachers courses registration
id = 0, password = 2, firstname = 3, lastname = 4 teachers Empty courses registration DB 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

40 ? Konure DB 𝜖 Prog := 𝜖 ? y ← Q1 Prog Prog := Seq ?
if y ← Q1 then Prog else Prog Prog := If ? for y ← Q1 do Prog else Prog Prog := For ? s = 0 p = 1 s p Execution E1 ? Konure Prog SELECT * FROM student WHERE id = ‘0’ DB student: id = 0, password = 2, firstname = 3, … Q1: select student.* where id = s s 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

41 ? Konure DB 𝜖 Prog := 𝜖 ? y ← Q1 Prog Prog := Seq ?
if y ← Q1 then Prog else Prog Prog := If ? for y ← Q1 do Prog else Prog Prog := For ? s = 0 p = 1 s p Execution E1 ? Konure Prog SELECT * FROM student WHERE id = ‘0’ student: id = 0, password = 2, firstname = 3, … DB student: id = 0, password = 2, firstname = 3, … Q1: select student.* where id = s s (1 row) 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

42 ? Konure DB 𝜖 Prog := 𝜖 ? y ← Q1 Prog Prog := Seq ?
if y ← Q1 then Prog else Prog Prog := If ? for y ← Q1 do Prog else Prog Prog := For ? s = 0 p = 1 s p Execution E1 ? Konure Prog DB student: id = 0, password = 2, firstname = 3, … Q1: select student.* where id = s s (1 row) 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

43 ? Konure DB 𝜖 Prog := 𝜖 ? y ← Q1 Prog Prog := Seq ?
if y ← Q1 then Prog else Prog Prog := If ? for y ← Q1 do Prog else Prog Prog := For ? s = 0 p = 1 s p Execution E1 ? Konure Prog SELECT * FROM student WHERE id = ‘0’ AND password = ‘1’ DB student: id = 0, password = 2, firstname = 3, … Q1: select student.* where id = s Q2: select student.* where id = s ∧ password = p s p (1 row) Q1: select student.* where id = s s (1 row) 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

44 ? Konure DB 𝜖 Prog := 𝜖 ? y ← Q1 Prog Prog := Seq ?
if y ← Q1 then Prog else Prog Prog := If ? for y ← Q1 do Prog else Prog Prog := For ? s = 0 p = 1 s p Execution E1 ? Konure Prog SELECT * FROM student WHERE id = ‘0’ AND password = ‘1’ Empty DB student: id = 0, password = 2, firstname = 3, … Q1: select student.* where id = s Q2: select student.* where id = s ∧ password = p s p (1 row) (0 rows) 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

45 ? Konure DB 𝜖 Prog := 𝜖 ? y ← Q1 Prog Prog := Seq ?
if y ← Q1 then Prog else Prog Prog := If ? for y ← Q1 do Prog else Prog Prog := For ? ? Konure Prog DB Execution E1 Q1: select student.* where id = s Q2: select student.* where id = s ∧ password = p s p (1 row) (0 rows) 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

46 if student s exists then (E1) else (E0)
𝜖 Prog := 𝜖 ? y ← Q1 Prog Prog := Seq ? if y ← Q1 then Prog else Prog Prog := If ? for y ← Q1 do Prog else Prog Prog := For ? Execution E0 Q1: select student.* where id = s s (0 rows) Prog Execution E1 Q1: select student.* where id = s Q2: select student.* where id = s ∧ password = p s p (1 row) (0 rows) if y1 ← Q1 then Prog else Prog Execution E2 Does not exist (Unsat) if student s exists then (E1) else (E0) s 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

47 Resolve subprograms recursively (No backtracking)
if y1 ← Q1 then { if y2 ← Q2 then Prog else Prog } else Prog if y1 ← Q1 then Prog else Prog Prog := If 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

48 Resolve subprograms recursively (No backtracking)
if y1 ← Q1 then { if y2 ← Q2 for y3 ← Q3 do Prog else Prog } else Prog if y1 ← Q1 then { if y2 ← Q2 then Prog else Prog } else Prog Prog := For 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

49 Resolve subprograms recursively (No backtracking)
if y1 ← Q1 then { if y2 ← Q2 for y3 ← Q3 do Prog else Prog } else Prog if y1 ← Q1 then { if y2 ← Q2 for y3 ← Q3 do { y4 ← Q4 Prog } else Prog Prog := Seq 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

50 Choosing inputs Encode paths as quantifier-free SMT formulas
Solve for inputs and DB values to force execution down this path Complications: Dependency Ambiguity 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

51 Key ideas Represent hypothesis as DSL sentential form
Refine hypothesis using active learning Resolve each Prog nonterminal with three executions Top-down recursion No backtracking DSL and inference algorithm developed together Associate control flow with query results Component-based inference 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

52 Soundness and completeness
Soundness and completeness result: Given any program in DSL, Konure infers a correct program Equivalently: Given an original program in DSL, If Konure produces an inferred program, the inferred program has the same behavior as the original program (soundness) Konure always produces an inferred program (completeness) 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

53 Benchmark applications
Fulcrum task manager (Ruby on Rails) Kandan chat room (Ruby on Rails) Enki blogging app (Ruby on Rails) Blog (Ruby on Rails) Student registration (Java) Synthesized new implementations in Python 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

54 Params App Runs Time Py LoC SQL If For Print
Command Params App Runs Time Py LoC SQL If For Print get_projects 1 Fulcrum 5 8 mins 21 9 get_projects_id 2 12 29 mins 25 8 get_projects_id_stories 11 7 mins 31 3 get_projects_id_stories_id get_projects_id_stories_id_notes 24 4 get_projects_id_stories_id_notes_id 13 10 mins 28 10 get_projects_id_users 30 mins get_channels Kandan 105 mins 63 16 27 get_channels_id_activities 23 39 mins 49 6 get_channels_id_activities_id 14 get_me 6 mins 44 get_users 15 9 mins 67 45 get_users_id get_admin_comments_id Enki 22 secs get_admin_pages get_admin_pages_id 23 secs get_admin_posts 33 secs get_articles Blog 21 secs get_articles_id 42 secs liststudentcourses Student 41 secs 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

55 Params App Runs Time Py LoC SQL If For Print
Command Params App Runs Time Py LoC SQL If For Print get_projects 1 Fulcrum 5 8 mins 21 9 get_projects_id 2 12 29 mins 25 8 get_projects_id_stories 11 7 mins 31 3 get_projects_id_stories_id get_projects_id_stories_id_notes 24 4 get_projects_id_stories_id_notes_id 13 10 mins 28 10 get_projects_id_users 30 mins get_channels Kandan 105 mins 63 16 27 get_channels_id_activities 23 39 mins 49 6 get_channels_id_activities_id 14 get_me 6 mins 44 get_users 15 9 mins 67 45 get_users_id get_admin_comments_id Enki 22 secs get_admin_pages get_admin_pages_id 23 secs get_admin_posts 33 secs get_articles Blog 21 secs get_articles_id 42 secs liststudentcourses Student 41 secs 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

56 Params App Runs Time Py LoC SQL If For Print
Command Params App Runs Time Py LoC SQL If For Print get_projects 1 Fulcrum 5 8 mins 21 9 get_projects_id 2 12 29 mins 25 8 get_projects_id_stories 11 7 mins 31 3 get_projects_id_stories_id get_projects_id_stories_id_notes 24 4 get_projects_id_stories_id_notes_id 13 10 mins 28 10 get_projects_id_users 30 mins get_channels Kandan 105 mins 63 16 27 get_channels_id_activities 23 39 mins 49 6 get_channels_id_activities_id 14 get_me 6 mins 44 get_users 15 9 mins 67 45 get_users_id get_admin_comments_id Enki 22 secs get_admin_pages get_admin_pages_id 23 secs get_admin_posts 33 secs get_articles Blog 21 secs get_articles_id 42 secs liststudentcourses Student 41 secs 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

57 Params App Runs Time Py LoC SQL If For Print
Command Params App Runs Time Py LoC SQL If For Print get_projects 1 Fulcrum 5 8 mins 21 9 get_projects_id 2 12 29 mins 25 8 get_projects_id_stories 11 7 mins 31 3 get_projects_id_stories_id get_projects_id_stories_id_notes 24 4 get_projects_id_stories_id_notes_id 13 10 mins 28 10 get_projects_id_users 30 mins get_channels Kandan 105 mins 63 16 27 get_channels_id_activities 23 39 mins 49 6 get_channels_id_activities_id 14 get_me 6 mins 44 get_users 15 9 mins 67 45 get_users_id get_admin_comments_id Enki 22 secs get_admin_pages get_admin_pages_id 23 secs get_admin_posts 33 secs get_articles Blog 21 secs get_articles_id 42 secs liststudentcourses Student 41 secs 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

58 Related work Automata learning
Angluin [Information and Computation 1987] Isberner, Howar, Steffen [RV 2014] Vaandrager [review article, CACM 2017] Oracle-guided component-based program synthesis Jha, Gulwani, Seshia, Tiwari [ICSE 2010] Black box active learning Rinard, Shen, Mangalick [Onward 2018] 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19

59 Conclusion Component-based program inference
Use DSL to represent program functionality Use DSL sentential form to represent hypothesis Use active learning to generate inputs and resolve nonterminals Applied to several Java and Ruby on Rails applications and synthesized new implementations in Python 6/24/19 Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19


Download ppt "Jiasi Shen, Martin Rinard MIT EECS & CSAIL"

Similar presentations


Ads by Google