Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Semantic String Transformations from Examples Rishabh Singh and Sumit Gulwani.

Similar presentations


Presentation on theme: "Learning Semantic String Transformations from Examples Rishabh Singh and Sumit Gulwani."— Presentation transcript:

1 Learning Semantic String Transformations from Examples Rishabh Singh and Sumit Gulwani

2 FlashFill

3

4 Transformations Syntactic Transformations –Concatenation of regular expression based substring –“VLDB2012”  “VLDB” Semantic Transformations –More than just characters –“1/5/2010”  “May 1 st 2010”

5 Semantic Transformations Semantic information as relational tables –1  January, 2  February Learn table lookup queries –VLOOKUP macro 2 nd most problematic

6 Outline Lookup Transformations Lookup + Syntactic Transformations Case Studies

7 Table Lookup Transformation s Demo

8 Learning Framework Input Strings F Output String F1F1 1. Domain-specific Language L FnFn … 2. Algorithm to learn all F s from (i,o)

9

10 Emp Record SSNEmpIdName 027-36-45571254John Henry 034-83-76832412William Johnson 044-58-34291125Steve Russell 018-45-89494257Ian Jordan 023-34-32546418Mary Dina Input v 1 Output 044-58-3429Steve Russell Select(Name, EmpRecord, (SSN = v 1 )) Example - Lookup

11 ItemRec ItemIdItem ST-340Stroller BI-567Bib DI-328Diapers WI-989Wipes AS-469Aspirator PriceRec ItemIdPrice ST-340$145.67 BI-567$3.56 DI-328$21.45 WI-989$5.12 AS-469$2.56 Input v 1 Output Stroller$145.67 Select(Price, PriceRec, (ItemId = Select(ItemId, ItemRec, Item = v 1 )) Example – Transitive Lookup

12 Learn Query ItemRec ItemIdItem ST-340Stroller BI-567Bib DI-328Diapers WI-989Wipes AS-469Aspirator PriceRec ItemIdPrice ST-340$145.67 BI-567$3.56 DI-328$21.45 WI-989$5.12 AS-469$2.56 Input v 1 Output Stroller$145.67 Select(Price, PriceRec, (ItemId = Select(ItemId, ItemRec, Item = v 1 ))

13

14 Strings reachable from input row 044-58-3429 Emp Record SSNEmpIdName 027-36-45571254John Henry 034-83-76832412William Johnson 044-58-34291125Steve Russell 018-45-89494257Ian Jordan

15 strings in table rows of visited nodes 044-58-34291125Steve Russell

16 …….. Repeat until k steps or fixpoint

17 …….. Steve Russell

18

19 Maintains tree structure –share common sub-expressions CNF of Boolean Conditionals –independent column predicates

20

21 Synthesize Procedure Synthesize((i 1,o 1 ), …, (i n,o n )) P = GenerateStr t (i 1,o 1 ) for j = 2 to n: P’ = GenerateStr t (i j,o j ) P = Intersect t (P’, P) return P

22 Semantic String Transformation s Demo

23 [GulwaniPOPL11]

24 Syntactic manipulations over lookup outputs Syntactic manipulations before indexing

25

26 SSN: 044-58-3429 Emp Record SSNEmpIdName 027-36-45571254John Henry 034-83-76832412William Johnson 044-58-34291125Steve Russell 018-45-89494257Ian Jordan Mr. Steve Russell

27 SSN: 044-58-3429 Emp Record SSNEmpIdName 027-36-45571254John Henry 034-83-76832412William Johnson 044-58-34291125Steve Russell 018-45-89494257Ian Jordan

28 SSN: 044-58-3429 Emp Record SSNEmpIdName 027-36-45571254John Henry 034-83-76832412William Johnson 044-58-34291125Steve Russell 018-45-89494257Ian Jordan

29 { “SSN: 044-58-3429”, “044-58-3429”, “1125”, “Steve Russell” } Set of reachable strings

30 { “SSN: 044-58-3429”, “044-58-3429”, “1125”, “Steve Russell” } Mr. Steve Russell

31 Experiments

32 Related Work Matching strings for table joins –Record Matching [Elmagarmid et. al. 07, Koudas et. al. SIGMOD06] –Schema Matching [Dhamankar et. al. SIGMOD04, Warren & Tompa VLDB06] Query Synthesis –from representative view [Das Sharma et.al. ICDT10, Tran et.al. SIGMOD09] Text-editing by example –QuickCode[Gulwani POPL11] –SMARTedit[Lau et.al. ML03], Simulatenous Editing[Miller et.al. USENIX01]

33 Thanks! End-Users Algorithm Designer s Software Developers Large potential

34 Backup slides

35 Semantic String Transformations Time (12 Hr)Time (24 Hr) 09309:30 AM 15203:20 PM 1648 0830 1015 2010 1012 1425 =TEXT(C,”00 00”)+0

36 Semantic String Transformations DateFormatted Date 06-03-2008Jun 3 rd, 2008 03-26-2010 08-01-2009 09-24-2007 05-14-2010 07-20-1998 10-24-2004 08-24-1972

37 Idea 1: Share sub-expressions T3T3 C1C1 C2C2 C3C3 s3s3 s4s4 s5s5 T1T1 C1C1 C2C2 C3C3 s1s1 s2s2 s3s3 T2T2 C1C1 C2C2 C3C3 s2s2 s3s3 s4s4 Select(C 3, T 2, C 1 =e) Select(C 2, T 3, C 1 =Select(C 2,T 2,C 1 =e)

38 Youtube Videos French Polish Urdu German Serbian Russian http://bit.ly/flashfill

39 Idea 2: CNF conditionals T C1C1 C2C2 C3C3 …CnCn C n+ 1 sssst v1v1 v2v2 … vmvm Out ssst

40 No. of Consistent Expressions

41 Succinct Representation

42 Performance

43 Ranking

44 Idea 2: CNF conditionals

45

46 Related Work Record Matching –Similarity functions for matching [Elmagarmid et. al. 07, Koudas et. al. SIGMOD06] –Customizable similarity function [Arasu et. al. VLDB09] Learning Schema Matches –iMAP [Dhamankar et. al. SIGMOD04] concat. of column strings using domain-specific knowledge –[Warren & Tompa VLDB06] concatenation of column substrings, single table

47 Related Work Query Synthesis [Das Sharma et.al. ICDT10, Tran et.al. SIGMOD09] –Infer relation from large representative example view –no joins or projections Text-editing using examples –QuickCode[Gulwani POPL11] string transformations –SMARTedit[Lau et.al. ML03], Simulatenous Editing[Miller et.al. USENIX01] programming by demonstration

48 General Framework A Domain-specific Transformation Language L –Expressive and succinct Efficient Data structures for set of expressions –Version-space algebra GenerateStr –All sets of expressions from I-O example Intersect –Intersect two sets of expressions

49 Emp Record SSNEmpIdName 027-36-45571254John Henry 034-83-76832412William Johnson 044-58-34291125Steve Russell 018-45-89494257Ian Jordan 023-34-32546418Mary Dina Input v 1 Output 044-58-3429Steve Russell 023-34-3254 Select(Name, EmpRecord, (SSN = v 1 )) Example - Lookup

50 ItemRec ItemIdItem ST-340Stroller BI-567Bib DI-328Diapers WI-989Wipes AS-469Aspirator PriceRec ItemIdPrice ST-340$145.67 BI-567$3.56 DI-328$21.45 WI-989$5.12 AS-469$2.56 Input v 1 Output Stroller$145.67 Bib Aspirator Wipes Select(Price, PriceRec, (ItemId = Select(ItemId, ItemRec, Item = v 1 )) Example – Transitive Lookups

51

52

53

54

55

56 T1T1 C1C1 C2C2 C3C3 s1s1 s2s2 s3s3 T2T2 C1C1 C2C2 C3C3 s2s2 s3s3 s4s4 TiTi C1C1 C2C2 C3C3 sisi s i+1 s i+2 Example … TmTm Input v 1 Output s1s1 smsm

57 T i-1 C1C1 C2C2 C3C3 s i-1 sisi s i+1 T i-2 C1C1 C2C2 C3C3 s i-2 s i-1 sisi Sub-expression Sharing

58

59

60

61

62 Current State of the Art: Help forums

63 Observations Semantic string transformations Input-output examples based interaction –New disambiguating inputs Add-in with the same interface


Download ppt "Learning Semantic String Transformations from Examples Rishabh Singh and Sumit Gulwani."

Similar presentations


Ads by Google