Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discovering Queries based on Example Tuples Yanyan Shen 1, Kaushik Chakrabati 2, Surajit Chaudhuri 2, Bolin Ding 2, Lev Novik 2 1 National University of.

Similar presentations


Presentation on theme: "Discovering Queries based on Example Tuples Yanyan Shen 1, Kaushik Chakrabati 2, Surajit Chaudhuri 2, Bolin Ding 2, Lev Novik 2 1 National University of."— Presentation transcript:

1 Discovering Queries based on Example Tuples Yanyan Shen 1, Kaushik Chakrabati 2, Surajit Chaudhuri 2, Bolin Ding 2, Lev Novik 2 1 National University of Singapore, 2 Microsoft Corporation

2 Complex Database Schema

3 Challenge: Querying Complex Databases SQL SELECT CustName, DevName, AppName FROM Customer, Sales, Device, App WHERE Sales.CustId=Custom.CustId AND Sales.DevId=Device.DevId AND Sales.AppId=App.AppId Target schema Relevant tables Join path Any help to formulate a SQL query?

4 ESRIdEmpIdAppIdDesc sr1e1a1Office crash sr2e2a3Dropbox can’t sync OIdEmpIdDevIdAppId o1e1d1a1 o2e2d3a3 o3e3d2a2 AppIdAppName a1Office 2013 a2Evernote a3Dropbox DevIdDevName d1ThinkPad X1 d2iPad Air d3Nexus 7 EmpIdEmpName e1Mike Stone e2Mary Lee e3Bob Nash CustIdCustName c1Mike Jones c2Mary Smith c3Bob Evans SIdCustIdDevIdAppId s1c1d1a1 s2c2d2a2 s3c3d3a3 Can Keyword Search Help? Output: matched rows Input: Where is schema information? Ambiguity *search for sales tuples Mike Jones s1ThinkPad X1 Office 2013 Mike Stone o1ThinkPad X1 Office 2013 Mike ThinkPad Office Customer Device App Employee Owner ESR Sales

5 Mike Mary ThinkPad iPad Office Dropbox Bob Our Proposal A CustIdCustName Customer DevIdDevName Device AppIdAppName App Sales SIdCustIdDevIdAppId B C Input (Example table) *Who bought which product with which app installed. Output(Project join query)

6 Roadmap Motivation & proposal Problem statement Solution –Candidate query generation –Candidate query verification VerifyAll SimplePrune Filter Experimental results Conclusion

7 Problem Statement Mike Mary ThinkPad iPad Office Dropbox Bob minimal Not minimal Developer

8 Solution Overview Candidate Query Generation Example Table Candidate Projection Column Retrieval Schema Graph Traversal Database Schema Candidate Query Verification Result Queries Database Instance IR Engine maintaining inverted index on text columns (CI)

9 Roadmap Motivation & proposal Problem statement Solution –Candidate query generation –Candidate query verification VerifyAll SimplePrune Filter Experimental results Conclusion

10 Candidate Query Generation Candidate Projection Column Retrieval –For each column in the example table, find candidate projection columns in the database satisfying column constraint: contain all the keywords in the column Mike Mary ThinkPad iPad Office Dropbox Bob Input columnCandidate projection columns ACustomer.CustName Employee.EmpName BDevice.DevName CApp.AppName ESR.Desc

11 Candidate Query Generation Candidate Query Enumeration –Follow candidate network generation algorithm [1] Mike Mary ThinkPad iPad Office Dropbox Bob [1] V. Hristidis and Y. Papakonstantinou. Discover: Keyword search in relational databases. VLDB Sales CustomerDevice App B CQ 1 A C Owner Employee App CQ 2 A B C Device CQ 3 Owner Employee Device ESR A B C Owner AppDevice CQ 4 BC Employee A ESR Owner EmployeeDeviceApp CQ 5 B C A No join is required!

12 Roadmap Motivation & proposal Problem statement Solution –Candidate query generation –Candidate query verification VerifyAll SimplePrune Filter Experimental results Conclusion

13 Algorithm 1: VerifyAll MaryiPad MikeThinkPad Office DropboxBob  Performing (CQ,r)-verification is expensive!  VerifyAll is wasteful as most candidate queries are invalid!

14 Opportunity of Pruning Mike Mary ThinkPad iPad Office Dropbox Bob (CQ 2,2) fails implies (CQ 5, 2) fails Failure dependency Verifying candidates with smaller join trees is more beneficial! Failure dependency Verifying candidates with smaller join trees is more beneficial!

15 Algorithm 2: SimplePrune Order candidate queries in increasing join tree size Keep a list of CQ-row verifications performed so far that failed Iterate over ordered candidate queries in the outer loop and rows in the inner loop. –When verify candidate Q, check if its failure result can be implied by the verifications in the list. If so, prune Q immediately. Otherwise, verify Q for all the rows.

16 Observation Mike Mary ThinkPad iPad Office Dropbox Bob  limited pruning!

17 Opportunity Mike Mary ThinkPad iPad Office Dropbox Bob Evaluating common sub-structure on certain row may prune multiple invalid candidates!

18 Filter Owner EmployeeDevice MikeThinkpadOffice ACB BA F 1 Owner Employee Device BA F 2 MaryiPad ACB

19 Dependency Properties of Filters Owner Employee Device A B ACB MaryiPad Filter-candidate dependency Inter-filter failure dependency Owner Employee Device A B ACB MaryiPad App C Inter-filter success dependency F 1 F 2

20 Adaptive Filter Selection ESR App C Employee A Owner Employee Device A B Owner AppDevice B C Owner Employee App A C CQ 2 CQ 3 CQ 4 J1J1 J2J2 J3J3 J4J4 (J 3,1) (J 3,2) (J 3,3) (J 4,1) (J 4,2) (J 4,3) (J 1,1)(J 1,2)(J 1,3)(J 2,1)(J 2,2)(J 2,3) 5 evaluations!

21 Adaptive Filter Selection ESR App C Employee A Owner Employee Device A B Owner AppDevice B C Owner Employee App A C CQ 2 CQ 3 CQ 4 J1J1 J2J2 J3J3 J4J4 (J 3,1) (J 3,2) (J 3,3) (J 4,1) (J 4,2) (J 4,3) 2 evaluations! (J 1,1)(J 1,2) (J 1,3) (J 2,1)(J 2,2)(J 2,3)

22 Filter Selection Problem

23 Roadmap Motivation Problem statement Solution –Candidate query generation –Candidate query verification VerifyAll SimplePrune Filter Experimental results Conclusion

24 Experiment Settings Dataset: IMDB Example table generation –Parameters: #rows, #columns, sparsity, value length for non-empty cells Implementations –VerifyAll –SimplePrune –Filter –Weave [2] Measures –Number of verifications performed –Execution time [2] L. Qian, M. J. Cafarella, and H. V. Jagadish. Sample-drive schema mapping. SIGMOD 2012.

25 Results on Various Example Tables Vary #rows Filter performs 5X fewer verifications than VerifyAll and 2X fewer than SimplePrune Filter is robust to #rows, i.e. requires similar #verifications Filter runs 4X faster than VerifyAll and 3X faster than SimplePrune

26 Comparison with Weave Filter requires 10X fewer verifications Filter runs 4X faster than Weave

27 Conclusion Develop a new search interface for discovering queries Address challenges in query discovery –Verify candidate queries efficiently Filter selection problem Greedy strategy

28 Thanks! Q&A


Download ppt "Discovering Queries based on Example Tuples Yanyan Shen 1, Kaushik Chakrabati 2, Surajit Chaudhuri 2, Bolin Ding 2, Lev Novik 2 1 National University of."

Similar presentations


Ads by Google