7 Problem Statement Input: an example table T Output: project join query such that(valid): every row 𝑟 in T is present in the query result(minimal): removing any edges or nodes from the join tree will lead to an invalid queryMikeMaryThinkPadiPadOfficeDropboxBobminimalNot minimalDeveloper
8 Solution Overview Candidate Query Generation Candidate Query VerificationCandidate Projection Column RetrievalSchema Graph TraversalExample TableResult QueriesIR Engine maintaining inverted index on text columns (CI)Database SchemaDatabase Instance
10 Candidate Query Generation MikeMaryThinkPadiPadOfficeDropboxBobCandidate Projection Column RetrievalFor each column in the example table, find candidate projection columns in the database satisfying column constraint: contain all the keywords in the columnInput columnCandidate projection columnsACustomer.CustNameEmployee.EmpNameBDevice.DevNameCApp.AppNameESR.Desc
11 Candidate Query Generation MikeMaryThinkPadiPadOfficeDropboxBobCandidate Query EnumerationFollow candidate network generation algorithm No join is required!CQ1CQ2SalesOwnerCQ3OwnerAABCABCBCustomerDeviceAppEmployeeDeviceEmployeeDeviceAppCESRCQ4OwnerCQ5OwnerGenerate join tree 𝐽Generate mapping 𝜙Check minimal:- Every leaf node contains a column that is mapped by an input columnCBABAppDeviceEmployeeDeviceAppCESRAESREmployee V. Hristidis and Y. Papakonstantinou. Discover: Keyword search in relational databases. VLDB 2002.
13 Algorithm 1: VerifyAllIterate over candidate queries in outer loop and rows in ET in inner loop (or vice versa) and verify whether a candidate query 𝑪𝑸 contains a row 𝒓 in its output.A candidate is valid iff it contains all the rows in ET.Performing (CQ,r)-verification is expensive!VerifyAll is wasteful as most candidate queries are invalid!MaryiPadMikeThinkPadOfficeDropboxBobNon-empty result implies 𝐶 𝑄 2 satisfies row 1Empty result implies 𝐶 𝑄 2 fails for row 2𝐶 𝑄 2 ,2 -verification:SELECT * TOP 1FROM Owner,Employee,Device,AppWHERE Owner.EmpId=Employee.EmpId AND Owner.DevId=Device.DevId AND Owner.AppId=App.AppIdAND CONTAINS(EmpName,’Mary’) AND CONTAINS(DevName,’iPad’)𝐶 𝑄 2 ,1 -verification:SELECT * TOP 1FROM Owner,Employee,Device,AppWHERE Owner.EmpId=Employee.EmpId AND Owner.DevId=Device.DevId AND Owner.AppId=App.AppIdAND CONTAINS(EmpName,’Mike’) AND CONTAINS(DevName,’ThinkPad’) AND CONTAINS(AppName,’Office’)
14 Opportunity of Pruning MikeMaryThinkPadiPadOfficeDropboxBob(CQ2,2) fails implies (CQ5, 2) fails𝐶 𝑄 2 ,2 -verification:SELECT * TOP 1FROM Owner,Employee,Device,AppWHERE Owner.EmpId=Employee.EmpId AND Owner.DevId=Device.DevIdAND Owner.AppId=App.AppIdAND CONTAINS(EmpName,’Mary’) AND CONTAINS(DevName,’iPad’)Failure dependencyVerifying candidates with smaller join trees is more beneficial!𝐶 𝑄 5 ,2 -verification:SELECT * TOP 1FROM Owner,Employee,Device,App, ESRWHERE Owner.EmpId=Employee.EmpId AND Owner.DevId=Device.DevIdAND Owner.AppId=App.AppId AND ESR.AppId=App.AppIdAND CONTAINS(EmpName,’Mary’) AND CONTAINS(DevName,’iPad’)
15 Algorithm 2: SimplePrune Order candidate queries in increasing join tree sizeKeep a list of CQ-row verifications performed so far that failedIterate over ordered candidate queries in the outer loop and rows in the inner loop.When verify candidate Q, check if its failure result can be implied by the verifications in the list. If so, prune Q immediately. Otherwise, verify Q for all the rows.
16 Observation limited pruning! Mike Mary ThinkPad iPad Office Dropbox Boblimited pruning!
17 OpportunityMikeMaryThinkPadiPadOfficeDropboxBobEvaluating common sub-structure on certain row may prune multiple invalid candidates!
22 Filter Selection Problem Given the set of filters for all the candidate queries, select a set of filters with minimized cost such that all the candidate queries are verified as valid/invalid after evaluating the selected filters.Cost of 𝐹 𝑖 : # of joins in the join tree of 𝐹 𝑖Problem Complexity: NP-hardGreedy algorithm: approx. ratio:
24 Experiment Settings Dataset: IMDB Example table generation Parameters: #rows, #columns, sparsity, value length for non-empty cellsImplementationsVerifyAllSimplePruneFilterWeaveMeasuresNumber of verifications performedExecution time L. Qian, M. J. Cafarella, and H. V. Jagadish. Sample-drive schema mapping. SIGMOD 2012.
25 Results on Various Example Tables Vary #rowsFilter performs 5X fewer verifications than VerifyAll and 2X fewer than SimplePruneFilter is robust to #rows, i.e. requires similar #verificationsFilter runs 4X faster than VerifyAll and 3X faster than SimplePrune
26 Comparison with Weave Filter requires 10X fewer verifications Filter runs 4X faster than Weave
27 Conclusion Develop a new search interface for discovering queries Address challenges in query discoveryVerify candidate queries efficientlyFilter selection problemGreedy strategy
Your consent to our cookies if you continue to use this website.