# Relaxing Join and Selection Queries Rares Vernica UC Irvine, USA Joint work with Nick Koudas, Chen Li, and Anthony K. H. Tung.

## Presentation on theme: "Relaxing Join and Selection Queries Rares Vernica UC Irvine, USA Joint work with Nick Koudas, Chen Li, and Anthony K. H. Tung."— Presentation transcript:

Relaxing Join and Selection Queries Rares Vernica UC Irvine, USA Joint work with Nick Koudas, Chen Li, and Anthony K. H. Tung

Rares Vernica, UC Irvine 2 Query Example SELECT * FROM Jobs J, Candidates C WHERE J.Salary <= 95 AND J.Zipcode = C.Zipcode AND J.Zipcode = C.Zipcode AND C.WorkExp >= 5; AND C.WorkExp >= 5; JobsCandidates IDCompanyZipcodeSalaryIDZipcodeExpSalaryWorkExp J1Broadcom9204780C1936521203 J2Intel9365295C2926121306 J3Microsoft82632120C3826321005 J4IBM90391130C4903911501...……… ………

Rares Vernica, UC Irvine 3 What if the query answer is empty? SELECT * FROM Jobs J, Candidates C WHERE J.Salary <= 95 AND J.Zipcode = C.Zipcode AND J.Zipcode = C.Zipcode AND C.WorkExp >= 5; AND C.WorkExp >= 5; Adjust the conditions  What conditions to adjust?  How to adjust them?

Rares Vernica, UC Irvine 4 Example Percentages of Empty Result Queries In a Customer Relationship Management (CRM) application developed by IBMIn a Customer Relationship Management (CRM) application developed by IBM  18.07% (3,396 empty result queries in 18,793 queries) In a real estate application developed by IBMIn a real estate application developed by IBM  5.75% In a digital library application [JCM + 00]In a digital library application [JCM + 00]  10.53% In a bioinformatics application [RCP + 98]In a bioinformatics application [RCP + 98]  38% Efficient Detection of Empty-Result Queries (p.1015)Gang Luo (IBM T.J. Watson Research Center, USA) VLDB 2006 Efficient Detection of Empty-Result Queries (p.1015)Gang Luo (IBM T.J. Watson Research Center, USA) VLDB 2006

Rares Vernica, UC Irvine 5 Observations JobsCandidates IDCompanyZipcodeSalaryIDZipcodeExpSalaryWorkExp J1Broadcom9204780C1936521203 J2Intel9365295C2926121306 J3Microsoft82632120C3826321005 J4IBM90391130C4903911501...……… ……… Different ways to adjust the conditions: Select vs. Join How much to adjust each condition? Salary <= 100 vs. Salary <= 120 Adjust join vs. Adjust both selections Salary <= 95 WorkExp >= 5

Rares Vernica, UC Irvine 6 Contributions Query relaxation framework for selections and joins Lattice-based approach for query relaxation Efficient relaxation algorithms

Rares Vernica, UC Irvine 7 Overview 1.Motivation 2.Query Relaxation 3.Lattice-based Relaxation 4.Relaxation Algorithms 5.Variations 6.Experiments

Rares Vernica, UC Irvine 8 Query Relaxation Top-k / Nearest neighbor  Weight for each condition Skyline  No weights are needed  Conditions are not considered equal  Return non dominated points

Rares Vernica, UC Irvine 9 Query Relaxation Skyline Stephan Börzsönyi, Donald Kossmann, Konrad Stocker: The Skyline Operator. ICDE 2001

Rares Vernica, UC Irvine 10 Overview 1.Motivation 2.Query Relaxation 3.Lattice-based Relaxation 4.Relaxation Algorithms 5.Variations 6.Experiments

Rares Vernica, UC Irvine 11 Lattice-based Relaxation JobsCandidates IDCompanyZipcodeSalaryIDZipcodeExpSalaryWorkExp J1Broadcom9204780C1936521203 J2Intel9365295C2926121306 J3Microsoft82632120C3826321005 J4IBM90391130C4903911501...……… ……… Salary <= 95 WorkExp >= 5 R – select on Jobs J – join condition S – select on Candidates

Rares Vernica, UC Irvine 12 Overview 1.Motivation 2.Query Relaxation 3.Lattice-based Relaxation 4.Relaxation Algorithms 5.Variations 6.Experiments

Rares Vernica, UC Irvine 13 Relaxing Selection Conditions JobsCandidates IDCompanyZipcodeSalaryIDZipcodeExpSalaryWorkExp J1Broadcom9204780C1936521203 J2Intel9365295C2926121306 J3Microsoft82632120C3826321005 J4IBM90391130C4903911501...……… ……… Algorithm: 1.Compute Skyline on Jobs 2.Compute Skyline on Candidates 3.Join the Skylines Salary <= 95 WorkExp >= 5 INCORRECT Skyline Skyline Empty Join Skyline

Rares Vernica, UC Irvine 14 Relaxing Selection Conditions JobsCandidates IDCompanyZipcodeSalaryIDZipcodeExpSalaryWorkExp J1Broadcom9204780C1936521203 J2Intel9365295C2926121306 J3Microsoft82632120C3826321005 J4IBM90391130C4903911501...……… ……… Join First Algorithm: 1.Compute the join (disregarding the selections) 2.Compute Skyline on join results Salary <= 95 WorkExp >= 5 Join Skyline

Rares Vernica, UC Irvine 15 Relaxing Selection Condition Variations Pruning Join  Build the Skyline during the join Pruning Join+  Pruning Join  Build the local Skyline before the join Sorted Access Join  Fagin’s Top-k: sort the columns on relaxation  Compute the join Skyline

Rares Vernica, UC Irvine 16 Relaxing all conditions Multi-Dim.-Index-based-Relaxation Algorithm: 1.Traverse the index structure top-down 2.Form pairs of nodes or records 3.Build the Skyline Skyline Queue

Rares Vernica, UC Irvine 17 Overview 1.Motivation 2.Query Relaxation 3.Lattice-based Relaxation 4.Relaxation Algorithms 5.Variations 6.Experiments

Rares Vernica, UC Irvine 18 Variations Computing Top-k over Skyline  Weight to each condition Queries with multiple joins Conditions on nonnumeric attributes  Dominance checking function

Rares Vernica, UC Irvine 19 Overview 1.Motivation 2.Query Relaxation 3.Lattice-based Relaxation 4.Relaxation Algorithms 5.Variations 6.Experiments

Rares Vernica, UC Irvine 20 Experimental Setting Datasets  Real 1.Internet Movie Database (IMDB) Movies (120k) & ActorInMovies (1.2m) 2.Census-Income – UCI KDD Repository Census (200k)  Synthetic Independent, Correlated, and Anticorrelated Implementation  GNU C++  Spatial Index Library (R-tree)  Linux, AMD Opteron 240, 1GB RAM

Rares Vernica, UC Irvine 21 IMDB Dataset Different algorithms, different behaviors

Rares Vernica, UC Irvine 22 Correlated Dataset Different datasets, different behaviors Anticorrelated Dataset Independent Dataset

Rares Vernica, UC Irvine 23 How big is the Skyline?

Rares Vernica, UC Irvine 24 Relaxing join takes time Self-join on Census Dataset

Rares Vernica, UC Irvine 25 Top-k over Skyline IMDB Dataset

Rares Vernica, UC Irvine 26 Related Work Muslea et al.  Alternate forms of conjunctive expressions Efficient Skyline algorithms  Selection queries Efficient Top-k algorithms  Require weights for conditions

Rares Vernica, UC Irvine 27 Conclusions Query relaxation framework for selections and joins Lattice-based approach for query relaxation Efficient relaxation algorithms

Rares Vernica, UC Irvine 28 Future Work Optimum use of the lattice structure Relax conditions on string attributes Algorithms applicable outside the databases

Questions ?

Rares Vernica, UC Irvine 30

Rares Vernica, UC Irvine 31 Skyline vs. Top-k

Rares Vernica, UC Irvine 32 Skyline vs. Top-k over Skyline

Download ppt "Relaxing Join and Selection Queries Rares Vernica UC Irvine, USA Joint work with Nick Koudas, Chen Li, and Anthony K. H. Tung."

Similar presentations