Presentation is loading. Please wait.

Presentation is loading. Please wait.

Relaxing Join and Selection Queries

Similar presentations


Presentation on theme: "Relaxing Join and Selection Queries"— Presentation transcript:

1 Relaxing Join and Selection Queries
Rares Vernica UC Irvine, USA Joint work with Nick Koudas, Chen Li, and Anthony K. H. Tung

2 Rares Vernica, UC Irvine
Query Example SELECT * FROM Jobs J, Candidates C WHERE J.Salary <= 95 AND J.Zipcode = C.Zipcode AND C.WorkExp >= 5; Make a “real” story; companies are very eager to find the people they want. We assume that closer zip codes mean closer areas. Jobs Candidates ID Company Zipcode Salary ExpSalary WorkExp J1 Broadcom 92047 80 C1 93652 120 3 J2 Intel 95 C2 92612 130 6 J3 Microsoft 82632 C3 100 5 J4 IBM 90391 C4 150 1 ... Rares Vernica, UC Irvine

3 What if the query answer is empty?
SELECT * FROM Jobs J, Candidates C WHERE J.Salary <= 95 AND J.Zipcode = C.Zipcode AND C.WorkExp >= 5; Queries can return nothing It is important to have results Automatically do relaxation! Adjust the conditions What conditions to adjust? How to adjust them? Rares Vernica, UC Irvine

4 Example Percentages of Empty Result Queries
In a Customer Relationship Management (CRM) application developed by IBM 18.07% (3,396 empty result queries in 18,793 queries) In a real estate application developed by IBM 5.75% In a digital library application [JCM+00] 10.53% In a bioinformatics application [RCP+98] 38% Efficient Detection of Empty-Result Queries (p.1015)Gang Luo (IBM T.J. Watson Research Center, USA) VLDB 2006 Rares Vernica, UC Irvine

5 Rares Vernica, UC Irvine
Observations Different ways to adjust the conditions: Select vs. Join How much to adjust each condition? Salary <= 100 vs. Salary <= 120 Adjust join vs. Adjust both selections Jobs Candidates ID Company Zipcode Salary ExpSalary WorkExp J1 Broadcom 92047 80 C1 93652 120 3 J2 Intel 95 C2 92612 130 6 J3 Microsoft 82632 C3 100 5 J4 IBM 90391 C4 150 1 ... Salary <= 95 WorkExp >= 5 Rares Vernica, UC Irvine

6 Rares Vernica, UC Irvine
Contributions Query relaxation framework for selections and joins Lattice-based approach for query relaxation Efficient relaxation algorithms Efficiency is a big issue Rares Vernica, UC Irvine

7 Rares Vernica, UC Irvine
Overview Motivation Query Relaxation Lattice-based Relaxation Relaxation Algorithms Variations Experiments Rares Vernica, UC Irvine

8 Rares Vernica, UC Irvine
Query Relaxation Top-k / Nearest neighbor Weight for each condition Skyline No weights are needed Conditions are not considered equal Return non dominated points Skyline is not the only way. Skyline does not care which one is more important. We are not comparing apples with oranges. Rares Vernica, UC Irvine

9 Rares Vernica, UC Irvine
Query Relaxation Skyline Stephan Börzsönyi, Donald Kossmann, Konrad Stocker: The Skyline Operator. ICDE 2001 In this diagram we are not relaxing join conditions. Each point is a join pair. Rares Vernica, UC Irvine

10 Rares Vernica, UC Irvine
Overview Motivation Query Relaxation Lattice-based Relaxation Relaxation Algorithms Variations Experiments Rares Vernica, UC Irvine

11 Lattice-based Relaxation
R – select on Jobs J – join condition S – select on Candidates Sometimes we might not want to relax the join (e.g.: attribute is an ID) Relaxation is done automatically by the system Jobs Candidates ID Company Zipcode Salary ExpSalary WorkExp J1 Broadcom 92047 80 C1 93652 120 3 J2 Intel 95 C2 92612 130 6 J3 Microsoft 82632 C3 100 5 J4 IBM 90391 C4 150 1 ... Salary <= 95 WorkExp >= 5 Rares Vernica, UC Irvine

12 Rares Vernica, UC Irvine
Overview Motivation Query Relaxation Lattice-based Relaxation Relaxation Algorithms Variations Experiments Rares Vernica, UC Irvine

13 Relaxing Selection Conditions
INCORRECT Algorithm: Compute Skyline on Jobs Compute Skyline on Candidates Join the Skylines Skyline as a relational algebra operator with various properties Jobs Candidates ID Company Zipcode Salary ExpSalary WorkExp J1 Broadcom 92047 80 C1 93652 120 3 J2 Intel 95 C2 92612 130 6 J3 Microsoft 82632 C3 100 5 J4 IBM 90391 C4 150 1 ... Salary <= 95 WorkExp >= 5 Empty Join Skyline Skyline Skyline Rares Vernica, UC Irvine

14 Relaxing Selection Conditions
Join First Algorithm: Compute the join (disregarding the selections) Compute Skyline on join results Jobs Candidates ID Company Zipcode Salary ExpSalary WorkExp J1 Broadcom 92047 80 C1 93652 120 3 J2 Intel 95 C2 92612 130 6 J3 Microsoft 82632 C3 100 5 J4 IBM 90391 C4 150 1 ... Salary <= 95 WorkExp >= 5 Join Skyline Rares Vernica, UC Irvine

15 Relaxing Selection Condition
Variations Pruning Join Build the Skyline during the join Pruning Join+ Build the local Skyline before the join Sorted Access Join Fagin’s Top-k: sort the columns on relaxation Compute the join Skyline Main idea of the algorithms. For more details see the paper. Rares Vernica, UC Irvine

16 Relaxing all conditions
Multi-Dim.-Index-based-Relaxation Algorithm: Traverse the index structure top-down Form pairs of nodes or records Build the Skyline Queue Skyline Index exists, e.g., R-tree; works with other types of multi-dimensional indices Children Queue: Enqueue, Dequeue Rares Vernica, UC Irvine

17 Rares Vernica, UC Irvine
Overview Motivation Query Relaxation Lattice-based Relaxation Relaxation Algorithms Variations Experiments Rares Vernica, UC Irvine

18 Rares Vernica, UC Irvine
Variations Computing Top-k over Skyline Weight to each condition Queries with multiple joins Conditions on nonnumeric attributes Dominance checking function Explain the Top-k over Skyline Rares Vernica, UC Irvine

19 Rares Vernica, UC Irvine
Overview Motivation Query Relaxation Lattice-based Relaxation Relaxation Algorithms Variations Experiments Rares Vernica, UC Irvine

20 Rares Vernica, UC Irvine
Experimental Setting Datasets Real Internet Movie Database (IMDB) Movies (120k) & ActorInMovies (1.2m) Census-Income – UCI KDD Repository Census (200k) Synthetic Independent, Correlated, and Anticorrelated Implementation GNU C++ Spatial Index Library (R-tree) Linux, AMD Opteron 240, 1GB RAM Rares Vernica, UC Irvine

21 Different algorithms, different behaviors
We present just a few of our results, for more details see the paper. IMDB Dataset Rares Vernica, UC Irvine

22 Different datasets, different behaviors
Correlated Dataset Anticorrelated Dataset Independent Dataset Rares Vernica, UC Irvine

23 Rares Vernica, UC Irvine
How big is the Skyline? Skyline size depends on cardinality, number of selections, and data size. Rares Vernica, UC Irvine

24 Relaxing join takes time
Self-join on Census Dataset Rares Vernica, UC Irvine

25 Rares Vernica, UC Irvine
Top-k over Skyline IMDB Dataset Rares Vernica, UC Irvine

26 Rares Vernica, UC Irvine
Related Work Muslea et al. Alternate forms of conjunctive expressions Efficient Skyline algorithms Selection queries Efficient Top-k algorithms Require weights for conditions Muslea deals primarily with expressibility issues without paying attention to the data management issues involved. We relax queries with selection and join conditions. Other studies assume that the attributes and ordering of the values are already pre-determined in a single table; our work require us to compute skyline dynamically for a set of tables which are to be join and whose attribute values must be determined on the fly. Our work considers both the selection and join conditions for relaxation. Rares Vernica, UC Irvine

27 Rares Vernica, UC Irvine
Conclusions Query relaxation framework for selections and joins Lattice-based approach for query relaxation Efficient relaxation algorithms Efficiency is a big issue Rares Vernica, UC Irvine

28 Rares Vernica, UC Irvine
Future Work Optimum use of the lattice structure Relax conditions on string attributes Algorithms applicable outside the databases Rares Vernica, UC Irvine

29 Questions ?

30 Rares Vernica, UC Irvine

31 Rares Vernica, UC Irvine
Skyline vs. Top-k Rares Vernica, UC Irvine

32 Skyline vs. Top-k over Skyline
Rares Vernica, UC Irvine


Download ppt "Relaxing Join and Selection Queries"

Similar presentations


Ads by Google