Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Richa Varshney.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha ( )
CS4432: Database Systems II
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
SUPPORTING TOP-K QUERIES IN RELATIONAL DATABASES. PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, MARCH 2004 Sowmya Muniraju.
1 Relational Query Optimization Module 5, Lecture 2.
6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades.
Query Processing (overview)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
Query Compiler: 16.7 Completing the Physical Query-Plan CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung ID: 212.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Top- K Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel Presenter: Avinandan Sengupta.
Access Path Selection in a Relation Database Management System (summarized in section 2)
Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign.
Da Yan and Wilfred Ng The Hong Kong University of Science and Technology.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Database Management 9. course. Execution of queries.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
Querying Structured Text in an XML Database By Xuemei Luo.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Copyright © Curt Hill Query Evaluation Translating a query into action.
1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
Presented by Suresh Barukula 2011csz  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
CS4432: Database Systems II Query Processing- Part 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Query Processing CS 405G Introduction to Database Systems.
Top-k Query Processing Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem, and Moni Naor + Sushruth P. + Arjun Dasgupta.
Database Searching and Information Retrieval Presented by: Tushar Kumar.J Ritesh Bagga.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Chapter 13: Query Processing
CS4432: Database Systems II Query Processing- Part 1 1.
1 VLDB, Background What is important for the user.
Query Optimization. overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin) DBA,
Storage Access Paging Buffer Replacement Page Replacement
RankSQL: Query Algebra and Optimization for Relational Top-k Queries
Parallel Databases.
Supporting Ad-Hoc Ranking Aggregates
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Chapter 12: Query Processing
Overview of Query Optimization
Chapter 15 QUERY EXECUTION.
Rank Aggregation.
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
One-Pass Algorithms for Database Operations (15.2)
Lecture 27: Optimizations
Implementation of Relational Operations
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Presentation transcript:

Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Richa Varshney

Introduction O Ordered set of join results according to some provided function. O Often searches are done on multiple features. O Each feature produces a different ranking for the query. O Joining the individual feature rankings to produce a global ranking. 2

Example 1: Ranking in Multimedia Retrieval Color Histogram Edge Histogram Texture Query Color Histogram Edge Histogram Texture Video Database 3

Example 2 SELECT h.id, s.name FROM houses h, schools s WHERE h.location = s.location ORDER BY h.price+10 x s.tuition STOP AFTER 4 4 4

Example 2 (Cont’d) IDLocationPrice LafayetteW.LafayetteIndianapolisKokomoLafayetteKokomo……90,000110,000111,000118,000125,000154,000 IDLocationTuition IndianapolisW.LafayetteLafayetteLafayetteIndianapolisIndianapolisKokomoKokomo Schools Houses

Motivation SELECT A.1,B.2 FROM A,B,C WHERE A.1 = B.1 and B.2 = C.2 ORDER BY (0.3*A.1+0.7*B.2) STOP AFTER 5; Problems:- Sorting is an expensive operation. Sorting is a blocking operator. 6 6

Contribution O Propose a new Rank-Join algorithm O Analyze the I/O cost of the algorithm O Implement the algorithm O Propose a score-guided and adaptive join strategy O Evaluate performance 7 7

Ripple Join Cartesian product L x R (L1(1,1,5) R1(1,3,5)) 8 8 (L2,R2) {(2,2,4),(2,1,4)} (L2,R1) {2,2,4), (1,3,5)} (L1,R2) {(1,1,5), (2,1,4)} L L R R

Variation Of Ripple Join Rectangle Block Hash Ripple Join: where all the sampled tuples are kept in hash tables in memory 9 9

Query Model: Top-k Join O m Relations R 1, ….., R m | R i has: O n attributes O score attribute, s i (can be an expression over other attributes) O A global score for a join result is computed as F(s 1,…., s m ) O A top-k join query is an ordered set of join results according to some provided function that combines the orders on each input. O An example template: SELECT some_attributes FROM R 1,…..,R m WHERE join_condition ORDER BY F(s 1,…..,s m ) STOP AFTER k 10

1) Generate new valid join combinations 2) Compute score for each combination 3) For each incoming input, calculate the threshold score: a) The last seen feature value and the top ranked feature value for all other features in the query. b) Store the maximum of these as T (threshold) 4) Store top k(maximum combined score) results in priority queue. 5) Halt when lowest value of queue ≥ T 11 Rank-Join Algorithm 11

Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 3 Compute a Threshold (T) by Max {(Last L).B + (First R.B), (First L).B + (Last R).B} (1). Get a valid combination using any certain algorithm Ripple Select (L1, R1) => No Result 12 Example

Example--Cont. (1) Get a valid combination using any certain algorithm Select (L2, R2) (L2, R2), (L2, R1), (L1, R2) => (L1, R2) (2) Compute the score (J) for the result J1(L1, R2) => L.B + R.B = = 9 13 Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 3

O (3) Compute a Threshold (T) score by Max {(Last L).B + (First R.B), (First L).B + (Last R).B} Selection (L1, R1), (L2, R2) => T = Max (L2.B + R1.B, L1.B + R2.B) =Max (4+5, 5+4) = 9 O (4) J1= 9,T = 9,J1 >= T,Report J1 Since we need top 3 (k=3), continue until k=3 and Min(J1, J2, …Jk) > T 14 Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 3 Example--Cont. 14

(1) Select (L3, R3) (L3, R3), (L3, R1), (L3, R2), (L1, R3), (L2, R3) => (L3, R3), (L2, R3) (2) J2(L2, R3) = = 7 J3(L3, R3) = 3 + 3= 6 Example--Cont. 15 Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 3

O (3) Calculate T= Max { (Last L).B + (First R).B,(First L).B+ (Last R).B} = Max {L3.B + R1.B, L1.B + R3.B}= Max(3 + 5, 5 + 3) = 8 O (4) J1(L1,R2) = 9(reported),J2( L2, R3) = 7,J3(L3, R3) = 6 (Note, J’s are in descending order) Min (J) = 6 < T Continue 16 Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 3 Example--Cont. 16

(1)Select (L4, R4) => (L4, R1), (L2, R4), (L3, R4) (2) J(L4, R1) = 7, J(L2, R4) = 6, J(L3, R4) = 5 (3) T= Max(L4.B+R1.B, L1.B + R4.B) = Max(7, 7) = 7 (4) J1(L1,R2) = 9, J2(L2, R3) = 7, J3(L4, R1) = 7,J3(L3,R3) = 6, J4(L2, R4) = 6, J5(L3, R4) = 5 Min(J1, J2) = 7 >= T (k = 3) Example--Cont. 17 Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 3

Hash Rank Join (HRJN) Operator O Built on idea of hash ripple join O Initialized by specifying four parameters: O Two inputs(Can be HRJN operator) O Join condition(general equality condition/computes valid join) O Combining function(monotone/computes global scores) O Maintains highest (first) and lowest (last selected) objects from each relation. O Results are added to a priority queue 18

Hash Rank Join (HRJN) Operator: Problems O Buffer Problem O Cannot predict how many partial joins will result O Local Ranking Problem 19

HRJN Solutions O Use Block Ripple Join to solve Local Ranking Problem. (e.g. block size = 2) 20

HRJN Solutions—Cont. O HRJN* score-guided join strategy O How to select next (block) tuple T1 = f(L top,R bottom ) and T2 = f(L bottom,R top ), where f is the ranking function Case 1: T1 >T2, more inputs should be retrieved from R Case 2: T1 <T2, more inputs should be retrieved from L 21

An adaptive join strategy O Use input availability as a guide instead of the aforementioned score-guided strategy O If both inputs are available, choose the next input to process. O Otherwise, the available input is processed. O e.g., a mediator over Web-accessible sources and distributed multimedia repositories 22

Join Order O When more than two tables join, the join order matters. (A and C have high similarity) 23

O Rank-Join order heuristic - Get a ranked sample, top S ranked list from L and R - Calculate the similarity using Footrule 24 Join Order Algorithm where (i, j ) is a valid join result that joins object i from L with object j from R

25 Rank Join Order Heuristic

Performance Evaluation Changing the number of required answers: Selectivity = 0.2 % and m= 4 26

Performance Evaluation--Cont. Changing the number of required answers: Selectivity = 0.2 % and m= 4 27

Performance Evaluation--Cont. Changing the number of required answers: Selectivity = 0.2 % and m= 4 28

Performance Evaluation--Cont. Changing the join selectivity: m =4 and K =50 29

Performance Evaluation--Cont. Changing the join selectivity: m =4 and K =50 30

Performance Evaluation--Cont. Changing the join selectivity: m =4 and K =50 31

Performance Evaluation--Cont. Effect of pipelining: selectivity = 0. 2% and K =50 32

Performance Evaluation--Cont. Effect of pipelining: selectivity = 0. 2% and K =50 33

Performance Evaluation--Cont. Effect of pipelining: selectivity = 0. 2% and K =50 34

35