Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.

Slides:



Advertisements
Similar presentations
Information Retrieval in Practice
Advertisements

Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,
指導教授:陳良弼 老師 報告者:鄧雅文  Introduction  Related Work  Problem Formulation  Future Work.
Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha ( )
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
SUPPORTING TOP-K QUERIES IN RELATIONAL DATABASES. PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, MARCH 2004 Sowmya Muniraju.
Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.
MAE 552 – Heuristic Optimization Lecture 27 April 3, 2002
Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Depth Estimation for Ranking Query Optimization Karl Schnaitter, UC Santa Cruz Joshua Spiegel, BEA Systems, Inc. Neoklis Polyzotis, UC Santa Cruz.
Top- K Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel Presenter: Avinandan Sengupta.
CS246 Ranked Queries. Junghoo "John" Cho (UCLA Computer Science)2 Traditional Database Query (Dept = “CS”) & (GPA > 3.5) Boolean semantics Clear boundary.
Query Processing Presented by Aung S. Win.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP-BY QUERIES Swarup Acharya Phillip Gibbons Viswanath Poosala ( Information Sciences Research Center,
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Database Management 9. course. Execution of queries.
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
Querying Structured Text in an XML Database By Xuemei Luo.
Multiple Aggregations Over Data Streams Rui ZhangNational Univ. of Singapore Nick KoudasUniv. of Toronto Beng Chin OoiNational Univ. of Singapore Divesh.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Copyright © Curt Hill Query Evaluation Translating a query into action.
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Richa Varshney.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.
Hashing is a method to store data in an array so that sorting, searching, inserting and deleting data is fast. For this every record needs unique key.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
Presented by Suresh Barukula 2011csz  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
CS4432: Database Systems II Query Processing- Part 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
CSCE Database Systems Chapter 15: Query Execution 1.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Top-k Query Processing Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem, and Moni Naor + Sushruth P. + Arjun Dasgupta.
Database Searching and Information Retrieval Presented by: Tushar Kumar.J Ritesh Bagga.
Switch off your Mobiles Phones or Change Profile to Silent Mode.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.
By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
1 VLDB, Background What is important for the user.
1 Chengkai Li Kevin-Chen-Chuan Chang Ihab Ilyas Sumin Song Presented by: Mariam John CSE /20/2006 RankSQL: Query Algebra and Optimization for Relational.
Chiu Luk CS257 Database Systems Principles Spring 2009
Information Retrieval in Practice
RankSQL: Query Algebra and Optimization for Relational Top-k Queries
Information Retrieval in Practice
Seung-won Hwang, Kevin Chen-Chuan Chang
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Ripple Joins for Online Aggregation
Top-k Query Processing
Rank Aggregation.
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
One-Pass Algorithms for Database Operations (15.2)
Implementation of Relational Operations
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Presentation transcript:

Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington

Introduction Often searches are done on multiple features Each feature produces a different ranking for the query Must thus join and aggregate rankings on different features

Example Find location for a house such that the combination of the cost of the house and 5 years tuition at a nearby school is minimal. Exact location is not predefined in query, per location the house and school features would have to be analyzed. Exact location is not predefined in query, per location the house and school features would have to be analyzed.

Motivation Current techniques decouple join and sorting (ranking) of results. Sorting is expensive and is a blocking operation. More apparent if ranking and the joining features are different.

Rank-Join Algorithm 1)Generate new valid join combinations 2)Compute score for each combination 3)For each incoming input, calculate the total score of: a)The last seen feature value and the top ranked feature value for all other features in the query. b)Store the maximum of these as T (threshold) 4)Store top k in priority queue. 5)Halt when lowest value of queue ≥ T

Optimality Is Instance Optimal over all correct top-K join algorithms. Guarantees that cost of Rank-Join is O (cost of any other algorithm). Guarantees that cost of Rank-Join is O (cost of any other algorithm). Mathematically: Mathematically: Cost(Rank-Join) ≤ c*Cost(Any Other Algorithm) + c’ c is the optimality ratio c, c’ > 0

Rank-Join Continued … Join strategy crucial Recommended: Ripple Join Recommended: Ripple Join Alternates between tuples Alternates between tuples Flexible in the way it sweeps out (rectangular, etc) Flexible in the way it sweeps out (rectangular, etc) Retains ordering in considering samples Retains ordering in considering samples Variant of Rank-Join Hash Rank Join (HRJN) Hash Rank Join (HRJN) Block Ripple Join Block Ripple Join

Hash Rank Join (HRJN) Operator Built on idea of hash ripple join Inputs are as two hash tables Inputs are as two hash tables Maintains highest (first) and lowest (last selected) objects from each relation. Results are added to a priority queue Advantages: Smaller space requirement Smaller space requirement Can be pipelined Can be pipelined

Hash Rank Join (HRJN) Operator: Problems Local Ranking Problem Results from three or more input streams Results from three or more input streams Larger queue sizes More database accesses Buffer Problem Cannot predict how many partial joins will result Cannot predict how many partial joins will result

HRJN Solutions? Block Ripple Joins Do comparisons as blocks Score-Guided Strategy If thresholds are very different, then this may be because of the way one of the rankings is larger and descends at a slower rate Can then take more inputs from the slower growing ranking so that the threshold goes closer to the other thresholds

Optimal Join-Order Try to have the least number of input records in order to get a correct ranking No clear way of estimating the order of joins Have a heuristic – Footrule Distance Simple measure of similarity among two rankings. Simple measure of similarity among two rankings. First join the most similar rankings This would quickly yield a join by accessing fewer records

Rank-Join Algorithm: Benefits What can it do? Integrates well with query plans Integrates well with query plans Produces results as fast as possible Produces results as fast as possible Provides performance guarantees Provides performance guarantees Minimizes space requirements Minimizes space requirements Offers a mechanism to determine the best order of joining to execute query optimally. Offers a mechanism to determine the best order of joining to execute query optimally. Can be improved further if random access is available Can be improved further if random access is available Can eliminate on-the-fly duplicate elimination Can eliminate on-the-fly duplicate elimination

References “Supporting top-k join queries in relational databases” - Ihab Ilyas, Walid Aref, Ahmed Elmagarmid (2004) Jing Chen : DBIR Spring 2005, CSE-UT Arlington Spring2005/DBIR/slides/top-k_join.ppt