Query Specific Ranking

Slides:



Advertisements
Similar presentations
Web Information Retrieval
Advertisements

Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha ( )
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
CS 4432query processing - lecture 161 CS4432: Database Systems II Lecture #16 Join Processing Algorithms Professor Elke A. Rundensteiner.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
SUPPORTING TOP-K QUERIES IN RELATIONAL DATABASES. PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, MARCH 2004 Sowmya Muniraju.
Error Measurement and Iterative Methods
Preferential top-k search over local data dissertation thesis RNDr. Martin Šumák supervisor: doc. RNDr. Stanislav Krajči, PhD. consultant: RNDr. Peter.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades.
Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.
Accelerated Cascading Advanced Algorithms & Data Structures Lecture Theme 16 Prof. Dr. Th. Ottmann Summer Semester 2006.
Rank Aggregation. Rank Aggregation: Settings Multiple items – Web-pages, cars, apartments,…. Multiple scores for each item – By different reviewers, users,
Aggregation Algorithms and Instance Optimality
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
CS246 Ranked Queries. Junghoo "John" Cho (UCLA Computer Science)2 Traditional Database Query (Dept = “CS”) & (GPA > 3.5) Boolean semantics Clear boundary.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Probabilistic Ranking of Database Query Results Surajit Chaudhuri, Microsoft Research Gautam Das, Microsoft Research Vagelis Hristidis, Florida International.
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
Ad Hoc Constraints Objectives of the Lecture : To consider Ad Hoc Constraints in principle; To consider Ad Hoc Constraints in SQL; To consider other aspects.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.
The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
Presented by Suresh Barukula 2011csz  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia.
CS4432: Database Systems II Query Processing- Part 2.
Ranking Instructor: Gautam Das Class notes Prepared by Sushanth Sivaram Vallath.
CSE 6392 – Data Exploration and Analysis in Relational Databases April 20, 2006.
B+ Trees: An IO-Aware Index Structure Lecture 13.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Query Processing CS 405G Introduction to Database Systems.
Top-k Query Processing Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem, and Moni Naor + Sushruth P. + Arjun Dasgupta.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
1 Algorithms CSCI 235, Fall 2015 Lecture 29 Greedy Algorithms.
1 Chengkai Li Kevin-Chen-Chuan Chang Ihab Ilyas Sumin Song Presented by: Mariam John CSE /20/2006 RankSQL: Query Algebra and Optimization for Relational.
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Advanced Algorithms Analysis and Design
RankSQL: Query Algebra and Optimization for Relational Top-k Queries
CS 540 Database Management Systems
Updating SF-Tree Speaker: Ho Wai Shing.
CS 440 Database Management Systems
Discovering the Skyline of Web Databases
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Top-k Query Processing
Chapter 15 QUERY EXECUTION.
Database Management Systems (CS 564)
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
Examples of Physical Query Plan Alternatives
Data Structures: Segment Trees, Fenwick Trees
Rank Aggregation.
Database Management Systems (CS 564)
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
Popular Ranking Algorithms
Wavelets and Ranking of database query results
Discrete Mathematics CMP-101 Lecture 12 Sorting, Bubble Sort, Insertion Sort, Greedy Algorithms Abdul Hameed
CSE 421: Introduction to Algorithms
Diversified Top-k Subgraph Querying in a Large Graph
Artificial Intelligence
Coarse Grained Parallel Selection
Artificial Intelligence
Algorithms: the big picture
Algorithms CSCI 235, Spring 2019 Lecture 29 Greedy Algorithms
CENG 351 Data Management and File Structures
Lecture 11: B+ Trees and Query Execution
Chapter 8. General LP Problems
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
CS200: Algorithm Analysis
Distributive Property
CSE 190D Database System Implementation
Presentation transcript:

Query Specific Ranking CSE 6392 02/27/2006 Database Exploration

Content Comparison of FA and TA algorithm Representing ranking problem as a geometric problem Query Specific Ranking Database Exploration

Comparison between FA and TA algorithm TA is faster than FA TA stops as soon as the score of the hypothetical tuple is less than the score of tuples in the top-k buffer. TA is a bounded buffer algorithm TA maintains a top-k buffer FA maintains a set of candidates of all the tuples read until it gets ‘k’ objects in common in these sets. Database Exploration

Comparison between FA and TA TA has to immediately scan as it reads a tuple in order to find the score in an eager manner. FA has 2 phases for calculating score: - sort phase - scan phase TA and FA algorithm requires the scoring function to be monotonic. Database Exploration

Why does TA work? Stopping condition for TA is: Score (hypothetical tuple) < score (k-th tuple in top-k buffer) Idea is that score of unseen tuples will be less that the score of the hypothetical tuple according to the monotonic property. Database Exploration

Closing points on TA and FA FA algorithm stops only when we get ‘k’ common objects/intersections in the set of candidates. TA algorithm makes assumptions of unseen tuples based on the score of the hypothetical tuple in order to stop. Therefore, there is no way FA can stop earlier than TA. Hence, TA is instance optimal. Database Exploration

Query Specific Ranking The ranking function we have discussed so far depends on the assumption of total ordering of attributes. E.g. total ordering of price: - high price is bad - low price is good In reality, this is not always true. Database Exploration

Query Specific Ranking Different people will have a different ideal price in mind. E.g. for one person, an ideal restaurant will be: price = $20 and capacity = 100. In this case, the ranking function can be: Score(<P, C>) = 5*|20-p| + 10*|100-c| Database Exploration

Query Specific Ranking The above ranking function is more realistic than total ranking function. But the above ranking function is not monotonic. How can we find the top-k restaurants in this case without looking at the whole data set? Database Exploration

Solution Assume the data set is sorted on all the attributes of interest. First, create transformed attributes based on the original attributes involved in the ranking function such that the transformed attributes maintains the monotonic property. Secondly, simulate sorted access. Database Exploration

Transformed attributes Consider the restaurant example where: Score(<P, C>) = 5*|20-p| + 10*|100-c| Transformed attributes are: ∆p = differential of price from original price ∆c = differential of capacity from original capacity Suppose tid1 = <$30, 120> then < ∆p, ∆c>=<10,20> tid2 = <$15, 85> then < ∆p, ∆c>=<5, 15> Database Exploration

Simulating sorted access Achieving monotonicity is just part of the problem. Need to achieve sorted access on the transformed (∆p and ∆c) attributes. Suppose if data is presorted on the ‘price’ attribute. Without presorting the whole dataset, we can go directly to the ‘sweet spot’ (i.e. price = $20 & capacity = 100) using B+ tree index. From this point do 2 walks in the opposite directions and find ∆p and ∆c in the sorted order and merge them. Database Exploration

Adding Selection This explains how hard conditions are handled or added to a ranking function. E.g. Look for restaurants in Arlington location =“Arlington”  hard condition Database Exploration

Handling hard conditions The query will look like this: Select top[10] From restaurants Where location = “Arlington” Order by 5*abs(120 - price) How to solve this query? Database Exploration

Handling hard conditions Do selection first, then do ranking This method is not the best method for the following reasons: If selection produces a big result, it defeats the purpose of doing ranking If selection produces a small result, then doing ranking on it will be an overkill. The raw data is presorted and doing a selection first on this raw data will destroy the order of tuples. TA requires data to be presorted. Database Exploration

Handling hard conditions The second method is to integrate selection as part of ranking. Score (<L,P,C>) = If L= “Arlington” then 5*|20-P| + 10*|100-C| else 0 Database Exploration

Handling hard conditions Now we are no longer dealing with numeric values alone. Since location = “Arlington”, ranking function is no longer on numeric data but is instead on characterical data. How do we deal with ranking function that have characterical data? Database Exploration