Presented by Suresh Barukula 2011csz8090 1.  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia.

Slides:



Advertisements
Similar presentations
Optimal Top-k Generation of Attribute Combinations based on Ranked Lists Jiaheng Lu, Renmin University of China Joint work with Pierre Senellart, Chunbin.
Advertisements

Topic 3 Top-K and Skyline Algorithms. 2 What is top-k processing? Find k items that best answer a users query –As a set, as a sorted list, or as a sorted.
Web Information Retrieval
1 A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES Leong Hou U, Nikos Mamoulis, Kyriakos Mouratidis Gruppo 10: Paolo Barboni, Tommaso Campanella, Simone.
 Introduction  Views  Related Work  Preliminaries  Problems Discussed  Algorithm LPTA  View Selection Problem  Experimental Results.
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Automated Ranking of Database Query Results Sanjay Agarwal, Surajit Chaudhuri, Gautam Das, Aristides Gionis Presented by Mahadevkirthi Mahadevraj Sameer.
Computational problems, algorithms, runtime, hardness
Optimized Query Execution in Large Search Engines with Global Page Ordering Xiaohui Long Torsten Suel CIS Department Polytechnic University Brooklyn, NY.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Efficiency of Algorithms
6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades.
Rank Aggregation. Rank Aggregation: Settings Multiple items – Web-pages, cars, apartments,…. Multiple scores for each item – By different reviewers, users,
1 Searching and Integrating Information on the Web Seminar 4: Ranking Queries and Data Privacy Professor Chen Li UC Irvine.
Aggregation Algorithms and Instance Optimality
Combining Fuzzy Information: an Overview Ronald Fagin Abdullah Mueen -- Slides by Abdullah Mueen.
A Unified Approach for Computing Top-k Pairs in Multidimensional Space Presented By: Muhammad Aamir Cheema 1 Joint work with Xuemin Lin 1, Haixun Wang.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Evaluating Top-k Queries over Web-Accessible Databases Nicolas Bruno Luis Gravano Amélie Marian Columbia University.
1 INF 2914 Information Retrieval and Web Search Lecture 10: Query Processing These slides are adapted from Stanford’s class CS276 / LING 286 Information.
Top- K Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel Presenter: Avinandan Sengupta.
CS246 Ranked Queries. Junghoo "John" Cho (UCLA Computer Science)2 Traditional Database Query (Dept = “CS”) & (GPA > 3.5) Boolean semantics Clear boundary.
Automated Ranking Of Database Query Results  Sanjay Agarwal - Microsoft Research  Surajit Chaudhuri - Microsoft Research  Gautam Das - Microsoft Research.
1 Arrays 2: Sorting and Searching Admin. §1) No class Thursday. §2) Will cover Strings next Tuesday. §3) Take in report. §4) Hand out program assignment.
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
Term Weighting and Ranking Models Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Richa Varshney.
Centroids part 2 Getting rid of outliers and sorting.
1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.
Answering Top-k Queries Using Views By: Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto),
The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
Combining Fuzzy Information: An Overview Ronald Fagin.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
NRA Top k query processing using Non Random Access Only sequential access Only sequential accessAlgorithm 1) 1) scan index lists in parallel; 2) 2) consider.
Automated Ranking Of Database Query Results  Sanjay Agarwal - Microsoft Research  Surajit Chaudhuri - Microsoft Research  Gautam Das - Microsoft Research.
CSE 6392 – Data Exploration and Analysis in Relational Databases April 20, 2006.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Query Processing CS 405G Introduction to Database Systems.
Top-k Query Processing Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem, and Moni Naor + Sushruth P. + Arjun Dasgupta.
Fast Indexes and Algorithms For Set Similarity Selection Queries M. Hadjieleftheriou A.Chandel N. Koudas D. Srivastava.
Database Searching and Information Retrieval Presented by: Tushar Kumar.J Ritesh Bagga.
Answering Why-not Questions on Top-K Queries Andy He and Eric Lo The Hong Kong Polytechnic University.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
1 VLDB, Background What is important for the user.
Indexing & querying text
Database Management System
Indexing & querying text
Algorithm Analysis CSE 2011 Winter September 2018.
Chapter 12: Query Processing
Top-k Query Processing
Evaluation of Relational Operations: Other Operations
Rank Aggregation.
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Laks V.S. Lakshmanan Depf. of CS UBC
Popular Ranking Algorithms
Linear Programming Duality, Reductions, and Bipartite Matching
Implementation of Relational Operations
Evaluation of Relational Operations: Other Techniques
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Query Specific Ranking
This shows running the first guess number program which only allows one guess - I show the correct answer for testing purposes.
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Presented by Suresh Barukula 2011csz8090 1

 Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia database *combines different graded attributes through an aggregation function *Overall grade for each object will be calculated using an aggregation function, and we can return top-k objects. 2

In general multimedia databases contains fuzzy data. For example: We want to retrieve all red objects What we can say about the below object? Is it red or not? We can’t say whether it is red or not, but we can grade it by the amount of redness. attribute values are typically graded [0,1] 3

 FA-Fagin’s Algorithm  TA-Threshold Algorithm  TA Z Algorithm  NRA- No Random Access  CA- Combined Algorithm 4

 N-Number of Objects  m-No of attributes  x i Є [0,1]  Database is consisting of m sorted lists L 1 …L m ; each of length N. We may refer to L i as list i. Each entry of L i is of the form (R, x i ), where x i is the i th field of object R, Each list L i is sorted in descending order by the x i value. 5

 Sorted access  Random access  The cost of the middleware is sC S + rC R Where s is the no of sorted accesses, r is no of random accesses, C S is sorted access cost and C R is random access cost. 6

Example – Simple Database model (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6) Sorted L 1 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2) N a b c d Object ID Attribute Attribute M Sorted L 2 7

Find the top 2 (k = 2) objects on the following ‘query’ executed on the middleware: A1 & A2 (eg: color=red & shape=round) Example – Simple Query A1 & A2 as a ‘query’ to the middleware results in combining of the grades of A1 andA2 by min(A1,A2) 8

c ID A1A1 A2A2 Min(A 1,A 2 ) STEP 1: Read attributes from every sorted list Stop when k objects have been seen in common from all lists (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6) L1L1 L2L2 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2) a d b Example – Fagin’s Algorithm 9

c IDA1A1 A2A2 Min(A 1,A 2 ) STEP 2: Random access to find missing grades (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6) L1L1 L2L2 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2) a d b Example – Fagin’s Algortihm 10

c IDA1A1 A2A2 Min(A 1,A 2 ) STEP 3 Compute the grades of the seen objects. Return the k highest graded objects. (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6) L1L1 L2L2 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2) a d b Example – Fagin’s Algortihm 11

Read all grades of an object once seen from a sorted access No need to wait until the lists give k common objects Do sorted access (and corresponding random accesses) until you have seen the top k answers. How do we know that grades of seen objects are higher than the grades of unseen objects ? Predict maximum possible grade unseen objects: a: 0.9 b: 0.8 c: L1L1 L2L2 d: 0.9 a: 0.85 b: 0.7 c: f: 0.65 d: 0.6 f: 0.6 Seen Possibly unseen Threshold value Threshold Algorithm (TA) T = min(0.72, 0.7) =

IDA1A1 A2A2 Min(A 1,A 2 ) Step 1: - parallel sorted access to each list (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6) L1L1 L2L2 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2) a d For each object seen: - get all grades by random access - determine Min(A1,A2) - amongst 2 highest seen ? keep in buffer Example – Threshold Algorithm 13

IDA1A1 A2A2 Min(A 1,A 2 ) a: 0.9 b: 0.8 c: 0.72 d: L1L1 L2L2 d: 0.9 a: 0.85 b: 0.7 c: Step 2: - Determine threshold value based on objects currently seen under sorted access. T = min(L1, L2) a d T = min(0.9, 0.9) = objects with overall grade ≥ threshold value ? stop else go to next entry position in sorted list and repeat step 1 Example – Threshold Algorithm 14

IDA1A1 A2A2 Min(A 1,A 2 ) Step 1 (Again): - parallel sorted access to each list (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6) L1L1 L2L2 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2) a d For each object seen: - get all grades by random access - determine Min(A1,A2) - amongst 2 highest seen ? keep in buffer b Example – Threshold Algorithm 15

IDA1A1 A2A2 Min(A 1,A 2 ) a: 0.9 b: 0.8 c: 0.72 d: L1L1 L2L2 d: 0.9 a: 0.85 b: 0.7 c: Step 2 (Again): - Determine threshold value based on objects currently seen. T = min(L1, L2) a b T = min(0.8, 0.85) = objects with overall grade ≥ threshold value ? stop else go to next entry position in sorted list and repeat step 1 Example – Threshold Algorithm 16

IDA1A1 A2A2 Min(A 1,A 2 ) a: 0.9 b: 0.8 c: 0.72 d: L1L1 L2L2 d: 0.9 a: 0.85 b: 0.7 c: Situation at stopping condition a b T = min(0.72, 0.7) = 0.7 Example – Threshold Algorithm 17

 The middleware cost of the FA is same no matter what the aggregation function is.  TA stops at least as early as FA  TA may perform more random accesses than FA  TA requires only bounded buffers  TA can be stopped early(θ-approximation) 18

A = class of algorithms, A Є A represents an algorithm D = legal inputs to algorithms (databases), D Є D represents a database Cost(A,D ) = middleware cost when running algorithm A over database D Concept of instance optimality Algorithm B is instance optimal over A and D if : B Є A and Cost(B,D ) = O(Cost(A,D )) A Є A, D Є D Which means that: Cost(B,D ) ≤ c. Cost(A,D ) + c’, A Є A, D Є D optimality ratio, 19

 Theorem: If the aggregation function t is monotone, TA correctly finds the top K answers.  Theorem: TA is instance optimal for every monotone aggregation function, over every database (Note: if we exclude wild guesses). 20

 (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6) L1L1 L2L2 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2) (b, 0.6) (a, 0.83) (d, 0.61) (c, 0.9) L3L3 1 T=min(0.72,0.7,1)=0.7 21

 Can we determine rank of an object without seeing all of it’s grades?  The main essence of this algorithm is estimating the rank using best and worst possible values /

 CA is merge between TA and NRA.  The idea of CA is to run NRA but after every h steps to perform random access step.  Both NRA and CA are instance optimal over all databases, when the aggregation function is monotone 23

 In this paper the authors have studied a simple and elegant algorithm called TA.  They have also studied the variants of TA, when there are no sorted access, no random access etc..,  They have emphasized on instance optimality and they have proved that their algorithms are instance optimal over all algorithms for all databases under normal assumptions.  But they have not considered the computational costs and the data structures that are required to implement the algorithms. 24

25