Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.

Slides:



Advertisements
Similar presentations
Optimal Top-k Generation of Attribute Combinations based on Ranked Lists Jiaheng Lu, Renmin University of China Joint work with Pierre Senellart, Chunbin.
Advertisements

Web Information Retrieval
Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,
1 DynaMat A Dynamic View Management System for Data Warehouses Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan.
 Introduction  Views  Related Work  Preliminaries  Problems Discussed  Algorithm LPTA  View Selection Problem  Experimental Results.
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
Comparison of parallel and random approach to a candidate list in the multifeature querying Peter Gurský Institute of Computer Science UPJŠ, Košice, Slovakia.
Top-k Query Evaluation with Probabilistic Guarantees By Martin Theobald, Gerald Weikum, Ralf Schenkel.
Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.
Genome-scale disk-based suffix tree indexing Benjarath Phoophakdee Mohammed J. Zaki Compiled by: Amit Mahajan Chaitra Venus.
SUPPORTING TOP-K QUERIES IN RELATIONAL DATABASES. PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, MARCH 2004 Sowmya Muniraju.
More on Rankings.
Optimized Query Execution in Large Search Engines with Global Page Ordering Xiaohui Long Torsten Suel CIS Department Polytechnic University Brooklyn, NY.
6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades.
Reduced Support Vector Machine
Rank Aggregation. Rank Aggregation: Settings Multiple items – Web-pages, cars, apartments,…. Multiple scores for each item – By different reviewers, users,
1 Searching and Integrating Information on the Web Seminar 4: Ranking Queries and Data Privacy Professor Chen Li UC Irvine.
Top-k and Skyline Computation in Database Systems
Aggregation Algorithms and Instance Optimality
Combining Fuzzy Information: an Overview Ronald Fagin Abdullah Mueen -- Slides by Abdullah Mueen.
A Unified Approach for Computing Top-k Pairs in Multidimensional Space Presented By: Muhammad Aamir Cheema 1 Joint work with Xuemin Lin 1, Haixun Wang.
Evaluating Top-k Queries over Web-Accessible Databases Nicolas Bruno Luis Gravano Amélie Marian Columbia University.
1 INF 2914 Information Retrieval and Web Search Lecture 10: Query Processing These slides are adapted from Stanford’s class CS276 / LING 286 Information.
Top- K Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel Presenter: Avinandan Sengupta.
CS246 Ranked Queries. Junghoo "John" Cho (UCLA Computer Science)2 Traditional Database Query (Dept = “CS”) & (GPA > 3.5) Boolean semantics Clear boundary.
Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign.
Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.
Surface Simplification Using Quadric Error Metrics Michael Garland Paul S. Heckbert.
DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer.
Distributed Protein Structure Analysis By Jeremy S. Brown Travis E. Brown.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
1 Fast Computation of Sparse Datacubes Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung.
Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Richa Varshney.
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
Information Networks Rank Aggregation Lecture 10.
Distributed Spatio-Temporal Similarity Search Demetrios Zeinalipour-Yazti University of Cyprus Song Lin
Efficient Processing of Top-k Spatial Preference Queries
1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.
Answering Top-k Queries Using Views By: Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto),
The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
Combining Fuzzy Information: An Overview Ronald Fagin.
Presented by Suresh Barukula 2011csz  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
Search Engines WS 2009 / 2010 Prof. Dr. Hannah Bast Chair of Algorithms and Data Structures Department of Computer Science University of Freiburg Lecture.
CS4432: Database Systems II Query Processing- Part 2.
NRA Top k query processing using Non Random Access Only sequential access Only sequential accessAlgorithm 1) 1) scan index lists in parallel; 2) 2) consider.
Mahashweta Das, Gautam DasUniversity of Texas at Arlington Vagelis HristidisFlorida International University.
CSE 6392 – Data Exploration and Analysis in Relational Databases April 20, 2006.
Top-k Query Processing Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem, and Moni Naor + Sushruth P. + Arjun Dasgupta.
Database Searching and Information Retrieval Presented by: Tushar Kumar.J Ritesh Bagga.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
SIMILARITY SEARCH The Metric Space Approach
Indexing & querying text
Top-k Query Processing
Rank Aggregation.
Laks V.S. Lakshmanan Depf. of CS UBC
Popular Ranking Algorithms
Models and Algorithms for Complex Networks
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Query Specific Ranking
Outline Rank Aggregation Computing aggregate scores
Presentation transcript:

Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor

Overview  Databases and data types  Fagin’s Algorithm  Threshold Algorithm  Advantages

Multimedia vs String  Early databases  Modern and Middleware databases  “fuzzy” attributes  Querying a database (x, g)

Naïve Algorithm  Find the top 2 objects R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0

R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Naïve Algorithm

R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Naïve Algorithm X1X1 1.5

R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Naïve Algorithm X1X1 1.5 X2X2 1.6

R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Naïve Algorithm X1X1 1.5 X2X2 1.6 X3X3 1.8

R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Naïve Algorithm X1X1 1.5 X2X2 1.6 X3X3 1.8 X4X4 1.3

R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Naïve Algorithm X1X1 1.5 X2X2 1.6 X3X3 1.8 X4X4 1.3 X5X5 0.3

Naïve Algorithm X3X3 1.8 X2X2 1.6 X1X1 1.5 X4X4 1.3 X5X5 0.3 Top-2 objects

Fagin’s Algorithm  Sequential access in parallel until k matches  Perform random access  Compute the grade for each R object R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0

Fagin’s Algorithm  Sequential Access R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0

Fagin’s Algorithm  Sequential Access R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0

Fagin’s Algorithm Since k=2, and X 1 and X 3 have been seen in all lists  Sequential Access R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0

Fagin’s Algorithm  Perform random accesses to obtain the scores of all seen objects R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0

Fagin’s Algorithm  Compute score for all objects and return top k R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 X3X3 1.8 X2X2 1.6 X1X1 1.5 X4X4 1.3

Threshold Algorithm  Sequential access for top k matches  Define threshold value τ  Find all seen object and compute scores  Maintain list of top k objects  Continue until top-k >= τ  Output graded set

Threshold Algorithm  Sequential access R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0

Threshold Algorithm  Set τ to be the aggregate of the scores seen in this access R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Τ = 2.6

Threshold Algorithm  Random access and compute scores R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Τ = 2.6 X1X1 1.5 X2X2 1.6 X4X4 1.3 Top-k

Threshold Algorithm  Sequential access R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 X1X1 1.5 X2X2 1.6 X4X4 1.3 Top-k

Threshold Algorithm  Set τ to be the aggregate of the scores seen in this access R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Τ = 2.1 X1X1 1.5 X2X2 1.6 X4X4 1.3 Top-k

Threshold Algorithm  Random access and compute scores R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Τ = 2.1 X1X1 1.5 X2X2 1.6 X4X4 1.3 X3X3 1.8 Top-k

X1X1 1.5 X2X2 1.6 X4X4 1.3 X3X3 1.8 Top-k Threshold Algorithm  Sequential Access R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0

X1X1 1.5 X2X2 1.6 X4X4 1.3 X3X3 1.8 Top-k Threshold Algorithm  Set τ to be the aggregate of the scores seen in this access R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Τ = 1

Threshold Algorithm  Stop when top-k >= τ R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Τ = 1 X1X1 1.5 X2X2 1.6 X4X4 1.3 X3X3 1.8 Top-k

Comparison  Naïve Algorithm Buffer space required = number of objects The cost is linear Not efficient for large databases

Comparison  Fagin’s Algorithm Large buffer space required Random access is done at the end Optimal under certain aggregate functions

Comparison  The Threshold Algorithm Buffer space bounded by k Objects not seen < τ Less object access required Always optimal

Sources  Query.pdf Query.pdf  earcher/files/us-fagin/jcss03.pdf earcher/files/us-fagin/jcss03.pdf