03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Web Information Retrieval
Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.
 Introduction  Views  Related Work  Preliminaries  Problems Discussed  Algorithm LPTA  View Selection Problem  Experimental Results.
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
Bidding Protocols for Deploying Mobile Sensors Reporter: Po-Chung Shih Computer Science and Information Engineering Department Fu-Jen Catholic University.
Branch & Bound Algorithms
Top-k Query Evaluation with Probabilistic Guarantees By Martin Theobald, Gerald Weikum, Ralf Schenkel.
SUPPORTING TOP-K QUERIES IN RELATIONAL DATABASES. PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, MARCH 2004 Sowmya Muniraju.
Soft Real-Time Semi-Partitioned Scheduling with Restricted Migrations on Uniform Heterogeneous Multiprocessors Kecheng Yang James H. Anderson Dept. of.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.
Circumventing Data Quality Problems Using Multiple Join Paths Yannis Kotidis, Athens University of Economics and Business Amélie Marian, Rutgers University.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.
Chapter 1 and 2 Computer System and Operating System Overview
Evaluating Top-k Queries over Web-Accessible Databases Nicolas Bruno Luis Gravano Amélie Marian Columbia University.
Mariam Salloum (YP.com) Xin Luna Dong (Google) Divesh Srivastava (AT&T Research) Vassilis J. Tsotras (UC Riverside) 1 Online Ordering of Overlapping Data.
Top- K Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel Presenter: Avinandan Sengupta.
CS246 Ranked Queries. Junghoo "John" Cho (UCLA Computer Science)2 Traditional Database Query (Dept = “CS”) & (GPA > 3.5) Boolean semantics Clear boundary.
Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign.
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Winter Semester 2003/2004Selected Topics in Web IR and Mining7-1 7 Top-k Queries on Web Sources and Structured Data 7.1 Top-k Queries over Autonomous Web.
Chapter 1 Computer System Overview Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Querying Structured Text in an XML Database By Xuemei Luo.
Searching for Extremes Among Distributed Data Sources with Optimal Probing Zhenyu (Victor) Liu Computer Science Department, UCLA.
Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Richa Varshney.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Presenter: Shanshan Lu 03/04/2010
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Efficient Processing of Top-k Spatial Preference Queries
1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.
Opportunistic Traffic Scheduling Over Multiple Network Path Coskun Cetinkaya and Edward Knightly.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
Combining Fuzzy Information: An Overview Ronald Fagin.
Presented by Suresh Barukula 2011csz  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
Searching Specification Documents R. Agrawal, R. Srikant. WWW-2002.
CS4432: Database Systems II Query Processing- Part 2.
A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES
CSE 6392 – Data Exploration and Analysis in Relational Databases April 20, 2006.
Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Query Processing CS 405G Introduction to Database Systems.
Top-k Query Processing Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem, and Moni Naor + Sushruth P. + Arjun Dasgupta.
Database Searching and Information Retrieval Presented by: Tushar Kumar.J Ritesh Bagga.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Database Management System
Seung-won Hwang, Kevin Chen-Chuan Chang
Chapter 12: Query Processing
Preference Query Evaluation Over Expensive Attributes
Rank Aggregation.
Objective of This Course
Popular Ranking Algorithms
Computer Organization & Architecture 3416
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed

03/02/20062 Overview To process top-k queries efficiently Users specified attributes might be handled by external, autonomous sources with a variety of access interfaces Present sequential and parallel query processing technique

03/02/20063 Introduction Web search engine consists a list of keywords – responds with top-k pages Do not expect exact answers Rank of the objects that best match the queries Scoring Function

03/02/20064 Example Problem of finding nearest available restaurants given the address, rating and price  Rating => Zagat-Review website  Price => New York Times’s NYT-Review website  Address => MapQuest website

03/02/20065 Difference Between Multimedia Systems and Web Sources Web sources might only support random access Attributes access faster for centralized multimedia systems Multimedia requires local processing – web sources can issue probes concurrently

03/02/20066 Data and Query Models The ordering is based upon how closely the tuple matches with given query Assignment of different weight to different attribute Sources  S-Source: Provides list of objects in order of their scores e.g. Rating provider website Zagat-Review  R-Source: Provides score of random object e.g. Map- Quest for providing distance  SR-Source: Source that provides both kind of access U(t) : Upper bound score for t Uunseen : Score upper bound of any object not yet retrieved E(t) : Expected score for t

03/02/20067 Sequential Query Processing At most one probe (random or sorted)  This strategy returns sorted unseen objects that might not be probed by other source  Or it can return already seen object with source that needs to be probed randomly for getting the corresponding score

03/02/20068 TA Strategy-TA z algorithm For each SR source  Algorithm retrieves the next “best” object via sorted access  Probes unknown attribute scores for this object via random access  Computes the final score for the object  At any given time keeps track of top-k tuples with their scores Threshold value U unseen = ScoreComb(s l (1),.1,.1) Termination condition  K objects are found  U unseen is no larger than scores of K top objects

03/02/20069 TA z Algorithm

03/02/ Improvements Over TA TA z -EP Algorithm The assumption for bounded buffer is removed and none of the object is discarded until algorithm returns Because same objects might be referenced again by different SR source

03/02/ Improvements Over TA (Contd..) Two optimizations  Saves random access probes when object is not part of top-k answers (i.e. when its score upper bound is lower than the scores of the top-k objects)  To process selection queries of the form p1 ^ … ^ pn, where each predicate pi can be expensive to calculate Key idea is to order the evaluation to minimize expected execution time The order is decided by, Rank(pi) = 1-selectivity(pi)/cost-per-object(pi)

03/02/ Improvements Over TA (Contd..)

03/02/ Upper Strategy Upper allows more flexible probes in which sorted and random accesses can be interleaved even when some objects have been partially probed When a probe completes, the Upper decides whether-  to perform sorted-access probe on source to get new objects, or  to perform “most promising” random access probes on some objects

03/02/ Upper Strategy (Contd..)

03/02/ Upper Strategy (Contd..) Selection of further probes will again depend upon the weight for that source and our ranking function

03/02/ Parallel Query Processing The sequential query processing is bound to take long processing time Web databases exhibit high and variable latency Attempt to maximize the source-access parallelism to minimize query processing time Source Access Constraints  Possibility of access restrictions, variance in loads and network capabilities  The number of parallel probes for source Di can be controlled

03/02/ Adapting TA Strategy pTA probes objects in parallel in order they are retrieved – respecting constraints Each object retrieved by sorted access is placed in a queue of discovered objects When a source Di becomes available pTA chooses which object to probe for that source by selecting the first object in queue not probed yet Can be optimized by not probing objects whose final score cannot exceed that of the top-k objects already seen The object is put on the “discarded” objects list

03/02/ pUpper Strategy Uses SelectBestSubset function to retrieve a minimal set of sources that need to be probed for a given object – instead of a single source These multiple probes might proceed in parallel to speed up query execution When a random source Di becomes underutilized, object t with highest score upper bound is identified: Di Є SelectBestSubset(t)

03/02/ pUpper Strategy (Contd..) pUpper associates a queue with each source for random access scheduling Queues are regularly updated by calls to the function GenerateQueues If source Di is available, pUpper checks Queue(Di)  If Queue(Di) is empty all random access queues are regenerated  Otherwise, probe first object of Queue(Di) Only one sorted access request per SR-Source Di

03/02/ pUpper Strategy (Contd..)

03/02/ Evaluation Setting Local sources – Uniform, Gaussian, Zipfian, Correlated, Mixed, Cover Real Web Accessible sources  Mix of SR and R sources

03/02/ Evaluation Results Sequential Algorithms – Local Database

03/02/ Evaluation Results (Contd..) Sequential Algorithms – Web Database

03/02/ Evaluation Results (Contd..) Parallel algorithms - Local Database

03/02/ Evaluation Results (Contd..) Parallel algorithms - Web Database  pUpper is faster than pTA  pUpper carefully selects the probes for each object  It considers probing time and source congestion to make probing choices per object-level  Results in better use of parallelism and faster query processing  Parallel probing significantly decreases query processing time

03/02/ Conclusions Probe interleaving greatly reduces query execution time Object level scheduling in Upper is desirable when sources exhibit moderate to high random access time pUpper minimizes query response time while taking source access constraints pUpper - the fastest query processing technique highlights  the importance of parallelism in a web setting  The advantages of object-level probe scheduling to adapt source congestion The approach in this paper exploits the source access constraint of web very well Extension of this model to capture more expressive web interfaces is possible

03/02/ THANK YOU