1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:
Heuristic Search techniques
Web Information Retrieval
Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.
A review on “Answering Relationship Queries on the Web” Bhushan Pendharkar ASU ID
Efficient Query Evaluation on Probabilistic Databases
6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades.
Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.
Circumventing Data Quality Problems Using Multiple Join Paths Yannis Kotidis, Athens University of Economics and Business Amélie Marian, Rutgers University.
Rank Aggregation. Rank Aggregation: Settings Multiple items – Web-pages, cars, apartments,…. Multiple scores for each item – By different reviewers, users,
1 Searching and Integrating Information on the Web Seminar 4: Ranking Queries and Data Privacy Professor Chen Li UC Irvine.
MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.
Aggregation Algorithms and Instance Optimality
Combining Fuzzy Information: an Overview Ronald Fagin Abdullah Mueen -- Slides by Abdullah Mueen.
Evaluating Top-k Queries over Web-Accessible Databases Nicolas Bruno Luis Gravano Amélie Marian Columbia University.
CS246 Ranked Queries. Junghoo "John" Cho (UCLA Computer Science)2 Traditional Database Query (Dept = “CS”) & (GPA > 3.5) Boolean semantics Clear boundary.
Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign.
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Winter Semester 2003/2004Selected Topics in Web IR and Mining7-1 7 Top-k Queries on Web Sources and Structured Data 7.1 Top-k Queries over Autonomous Web.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Richa Varshney.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
“Artificial Intelligence” in my research Seung-won Hwang Department of CSE POSTECH.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Distributed Spatio-Temporal Similarity Search Demetrios Zeinalipour-Yazti University of Cyprus Song Lin
1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Combining Fuzzy Information: An Overview Ronald Fagin.
Presented by Suresh Barukula 2011csz  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
Searching Specification Documents R. Agrawal, R. Srikant. WWW-2002.
Search Engines WS 2009 / 2010 Prof. Dr. Hannah Bast Chair of Algorithms and Data Structures Department of Computer Science University of Freiburg Lecture.
CS4432: Database Systems II Query Processing- Part 2.
CSE 6392 – Data Exploration and Analysis in Relational Databases April 20, 2006.
Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Query Processing CS 405G Introduction to Database Systems.
Top-k Query Processing Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem, and Moni Naor + Sushruth P. + Arjun Dasgupta.
Database Searching and Information Retrieval Presented by: Tushar Kumar.J Ritesh Bagga.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Chapter 15 Running Time Analysis. Topics Orders of Magnitude and Big-Oh Notation Running Time Analysis of Algorithms –Counting Statements –Evaluating.
Neighborhood - based Tag Prediction
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Database Management System
Rule Induction for Classification Using
Seung-won Hwang, Kevin Chen-Chuan Chang
Chapter 12: Query Processing
Top-k Query Processing
Preference Query Evaluation Over Expensive Attributes
Rank Aggregation.
Popular Ranking Algorithms
Structure and Content Scoring for XML
Structure and Content Scoring for XML
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Efficient Processing of Top-k Spatial Preference Queries
Query Specific Ranking
Relax and Adapt: Computing Top-k Matches to XPath Queries
Presentation transcript:

1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University of Texas at Arlington

2 Overview More importance to top-k results Fagin’s algorithm talks about effective differentiation between top-results by various ways e.g. FA, TA Here we discuss about more larger scenario in terms of web-accessible databases Assumption: Mapping of keywords typed from search text box to appropriate related modules (Web-accessible databases) Larger query response times for probing web sources Tries to exploit the parallel access offered by web

3 Introduction We never expect exact answers from search engine but the most nearest possible tuples Difference between querying a general search engine and dedicated search engine e.g. Google vs Amazon The paper tries to define the problem using example of restaurants “ problem of finding nearest available restaurants given the current place, rating and price”

4 Approach Thinking beyond relational databases Web accessible sources storing information about rating of restaurants, map provider system etc. Rating => Zagat-Review website Price => New York Times’s NYT-Review website Address => MapQuest website Scenario where databases are geographically and functionally different but are related “in some way” Assumption: 1. The interface required for accessing web sources is in place the dependency can be handled 2. The dependency constraints are handled

5 Approach (continued..) Can be compared with a similar scenario with several multimedia systems which are more closely connected Here we try to use the intrinsic parallel nature of web We issue probes to various sources in parallel and try to improve upon the final query processing time Assumption: Mapping of keywords typed in search text box to routing it to appropriate related modules (Web-accessible databases) Larger query response times for probing web sources Tries to exploit the parallel access offered by web

6 Data and Query models The ordering is bases upon how closely the tuple matches with given query Assignment of different weight to different attribute Sources S-Source: Provides list of objects in order of their scores e.g. Rating provider website Zagat-Review R-Source: Provides score of random object e.g. Map- Quest for providing distance SR-Source: Source that provides both kind of access U(t) : Upper bound score for t Uunseen : Score upper bound of any object not yet retrieved E(t) : Expected score for t

7 Query Model (continued..) Getting all k scores with S sources can be expensive Therefore availability of SR sources is important for this approach Initially we assume that all object know about all other object If any score is not possible to get then that can be replaced with some default value e.g. Opening of any new restaurant, it might not be ranked by other referencing websites

8 Sequential Query ProcessingStrategy This strategy returns sorted unseen objects that might not be probed by other source Or it can return already seen object with source that needs to be probed randomly for getting the corresponding score

9 TA strategy Processes top-k queries over SR sources Algorithm retrieves the next “best” object via sorted access Probes all its unknown scores via random access Computes the final score for object At any given time keeps track of top-k tuples available When no unretrived object can have a score higher than current top k tuples, the solution is reached

10

11 Improvements upon TA The assumption for bounded buffer is removed and none of the object is discarded until algorithm returns Because same objects might be referenced again by different SR source For selection queries of nature,p1^p2^…^pn The calculation of each predicate pi can be expensive to calculate Key idea is to order the evaluation to minimize expected execution time The order is decided by, Rank(pi) = 1-selectivity(pi)/cost-per-object(pi)

12 Improvements upon TA (Continued..) Let w1, w2, …w2 be the weights of sources D1,D2,..,Dn Let e(Ri) be the expected score of randomly picked object Ri Then the expected decrease in U(t) after probing Ri for object t is, di = wi * (1-e(Ri)) We sort the sources in decreasing order of their rank, where rank for a source Di is defined as, Rank(Ri) = di/tR(Ri) Thus we favor fast sources that might have large impact on final score of object

13

14 Upper Strategy Upper allows more flexible probes in which sorted and random accesses can be interleaved even when some objects have been partially probed When a probe completes the Upper decides whether- to perform sorted-access probe on source to get new objects to perform “most promising” random access probes on some objects

15 Upper Strategy (Continued..)

16 Upper Strategy (Continued..) Selection of further probes will again depend upon the weight for that source and our ranking function

17 Parallel Query Processing Strategy The query processing is bound to take long processing time Web databases exhibit high and variable latency Attempt to maximize the source-access parallelism to minimize query processing time Source Access Constraints Possibility of access restrictions, variance in loads and network capabilities The number of parallel probes for source Di can be controlled

18 Parallel Query Processing Strategy Adapting the TA strategy When a source Di becomes available pTA chooses which object to probe for that source It can be optimized by not probing objects whose final score cannot exceed that of the top-k objects already seen The object is put on the “discarded” objects list pUpper Strategy If t is expected to be one of the top-k objects all random accesses on sources for which t’s attribute score is missing will be considered Otherwise only fastest probes expected to discard t are considered

19 Evaluation Settings Local sources Real Web Accessible sources Mix of SR and R sources

20 Evaluation Results Sequential Algorithms – Local Database

21 Evaluation Results Sequential Algorithms –Web Database

22 Evaluation Results Parallel algorithms - Local Database

23 Evaluation Results Parallel algorithms - Web Database pUpper is faster than pTA pUpper carefully selects the probs for each object It considers probing time and source congestion to make probing choices per object-level Results in better use of parallelism and faster query processing

24 Conclusion Probe interleaving greatly improves query execution time Upper is desirable when source shows moderate to high random access time The approach in this paper exploits the source access constraint of web very well Extension of this model to capture more expressive web interfaces is possible

25 References Optimal Aggregation Algorithms for Middleware. PODS 2001 Ronald Fagin, Amnon Lotem, Moni Naor Evaluating Top-k Queries over Web-Accessible Databases. ICDE 2002 (Compact Version) Nicolas Bruno, Luis Gravano, Amelie Marian Evaluating Top-k Queries over Web-Accessible Databases. ACM 2004 (Full Version) Nicolas Bruno, Luis Gravano, Amelie Marian