NRA Top k query processing using Non Random Access Only sequential access Only sequential accessAlgorithm 1) 1) scan index lists in parallel; 2) 2) consider.

Slides:



Advertisements
Similar presentations
Topic 3 Top-K and Skyline Algorithms. 2 What is top-k processing? Find k items that best answer a users query –As a set, as a sorted list, or as a sorted.
Advertisements

Group Recommendation: Semantics and Efficiency
Web Information Retrieval
Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,
Martin Theobald Max Planck Institute for Computer Science Stanford University Joint work with Ralf Schenkel, Gerhard Weikum TopX Efficient & Versatile.
Selinger Optimizer Lecture 10 October 15, 2009 Sam Madden.
 Introduction  Views  Related Work  Preliminaries  Problems Discussed  Algorithm LPTA  View Selection Problem  Experimental Results.
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,
Best-Effort Top-k Query Processing Under Budgetary Constraints
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
IRDM WS Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query Processing 3.3 Index Access Scheduling.
Top-k Query Evaluation with Probabilistic Guarantees By Martin Theobald, Gerald Weikum, Ralf Schenkel.
SUPPORTING TOP-K QUERIES IN RELATIONAL DATABASES. PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, MARCH 2004 Sowmya Muniraju.
Max-Planck-Institut University of Patras NetCInS Lab Informatik KLEE: A Framework for Distributed Top-k Query Algorithms KLEE: A Framework for Distributed.
Sensitivity Analysis & Explanations for Robust Query Evaluation in Probabilistic Databases Bhargav Kanagal, Jian Li & Amol Deshpande.
6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades.
Rank Aggregation. Rank Aggregation: Settings Multiple items – Web-pages, cars, apartments,…. Multiple scores for each item – By different reviewers, users,
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
Aggregation Algorithms and Instance Optimality
Top- K Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel Presenter: Avinandan Sengupta.
CS246 Ranked Queries. Junghoo "John" Cho (UCLA Computer Science)2 Traditional Database Query (Dept = “CS”) & (GPA > 3.5) Boolean semantics Clear boundary.
Winter Semester 2003/2004Selected Topics in Web IR and Mining7-1 7 Top-k Queries on Web Sources and Structured Data 7.1 Top-k Queries over Autonomous Web.
DELIS Kickoff, March 18-19, Outline 1 Overriding Goals & Structure 2 Background on P2P 3 Background on Search Engines 4 P2P SE Architecture 5 Research.
VLDB ´04 Top-k Query Evaluation with Probabilistic Guarantees Martin Theobald Gerhard Weikum Ralf Schenkel Max-Planck Institute for Computer Science SaarbrückenGermany.
MPI Informatik 1/17 Oberseminar AG5 Result merging in a Peer-to-Peer Web Search Engine Supervisors: Speaker : Sergey Chernov Prof. Gerhard Weikum Christian.
The CompleteSearch Engine: Interactive, Efficient, and Towards IR&DB integration Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany joint.
CIDR 20051/16 Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? Surajit Chaudhuri (Microsoft Research) Raghu Ramakrishnan (U.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Term Weighting and Ranking Models Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
TopX 2.0 at the INEX 2009 Ad-hoc and Efficiency tracks Martin Theobald Max Planck Institute Informatics Ralf Schenkel Saarland University Ablimit Aji Emory.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.
Search A Basic Overview Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata October 20, 2014.
1/28 Efficient Top-k Queries for XML Information Retrieval Gerhard Weikum Joint work with Ralf Schenkel.
Winter Semester 2003/2004Selected Topics in Web IR and Mining6-1 6 Rank Aggregation and Top-k Queries 6.1 Fagin‘s Threshold Algorithm 6.2 Rank Aggregation.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
IO-Top-k: Index-access Optimized Top-k Query Processing Debapriyo Majumdar Max-Planck-Institut für Informatik Saarbrücken, Germany Joint work with Holger.
Winter Semester 2003/2004Selected Topics in Web IR and Mining5-1 5 Index Pruning 5.1 Index-based Query Processing 5.2 Pruning with Combined Authority/Similarity.
+ Efficient network aware search in collaborative tagging Sihem Amer Yahia, Michael Benedikt, Laks V.S. Lakshmanan, Julia Stoyanovich Presented by: Ashish.
Combining Fuzzy Information: An Overview Ronald Fagin.
Presented by Suresh Barukula 2011csz  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
INEX ‘05 INEX ‘05 Martin Theobald Ralf Schenkel Gerhard Weikum Max Planck Institute for Informatics Saarbrücken.
CSE 6392 – Data Exploration and Analysis in Relational Databases April 20, 2006.
Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Top-k Query Processing Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem, and Moni Naor + Sushruth P. + Arjun Dasgupta.
Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB
Why Spectral Retrieval Works Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbrücken, Germany joint work with Debapriyo Majumdar SIGIR 2005 in.
1 VLDB, Background What is important for the user.
Efficient and Self-tuning Incremental Query Expansions for Top-k Query Processing Martin Theobald Ralf Schenkel Gerhard Weikum Max-Planck Institute for.
Efficient Top-k Querying over Social-Tagging Networks Ralf Schenkel, Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Xavier Parreira,
Secretary Markets with Local Information
Neighborhood - based Tag Prediction
Max-Planck Institute for Informatics
Information Retrieval in Department 1
Top-k Query Processing
Spatio-temporal Pattern Queries
Introduction to Database Systems
Max Planck Institute for Informatics
Martin Theobald Max-Planck-Institut Informatik Stanford University
Rank Aggregation.
Laks V.S. Lakshmanan Depf. of CS UBC
Popular Ranking Algorithms
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Presentation transcript:

NRA Top k query processing using Non Random Access Only sequential access Only sequential accessAlgorithm 1) 1) scan index lists in parallel; 2) 2) consider dj at position posi in Li; 3) 3) E(dj) := E(dj) Є {i}; highi := si(q,dj); 4) 4) bestscore(dj) := aggr{x1,..., xm)   with xi := si(q,dj) for i Є E(dj), highi for i Є E(dj); 5) 5) worstscore(dj) := aggr{x1,..., xm)   with xi := si(q,dj) for i Є E(dj), 0 for i Є E(dj); 6) 6) top-k := k docs with largest worstscore; 7) 7) threshold := bestscore{d | d not in top-k}; 8) 8) if min worstscore top-k ≥ threshold then exit;

item item item item item item item item item item item item item item item item item item item item 83 [0.9, 2.1] item 17 [0.6, 2.1] item 25 [0.6, 2.1] worst score best-score Min top-2 score : 0.6 Threshold (Max of unseen tuples): 2.1 Pruning Candidates: Min top-2 < best score of candidate Stopping Condition Threshold < min top-2 ? List 1 List 2 List 3 Candidates =2.1 NRA

item item item item item item item item item item item item item item item item item item item worst score best-score Min top-2 score : 0.9 Threshold (Max of unseen tuples): 1.8 Pruning Candidates: Min top-2 < best score of candidate Stopping Condition Threshold < min top-2 ? item 17 [1.3, 1.8] item 83 [0.9, 2.0] item 25 [0.6, 1.9] item 38 [0.6, 1.8] item 78 [0.5, 1.8] List 1 List 2 List 3 Candidates NRA

item item item item item item item item item item item item item item item item item item item worst score best-score item 83 [1.3, 1.9] item 17 [1.3, 1.9] item 25 [0.6, 1.5] item 78 [0.5, 1.4] Min top-2 score : 1.3 Threshold (Max of unseen tuples): 1.3 Pruning Candidates: Min top-2 < best score of candidate Stopping Condition Threshold < min top-2 ? no more new items can get into top-2 but, extra candidates left in queue List 1 List 2 List 3 Candidates NRA

item item item item item item item item item item item item item item item item item item item worst score best-score Min top-2 score : 1.3 Threshold (Max of unseen tuples): 1.1 Pruning Candidates: Min top-2 < best score of candidate Stopping Condition Threshold < min top-2 ? no more new items can get into top-2 but, extra candidates left in queue item item 83 [1.3, 1.9] item 25 [0.6, 1.4] List 1 List 2 List 3 Candidates NRA

item item item item item item item item item item item item item item item item item item item Min top-2 score : 1.6 Threshold (Max of unseen tuples): 0.8 Pruning Candidates: Min top-2 < best score of candidate item item List 1 List 2 List 3 Candidates NRA

 NRA performs only sorted accesses (SA) (No Random Access)  Random access (RA)  lookup actual (final) score of an item  costlier than SA (100 – 100,000 times), cR/cS := (cost of RA)/(cost of SA)  often very useful  CA (Combined Algorithm), (Fagin et al., 2001)  one RA after every cR/cS SAs  total cost of SA ~ total cost of RA  Measure of effectiveness (access cost): #SA + cR/cS x #RA  Full-merge: compute scores for all items followed by partial sort  simple and efficient  important baseline for any top-k algorithm  Problems with NRA, CA  high bookkeeping overhead  for “high” values of k, gain in even access cost not significant NRA

References  IO-Top-k: Index-access Optimized Top-k Query Processing Debapriyo Majumdar Max-Planck-Institut f ü r Informatik Saarbr ü cken, Germany Joint work with Holger Bast, Ralf Schenkel, Martin Theobald, Gerhard Weikum   Top-k Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel