Thomas Bernecker, Tobias Emrich, Hans-Peter Kriegel,

Slides:

Advertisements

Similar presentations

Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,

Advertisements

Efficient Evaluation of k-Range Nearest Neighbor Queries in Road Networks Jie BaoChi-Yin ChowMohamed F. Mokbel Department of Computer Science and Engineering.

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.

Representing and Querying Correlated Tuples in Probabilistic Databases

1 Chapter 5 : Query Processing and Optimization Group 4: Nipun Garg, Surabhi Mithal

Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1.

Danzhou Liu Ee-Peng Lim Wee-Keong Ng

School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.

Efficient Query Evaluation on Probabilistic Databases

VLDB’2007 review Denis Mindolin. VLDB’07 program.

OLAP Over Uncertain and Imprecise Data Adapted from a talk by T.S. Jayram (IBM Almaden) with Doug Burdick (Wisconsin), Prasad Deshpande (IBM), Raghu Ramakrishnan.

1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.

Indexing the imprecise positions of moving objects Xiaofeng Ding and Yansheng Lu Department of Computer Science Huazhong University of Science & Technology.

LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DATABASE SYSTEMS GROUP DEPARTMENT INSTITUTE FOR INFORMATICS Probabilistic Similarity Queries in Uncertain Databases.

Voronoi-based Nearest Neighbor Search for Multi-Dimensional Uncertain Databases Peiwu Zhang Reynold Cheng Nikos Mamoulis Yu Tang University of Hong Kong.

Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.

Effectively Indexing Uncertain Moving Objects for Predictive Queries School of Computing National University of Singapore Department of Computer Science.

Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:

A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.

2-dimensional indexing structure

Spatio-Temporal Databases

Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..

An Intelligent & Incremental Approach to kNN using R-trees DJ Oneil & Esten Rye (G01)

Efficient Join Processing over Uncertain Data - By Reynold Cheng, et all. Presented By Lydia & Usha.

Model-Driven Data Acquisition in Sensor Networks - Amol Deshpande et al., VLDB ‘04 Jisu Oh March 20, 2006 CS 580S Paper Presentation.

Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Using Sets of Feature Vectors for Similarity Search on Voxelized.

Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.

R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.

Presented by: Duong, Huu Kinh Luan March 14 th, 2011.

Database k-Nearest Neighbors in Uncertain Graphs Lin Yincheng VLDB10.

A k-Nearest Neighbor Based Algorithm for Multi-Label Classification Min-Ling Zhang

Wei Cheng 1, Xiaoming Jin 1, and Jian-Tao Sun 2 Intelligent Data Engineering Group, School of Software, Tsinghua University 1 Microsoft Research Asia 2.

Da Yan and Wilfred Ng The Hong Kong University of Science and Technology.

1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.

Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey,

Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,

Nearest Neighbor Searching Under Uncertainty

A Survey Based Seminar: Data Cleaning & Uncertain Data Management Speaker: Shawn Yang Supervisor: Dr. Reynold Cheng Prof. David Cheung

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

Christian Böhm & Hans-Peter Kriegel, Ludwig Maximilians Universität München A Cost Model and Index Architecture for the Similarity Join.

1 L AZY U PDATES : A N E FFICIENT T ECHNIQUE T O C ONTINUOUSLY M ONITORING R EVERSE K NN (PVLDB’09) Presented By: Jing LI Supervisor: Nikos Mamoulis.

Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.

Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.

Spatial Query Processing Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle.

Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.

Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.

On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications 陳良弼 Arbee L.P. Chen National Chengchi University 9/21/2012 at NCHU.

Spatial Range Querying for Gaussian-Based Imprecise Query Objects Yoshiharu Ishikawa, Yuichi Iijima Nagoya University Jeffrey Xu Yu The Chinese University.

A Semantic Caching Method Based on Linear Constraints Yoshiharu Ishikawa and Hiroyuki Kitagawa University of Tsukuba

Presented by: Dardan Xhymshiti Fall  Type: Demonstration paper  Authors:  International conference on Very Large Data Bases. Erich SchubertAlexander.

23 1 Christian Böhm 1, Florian Krebs 2, and Hans-Peter Kriegel 2 1 University for Health Informatics and Technology, Innsbruck 2 University of Munich Optimal.

A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB

Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia

Presented by: Dardan Xhymshiti Fall  Type: Research paper  Authors:  International conference on Very Large Data Bases. Yoonjar Park Seoul National.

IMinMax B.C. Ooi, K.-L Tan, C. Yu, S. Stephen. Indexing the Edges -- A Simple and Yet Efficient Approach to High dimensional Indexing. ACM SIGMOD-SIGACT-

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)

Probabilistic Data Management

Probabilistic Data Management

Clustering Uncertain Taxi data

Spatio-temporal Pattern Queries

Queries with Difference on Probabilistic Databases

Probabilistic Data Management

Lecture 16: Probabilistic Databases

Probabilistic Data Management

Probabilistic Data Management

University of Crete Department Computer Science CS-562

Distributed Probabilistic Range-Aggregate Query on Uncertain Data

Uncertain Data Mobile Group 报告人：郝兴.

Presentation transcript:

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data Thomas Bernecker, Tobias Emrich, Hans-Peter Kriegel, Matthias Renz, Stefan Zankl and Andreas Zuefle Ludwig-Maximilians-Universität München (LMU) Munich, Germany http://www.dbs.ifi.lmu.de {bernecker, emrich, kriegel, renz, zuefle} @dbs.ifi.lmu.de

Outline Background Framework for Probabilistic RkNN Processing Uncertain Data Model Reverse k-nearest neighbour queries Reverse k-nearest neighbour queries on uncertain objects Framework for Probabilistic RkNN Processing Approximation Spatial Filter Probabilistic Filter Verification Evaluation + Summary Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

Objects are described by a multi-dimensional probability distribution Background Datamodel Framework RkNN Queries Summary PRkNN Queries Objects are described by a multi-dimensional probability distribution Object Independence Assumption Queries are answered according to possible worlds semantic Object PDFs can be spatially bounded Continuous or discrete representation User ratings for „Life of Brian“ Uncertain Attribute a PDFX Attribute können abhängig voneinander sein Mean keine gute reprensentation Action Uncertain Attribute b Humor Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 3

RkNN(q) = {o  DB | q  kNN(o)} Background Datamodel Framework RkNN Queries Summary PRkNN Queries RkNN(q) = {o  DB | q  kNN(o)} o2 o1 What is it good for? Market segmentation Outlier detection Incremental algorithms … o3 o4 o5 q Datamining -> Market Segmentation Outlier Detection Incremental -> Continous Nearest Neighbour o6 R1NN(q) = {o7} R2NN(q) = {o7, o5, o4} o7 Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

Background Datamodel Framework RkNN Queries Summary PRkNN Queries „Is O‘ R1NN of Q?“ O2 O‘ O1 Q Note: The query object may be uncertain.as well! Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

=> In some worlds it is Background Datamodel Framework RkNN Queries Summary PRkNN Queries „Is O‘ R1NN of Q?“ => In some worlds it is O2 O‘ O1 Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

=> In other worlds it is not Background Datamodel Framework RkNN Queries Summary PRkNN Queries „Is O‘ R1NN of Q?“ => In other worlds it is not O2 O‘ O1 Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

Definition of Probabilistic RkNN Background Datamodel Framework RkNN Queries Summary PRkNN Queries Definition of Probabilistic RkNN PRkNN(Q, τ) = {O  DB | P(O  RkNN(Q)) ≥ τ} {O  DB | P(Q  kNN(O)) ≥ τ} O2 O‘ P(Q  1NN(O‘)) = 21/24 e.g. O‘  PR1NN(Q, 0.5) O1 Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

Framework for PRkNN query processing Approximation (Indexing) Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Framework for PRkNN query processing Approximation (Indexing) Simplification of spatial-probabilistic keys Spatial Filter Filter objects according to simple spatial keys Probabilistic Filter Derive lower/upper bounds of qualification probability (by means of simple spatial-probabilistic keys) Filter objects according to lower/upper probability bounds Verification Computation of the exact probability (very expensive) Monte-Carlo Sampling (many samples required) Modularization Comparison of different algorithms Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

R*-Tree for indexing objects (global index) Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification R*-Tree for indexing objects (global index) Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

AR*-Tree for indexing instances (local index) Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification AR*-Tree for indexing instances (local index) 0.3 0.15 1.0 0.15 0.15 0.25 0.15 0.1 0.1 0.2 0.45 Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

Pruning based on rectangular approximations only [1]. Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Pruning based on rectangular approximations only [1]. [1] Tobias Emrich, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Andreas Züfle: Boosting Spatial Pruning: On Optimal Pruning of MBRs. SIGMOD Conference 2010: 39-50 For any O‘ intersecting this region, Q may possibly be closer than O. For any O‘ in this region, O is closer than Q. Task Find k objects O  DB\O‘ which are closer to O‘ than to Q O Q B For any O‘ in this region, O is not closer than Q. Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

Probability of O to be closer to O‘ than Q? Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Probability of O to be closer to O‘ than Q? O Q O‘ B Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

„O is closer to O‘ than Q with at least x% probability“ Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Probability of O to be closer to O‘ than Q? „O is closer to O‘ than Q with at least x% probability“ O Q O‘ Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

„O is closer to O‘ than Q with at most x% probability“ Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Probability of O to be closer to O‘ than Q? „O is closer to O‘ than Q with at most x% probability“ O Q O‘ Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

How many objects O  DB are closer to O‘ than Q? Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Exemplary statements O1 is closer to O’ with at least 20% and at most 50% O2 is closer to O’ with at least 60% and at most 80% Correctly deriving these bounds is not trivial (see paper) How many objects O  DB are closer to O‘ than Q? Consider the following uncertain generating function x-term: probability of the object to be closer to O’ than Q z-term: probability of the object to be further from O’ than Q y-term: uncertainty => (0.2x + 0.3y + 0.5z) * (0.6x + 0.2y + 0.2z) Expansion yields 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² Beim splitten müssen gewisse regeln beachtet werden 1 Term pro objekt Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² 80 % 60 % probability 40 % 20 % 1 2 # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² 80 % 60 % probability 40 % 20 % 1 2 # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² 80 % 60 % probability 40 % 20 % 1 2 # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² 80 % 60 % probability 40 % 20 % 1 2 # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² 80 % 60 % probability 40 % 20 % 1 2 # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² 80 % 60 % probability 40 % 20 % 1 2 # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

Example PRkNN queries PR1NN (Q, 50%)  O‘ is not part of the result Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 80 % 100 % 60 % 80 % probability 40 % 60 % probability 20 % 40 % 20 % 1 2 Exact # objects O  DB that are closer to O‘ than Q 1 2 Maximum # objects O  DB that are closer to O‘ than Q Example PRkNN queries PR1NN (Q, 50%)  O‘ is not part of the result PR2NN (Q, 40%)  O‘ is part of the result PR2NN (Q, 80%)  O‘ has to be further investigated cdf Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

Example PRkNN queries PR1NN (Q, 50%)  O‘ is not part of the result Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 80 % 100 % 60 % 80 % probability 40 % 60 % probability 20 % 40 % 20 % 1 2 Exact # objects O  DB that are closer to O‘ than Q 1 2 Maximum # objects O  DB that are closer to O‘ than Q Example PRkNN queries PR1NN (Q, 50%)  O‘ is not part of the result PR2NN (Q, 40%)  O‘ is part of the result PR2NN (Q, 80%)  O‘ has to be further investigated cdf Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 24

Example PRkNN queries PR1NN (Q, 50%)  O‘ is not part of the result Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 80 % 100 % 60 % 80 % probability 40 % 60 % probability 20 % 40 % 20 % 1 2 Exact # objects O  DB that are closer to O‘ than Q 1 2 Maximum # objects O  DB that are closer to O‘ than Q Example PRkNN queries PR1NN (Q, 50%)  O‘ is not part of the result PR2NN (Q, 40%)  O‘ is part of the result PR2NN (Q, 80%)  O‘ has to be further investigated cdf Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 25

Example PRkNN queries PR1NN (Q, 50%)  O‘ is not part of the result Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 80 % 100 % 60 % 80 % probability 40 % 60 % probability 20 % 40 % 20 % 1 2 Exact # objects O  DB that are closer to O‘ than Q 1 2 Maximum # objects O  DB that are closer to O‘ than Q Example PRkNN queries PR1NN (Q, 50%)  O‘ is not part of the result PR2NN (Q, 40%)  O‘ is part of the result PR2NN (Q, 80%)  O‘ has to be further investigated cdf Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 26

Options for Verification Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Options for Verification Consideration of all possible worlds (exponential) Adabting probabilistic nearest neighbour ranking [2] on instance level of objects (polynomial) Monte-Carlo based (linear in the number of samples) [2] Jian Li, Barna Saha, Amol Deshpande: A Unified Approach to Ranking in Probabilistic Databases. PVLDB 2(1): 502-513 (2009) Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

Spatial Filter Background Evaluation Framework Conclusion Summary Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

Background Evaluation Framework Conclusion Summary Probabilitsic Filter Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

Comparison to other algorithms Background Evaluation Framework Conclusion Summary Comparison to other algorithms

Framework for PRkNN query processing Background Evaluation Framework Conclusion Summary Framework for PRkNN query processing Deriving probabilistic pruning bounds for single objects Accumulate theses bounds using uncertain generating functions Cost model for choosing the optimal value for tree depth Comparison to existing algorithms for PRNN processing

Thanks! Questions? Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

Dependency on k

Problem of dependency O’ Q O1, O2