Chapter 4: Probabilistic Query Answering (2)

Slides:



Advertisements
Similar presentations
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Advertisements

Finding the Sites with Best Accessibilities to Amenities Qianlu Lin, Chuan Xiao, Muhammad Aamir Cheema and Wei Wang University of New South Wales, Australia.
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Spatio-temporal Databases
Efficient Evaluation of k-Range Nearest Neighbor Queries in Road Networks Jie BaoChi-Yin ChowMohamed F. Mokbel Department of Computer Science and Engineering.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Similarity Search on Bregman Divergence, Towards Non- Metric Indexing Zhenjie Zhang, Beng Chi Ooi, Srinivasan Parthasarathy, Anthony K. H. Tung.
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Branch & Bound Algorithms
Efficient Reverse k-Nearest Neighbors Retrieval with Local kNN-Distance Estimation Mike Lin.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Effectively Indexing Uncertain Moving Objects for Predictive Queries School of Computing National University of Singapore Department of Computer Science.
Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.
Cheng, Xie, Yiu, Chen, Sun UV-diagram: a Voronoi Diagram for uncertain data 26th IEEE International Conference on Data Engineering Reynold Cheng (University.
Spatio-temporal Databases Time Parameterized Queries.
Spatial Queries Nearest Neighbor Queries.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Wei Cheng 1, Xiaoming Jin 1, and Jian-Tao Sun 2 Intelligent Data Engineering Group, School of Software, Tsinghua University 1 Microsoft Research Asia 2.
Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.
Department of Computer Science City University of Hong Kong Department of Computer Science City University of Hong Kong 1 Probabilistic Continuous Update.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.
EDGE DETECTION IN COMPUTER VISION SYSTEMS PRESENTATION BY : ATUL CHOPRA JUNE EE-6358 COMPUTER VISION UNIVERSITY OF TEXAS AT ARLINGTON.
Shape-based Similarity Query for Trajectory of Mobile Object NTT Communication Science Laboratories, NTT Corporation, JAPAN. Yutaka Yanagisawa Jun-ichi.
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Easiest-to-Reach Neighbor Search Fatimah Aldubaisi.
Efficient Processing of Top-k Spatial Preference Queries
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
Clustering of Uncertain data objects by Voronoi- diagram-based approach Speaker: Chan Kai Fong, Paul Dept of CS, HKU.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
Database Seminar The Gauss-Tree: Efficient Object Identification in Databases of Probabilistic Feature Vectors Authors : Christian Bohm, Alexey Pryakhin,
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
Spatial Range Querying for Gaussian-Based Imprecise Query Objects Yoshiharu Ishikawa, Yuichi Iijima Nagoya University Jeffrey Xu Yu The Chinese University.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Advanced Database Aggregation Query Processing
Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)
Fast Subsequence Matching in Time-Series Databases.
Spatial Data Management
SIMILARITY SEARCH The Metric Space Approach
Data Science Algorithms: The Basic Methods
Database Management System
Probabilistic Data Management
Probabilistic Data Management
Clustering Uncertain Taxi data
Presented by Prashant Duhoon
Sameh Shohdy, Yu Su, and Gagan Agrawal
Nearest Neighbor Queries using R-trees
Visualization of query processing over large-scale road networks
Spatio-temporal Pattern Queries
Probabilistic Data Management
Probabilistic Data Management
Introduction to Spatial Databases
Probabilistic Data Management
Efficient Evaluation of k-NN Queries Using Spatial Mashups
Probabilistic Data Management
Probabilistic Data Management
Distributed Probabilistic Range-Aggregate Query on Uncertain Data
On the Designing of Popular Packages
Uncertain Data Mobile Group 报告人:郝兴.
Efficient Processing of Top-k Spatial Preference Queries
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Data Mining CSCI 307, Spring 2019 Lecture 23
Presentation transcript:

Chapter 4: Probabilistic Query Answering (2) Probabilistic Data Management Chapter 4: Probabilistic Query Answering (2)

Objectives In this chapter, you will: Learn the definition and query processing techniques of a probabilistic query type Probabilistic Group Nearest Neighbor Query X. Lian and L. Chen, "Probabilistic Group Nearest Neighbor Queries in Uncertain Databases," In IEEE Trans. on Knowledge and Data Eng. (TKDE), vol. 20, no. 6, pp. 809-824, June, 2008.

Recall: Probabilistic Query Types Probabilistic Spatial Query Uncertain/probabilistic database Probabilistic range query Probabilistic k-nearest neighbor query Probabilistic group nearest neighbor (PGNN) query Probabilistic reverse k-nearest neighbor query Probabilistic spatial join /similarity join Probabilistic top-k query (or ranked query) Probabilistic skyline query Probabilistic reverse skyline query Probabilistic Preference Query 3 3

Probabilistic Group Nearest Neighbor Queries in Uncertain Databases In IEEE Trans. on Knowledge and Data Engineering (TKDE), 2008

Group Nearest Neighbor Query Group Nearest Neighbor (GNN) Search [ICDE04] Given a database D and a set Q of query objects q1, q2, …, and qn, a GNN query retrieves an object o D that has the smallest summed distance to Q, i.e. i=1~n dist(qi, o) find a data object o that minimizes i=1~3 dist(qi, o) D. Papadias, Q. Shen, Y. Tao, and K. Mouratidis, “Group Nearest Neighbor Queries,” Proc. 20th Int’l Conf. Data Eng. (ICDE), 2004.

An Example of GNN Query In a city, there are many restaurants Three people want to have lunch together at a restaurant Problem: To find a restaurant in the city that minimizes its total distances to these 3 people

Other GNN Applications Image Retrieval Find an image in the database that is similar to a group of user-specified query images Geographic information system Mobile computing applications

Data Uncertainty In many applications, real-world data are usually uncertain and imprecise Image database Noises in image features Location-based services (LBS) Measurement errors (GPS, RFID, etc.) Trajectory data of mobile users are blurred for the sake of privacy preserving Sensor networks Sensory data inherently contain noises due to the environmental factor, network latency, or fluctuation of battery power …

Data Uncertainty (cont.) object o uncertainty region UR(o) traditional database uncertain database

GNN in Uncertain Databases Probabilistic Group Nearest Neighbor (PGNN) Query in Uncertain Database [TKDE08]

Motivation Example of PGNN In the mixed-reality games (like counter strike (CS)), 3 players may want to find a moving enemy to attack who can minimize their total travelling distance Position of each enemy is uncertain (due to the movement or the network delay)

Motivation Example of PGNN (cont'd) When a forest has several places on fire, at least 3 firefighters can collaborate to put out a place on fire The positions that are on fire are uncertain A PGNN can be issued to find those places on fire such that this group of firefighters can arrive as early as possible

Contributions The proposal of probabilistic group nearest neighbor (PGNN) query in the uncertain database Two effective pruning methods Spatial pruning Probabilistic pruning Efficient PGNN query processing approach Variants of PGNN query

Introduction Uncertain database Query processing over uncertain data Uncertain objects are represented by uncertainty regions rather than precise points Distances between uncertain objects are variables instead of fixed values Query processing over uncertain data Re-define traditional query types over precise data Consider unique characteristics (i.e. uncertainty) of uncertain data Guarantee the accuracy of query answers

Definition of PGNN Problem Probabilistic Group Nearest Neighbor (PGNN) Given an uncertain database D, a set of n query points, Q = {q1, q2, …, qn}, and a user-specified probability threshold a  (0, 1], a PGNN query retrieves a set of uncertain objects o  D such that they are expected to be GNN of query set Q with probability greater than a, that is, where adist(o, Q) is defined as a monotonically increasing function adist(o, Q) = f (dist(o, q1), dist(o, q2), …, dist(o, qn)) aggregate function like SUM, MIN, or MAX Euclidean distance

Computation of PGNN Answers Straightforward Approach, Linear Scan For each uncertain object o  D Sequentially scan the entire database Compute the complex probability integration (via numerical method) Output object o, if its probability is greater than a Time Complexity -- O(|D|2)

Two Pruning Techniques Spatial Pruning Probabilistic Pruning

select the smallest distance upper bound as threshold Spatial Pruning Basic idea Compute the lower/upper bounds of the aggregate distance, adist(o, Q), from each uncertain object o to query set Q, at a low cost Use lower/upper bounds to filter out false alarms select the smallest distance upper bound as threshold candidates

Spatial Pruning (cont'd) Recall from the PGNN definition: In fact, the spatial pruning method discards those objects with expected probability of being GNN equal to 0

Derivation of Distance Bounds Probabilistic minimum bounding method (PMBM) Bound all query points with an MBR Compute the minimum/maximum distances between query MBR and uncertain objects max min

Derivation of Distance Bounds (cont'd) Probabilistic single point method (PSPM) Select a geometric centroid of query points as the representative Compute the minimum/maximum distances via triangle inequality |dist(q1, q3) - dist (q3, Co)| - ro ≤ dist(q1, o) ≤ dist(q1, q3)+ dist (q3, Co)+ro

Probabilistic Pruning Intuition of probabilistic pruning Prune those data objects that have the expected PGNN probability (LHS of inequality) smaller than or equal to 

Probabilistic Pruning (cont'd) (1-b)-Hypersphere, For any uncertain object p, we can pre-compute a hypersphere, namely , within its uncertainty region UR(p), such that object p reside in with probability (1-b), where b  [0, a]

Probabilistic Pruning (cont'd) Use (1-b)-hypersphere to obtain lower/upper bounds of distance adist(p1-b, Q), say [LB_adist(p1-b, Q), UB_adist(p1-b, Q)] Any object o can be safely pruned if it holds that: UB_adist(p1-b, Q) <LB_adist(o, Q)

PGNN Query Processing Construct an R-tree over uncertainty regions of data objects Nodes are recursively bounded by minimum bounding rectangle (MBR) until finally one node (root) is obtained

PGNN Query Processing (cont'd) Pruning Intermediate Nodes Define lower/upper bounds of aggregate distances between nodes and query points PMBM (bounding query points with an MBR) PSPM (using centroid of query points and triangle inequality) An intermediate node e can be pruned, if it holds that LB_adist(e, Q)  UB_adist(o, Q) for a candidate o PMBM

PGNN Query Procedure Traverse the nodes of R-tree in a best-first manner by maintaining a minimum heap H (with key the lower bound of distance from node/object to query set Q) Every time we encounter a node/object, we compute its lower/upper bounds of aggregate distance, and apply the spatial pruning to filter out false alarms For uncertain objects that cannot be pruned by spatial pruning, apply probabilistic pruning After tree traversal, we refine the obtained candidate set by calculating the actual PGNN probability

Variants of PGNN Query PGNN query with uncertain query objects Re-define the lower/upper bounds of aggregate distances

Variants of PGNN Query (cont'd) PGNN with different aggregate functions SUM, MIN, MAX AVG, weighted SUM, etc. k-PGNN query To retrieve k uncertain objects that are expected to be closest to a query set Q with probability greater than a

Variants of PGNN Query (cont'd) Any other variants of PGNN? PGNN on road networks? …

Experimental Evaluation Experimental Settings Synthetic data sets Generate center location Co of uncertain object o in a data space [0, 1,000]d Produce radius ro  [rmin, rmax] for uncertainty region UR(o) Four types of data sets: lUrU, lUrG, lSrU, lSrG Measures wall clock time (the filtering time of index traversal) speed-up ratio (total time cost compared with that of the linear scan method)

Query Performance vs. b the time cost of spatial pruning (upper part) the time cost of probabilistic pruning (lower part) data size |D| = 30K, dimensionality d = 3, the number of query points n = 4, probability threshold a = 1, SUM aggregate function

Scalability Test on Data Size |D| dimensionality d = 3, the number of query points n = 4, probability threshold a = 1, SUM aggregate function

Summary We formulated probabilistic group nearest neighbor (PGNN) query in the context of uncertain databases We proposed two effective pruning methods, spatial and probabilistic pruning, to reduce the search space of PGNN query, which can be seamlessly integrated into an efficient query procedure We further discussed some variants of the PGNN query We demonstrated through extensive experiments the efficiency and effectiveness of our proposed pruning methods as well as query processing approaches, in terms of wall clock time and speed-up ratio compared with linear scan