Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic Data Management

Similar presentations


Presentation on theme: "Probabilistic Data Management"— Presentation transcript:

1 Probabilistic Data Management
Chapter 5: Probabilistic Query Answering (3)

2 Objectives In this chapter, you will:
Learn the definition and query processing techniques of a probabilistic query type Probabilistic Reverse Nearest Neighbor Query

3 Recall: Probabilistic Query Types
Probabilistic Spatial Query Uncertain/probabilistic database Probabilistic range query Probabilistic k-nearest neighbor query Probabilistic group nearest neighbor (PGNN) query Probabilistic reverse k-nearest neighbor query Probabilistic spatial join /similarity join Probabilistic top-k query (or ranked query) Probabilistic skyline query Probabilistic reverse skyline query Probabilistic Preference Query 3 3

4 Probabilistic Reverse Nearest Neighbor Queries in Uncertain Databases
Very Large Data Bases Journal (VLDBJ), 2009

5 Outline Introduction Related Work Problem Definition
PRNN Query Processing Experimental Evaluation Summary

6 Reverse Nearest Neighbor Query (RNN)
Rescue tasks in oceans In the case of emergency, a ship will ask its nearest ship for help A rescue ship needs to monitor those ships that have itself as their nearest neighbors In other words, the rescue ship needs to obtain its reverse nearest neighbors (RNNs)

7 Introduction Reverse Nearest Neighbor Query (RNN)
Given a database D and a query object q, a RNN query retrieves those data objects o D that have q as nearest neighbor q o5 o4 o2 o1 o3

8 RNN Processing on Certain Data Points
TPL Approach [VLDB'04] q RNN candidate o5 o4 o2 o1 o3 pruning region 8

9 RNN Processing on Certain Data Points
TPL Approach [VLDB'04] RNN candidate q RNN candidate o5 o4 o2 o1 o3 pruning region 9

10 Probabilistic Reverse Nearest Neighbor Query (PRNN)
Due to the accuracy of positioning devices (e.g. GPS) or their movement, the reported positions of ships are imprecise Therefore, it is important to answer RNN queries over uncertain data effectively and efficiently

11 Other Application of PRNN
Mixed-reality game Each player tend to shoot his/her nearest neighbor A query player needs to monitor those players (RNNs) who have himself/herself as their nearest neighbors Due to movement of players, positions of players can be imprecise and uncertain, and RNN is conducted on uncertain objects

12 RNN Queries in Uncertain Databases

13 PRNN Definition Probabilistic Reverse Nearest Neighbor (PRNN) Queries
an uncertain database D a query object q, and a probabilistic threshold   (0, 1] To retrieve uncertain objects o D that are RNNs of q with probabilities PPRNN(q, o) greater than or equal to, that is, where r1 and r2 are min and max distances from q to o, respectively

14 A Straightforward Method
For every uncertain object o in the database Sequentially scan all the objects in the database Calculate the PRNN probability, PPRNN (q, o), that o is an RNN of q If PPRNN (q, o) is greater than or equal to probabilistic threshold a, then o is the answer; otherwise, o is discarded Analysis Complexity: O(N2), where N is the database size The computation of probability PPRNN (q, o) is very costly

15 Pruning Techniques Geometric Pruning (GP) GP0 method
The object distribution in the uncertainty region can be either known or unknown Prune those data objects that definitely cannot be RNN of q GPb method (b  (0, 1]) The object distribution in uncertainty region is known and the pre-computation is allowed Prune those objects with the PRNN probability smaller than b

16 Heuristics of GP0 Method
Data objects always reside within uncertainty regions conservative pruning region (CPR)

17 Heuristics of GP0 Method (cont.)
no false dismissals are introduced with hypersphere approximation candidate o

18 Conditions of GP0 Method
Pruning Conditions dist(P, q) - dist(P, Co) > ro mindist(P, D)  rp In other words, if object p is fully contained in the pruning region CPR'(q, o), then p can be safely pruned

19 Heuristics of GPb Method (b  (0, 1])
GPb prunes those objects with the PRNN probability smaller than b (< a) p can be pruned by GPb candidate o

20 Refinement Phase After applying geometric pruning methods, we can obtain a candidate set For each candidate o, we retrieve those uncertain objects p' intersecting with PR and compute the probability that o is an RNN of q

21 PRNN Query Processing Maintain a multidimensional index structure over uncertain database // indexing phase For each PRNN query Apply geometric pruning methods during the index traversal // pruning phase Refine candidates and return the answer set // refinement phase

22 PRNN Query Processing Index uncertain data with an R-tree

23 PRNN Query Procedure Traverse the R-tree index by maintaining a minimum heap (with key the minimum distance from query point to node) For each node/object Ni we encounter Check whether or not Ni can be pruned by GP methods If the answer is no, then we either further check the children of node Ni, or add it to a PRNN candidate set Scand in case Ni is an object After the index traversal, we refine candidates in Scand by calculating their actual PRNN probabilities

24 PRNN Query Processing (cont'd)

25 Experimental Evaluation
Experimental Settings Real data sets: LB, MG, TCB, and CAR Synthetic data sets: Generate center location Co of uncertain object o in a data space [0, 1,000]d Produce radius ro  [rmin, rmax] for uncertainty region UR(o) Four types of data sets: lUrU, lUrG, lSrU, and lSrG Competitors: Linear scan (worse than ours by 5-9 orders of magnitude) Naïve pruning (pruning condition: given a PRNN candidate o, a node/object e can be pruned if maxdist(o, e) < mindist(q, e))

26 Performance vs. b data size N = 100K, dimensionality d = 3, radius range [rmin, rmax] = [0, 5], and probabilistic threshold a = 1

27 Summary We formulate the problem of probabilistic queries over uncertain databases We propose effective pruning methods to reduce the search space of probabilistic queries We integrate pruning methods into an efficient query procedure We verify the efficiency of our proposed approaches through extensive experiments


Download ppt "Probabilistic Data Management"

Similar presentations


Ads by Google