Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 4: Probabilistic Query Answering (2)

Similar presentations


Presentation on theme: "Chapter 4: Probabilistic Query Answering (2)"— Presentation transcript:

1 Chapter 4: Probabilistic Query Answering (2)
Probabilistic Data Management Chapter 4: Probabilistic Query Answering (2)

2 Objectives In this chapter, you will:
Learn the definition and query processing techniques of a probabilistic query type Probabilistic Group Nearest Neighbor Query X. Lian and L. Chen, "Probabilistic Group Nearest Neighbor Queries in Uncertain Databases," In IEEE Trans. on Knowledge and Data Eng. (TKDE), vol. 20, no. 6, pp , June, 2008.

3 Recall: Probabilistic Query Types
Probabilistic Spatial Query Uncertain/probabilistic database Probabilistic range query Probabilistic k-nearest neighbor query Probabilistic group nearest neighbor (PGNN) query Probabilistic reverse k-nearest neighbor query Probabilistic spatial join /similarity join Probabilistic top-k query (or ranked query) Probabilistic skyline query Probabilistic reverse skyline query Probabilistic Preference Query 3 3

4 Probabilistic Group Nearest Neighbor Queries in Uncertain Databases
In IEEE Trans. on Knowledge and Data Engineering (TKDE), 2008

5 Group Nearest Neighbor Query
Group Nearest Neighbor (GNN) Search [ICDE04] Given a database D and a set Q of query objects q1, q2, …, and qn, a GNN query retrieves an object o D that has the smallest summed distance to Q, i.e. i=1~n dist(qi, o) find a data object o that minimizes i=1~3 dist(qi, o) D. Papadias, Q. Shen, Y. Tao, and K. Mouratidis, “Group Nearest Neighbor Queries,” Proc. 20th Int’l Conf. Data Eng. (ICDE), 2004.

6 An Example of GNN Query In a city, there are many restaurants
Three people want to have lunch together at a restaurant Problem: To find a restaurant in the city that minimizes its total distances to these 3 people

7 Other GNN Applications
Image Retrieval Find an image in the database that is similar to a group of user-specified query images Geographic information system Mobile computing applications

8 Data Uncertainty In many applications, real-world data are usually uncertain and imprecise Image database Noises in image features Location-based services (LBS) Measurement errors (GPS, RFID, etc.) Trajectory data of mobile users are blurred for the sake of privacy preserving Sensor networks Sensory data inherently contain noises due to the environmental factor, network latency, or fluctuation of battery power

9 Data Uncertainty (cont.)
object o uncertainty region UR(o) traditional database uncertain database

10 GNN in Uncertain Databases
Probabilistic Group Nearest Neighbor (PGNN) Query in Uncertain Database [TKDE08]

11 Motivation Example of PGNN
In the mixed-reality games (like counter strike (CS)), 3 players may want to find a moving enemy to attack who can minimize their total travelling distance Position of each enemy is uncertain (due to the movement or the network delay)

12 Motivation Example of PGNN (cont'd)
When a forest has several places on fire, at least 3 firefighters can collaborate to put out a place on fire The positions that are on fire are uncertain A PGNN can be issued to find those places on fire such that this group of firefighters can arrive as early as possible

13 Contributions The proposal of probabilistic group nearest neighbor (PGNN) query in the uncertain database Two effective pruning methods Spatial pruning Probabilistic pruning Efficient PGNN query processing approach Variants of PGNN query

14 Introduction Uncertain database Query processing over uncertain data
Uncertain objects are represented by uncertainty regions rather than precise points Distances between uncertain objects are variables instead of fixed values Query processing over uncertain data Re-define traditional query types over precise data Consider unique characteristics (i.e. uncertainty) of uncertain data Guarantee the accuracy of query answers

15 Definition of PGNN Problem
Probabilistic Group Nearest Neighbor (PGNN) Given an uncertain database D, a set of n query points, Q = {q1, q2, …, qn}, and a user-specified probability threshold a  (0, 1], a PGNN query retrieves a set of uncertain objects o  D such that they are expected to be GNN of query set Q with probability greater than a, that is, where adist(o, Q) is defined as a monotonically increasing function adist(o, Q) = f (dist(o, q1), dist(o, q2), …, dist(o, qn)) aggregate function like SUM, MIN, or MAX Euclidean distance

16 Computation of PGNN Answers
Straightforward Approach, Linear Scan For each uncertain object o  D Sequentially scan the entire database Compute the complex probability integration (via numerical method) Output object o, if its probability is greater than a Time Complexity -- O(|D|2)

17 Two Pruning Techniques
Spatial Pruning Probabilistic Pruning

18 select the smallest distance upper bound as threshold
Spatial Pruning Basic idea Compute the lower/upper bounds of the aggregate distance, adist(o, Q), from each uncertain object o to query set Q, at a low cost Use lower/upper bounds to filter out false alarms select the smallest distance upper bound as threshold candidates

19 Spatial Pruning (cont'd)
Recall from the PGNN definition: In fact, the spatial pruning method discards those objects with expected probability of being GNN equal to 0

20 Derivation of Distance Bounds
Probabilistic minimum bounding method (PMBM) Bound all query points with an MBR Compute the minimum/maximum distances between query MBR and uncertain objects max min

21 Derivation of Distance Bounds (cont'd)
Probabilistic single point method (PSPM) Select a geometric centroid of query points as the representative Compute the minimum/maximum distances via triangle inequality |dist(q1, q3) - dist (q3, Co)| - ro ≤ dist(q1, o) ≤ dist(q1, q3)+ dist (q3, Co)+ro

22 Probabilistic Pruning
Intuition of probabilistic pruning Prune those data objects that have the expected PGNN probability (LHS of inequality) smaller than or equal to 

23 Probabilistic Pruning (cont'd)
(1-b)-Hypersphere, For any uncertain object p, we can pre-compute a hypersphere, namely , within its uncertainty region UR(p), such that object p reside in with probability (1-b), where b  [0, a]

24 Probabilistic Pruning (cont'd)
Use (1-b)-hypersphere to obtain lower/upper bounds of distance adist(p1-b, Q), say [LB_adist(p1-b, Q), UB_adist(p1-b, Q)] Any object o can be safely pruned if it holds that: UB_adist(p1-b, Q) <LB_adist(o, Q)

25 PGNN Query Processing Construct an R-tree over uncertainty regions of data objects Nodes are recursively bounded by minimum bounding rectangle (MBR) until finally one node (root) is obtained

26 PGNN Query Processing (cont'd)
Pruning Intermediate Nodes Define lower/upper bounds of aggregate distances between nodes and query points PMBM (bounding query points with an MBR) PSPM (using centroid of query points and triangle inequality) An intermediate node e can be pruned, if it holds that LB_adist(e, Q)  UB_adist(o, Q) for a candidate o PMBM

27 PGNN Query Procedure Traverse the nodes of R-tree in a best-first manner by maintaining a minimum heap H (with key the lower bound of distance from node/object to query set Q) Every time we encounter a node/object, we compute its lower/upper bounds of aggregate distance, and apply the spatial pruning to filter out false alarms For uncertain objects that cannot be pruned by spatial pruning, apply probabilistic pruning After tree traversal, we refine the obtained candidate set by calculating the actual PGNN probability

28 Variants of PGNN Query PGNN query with uncertain query objects
Re-define the lower/upper bounds of aggregate distances

29 Variants of PGNN Query (cont'd)
PGNN with different aggregate functions SUM, MIN, MAX AVG, weighted SUM, etc. k-PGNN query To retrieve k uncertain objects that are expected to be closest to a query set Q with probability greater than a

30 Variants of PGNN Query (cont'd)
Any other variants of PGNN? PGNN on road networks?

31 Experimental Evaluation
Experimental Settings Synthetic data sets Generate center location Co of uncertain object o in a data space [0, 1,000]d Produce radius ro  [rmin, rmax] for uncertainty region UR(o) Four types of data sets: lUrU, lUrG, lSrU, lSrG Measures wall clock time (the filtering time of index traversal) speed-up ratio (total time cost compared with that of the linear scan method)

32 Query Performance vs. b the time cost of spatial pruning (upper part)
the time cost of probabilistic pruning (lower part) data size |D| = 30K, dimensionality d = 3, the number of query points n = 4, probability threshold a = 1, SUM aggregate function

33 Scalability Test on Data Size |D|
dimensionality d = 3, the number of query points n = 4, probability threshold a = 1, SUM aggregate function

34 Summary We formulated probabilistic group nearest neighbor (PGNN) query in the context of uncertain databases We proposed two effective pruning methods, spatial and probabilistic pruning, to reduce the search space of PGNN query, which can be seamlessly integrated into an efficient query procedure We further discussed some variants of the PGNN query We demonstrated through extensive experiments the efficiency and effectiveness of our proposed pruning methods as well as query processing approaches, in terms of wall clock time and speed-up ratio compared with linear scan


Download ppt "Chapter 4: Probabilistic Query Answering (2)"

Similar presentations


Ads by Google