Presentation is loading. Please wait.

Presentation is loading. Please wait.

Thomas Bernecker, Tobias Emrich, Hans-Peter Kriegel,

Similar presentations


Presentation on theme: "Thomas Bernecker, Tobias Emrich, Hans-Peter Kriegel,"— Presentation transcript:

1 Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
Thomas Bernecker, Tobias Emrich, Hans-Peter Kriegel, Matthias Renz, Stefan Zankl and Andreas Zuefle Ludwig-Maximilians-Universität München (LMU) Munich, Germany {bernecker, emrich, kriegel, renz,

2 Outline Background Framework for Probabilistic RkNN Processing
Uncertain Data Model Reverse k-nearest neighbour queries Reverse k-nearest neighbour queries on uncertain objects Framework for Probabilistic RkNN Processing Approximation Spatial Filter Probabilistic Filter Verification Evaluation + Summary Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

3 Objects are described by a multi-dimensional probability distribution
Background Datamodel Framework RkNN Queries Summary PRkNN Queries Objects are described by a multi-dimensional probability distribution Object Independence Assumption Queries are answered according to possible worlds semantic Object PDFs can be spatially bounded Continuous or discrete representation User ratings for „Life of Brian“ Uncertain Attribute a PDFX Attribute können abhängig voneinander sein Mean keine gute reprensentation Action Uncertain Attribute b Humor Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 3

4 RkNN(q) = {o  DB | q  kNN(o)}
Background Datamodel Framework RkNN Queries Summary PRkNN Queries RkNN(q) = {o  DB | q  kNN(o)} o2 o1 What is it good for? Market segmentation Outlier detection Incremental algorithms o3 o4 o5 q Datamining -> Market Segmentation Outlier Detection Incremental -> Continous Nearest Neighbour o6 R1NN(q) = {o7} R2NN(q) = {o7, o5, o4} o7 Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

5 Background Datamodel Framework RkNN Queries Summary PRkNN Queries
„Is O‘ R1NN of Q?“ O2 O‘ O1 Q Note: The query object may be uncertain.as well! Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

6 => In some worlds it is
Background Datamodel Framework RkNN Queries Summary PRkNN Queries „Is O‘ R1NN of Q?“ => In some worlds it is O2 O‘ O1 Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

7 => In other worlds it is not
Background Datamodel Framework RkNN Queries Summary PRkNN Queries „Is O‘ R1NN of Q?“ => In other worlds it is not O2 O‘ O1 Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

8 Definition of Probabilistic RkNN
Background Datamodel Framework RkNN Queries Summary PRkNN Queries Definition of Probabilistic RkNN PRkNN(Q, τ) = {O  DB | P(O  RkNN(Q)) ≥ τ} {O  DB | P(Q  kNN(O)) ≥ τ} O2 O‘ P(Q  1NN(O‘)) = 21/24 e.g. O‘  PR1NN(Q, 0.5) O1 Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

9 Framework for PRkNN query processing Approximation (Indexing)
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Framework for PRkNN query processing Approximation (Indexing) Simplification of spatial-probabilistic keys Spatial Filter Filter objects according to simple spatial keys Probabilistic Filter Derive lower/upper bounds of qualification probability (by means of simple spatial-probabilistic keys) Filter objects according to lower/upper probability bounds Verification Computation of the exact probability (very expensive) Monte-Carlo Sampling (many samples required) Modularization Comparison of different algorithms Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

10 R*-Tree for indexing objects (global index)
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification R*-Tree for indexing objects (global index) Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

11 AR*-Tree for indexing instances (local index)
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification AR*-Tree for indexing instances (local index) 0.3 0.15 1.0 0.15 0.15 0.25 0.15 0.1 0.1 0.2 0.45 Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

12 Pruning based on rectangular approximations only [1].
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Pruning based on rectangular approximations only [1]. [1] Tobias Emrich, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Andreas Züfle: Boosting Spatial Pruning: On Optimal Pruning of MBRs. SIGMOD Conference 2010: 39-50 For any O‘ intersecting this region, Q may possibly be closer than O. For any O‘ in this region, O is closer than Q. Task Find k objects O  DB\O‘ which are closer to O‘ than to Q O Q B For any O‘ in this region, O is not closer than Q. Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

13 Probability of O to be closer to O‘ than Q?
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Probability of O to be closer to O‘ than Q? O Q O‘ B Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

14 „O is closer to O‘ than Q with at least x% probability“
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Probability of O to be closer to O‘ than Q? „O is closer to O‘ than Q with at least x% probability“ O Q O‘ Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

15 „O is closer to O‘ than Q with at most x% probability“
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Probability of O to be closer to O‘ than Q? „O is closer to O‘ than Q with at most x% probability“ O Q O‘ Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

16 How many objects O  DB are closer to O‘ than Q?
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Exemplary statements O1 is closer to O’ with at least 20% and at most 50% O2 is closer to O’ with at least 60% and at most 80% Correctly deriving these bounds is not trivial (see paper) How many objects O  DB are closer to O‘ than Q? Consider the following uncertain generating function x-term: probability of the object to be closer to O’ than Q z-term: probability of the object to be further from O’ than Q y-term: uncertainty => (0.2x + 0.3y + 0.5z) * (0.6x + 0.2y + 0.2z) Expansion yields 0.12x² xz + 0.1z² xy yz y² Beim splitten müssen gewisse regeln beachtet werden 1 Term pro objekt Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

17 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 0.12x² xz + 0.1z² xy yz y² 80 % 60 % probability 40 % 20 % 1 2 # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

18 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 0.12x² xz + 0.1z² xy yz y² 80 % 60 % probability 40 % 20 % 1 2 # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

19 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 0.12x² xz + 0.1z² xy yz y² 80 % 60 % probability 40 % 20 % 1 2 # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

20 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 0.12x² xz + 0.1z² xy yz y² 80 % 60 % probability 40 % 20 % 1 2 # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

21 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 0.12x² xz + 0.1z² xy yz y² 80 % 60 % probability 40 % 20 % 1 2 # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

22 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 0.12x² xz + 0.1z² xy yz y² 80 % 60 % probability 40 % 20 % 1 2 # objects O  DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

23 Example PRkNN queries PR1NN (Q, 50%)  O‘ is not part of the result
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 80 % 100 % 60 % 80 % probability 40 % 60 % probability 20 % 40 % 20 % 1 2 Exact # objects O  DB that are closer to O‘ than Q 1 2 Maximum # objects O  DB that are closer to O‘ than Q Example PRkNN queries PR1NN (Q, 50%)  O‘ is not part of the result PR2NN (Q, 40%)  O‘ is part of the result PR2NN (Q, 80%)  O‘ has to be further investigated cdf Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

24 Example PRkNN queries PR1NN (Q, 50%)  O‘ is not part of the result
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 80 % 100 % 60 % 80 % probability 40 % 60 % probability 20 % 40 % 20 % 1 2 Exact # objects O  DB that are closer to O‘ than Q 1 2 Maximum # objects O  DB that are closer to O‘ than Q Example PRkNN queries PR1NN (Q, 50%)  O‘ is not part of the result PR2NN (Q, 40%)  O‘ is part of the result PR2NN (Q, 80%)  O‘ has to be further investigated cdf Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 24

25 Example PRkNN queries PR1NN (Q, 50%)  O‘ is not part of the result
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 80 % 100 % 60 % 80 % probability 40 % 60 % probability 20 % 40 % 20 % 1 2 Exact # objects O  DB that are closer to O‘ than Q 1 2 Maximum # objects O  DB that are closer to O‘ than Q Example PRkNN queries PR1NN (Q, 50%)  O‘ is not part of the result PR2NN (Q, 40%)  O‘ is part of the result PR2NN (Q, 80%)  O‘ has to be further investigated cdf Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 25

26 Example PRkNN queries PR1NN (Q, 50%)  O‘ is not part of the result
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification 80 % 100 % 60 % 80 % probability 40 % 60 % probability 20 % 40 % 20 % 1 2 Exact # objects O  DB that are closer to O‘ than Q 1 2 Maximum # objects O  DB that are closer to O‘ than Q Example PRkNN queries PR1NN (Q, 50%)  O‘ is not part of the result PR2NN (Q, 40%)  O‘ is part of the result PR2NN (Q, 80%)  O‘ has to be further investigated cdf Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 26

27 Options for Verification
Background Approximation Framework Spatial Filter Summary Probabilistic Filter Verification Options for Verification Consideration of all possible worlds (exponential) Adabting probabilistic nearest neighbour ranking [2] on instance level of objects (polynomial) Monte-Carlo based (linear in the number of samples) [2] Jian Li, Barna Saha, Amol Deshpande: A Unified Approach to Ranking in Probabilistic Databases. PVLDB 2(1): (2009) Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

28 Spatial Filter Background Evaluation Framework Conclusion Summary
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

29 Background Evaluation Framework Conclusion Summary
Probabilitsic Filter Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

30 Comparison to other algorithms
Background Evaluation Framework Conclusion Summary Comparison to other algorithms

31 Framework for PRkNN query processing
Background Evaluation Framework Conclusion Summary Framework for PRkNN query processing Deriving probabilistic pruning bounds for single objects Accumulate theses bounds using uncertain generating functions Cost model for choosing the optimal value for tree depth Comparison to existing algorithms for PRNN processing

32 Thanks! Questions? Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

33 Dependency on k

34 Problem of dependency O’ Q O1, O2


Download ppt "Thomas Bernecker, Tobias Emrich, Hans-Peter Kriegel,"

Similar presentations


Ads by Google