Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.

Similar presentations


Presentation on theme: "A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong."— Presentation transcript:

1 A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong Kong University of Science and Technology Clear Water Bay, Kowloon Hong Kong, China {xlian, leichen}@cse.ust.hk VLDB 2011 @ Seattle

2 Motivation Example Forest monitoring application 2 VLDB 2011 @ Seattle forest Sensory data:

3 Motivation Example (cont'd) Samples s i collected from sensor node n i 3 VLDB 2011 @ Seattle

4 Motivation Example (cont'd) Sensory data are uncertain and imprecise 4 VLDB 2011 @ Seattle uncertainty regions

5 Motivation Example (cont'd) 3 monitoring areas 5 VLDB 2011 @ Seattle forest

6 Motivation Example (cont'd) 3 monitoring areas 6 VLDB 2011 @ Seattle forest spatially close sensors sensors far away

7 Locally Correlated Sensory Data 7 Efficient Query Answering on Locally Correlated Uncertain Data Area 1 Area 2 Area 3 VLDB 2011 @ Seattle

8 Nearest Neighbor Queries on Locally Correlated Uncertain Data 8 VLDB 2011 @ Seattle

9 Outline Introduction Model for Locally Correlated Uncertain Data Problem Definition Query Answering on Uncertain Data With Local Correlations Experimental Evaluation Conclusions 9 VLDB 2011 @ Seattle

10 Introduction Uncertain data are pervasive in real applications  Sensor networks  RFID networks  Location-based services  Data integration While existing works often assume the independence among uncertain objects,  Uncertain objects exhibit correlations 10 VLDB 2011 @ Seattle local correlations!

11 Data Model for Local Correlations Data Model  Uncertain objects contain several locally correlated partitions (LCPs) Uncertain objects within each LCP are correlated with each other Uncertain objects from distinct LCPs are independent of each other 11 VLDB 2011 @ Seattle

12 Data Model for Local Correlations (cont'd) Bayesian network  Each vertex corresponds to a random variable  Each vertex is associated with a conditional probability table (CPT) 12 VLDB 2011 @ Seattle

13 Data Model for Local Correlations (cont'd) The joint probability of variables  Join tuples in CPTs and multiply conditional probabilities  Variable elimination 13 VLDB 2011 @ Seattle

14 Definition of LC-PNN Query Probabilistic Nearest Neighbor Query on Uncertain and Locally Correlated Data, LC-PNN 14 VLDB 2011 @ Seattle

15 Challenges & Solutions Challenges  Straightforward method of linear scan is costly  Computation cost of integration is expensive  Dealing with data correlations Filtering Methods  Index pruning  Candidate filtering with pre-computations 15 VLDB 2011 @ Seattle

16 Index Pruning Basic idea  Let best_so_far be the smallest maximum distance from query point q to any uncertain objects seen so far  Then, any objects/nodes e having mindist(q, e) > best_so_far can be safely pruned 16 best_so_far VLDB 2011 @ Seattle

17 Candidate Filtering with Pre-Computations Basic idea  Obtain an upper bound, UB_Pr LC-PNN (q, o i ), of the LC-PNN probability  Object o i can be safely pruned, if UB_Pr LC-PNN (q, o i ) <  17 How to obtain the probability upper bound? Derived from formula of the LC-PNN probability upper bound via pivots! VLDB 2011 @ Seattle

18 Derivation of Probability Upper Bound 18 pivot piv s5 VLDB 2011 @ Seattle

19 Range [min_, max_ ] of  Let min_ = and max_  = If online  is smaller than min_, then JP o (s 5 ) = 1 If online  is greater than max_ , then JP o (s 5 ) = 0 Thus, we do not need to store pre-computations with  outside the range [min_, max_ ] 19 VLDB 2011 @ Seattle

20 Candidate Positions of Pivots 20 sample s 5 pivot piv s 5

21 Selection of Pivot Positions We provide a cost model to formalize the filtering and refinement costs, and obtain a good value of parameter  to achieve low query cost 21 VLDB 2011 @ Seattle

22 LC-PNN Query Procedure Index uncertain objects containing LCPs in an R-tree based index For an LC-PNN query  When traversing the index, apply index pruning method and candidate filtering to remove false alarms Refine candidates and return true query answers 22 VLDB 2011 @ Seattle

23 Experimental Evaluation Data Sets  Real data: California road network  Synthetic data: lUeU, lUeG, lSeU, and lSeG Generate center locations of LCPs with Uniform or Skew distribution Produce extent lengths of LCPs with Uniform or Gaussian distribution Within LCPs, randomly generate locally correlated uncertain objects with Bayesian networks Competitor  Basic method [Cheng et al., SIGMOD 2003] Assuming uncertain objects are independent Measures  Wall clock time  Speed-up ratio 23 VLDB 2011 @ Seattle

24 LC-PNN Performance vs.  24 Extent length of LCP = [1, 3], data size N = 150K, average No. of uncertain objects in an LCP = 5 VLDB 2011 @ Seattle

25 Conclusions We proposed the problem of queries over locally correlated uncertain data, in particular, the LC-PNN query, which is important in real applications We designed the index pruning method, and based on a proposed cost model, we presented the candidate filtering method via offline pre-computations w.r.t. pivots We provided efficient query processing techniques to answer LC-PNN queries on locally correlated uncertain data , and discussed applying the same framework to answer other types of queries. 25 VLDB 2011 @ Seattle

26 Thank you! Q/A 26 VLDB 2011 @ Seattle


Download ppt "A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong."

Similar presentations


Ads by Google