Presentation is loading. Please wait.

Presentation is loading. Please wait.

A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays Presented by: ZHANG Xiaofei March 2, 2011.

Similar presentations


Presentation on theme: "A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays Presented by: ZHANG Xiaofei March 2, 2011."— Presentation transcript:

1 A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays Presented by: ZHANG Xiaofei March 2, 2011

2 Outline Motivation Modeling correlated uncertainty Construction of A*-tree Analysis of A*-tree Query processing Experiments

3 Outline Motivation Modeling correlated uncertainty Construction of A*-tree Analysis of A*-tree Query processing Experiments

4 Motivation Multidimensional arrays – Suit for scientific and engineering applications – Logically equivalent to relational tables D1 D2 D1D2A1A2…An A cell of the multidimensional arrays: (A 1,A 2,…,A k, D 1,D 2,…D d )

5 Motivation (Cont’d) Uncertain data – Inevitable – Two categories

6 Motivation (Cont’d) Correlated uncertain data – Examples: Geographically distributed sensors IDXYZT.H.P. S * 5* S * 5.5* More applications examples can be found in router’s network traffic analysis, quantization of image or sound, etc.

7 Outline Motivation Modeling correlated uncertainty Construction of A*-tree Analysis of A*-tree Query processing Experiments

8 Modeling Correlated Uncertainty PGM: Probabilistic Graphical Model – Bayesian network Limitations: 1)Prior knowledge and initial probabilities 2)Significant computational cost(NP hard)

9 Modeling Correlated Uncertainty (Cont’d) PGM: Probabilistic Graphical Model – Markov Random Fields A graphical model in which a set of random variables have a Markov property described by an undirected graph Pros: cyclic dependencies Cons: no induced dependencies NP hard to compute

10 Modeling Correlated Uncertainty (Cont’d) Considering the locality of correlation – E.g. a 2-dimensional arrays

11 Outline Motivation Modeling correlated uncertainty Construction of A*-tree Analysis of A*-tree Query processing Experiments

12 Construction of A*-tree Basic A*-structure 1)k-ary tree: k=2^d, where d is the number of correlated dimensions 2)Each leaf contains the joint distribution of four neighboring cells it maps to 3)The joint distribution at each internal node is recursively defined

13 Construction of A*-tree (Cont’d) Joint distribution at a node X1X2 X3X4 Y=(X1+X2+X3+X4)/4 Xi=Y(1+Fi) Fi range k, r entries in distribution table, l bits to present probability

14 Construction of A*-tree (Cont’d) Extension of A*-tree – Uneven dimensional size 2k+1 partitioned as k and k+1 Shorter dimension stops partition first, with partition of longer dimension goes on

15 Construction of A*-tree (Cont’d) Extension of A*-tree – Basic uncertainty blocks of arbitrary shapes Each cell is intuitively the basic uncertain block, however, maybe this granularity is too fine Initial identification of uncertainty blocks is user and application specified

16 Outline Motivation Modeling correlated uncertainty Construction of A*-tree Analysis of A*-tree Query processing Experiments

17 Analysis of A*-tree Natural mapping from A*-tree to Bayesian Network

18 Analysis of A*-tree (Cont’d) How A*-tree model express the neighboring correlation – From the perspective of any random query, the average level where cell correlation is encoded is low. (efficient inference & accurate modeling)

19 Analysis of A*-tree (Cont’d) Neighboring cells and clustering distance – Definition

20 Analysis of A*-tree (Cont’d) Neighboring cells and clustering distance

21 Analysis of A*-tree (Cont’d) CD (Clustering Distance) – For any query that may return q pairs of neighboring cells Expected average CD e.g. for 1024*1024 array, h=10, then E(argCD )~ 1.01

22 Analysis of A*-tree (Cont’d) Accuracy vs. Efficiency – Double “flip” – Polynomial time scan O(d*n) – Consider basic uncertainty block

23 Outline Motivation Modeling correlated uncertainty Construction of A*-tree Analysis of A*-tree Query processing Experiments

24 Query Processing Monte Carlo based query processing – Sampling Q: select avg(brightness) From space_image Where Dis(x,y,z,322,108,251)<50

25 Query Processing (Cont’d) Compared with MRF – MRF require sequenced round sampling – Each sample node is computed from all the nodes

26 Query Processing (Cont’d) Other queries – COUNT, AVG and SUM Minimum Set Cover Build-in cell-count function Effectively query answering

27 Outline Motivation Modeling correlated uncertainty Construction of A*-tree Analysis of A*-tree Query processing Experiments

28 Data set description Evaluations – Accuracy of modeling the underlying joint distribution – Execution time – Aggregate query – Space cost IDXYTTemperature

29 Experiments (Cont’d) Accuracy

30 Experiments (Cont’d) Accuracy

31 Experiments (Cont’d) Execution time

32 Experiments (Cont’d) Aggregate query and space cost

33 Thank you! Q&A


Download ppt "A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays Presented by: ZHANG Xiaofei March 2, 2011."

Similar presentations


Ads by Google