Download presentation

Presentation is loading. Please wait.

Published byChristian Nichols Modified over 3 years ago

1
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Efficient Data Acquisition in Sensor Networks Presented By Kedar Bellare (Slides adapted from Carlos Guestrin)

2
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Papers A. Deshpande, C. Guestrin, S. Madden, J. Hellerstein, W.Hong. "Model-Driven Data Acquisition in Sensor Networks," In the 30th International Conference on Very Large Data Bases (VLDB 2004), Toronto, Canada, August 2004."Model-Driven Data Acquisition in Sensor Networks," A. Deshpande, C. Guestrin, W. Hong, S. Madden. "Exploiting Correlated Attributes in Acquisitional Query Processing" In the 21st International Conference on Data Engineering (ICDE 2005), Tokyo, Japan, April 2005. "Exploiting Correlated Attributes in Acquisitional Query Processing"

3
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Model-driven Data Acquisition in Sensor Networks Amol Deshpande 1,4 Carlos Guestrin 4,2 Sam Madden 4,3 Joe Hellerstein 1,4 Wei Hong 4 1 UC Berkeley 2 Carnegie Mellon University 3 MIT 4 Intel Research - Berkeley

4
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Every time step Analogy: Sensor net as a database TinyDB Query Distribute query Collect query answer or data SQL-style query Declarative interface: Sensor nets are not just for PhDs Decrease deployment time Data aggregation: Can reduce communication

5
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Every time step Limitations of existing approach TinyDB Query Distribute query Collect data New Query SQL-style query Redo process every time query changes Query distribution: Every node must receive query (even when approximate answer needed) Data collection: Every node must wake up at every time step Data loss ignored No quality guarantees Data inefficient – ignoring correlations

6
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Sensor net data is correlated Spatial-temporal correlation Inter-attributed correlation Data is not i.i.d. shouldnt ignore missing data Observing one sensor information about other sensors (and future values) Observing one attribute information about other attributes

7
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 t SQL-style query with desired confidence Model-driven data acquisition: overview Probabilistic Model Query Data gathering plan Condition on new observations New Query posterior belief Strengths of model-based data acquisition Observe fewer attributes Exploit correlations Reuse information between queries Directly deal with missing data Answer more complex (probabilistic) queries

8
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Benefits of Statistical Models More robust interpretation of sensor net readings Account for biases in spatial sampling Identify faulty sensors Extrapolate missing values of sensors More efficient data acquisition Lesser number of attributes to observe Reuse of information between queries Exploit correlations – acquire data when model not able to answer query with acceptable confidence More complex queries Probabilistic queries

9
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Issues introduced by Models Optimization problem Given query and model, choose data acquisition plan to best refine answer Two dependencies – statistical benefit of acquiring reading AND system costs Any non-trivial statistical model can capture first dependency Improving model-driven estimates for nearby nodes Connectivity of wireless sensnet affects second dependency

10
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Probabilistic models and queries Users perspective: Query SELECT nodeId, temp ± 0.5°C, conf(.95) FROM sensors WHERE nodeId in {1..8} System selects and observes subset of nodes Observed nodes: {3,6,8} Query result Node12345678 Temp.17.318.117.416.119.221.317.516.3 Conf.98%95%100%99%95%100%98%100%

11
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Probabilistic models - Illustration Node 0 – Interface between user and sensor net No need to query entire network Model chooses to observe voltage even though query is temperature

12
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Probabilistic models (Contd.) Why did model choose to observe voltage instead of temperature? Correlation in Value Cost differential

13
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Probabilistic models and queries Joint distribution P(X 1,…,X n ) Probabilistic query Example: Value of X 2 ± with prob. > 1- Prob. below 1- ? Observe attributes Example: Observe X 1 =18 P(X 2 |X 1 =18) Higher prob., could answer query Learn from historical data

14
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Dynamic models: filtering Joint distribution at time t Observe attributes Example: Observe X 1 =18 Condition on observations t Fewer obs. in future queries Example: Kalman filter Learn from historical data

15
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Kalman Filtering Transition model maintained for each hour of the day – mod(t,24) Evolution of system over time from to Compute using simple marginalization Next obtain posterior distribution for observations including that at (t+1)

16
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Kalman Filtering (Contd.) Transition model is learned by first computing joint density Then use conditioning rule to compute the transition model

17
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Supported queries Value query X i ± with prob. at least 1- SELECT and Range query X i [a,b] with prob. at least 1- which sensors have temperature greater than 25°C ? Aggregation average ± of subset of attribs. with prob. > 1- combine aggregation and selection probability > 10 sensors have temperature greater than 25°C ? Queries require solution to integrals Many queries computed in closed-form Some require numerical integration/sampling

18
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Probabilistic queries Range queries – First marginalize multivariate gaussian – Done by dropping entries being marginalized Compute confidence of query using error function If confidence is less, make observation to improve confidence Conditioning a gaussian on value of some attributes gives another gaussian

19
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Probabilistic queries (Contd.) Value Query Easy to estimate Also determine confidence interval for given error bound and observe attributes if needed Posterior mean can be obtained directly from mean vector conditioned on observed o

20
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Probabilistic queries (Contd.) Average aggregates If we are interested in average over attributes A Define random variable The pdf of Y is given by: where 1[:] is the indicator function Once P(Y=y|o) is defined simply define a value query for random variable Y Other complex aggregate queries can be similarly answered by constructing new random variables PDF of sum of gaussians is a gaussian where: expected mean is the mean of expected values variance is weighted sum of variances Xi plus covariances of Xi and Xj

21
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 t SQL-style query with desired confidence Model-driven data acquisition: overview Probabilistic Model Query Data gathering plan Condition on new observations posterior belief What sensors do we observe ? How do we collect observations?

22
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Acquisition costs Attributes have different acquisition costs Exploit correlation through probabilistic model Must consider networking cost 1 2 63 45 cheaper?

23
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Network model and plan format Assume known (quasi-static) network topology Define traversal using (1.5-approximate) TSP C t (S ) is expected cost of TSP (lossy communication) 1 2 63 45 7 8 12 9 1011 Cost of collecting subset S of sensor values: C(S ) = C a (S )+ C t (S ) Goal: Find subset S that is sufficient to answer query at minimum cost C(S )

24
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Choosing observation plan Is a subset S sufficient? X i 2 [a,b] with prob. > 1- If we observe S =s : R i (s ) = max{ P(X i 2 [a,b] | s ), 1-P(X i 2 [a,b] | s )} Value of S is unknown: R i (S ) = P(s ) R i (s ) ds Optimization problem:

25
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Observation Plan General optimization problem is NP-hard Two algorithms Exhaustive search – exponential Greedy search Begin with empty observation plan Compute benefit R and cost C for added attribute If confidence reached, choose attribute with minimum cost Else add the attribute which has maximum benefit/cost ratio Repeat until you reach desired confidence

26
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 t SQL-style query with desired confidence BBQ system Probabilistic Model Query Data gathering plan Condition on new observations posterior belief Value Range Average Multivariate Gaussians Learn from historical data Equivalent to Kalman filter Simple matrix operations Exhaustive or greedy search Factor 1.5 TSP approximation

27
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Exploiting correlated attributes Extension of single plan to conditional plan Useful when cost of acquisition non-negligible Correlations exist between one or more attributes Queries of the form multi-predicate range queries Query evaluation can become cheaper by observing additional attributes If additional attributes are low-cost Reject tuple with high confidence without expensive acquisition – substantial performance gains

28
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Conditional Plans

29
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Conditional Plans (Contd.) Simple binary decision trees Each interior node n_j specifies binary conditioning predicate (depends on only single attribute value) Choose conditional plan with minimum expected cost

30
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Cost of Conditional Plans Optimal plan Traversal cost Expected plan cost

31
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Issues in Conditional Plans Need to estimate P(Tj|t) Naïve method is to scan historical data for each computation – expensive Cost model Only acquisition cost taken into account Transmission cost Size of plan to fit in RAM Add into the cost model Authors only focus on limiting plan sizes

32
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Architecture

33
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Optimal Conditional Plan Problem is hard Even if we are given conditional probabilities (by oracle) complexity is #P-hard – reduction from 3-SAT Even if we try to optimize our plan with respect to set of d tuples D problem is NP-complete – reduction from complexity of binary decision trees Exhaustive search Depth first search With caching and Pruning Also heuristic solutions using greedy

34
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Example: Intel Berkeley Lab deployment

35
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Experimental results Redwood trees and Intel Lab datasets Learned models from data Static model Dynamic model – Kalman filter, time-indexed transition probabilities Evaluated on a wide range of queries

36
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Cost versus Confidence level

37
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Obtaining approximate values Query: True temperature value ± epsilon with confidence 95%

38
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Approximate range queries Query: Temperature in [T 1,T 2 ] with confidence 95%

39
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Comparison to other methods

40
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Intel Lab traversals

41
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 t SQL-style query with desired confidence BBQ system Probabilistic Model Query Data gathering plan Condition on new observations posterior belief Value Range Average Multivariate Gaussians Learn from historical data Equivalent to Kalman filter Simple matrix operations Exhaustive or greedy search Factor 1.5 TSP approximation Extensions More complex queries Other probabilistic models More advanced planning Outlier detection Dynamic networks Continuous queries …

42
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Conclusions Model-driven data acquisition Observe fewer attributes Exploit correlations Reuse information between queries Directly deal with missing data Answer more complex (probabilistic) queries Basis for future sensor network systems

43
Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Discussion Questions What other models apart from multivariate gaussian can be used? If other models are used will their solution be in closed form? Model-driven techniques are suitable only if test data is same as training data. Will solution be adaptable if test region is different from training region? Optimization problem is hard and expensive to compute even with heuristics. Will it work for real-time data analysis? Outlier detection is not supported for model-driven acquisition. Is there any way to do it for model-based sensor networks? If in general your needed confidence on the query is low then some nodes may not be queried at all?

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google