Presentation is loading. Please wait.

Presentation is loading. Please wait.

Near-Optimal Sensor Placements in Gaussian Processes Carlos Guestrin Andreas KrauseAjit Singh Carnegie Mellon University.

Similar presentations


Presentation on theme: "Near-Optimal Sensor Placements in Gaussian Processes Carlos Guestrin Andreas KrauseAjit Singh Carnegie Mellon University."— Presentation transcript:

1 Near-Optimal Sensor Placements in Gaussian Processes Carlos Guestrin Andreas KrauseAjit Singh Carnegie Mellon University

2 Sensor placement applications Monitoring of spatial phenomena Temperature Precipitation Drilling oil wells... Active learning, experimental design,... Results today not limited to 2-dimensions Precipitation data from Pacific NW Temperature data from sensor network

3 Deploying sensors This deployment: Evenly distributed sensors But, what are the optimal placements??? i.e., solving combinatorial (non-myopic) optimization Chicken-and-Egg problem:  No data or assumptions about distribution Don’t know where to place sensors assumptions Considered in: Computer science (c.f., [Hochbaum & Maass ’85]) Spatial statistics (c.f., [Cressie ’91])

4 Strong assumption – Sensing radius Node predicts values of positions with some radius Becomes a covering problem Problem is NP-complete But there are good algorithms with (PTAS)  -approximation guarantees [Hochbaum & Maass ’85] Unfortunately, approach is usually not useful… Assumption is wrong on real data!  For example…

5 Spatial correlation Precipitation data from Pacific NW Non-local, Non-circular correlations Complex positive and negative correlations

6 Complex, noisy correlations Complex, uneven sensing “region” Actually, noisy correlations, rather than sensing region

7 Combining multiple sources of information Individually, sensors are bad predictors Combined information is more reliable How do we combine information? Focus of spatial statistics Temp here?

8 Gaussian process (GP) - Intuition x - position y - temperature GP – Non-parametric; represents uncertainty; complex correlation functions (kernels) less sure here more sure here Uncertainty after observations are made

9 Gaussian processes Posterior mean temperature Posterior variance Kernel function:Prediction after observing set of sensors A:

10 Gaussian processes for sensor placement Posterior mean temperature Posterior variance Goal: Find sensor placement with least uncertainty after observations Problem is still NP-complete  Need approximation

11 Consider myopically selecting This can be seen as an attempt to non-myopically maximize Non-myopic placements H(A 1 )+ H(A 2 | {A 1 })+... + H(A k | {A 1... A k-1 }) most uncertain most uncertain given A 1 most uncertain given A 1... A k-1 This is exactly the joint entropy H(A) = H({A 1... A k })

12 Entropy criterion (c.f., [Cressie ’91]) A Ã ; For i = 1 to k Add location X i to A, s.t.: “Wasted” information observed by [O’Hagan ’78] Entropy High uncertainty given current set A – X is different Temperature data placements: Entropy Uncertainty (entropy) plot Entropy places sensors along borders Entropy criterion wastes information [O’Hagan ’78], Indirect, doesn’t consider sensing region – No formal non-myopic guarantees 

13 Proposed objective function: Mutual information Locations of interest V Find locations A µ V maximizing mutual information: Intuitive greedy rule: High uncertainty given A – X is different Low uncertainty given rest – X is informative Uncertainty of uninstrumented locations after sensing Uncertainty of uninstrumented locations before sensing Intuitive criterion – Locations that are both different and informative We give formal non-myopic guarantees Temperature data placements: Entropy Mutual information

14 T1T1 T2T2 An important observation T5T5 T4T4 T3T3 Selecting T 1 tells sth. about T 2 and T 5 Selecting T 3 tells sth. about T 2 and T 4 Now adding T 2 would not help much In many cases, new information is worth less if we know more (diminishing returns)!

15 Submodular set functions Submodular set functions are a natural formalism for this idea: f(A [ {X}) – f(A) Maximization of SFs is NP-hard  But… B A {X} ¸ f(B [ {X}) – f(B) for A µ B

16 Theorem [Nemhauser et al. ’78]: The greedy algorithm guarantees (1-1/e) OPT approximation for monotone SFs, i.e. ~ 63% How can we leverage submodularity?

17 Theorem [Nemhauser et al. ’78]: The greedy algorithm guarantees (1-1/e) OPT approximation for monotone SFs, i.e. ~ 63% How can we leverage submodularity?

18 Mutual information and submodularity Mutual information is submodular F(A) = I(A;V\A) So, we should be able to use Nemhauser et al. Mutual information is not monotone!!!  Initially, adding sensor increases MI; later adding sensors decreases MI F( ; ) = I( ; ;V) = 0 F(V) = I(V; ; ) = 0 F(A) ¸ 0 mutual information A= ; A=V num. sensors Even though MI is submodular, can’t apply Nemhauser et al. Or can we…

19 Approximate monotonicity of mutual information If H(X|A) – H(X|V\A) ¸ 0, then MI monotonic Solution: Add grid Z of unobservable locations If H(X|A) – H(X|Z  V\A) ¸ 0, then MI monotonic X AV\A H(X|A) << H(X|V\A) MI not monotonic For sufficiently fine Z: H(X|A) > H(X|Z  V\A) -   MI approximately monotonic Z – unobservable

20 Theorem: Mutual information sensor placement Greedy MI algorithm provides constant factor approximation: placing k sensors, 8  >0: Optimal non-myopic solution Result of our algorithm Constant factor Approximate monotonicity for sufficiently discretization – poly(1/ ,k, ,L,M)  – sensor noise, L – Lipschitz const. of kernels, M – max X K(X,X)

21 Different costs for different placements Theorem 1: Constant-factor approximation of optimal locations – select k sensors Theorem 2: (Cost-sensitive placements) In practice, different locations may have different costs Corridor versus inside wall Have a budget B to spend on placing sensors Constant-factor approximation – same constant (1-1/e) Slightly more complicated than greedy algorithm [Sviridenko / Krause, Guestrin]

22 Deployment results “True” temp. prediction “True” temp. variance Used initial deployment to select 22 new sensors Learned new GP on test data using just these sensors Posterior mean Posterior variance Entropy criterion Mutual information criterion Mutual information has 3 times less variance than entropy criterion Model learned from 54 sensors

23 Comparing to other heuristics mutual information Better Greedy Algorithm we analyze Random placements Pairwise exchange (PE) Start with a some placement Swap locations while improving solution Our bound enables a posteriori analysis for any heuristic Assume, algorithm TUAFSPGP gives results which are 10% better than the results obtained from the greedy algorithm Then we immediately know, TUAFSPGP is within 70% of optimum!

24 Precipitation data Better Entropy criterion Mutual information Entropy Mutual information

25 Computing the greedy rule Exploit sparsity in kernel matrix At each iteration For each candidate position i 2 {1,…,N}, must compute: Requires inversion of NxN matrix – about O(N 3 ) Total running time for k sensors: O(kN 4 ) Polynomial! But very slow in practice 

26 Local kernels Covariance matrix may have many zeros! Each sensor location correlated with a small number of other locations Exploiting locality: If each location correlated with at most d others A sparse representation, and a priority queue trick Reduce complexity from O(kN 4 ) to: Only about O(N log N) == Usually, matrix is only almost sparse

27 Approximately local kernels Covariance matrix may have many elements close to zero E.g., Gaussian kernel Matrix not sparse What if we set them to zero? Sparse matrix Approximate solution Theorem: Truncate small entries ! small effect on solution quality If |K(x,y)| · , set to 0 Then, quality of placements only O(  ) worse

28 Effect of truncated kernels on solution – Rain data Improvement in running time Better Effect on solution quality Better About 3 times faster, minimal effect on solution quality

29 Summary Mutual information criterion for sensor placement in general GPs Efficient algorithms with strong approximation guarantees: (1-1/e) OPT-ε Exploiting local structure improves efficiency Superior prediction accuracy for several real- world problems Related ideas in discrete settings presented at UAI and IJCAI this year Effective algorithm for sensor placement and experimental design; basis for active learning

30 A note on maximizing entropy Entropy is submodular [Ko et al. `95], but… Function F is monotonic iff: Adding X cannot hurt F(A [ X) ¸ F(A) Remark: Entropy in GPs not monotonic (not even approximately) H(A [ X) – H(A) = H(X|A) As discretization becomes finer H(X|A) ! - 1 Nemhauser et al. analysis for submodular functions not applicable directly to entropy

31 How do we predict temperatures at unsensed locations? position temperature Interpolation? Overfits Far away points?

32 How do we predict temperatures at unsensed locations? x - position y - temperature Regression y = a + bx + cx 2 + dx 3 Few parameters, less overfitting But, regression function has no notion of uncertainty!!!  How sure are we about the prediction? less sure here more sure here


Download ppt "Near-Optimal Sensor Placements in Gaussian Processes Carlos Guestrin Andreas KrauseAjit Singh Carnegie Mellon University."

Similar presentations


Ads by Google