Presentation is loading. Please wait.

Presentation is loading. Please wait.

Expectation Propagation for Graphical Models Yuan (Alan) Qi Joint work with Tom Minka.

Similar presentations


Presentation on theme: "Expectation Propagation for Graphical Models Yuan (Alan) Qi Joint work with Tom Minka."— Presentation transcript:

1 Expectation Propagation for Graphical Models Yuan (Alan) Qi Joint work with Tom Minka

2 Motivation Graphical models are widely used in real- world applications, such as wireless communications and bioinformatics. Inference techniques on graphical models often sacrifice efficiency for accuracy or sacrifice accuracy for efficiency. Need a new method that better balances the trade-off between accuracy and efficiency.

3 Motivation Efficiency Accuracy Current Techniques What we want

4 Outline Background Expectation Propagation (EP) on dynamic systems –Poisson tracking –Signal detection for wireless communications Tree-structured EP on loopy graphs Conclusions and future work

5 Outline Background Expectation Propagation (EP) on dynamic systems –Poisson tracking –Signal detection for wireless communications Tree-structured EP on loopy graphs Conclusions

6 Graphical Models DirectedUndirected GenerativeBayesian networksBoltzman machines Conditional (Discriminative) Maximum entropy Markov models Conditional random fields x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2

7 Inference on Graphical Models Bayesian inference techniques: –Belief propagation(BP): Kalman filtering /smoothing, forward-backward algorithm –Monte Carlo: Particle filter/smoothers, MCMC Loopy BP: typically efficient, but not accurate Monte Carlo: accurate, but often not efficient

8 Efficiency vs. Accuracy Efficiency Accuracy EP ? BP MC

9 Expectation Propagation in a Nutshell Approximate a probability distribution by simpler parametric terms: Each approximation term lives in an exponential family (e.g. Gaussian)

10 Update Term Approximation Iterate the fixed-point equation by moment matching: Where the leave-one-out approximation is

11 Outline Background Expectation Propagation (EP) on dynamic systems –Poisson tracking –Signal detection for wireless communications Tree-structured EP on loopy graphs Conclusions

12 EP on Dynamic Systems DirectedUndirected GenerativeBayesian networksBoltzman machines Conditional (Discriminative) Maximum entropy Markov models Conditional random fields x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2

13 Object Tracking Guess the position of an object given noisy observations Object

14 Bayesian Network (random walk) e.g. want distribution of x’s given y’s x1x1 x2x2 xTxT y1y1 y2y2 yTyT

15 Approximation Factorized and Gaussian in x (proportional)

16 Message Interpretation = (forward msg)(observation)(backward msg) xtxt ytyt Forward Message Backward Message Observation Message

17 EP on Dynamic Systems Filtering: t = 1, …, T –Incorporate forward message –Initialize observation message Smoothing: t = T, …, 1 –Incorporate the backward message –Compute the leave-one-out approximation by dividing out the old observation messages –Re-approximate the new observation messages Re-filtering: t = 1, …, T –Incorporate forward and observation messages

18 Extension of EP Instead of matching moments, use any method for approximate filtering. –Examples: Extended Kalman filter, statistical linearization, unscented filter All methods can be interpreted as finding linear/Gaussian approximations to original terms

19 Example: Poisson Tracking is an integer valued Poisson variate with mean

20 Poisson Tracking Model

21 Approximate Observation Message is not Gaussian Moments of x not analytic Two approaches: –Gauss-Hermite quadrature for moments –Statistical linearization instead of moment- matching Both work well

22 EP Accuracy Improves Significantly in only a few Iterations

23 Approximate vs. Exact Posterior

24 EP vs. Monte Carlo: Accuracy Variance Mean

25 Accuracy/Efficiency Tradeoff

26 EP for Digital Wireless Communication Signal detection problem Transmitted signal s t = vary to encode each symbol Complex representation: Re Im

27 Binary Symbols, Gaussian Noise Symbols are 1 and –1 (in complex plane) Received signal y t = Optimal detection is easy

28 Fading Channel Channel systematically changes amplitude and phase: changes over time

29 Benchmark: Differential Detection Classical technique Use previous observation to estimate state Binary symbols only

30 Bayesian network for Signal Detection x1x1 x2x2 xTxT y1y1 y2y2 yTyT s1s1 s2s2 sTsT

31 On-line EP Joint Signal Detector and Channel Estimation Iterate over the last observations Observations before act as prior for the current estimation

32 Computational Complexity Expectation propagation O(nLd 2 ) Stochastic mixture of Kalman filters O(LMd 2 ) Rao-blackwised paricle smoothers O(LMNd 2 ) n: Number of EP iterations (Typically, 4 or 5) d: Dimension of the parameter vector L: Smooth window length M: Number of samples in filtering N: Number of samples in smoothing

33 Experimental Results EP outperforms particle smoothers in efficiency with comparable accuracy. (Chen, Wang, Liu 2000)

34 Bayesian Networks for Adaptive Decoding x1x1 x2x2 xTxT y1y1 y2y2 yTyT e1e1 e2e2 eTeT The information bits e t are coded by a convolutional error-correcting encoder.

35 EP Outperforms Viterbi Decoding

36 Outline Background Expectation Propagation (EP) on dynamic systems –Poisson tracking –Signal detection for wireless communications Tree-structured EP on loopy graphs Conclusions

37 EP on Boltzman machines DirectedUndirected GenerativeBayesian networksBoltzman machines Conditional (Discriminative) Maximum entropy Markov models Conditional random fields x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2

38 Inference on Grids Problem: estimate marginal distributions of the variables indexed by the nodes in a loopy graph, e.g., p(x i ), i = 1,..., 16. X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 X 13 X 14 X 15 X 16

39 Boltzmann Machines Joint distribution is product of pair potentials: Want to approximate by a simpler distribution

40 BP vs. EP BP EP

41 Junction Tree Representation p(x) q(x) Junction tree

42 Approximating an Edge by a Tree Each potential f a in p is projected onto the tree-structure of q Correlations are not lost, but projected onto the tree

43 Moment Matching Match single and pairwise marginals of Reduces to exact inference on single loops –Use cutset conditioning and

44 Local Propagation Original EP: globally propagate evidence to the whole tree –Problem: Computationally expensive Exploit the junction tree representation: only locally propagate evidence within the minimal subtree that is directly connected to the off-tree edge. – Reduce computational complexity – Save memory

45 x 5 x 7 x 1 x 2 x 1 x 3 x 1 x 4 x 3 x 5 x 3 x 6 x 3 x 4 x 5 x 7 x 1 x 2 x 1 x 3 x 1 x 4 x 3 x 5 x 3 x 6 Global propagation x 3 x 4 Local propagation x 5 x 7 x 1 x 2 x 1 x 3 x 1 x 4 x 3 x 5 x 3 x 6 x 1 x 2 x 1 x 3 x 1 x 4

46 4-node Graph TreeEP = the proposed method, BP = loopy belief propagation, GBP = generalized belief propagation on triangles, MF = mean-field, TreeVB =variational tree.

47 Fully-connected graphs Results are averaged over 10 graphs with randomly generated potentials TreeEP performs the same or better than all other methods in both accuracy and efficiency!

48 8x8 grids, 10 trials MethodFLOPSError Exact30,0000 TreeEP300,0000.149 BP/double-loop15,500,0000.358 GBP17,500,0000.003

49 TreeEP versus BP and GBP TreeEP is always more accurate than BP and is often faster TreeEP is much more efficient than GBP and more accurate on some problems TreeEP converges more often than BP and GBP

50 Outline Background Expectation Propagation (EP) on dynamic systems –Poisson tracking –Signal detection for wireless communications Tree-structured EP on loopy graphs Conclusions

51 EP algorithms outperform state-of-art inference methods on graphical models in the trade-off between accuracy and efficiency Efficiency Accuracy EP

52 Future Work EP is applicable to a wide range of applications EP is sensitive to choice of approximation –How to choose an approximation family (e.g. tree structure) –More flexible approximation: mixture of EP? –Error bound?

53 Future Work DirectedUndirected GenerativeBayesian networksBoltzman machines Conditional (Discriminative) Maximum entropy Markov models Conditional random fields x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2

54 End

55 EP versus BP EP approximation is in a restricted family, e.g. Gaussian EP approximation does not have to be factorized EP applies to many more problems –e.g. mixture of discrete/continuous variables

56 EP versus Monte Carlo Monte Carlo is general but expensive EP exploits underlying simplicity of the problem if it exists Monte Carlo is still needed for complex problems (e.g. large isolated peaks) Trick is to know what problem you have

57 (Loopy) Belief propagation Specialize to factorized approximations: Minimize KL-divergence = match marginals of (partially factorized) and (fully factorized) –“send messages” “messages”

58 Limitation of BP If the dynamics or measurements are not linear and Gaussian, the complexity of the posterior increases with the number of measurements I.e. BP equations are not “closed” –Beliefs need not stay within a given family * or any other exponential family *

59 Approximate filtering Compute a Gaussian belief which approximates the true posterior: E.g. Extended Kalman filter, statistical linearization, unscented filter, assumed- density filter

60 EP perspective Approximate filtering is equivalent to replacing true measurement/dynamics equations with linear/Gaussian equations implies Gaussian

61 EP perspective EKF, UKF, ADF are all algorithms for: Nonlinear, Non-Gaussian Linear, Gaussian

62 Terminology Filtering: p(x t |y 1:t ) Smoothing: p(x t |y 1:t+L ) where L>0 On-line: old data is discarded (fixed memory) Off-line: old data is re-used (unbounded memory)

63 Kalman filtering / Belief propagation Prediction: Measurement: Smoothing:


Download ppt "Expectation Propagation for Graphical Models Yuan (Alan) Qi Joint work with Tom Minka."

Similar presentations


Ads by Google