Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extending Expectation Propagation for Graphical Models

Similar presentations


Presentation on theme: "Extending Expectation Propagation for Graphical Models"— Presentation transcript:

1 Extending Expectation Propagation for Graphical Models
Yuan (Alan) Qi Joint work with Tom Minka

2 Motivation Graphical models are widely used in real-world applications, such as wireless communications and bioinformatics. Inference techniques on graphical models often sacrifice efficiency for accuracy or sacrifice accuracy for efficiency. Need a method that better balances the trade-off between accuracy and efficiency.

3 Motivation Current Techniques Error What we want Computational Time

4 Outline Background on expectation propagation (EP)
Extending EP on Bayesian networks for dynamic systems Poisson tracking Signal detection for wireless communications Tree-structured EP on loopy graphs Conclusions and future work

5 Outline Background on expectation propagation (EP)
Extending EP on Bayesian networks for dynamic systems Poisson tracking Signal detection for wireless communications Tree-structured EP on loopy graphs Conclusions and future work

6 Graphical Models Directed ( Bayesian networks) Undirected
( Markov networks) x1 x2 y1 y2 x1 x2 y1 y2

7 Inference on Graphical Models
Bayesian inference techniques: Belief propagation (BP): Kalman filtering /smoothing, forward-backward algorithm Monte Carlo: Particle filter/smoothers, MCMC Loopy BP: typically efficient, but not accurate on general loopy graphs Monte Carlo: accurate, but often not efficient

8 Expectation Propagation in a Nutshell
Approximate a probability distribution by simpler parametric terms: For directed graphs: For undirected graphs: Each approximation term lives in an exponential family (e.g. Gaussian)

9 EP in a Nutshell The approximate term minimizes the following KL divergence by moment matching: Where the leave-one-out approximation is

10 Limitations of Plain EP
Can be difficult or expensive to analytically compute the needed moments in order to minimize the desired KL divergence. Can be expensive to compute and maintain a valid approximation distribution q(x), which is coherent under marginalization. Tree-structured q(x):

11 Three Extensions 1. Instead of choosing the approximate term to minimize the following KL divergence: use other criteria. 2. Use numerical approximation to compute moments: Quadrature or Monte Carlo. 3. Allow the tree-structured q(x) to be non-coherent during the iterations. It only needs to be coherent in the end.

12 Efficiency vs. Accuracy
Loopy BP (Factorized EP) Error Extended EP ? Monte Carlo Computational Time

13 Outline Background on expectation propagation (EP)
Extending EP on Bayesian networks for dynamic systems Poisson tracking Signal detection for wireless communications Tree-structured EP on loopy graphs Conclusions and future work

14 Object Tracking Guess the position of an object given noisy observations Object

15 Bayesian Network x1 x2 xT y1 y2 yT e.g. (random walk)
want distribution of x’s given y’s

16 Approximation Factorized and Gaussian in x

17 Message Interpretation
= (forward msg)(observation msg)(backward msg) Forward Message Backward Message xt Observation Message yt

18 EP on Dynamic Systems Filtering: t = 1, …, T Smoothing: t = T, …, 1
Incorporate forward message Initialize observation message Smoothing: t = T, …, 1 Incorporate the backward message Compute the leave-one-out approximation by dividing out the old observation messages Re-approximate the new observation messages Re-filtering: t = 1, …, T Incorporate forward and observation messages

19 Extensions of EP Instead of matching moments, use any method for approximate filtering. Examples: statistical linearization, unscented Kalman filter (UKF), mixture of Kalman filters Turn any deterministic filtering method into a smoothing method! All methods can be interpreted as finding linear/Gaussian approximations to original terms. Use quadrature or Monte Carlo for term approximations

20 Example: Poisson Tracking
is an integer valued Poisson variate with mean

21 Poisson Tracking Model

22 Extension of EP: Approximate Observation Message
is not Gaussian Moments of x not analytic Two approaches: Gauss-Hermite quadrature for moments Statistical linearization instead of moment-matching (Turn unscented Kalman filters into a smoothing method) Both work well

23 Approximate vs. Exact Posterior
p(xT|y1:T) xT

24 Extended EP vs. Monte Carlo: Accuracy
Mean Variance

25 Accuracy/Efficiency Tradeoff

26 EP for Digital Wireless Communication
Signal detection problem Transmitted signal st = vary to encode each symbol Complex representation: Im Re

27 Binary Symbols, Gaussian Noise
Symbols are 1 and –1 (in complex plane) Received signal yt = Optimal detection is easy

28 Fading Channel Channel systematically changes amplitude and phase:
changes over time

29 Benchmark: Differential Detection
Classical technique Use previous observation to estimate state Binary symbols only

30 Bayesian network for Signal Detection
x1 x2 xT y1 y2 yT s1 s2 sT

31 Extended-EP Joint Signal Detection and Channel Estimation
Turn mixture of Kalman filters into a smoothing method Smoothing over the last observations Observations before act as prior for the current estimation

32 Computational Complexity
Expectation propagation O(nLd2) Stochastic mixture of Kalman filters O(LMd2) Rao-blackwised particle smoothers O(LMNd2) n: Number of EP iterations (Typically, 4 or 5) d: Dimension of the parameter vector L: Smooth window length M: Number of samples in filtering (Often larger than 500) N: Number of samples in smoothing (Larger than 50) EP is about 5,000 times faster than Rao-blackwised particle smoothers.

33 Experimental Results (Chen, Wang, Liu 2000) Signal-Noise-Ratio Signal-Noise-Ratio EP outperforms particle smoothers in efficiency with comparable accuracy.

34 Bayesian Networks for Adaptive Decoding
x1 x2 xT y1 y2 yT The information bits et are coded by a convolutional error-correcting encoder.

35 EP Outperforms Viterbi Decoding
Signal-Noise-Ratio

36 Outline Background on expectation propagation (EP)
Extending EP on Bayesian networks for dynamic systems Poisson tracking Signal detection for wireless communications Tree-structured EP on loopy graphs Conclusions and future work

37 Inference on Loopy Graphs
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 Problem: estimate marginal distributions of the variables indexed by the nodes in a loopy graph, e.g., p(xi), i = 1, , 16.

38 4-node Loopy Graph Joint distribution is product of pairwise potentials for all edges: Want to approximate by a simpler distribution

39 BP vs. TreeEP BP TreeEP

40 Junction Tree Representation
p(x) q(x) Junction tree p(x) q(x) Junction tree

41 Two Kinds of Edges On-tree edges, e.g., (x1,x4): exactly incorporated into the junction tree Off-tree edges, e.g., (x1,x2): approximated by projecting them onto the tree structure

42 KL Minimization KL minimization moment matching
Match single and pairwise marginals of Reduces to exact inference on single loops Use cutset conditioning and

43 Matching Marginals on Graph
x5 x7 x1 x2 x1 x3 x1 x4 x3 x5 x3 x6 x5 x7 x1 x2 x1 x3 x1 x4 x3 x5 x3 x6 (1) Incorporate edge (x3 x4) x3 x4 x5 x7 x1 x2 x1 x3 x1 x4 x3 x5 x3 x6 x5 x7 x1 x2 x1 x3 x1 x4 x3 x5 x3 x6 (2) Incorporate edge (x6 x7) x6 x7

44 Drawbacks of Global Propagation
Update all the cliques even when only incorporating one off-tree edge Computationally expensive Store each off-tree data message as a whole tree Require large memory size

45 Solution: Local Propagation
Allow q(x) be non-coherent during the iterations. It only needs to be coherent in the end. Exploit the junction tree representation: only locally propagate information within the minimal loop (subtree) that is directly connected to the off-tree edge. Reduce computational complexity Save memory

46 (1) Incorporate edge(x3 x4)
(2) Propagate evidence x5 x7 x1 x2 x1 x3 x1 x4 x3 x5 x3 x6 On this simple graph, local propagation runs roughly 2 times faster and uses 2 times less memory to store messages than plain EP x5 x7 x3 x5 x3 x6 x6 x7 (3) Incorporate edge (x6 x7)

47 New Interpretation of TreeEP
Marry EP with Junction algorithm Can perform efficiently over hypertrees and hypernodes

48 4-node Graph TreeEP = the proposed method
GBP = generalized belief propagation on triangles TreeVB = variational tree BP = loopy belief propagation = Factorized EP MF = mean-field

49 Fully-connected graphs
Results are averaged over 10 graphs with randomly generated potentials TreeEP performs the same or better than all other methods in both accuracy and efficiency!

50 8x8 grids, 10 trials Method FLOPS Error Exact 30,000 TreeEP 300,000
TreeEP 300,000 0.149 BP/double-loop 15,500,000 0.358 GBP 17,500,000 0.003

51 TreeEP versus BP and GBP
TreeEP is always more accurate than BP and is often faster TreeEP is much more efficient than GBP and more accurate on some problems TreeEP converges more often than BP and GBP

52 Outline Background on expectation propagation (EP)
Extending EP on Bayesian networks for dynamic systems Poisson tracking Signal detection for wireless communications Tree-structured EP on loopy graphs Conclusions and future work

53 Conclusions Extend EP on graphical models:
Instead of minimizing KL divergence, use other sensible criteria to generate messages. Effectively turn any deterministic filtering method into a smoothing method. Use quadrature to approximate messages. Local propagation to save the computation and memory in tree structured EP.

54 Conclusions State-of-art Techniques Error Extended EP Computational Time Extended EP algorithms outperform state-of-art inference methods on graphical models in the trade-off between accuracy and efficiency

55 Future Work More extensions of EP: More real-world applications
How to choose a sensible approximation family (e.g. which tree structure) More flexible approximation: mixture of EP? Error bound? Bayesian conditional random fields More real-world applications

56 End Contact information:


Download ppt "Extending Expectation Propagation for Graphical Models"

Similar presentations


Ads by Google