Download presentation

Presentation is loading. Please wait.

Published byLeah Johnston Modified over 3 years ago

1
Introduction to Graphical Models Brookes Vision Lab Reading Group

2
Graphical Models To build a complex system using simpler parts. System should be consistent Parts are combined using probability Undirected – Markov random fields Directed – Bayesian Networks

3
Overview Representation Inference Linear Gaussian Models Approximate inference Learning

4
Causality : Sprinkler causes wet grass Representation

5
Conditional Independence Independent of ancestors given parents P(C,S,R,W) = P(C) P(S|C) P(R|C,S) P(W|C,S,R) = P(C) P(S|C) P(R|C) P(W|S,R) Space required for n binary nodes – O(2 n ) without factorization – O(n2 k ) with factorization, k = maximum fan-in

6
Inference Pr(S=1|W=1) = Pr(S=1,W=1)/Pr(W=1) = 0.2781/0.6471 = 0.430 Pr(R=1|W=1) = Pr(R=1,W=1)/Pr(W=1) = 0.4581/0.6471 = 0.708

7
Explaining Away S and R compete to explain W=1 S and R are conditionally dependent Pr(S=1|R=1,W=1) = 0.1945

8
Inference where

9
Inference Variable elimination Choosing optimal ordering – NP hard Greedy methods work well Computing several marginals Dynamic programming avoids redundant computation Sound familiar ??

10
Bayes Balls for Conditional Independence

11
A Unifying (Re)View Linear Gaussian Model (LGM) FA SPCAPCALDS Mixture of Gaussians VQ HMM Continuous-State LGM Basic Model Discrete-State LGM

12
Basic Model State of a system is a k-vector x (unobserved) Output of a system is a p-vector y (observed) Often k << p Basic model x t+1 = A x t + w y t = C x t + v A is the k x k transition matrix C is a p x k observation matrix w = N(0, Q) v = N(0, R) Noise processes are essential Zero mean w.l.o.g

13
Degeneracy in Basic Model Structure in Q can be moved to A and C W.l.o.g. Q = I R cannot be restricted as y t are observed Components of x can be reordered arbitrarily. Ordering is based on norms of columns of C. x 1 = N(µ 1, Q 1 ) A and C are assumed to have rank k. Q, R, Q 1 are assumed to be full rank.

14
Probability Computation P( x t+1 | x t ) = N(A x t, Q ; x t+1 ) P( y t | x t ) = N( C x t, R; y t ) P({x 1,..,x T,{y 1,..,y T }) = P(x 1 ) П P(x t+1 |x t П P(y t |x t ) Negative log probability

15
Inference Given model parameters {A, C, Q, R, µ 1, Q 1 } Given observations y What can be infered about hidden states x ? Total likelihood Filtering : P (x(t) | {y(1),..., y(t)}) Smoothing: P (x(t) | {y(1),..., y(T)}) Partial smoothing: P (x(t) | {y(1),..., y(t+t')}) Partial prediction: P (x(t) | {y(1),..., y(t-t')}) Intermediate values of recursive methods for computing total likelihood.

16
Learning Unknown parameters {A, C, Q, R, µ 1, Q 1 } Given observations y Log-likelihood F(Q, Ө) – free energy

17
EM algorithm Alternate between maximizing F(Q,Ө) w.r.t. Q and Ө. F = L at the beginning of M-step E-step does not change Ө Therefore, likelihood does not decrease.

18
Continuous-State LGM Static Data ModelingTime-series Modeling No temporal dependence Factor analysis SPCA PCA Time ordering of data crucial LDS (Kalman filter models)

19
Static Data Modelling A = 0 x = w y = C x + v x 1 = N(0,Q) y = N(0, CQC'+R) Degeneracy in model Learning : EM –R restricted Inference

20
Factor Analysis Restrict R to be diagonal. Q = I x – factors C – factor loading matrix R – uniqueness Learning – EM, quasi-Newton optimization Inference

21
SPCA R = єI є – global noise level Columns of C span the principal subspace. Learning – EM algorithm Inference

22
PCA R = lim є->0 єI Learning –Diagonalize sample covariance of data –Leading k eigenvalues and eigenvectors define C –EM determines leading eigenvectors without diagonalization Inference –Noise becomes infinitesimal –Posterior collapses to a single point

23
Linear Dynamical Systems Inference – Kalman filter Smoothing – RTS recursions Learning – EM algorithm – C known – Shumway and Stoffer, 1982 – All unknown – Ghahramani and Hinton, 1995

24
Discrete-State LGM x t+1 = WTA[A x t + w] y t = C x t + v x 1 = WTA[N(µ 1,Q 1 )]

25
Discrete-State LGM Discrete-state LGM Static Data ModelingTime-series Modeling Mixture of Gaussians VQ HMM

26
Static Data Modelling A = 0 x = WTA[w] w = N(µ,Q) Y = C x + v л j = P(x = e j ) Nonzero µ for nonuniform л j y = N(C j, R) C j – jth column of C

27
Mixture of Gaussians Mixing coefficients of cluster л j Mean – columns C j Variance – R Learning: EM (corresponds to ML competitive learning) Inference

28
Vector Quantization Observation noise becomes infinitesimal Inference problem solved by 1NN rule Euclidean distance for diagonal R Mahalanobis distance for unscaled R Posterior collapses to closest cluster Learning with EM = batch version of k- means

29
Time-series modelling

30
HMM Transition matrix T T i,j = P(x t+1 = e j | x t = e i ) For every T, there exist A and Q Filtering : forward recursions Smoothing: forward-backward algorithm Learning: EM (called Baum-Welsh reestimation) MAP state sequences - Viterbi

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google