Download presentation

Presentation is loading. Please wait.

Published byLeah Johnston Modified over 3 years ago

1
Introduction to Graphical Models Brookes Vision Lab Reading Group

2
Graphical Models To build a complex system using simpler parts. System should be consistent Parts are combined using probability Undirected – Markov random fields Directed – Bayesian Networks

3
Overview Representation Inference Linear Gaussian Models Approximate inference Learning

4
Causality : Sprinkler causes wet grass Representation

5
Conditional Independence Independent of ancestors given parents P(C,S,R,W) = P(C) P(S|C) P(R|C,S) P(W|C,S,R) = P(C) P(S|C) P(R|C) P(W|S,R) Space required for n binary nodes – O(2 n ) without factorization – O(n2 k ) with factorization, k = maximum fan-in

6
Inference Pr(S=1|W=1) = Pr(S=1,W=1)/Pr(W=1) = 0.2781/0.6471 = 0.430 Pr(R=1|W=1) = Pr(R=1,W=1)/Pr(W=1) = 0.4581/0.6471 = 0.708

7
Explaining Away S and R compete to explain W=1 S and R are conditionally dependent Pr(S=1|R=1,W=1) = 0.1945

8
Inference where

9
Inference Variable elimination Choosing optimal ordering – NP hard Greedy methods work well Computing several marginals Dynamic programming avoids redundant computation Sound familiar ??

10
Bayes Balls for Conditional Independence

11
A Unifying (Re)View Linear Gaussian Model (LGM) FA SPCAPCALDS Mixture of Gaussians VQ HMM Continuous-State LGM Basic Model Discrete-State LGM

12
Basic Model State of a system is a k-vector x (unobserved) Output of a system is a p-vector y (observed) Often k << p Basic model x t+1 = A x t + w y t = C x t + v A is the k x k transition matrix C is a p x k observation matrix w = N(0, Q) v = N(0, R) Noise processes are essential Zero mean w.l.o.g

13
Degeneracy in Basic Model Structure in Q can be moved to A and C W.l.o.g. Q = I R cannot be restricted as y t are observed Components of x can be reordered arbitrarily. Ordering is based on norms of columns of C. x 1 = N(µ 1, Q 1 ) A and C are assumed to have rank k. Q, R, Q 1 are assumed to be full rank.

14
Probability Computation P( x t+1 | x t ) = N(A x t, Q ; x t+1 ) P( y t | x t ) = N( C x t, R; y t ) P({x 1,..,x T,{y 1,..,y T }) = P(x 1 ) П P(x t+1 |x t П P(y t |x t ) Negative log probability

15
Inference Given model parameters {A, C, Q, R, µ 1, Q 1 } Given observations y What can be infered about hidden states x ? Total likelihood Filtering : P (x(t) | {y(1),..., y(t)}) Smoothing: P (x(t) | {y(1),..., y(T)}) Partial smoothing: P (x(t) | {y(1),..., y(t+t')}) Partial prediction: P (x(t) | {y(1),..., y(t-t')}) Intermediate values of recursive methods for computing total likelihood.

16
Learning Unknown parameters {A, C, Q, R, µ 1, Q 1 } Given observations y Log-likelihood F(Q, Ө) – free energy

17
EM algorithm Alternate between maximizing F(Q,Ө) w.r.t. Q and Ө. F = L at the beginning of M-step E-step does not change Ө Therefore, likelihood does not decrease.

18
Continuous-State LGM Static Data ModelingTime-series Modeling No temporal dependence Factor analysis SPCA PCA Time ordering of data crucial LDS (Kalman filter models)

19
Static Data Modelling A = 0 x = w y = C x + v x 1 = N(0,Q) y = N(0, CQC'+R) Degeneracy in model Learning : EM –R restricted Inference

20
Factor Analysis Restrict R to be diagonal. Q = I x – factors C – factor loading matrix R – uniqueness Learning – EM, quasi-Newton optimization Inference

21
SPCA R = єI є – global noise level Columns of C span the principal subspace. Learning – EM algorithm Inference

22
PCA R = lim є->0 єI Learning –Diagonalize sample covariance of data –Leading k eigenvalues and eigenvectors define C –EM determines leading eigenvectors without diagonalization Inference –Noise becomes infinitesimal –Posterior collapses to a single point

23
Linear Dynamical Systems Inference – Kalman filter Smoothing – RTS recursions Learning – EM algorithm – C known – Shumway and Stoffer, 1982 – All unknown – Ghahramani and Hinton, 1995

24
Discrete-State LGM x t+1 = WTA[A x t + w] y t = C x t + v x 1 = WTA[N(µ 1,Q 1 )]

25
Discrete-State LGM Discrete-state LGM Static Data ModelingTime-series Modeling Mixture of Gaussians VQ HMM

26
Static Data Modelling A = 0 x = WTA[w] w = N(µ,Q) Y = C x + v л j = P(x = e j ) Nonzero µ for nonuniform л j y = N(C j, R) C j – jth column of C

27
Mixture of Gaussians Mixing coefficients of cluster л j Mean – columns C j Variance – R Learning: EM (corresponds to ML competitive learning) Inference

28
Vector Quantization Observation noise becomes infinitesimal Inference problem solved by 1NN rule Euclidean distance for diagonal R Mahalanobis distance for unscaled R Posterior collapses to closest cluster Learning with EM = batch version of k- means

29
Time-series modelling

30
HMM Transition matrix T T i,j = P(x t+1 = e j | x t = e i ) For every T, there exist A and Q Filtering : forward recursions Smoothing: forward-backward algorithm Learning: EM (called Baum-Welsh reestimation) MAP state sequences - Viterbi

Similar presentations

OK

Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.

Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google