A Unifying Review of Linear Gaussian Models

Slides:

Advertisements

Similar presentations

State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.

Advertisements

Computer vision: models, learning and inference Chapter 18 Models for style and identity.

Expectation Maximization

Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Modeling Uncertainty over time Time series of snapshot of the world “state” we are interested represented as a set of random variables (RVs) – Observable.

Supervised Learning Recap

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

Laboratory for Social & Neural Systems Research (SNS) PATTERN RECOGNITION AND MACHINE LEARNING Institute of Empirical Research in Economics (IEW)

Hidden Markov Models M. Vijay Venkatesh. Outline Introduction Graphical Model Parameterization Inference Summary.

Visual Recognition Tutorial

EE-148 Expectation Maximization Markus Weber 5/11/99.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Hilbert Space Embeddings of Hidden Markov Models Le Song, Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1.

Pattern Recognition and Machine Learning

Ch 13. Sequential Data (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Kim Jin-young Biointelligence Laboratory, Seoul.

CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California

Dimensional reduction, PCA

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Nonlinear and Non-Gaussian Estimation with A Focus on Particle Filters Prasanth Jeevan Mary Knox May 12, 2006.

Gaussian Mixture Example: Start After First Iteration.

Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

(1) A probability model respecting those covariance observations: Gaussian Maximum entropy probability distribution for a given covariance observation.

Gaussian Mixture Models and Expectation Maximization.

Biointelligence Laboratory, Seoul National University

Latent Variable Models Christopher M. Bishop. 1. Density Modeling A standard approach: parametric models  a number of adaptive parameters  Gaussian.

Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3

1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.

Computer vision: models, learning and inference Chapter 19 Temporal models.

EM and expected complete log-likelihood Mixture of Experts

Probabilistic Robotics Bayes Filter Implementations.

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.

1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.

The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

Lecture 17 Gaussian Mixture Models and Expectation Maximization

Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

CS Statistical Machine learning Lecture 24

EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.

An Introduction to Kalman Filtering by Arthur Pece

CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.

Lecture 2: Statistical learning primer for biologists

Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.

Multi-target Detection in Sensor Networks Xiaoling Wang ECE691, Fall 2003.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:

Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.

OBJECT TRACKING USING PARTICLE FILTERS. Table of Contents Tracking Tracking Tracking as a probabilistic inference problem Tracking as a probabilistic.

Kalman Filtering And Smoothing

Tracking with dynamics

The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,

Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.

CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.

Chapter 3: Maximum-Likelihood Parameter Estimation

Ch 12. Continuous Latent Variables ~ 12

Probability Theory and Parameter Estimation I

Latent Variables, Mixture Models and EM

Hidden Markov Models Part 2: Algorithms

Probabilistic Models with Latent Variables

Lecture 14 PCA, pPCA, ICA.

A Unifying Review of Linear Gaussian Models

Filtering and State Estimation: Basic Concepts

SMEM Algorithm for Mixture Models

Independent Factor Analysis

Multivariate Methods Berlin Chen

Kalman Filters Gaussian MNs

Presentation transcript:

A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth Ph.D.

Overview Introduce the Basic Model Discrete Time Linear Dynamical System (Kalman Filter) Some nice properties of Gaussian distributions Graphical Model: Static Model (Factor Analysis, PCA, SPCA) Learning & Inference: Static Model Graphical Model: Gaussian Mixture & Vector Quantization Learning & Inference: GMMs & Quantization Graphical Model: Discrete-State Dynamic Model (HMMs) Independent Component Analysis Conclusion

The Basic Model Basic Model: Discrete Time Linear Dynamical System (Kalman Filter) Additive Gaussian Noise A = k x k state transition matrix C = p x k observation / generative matrix Generative Model Variations of this model produce: Factor Analysis Principal Component Analysis Mixtures of Gaussians Vector Quantization Independent Component Analysis Hidden Markov Models

Nice Properties of Gaussians Conditional Independence Markov Property Inference in these models Learning via Expectation Maximization (EM)

Graphical Model for Static Models Generative Model Additive Gaussian Noise Intution: White noise to generate a spherical ball (Q=I) of density in k-dimensional space (latent space). Stretched and then rotated into p-dimensional observation space by matrix C, where it would look like a k-dimensional pancake. This is then convolved with the covariance density from v. described by R to get the final model for y. Factor Analysis: Q = I & R is diagonal SPCA: Q = I & R = αI PCA: Q = I & R = lime0eI

Example of the generative process for PCA Bishop (2006) Intution: White noise to generate a spherical ball (Q=I) of density in k-dimensional space (latent space). Stretched and then rotated into p-dimensional observation space by matrix C, where it would look like a k-dimensional pancake. This is then convolved with the covariance density from v. described by R to get the final model for y. 2-dimensional observation space Marginal distribution for p(x) 1-dimensional latent space Z = latent variable X = observed variable

Learning & Inference: Static Models Analytically integrating over the joint, we obtain the marginal distribution of y. We can calculate our poterior using Bayes rule Our posterior now becomes another Gaussian Where beta is equal to: Note: Filtering and Smoothing reduce to the same problem in the static model since the time dependence is gone. We want to find P(x.|y.) over a single hidden state given the single observation. Inference can be performed simply by linear matrix projection and the result is also Gaussian.

Graphical Model: Gaussian Mixture Models & Vector Quantization Note: Each state x. is generated independently according to a fixed discrete probability histogram controlled by the mean and covariance of w. Generative Model Additive Gaussian Noise (Winner Takes All - WTA)[x] = new vector with unity in the position of the largest coordinate of the input and zeros in all other positions. [0 0 1 ] This model becomes a Vector Quantization model when:

Learning & Inference: GMMs & Quantization Computing the Likelihood for the data is straightforward Pi is the probability assigned by the Gaussian N(mu,Q) to the region of k-space in which the jth coordinate is larger than all the others. Calculating the posterior responsibility for each cluster is analagous to the E-Step in this model.

Gaussian Mixture Models Joint Distribution p(y,x) Marginal Distribution p(y) Pi is the probability assigned by the Gaussian N(mu,Q) to the region of k-space in which the jth coordinate is larger than all the others.

Graphical Model: Discrete-State Dynamic Models Additive Gaussian Noise Generative Model Intuition: As before, any point in the state-space is surrounded by a ball (or ovoid) of density defined by Q, which is stretched by C into a pancake in observation space to be convolved with the observation noise covariance R. However, unlike the static case, where the ball was centered in the origin of state-space, the center of the ball shifts from time step to time step. This flow is routed by the eigenvalues and eigenvectors of the matrix A. Once we move to a new point, we center our ball on that point, pick a new state, and then flow to that new point and apply noise.

Independent Component Analysis ICA can be seen as a linear generative model with non-gaussian priors for the hidden variables or as a nonlinear generative model with gaussian priors for the hidden variables. g(.) is a general nonlinearity that is invertible and differentiable Generative Model The posterior density p(x.|y.) is a delta function at x. = C-1y. The ICA algorithm can be defined by learning the unmixing or recognition weights W rather than the generative mixing weights C. Note that any generative nonlinearity g(.) results in a non-Gaussian prior p(x), which in turn results in a nonlinear f(x) in the maximum likelihood rule. The gradient learning rule to increase the likelihood:

Conclusion Many more potential models! The posterior density p(x.|y.) is a delta function at x. = C-1y. The ICA algorithm can be defined by learning the unmixing or recognition weights W rather than the generative mixing weights C. Note that any generative nonlinearity g(.) results in a non-Gaussian prior p(x), which in turn results in a nonlinear f(x) in the maximum likelihood rule. Many more potential models!