Latent Variable Models Christopher M. Bishop. 1. Density Modeling A standard approach: parametric models  a number of adaptive parameters  Gaussian.

Slides:



Advertisements
Similar presentations
Part 2: Unsupervised Learning
Advertisements

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
Multivariate distributions. The Normal distribution.
Visual Recognition Tutorial
DEPARTMENT OF HEALTH SCIENCE AND TECHNOLOGY STOCHASTIC SIGNALS AND PROCESSES Lecture 1 WELCOME.
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Part 2b Parameter Estimation CSE717, FALL 2008 CUBS, Univ at Buffalo.
Dimensional reduction, PCA
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Visual Recognition Tutorial
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
A Unifying Review of Linear Gaussian Models
(1) A probability model respecting those covariance observations: Gaussian Maximum entropy probability distribution for a given covariance observation.
Techniques for studying correlation and covariance structure
5-1 Two Discrete Random Variables Example Two Discrete Random Variables Figure 5-1 Joint probability distribution of X and Y in Example 5-1.
5-1 Two Discrete Random Variables Example Two Discrete Random Variables Figure 5-1 Joint probability distribution of X and Y in Example 5-1.
Modern Navigation Thomas Herring
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Maximum Likelihood Estimation
Chapter Two Probability Distributions: Discrete Variables
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
The Multiple Correlation Coefficient. has (p +1)-variate Normal distribution with mean vector and Covariance matrix We are interested if the variable.
High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.
Lecture 2: Statistical learning primer for biologists
Estimation of covariance matrix under informative sampling Julia Aru University of Tartu and Statistics Estonia Tartu, June 25-29, 2007.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Gaussian Process Networks Nir Friedman and Iftach Nachman UAI-2K.
- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.
CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Bayesian Semi-Parametric Multiple Shrinkage
Chapter 3: Maximum-Likelihood Parameter Estimation
Ch 12. Continuous Latent Variables ~ 12
Probability Theory and Parameter Estimation I
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Variational Bayes Model Selection for Mixture Distribution
CS 2750: Machine Learning Density Estimation
Parameter Estimation 主講人:虞台文.
Special Topics In Scientific Computing
Latent Variables, Mixture Models and EM
Outline Parameter estimation – continued Non-parametric methods.
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Probabilistic Models with Latent Variables
Techniques for studying correlation and covariance structure
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
CS 2750: Machine Learning Expectation Maximization
LECTURE 07: BAYESIAN ESTIMATION
Biointelligence Laboratory, Seoul National University
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

Latent Variable Models Christopher M. Bishop

1. Density Modeling A standard approach: parametric models  a number of adaptive parameters  Gaussian distribution is widely used.  Loglikelihood method  Limitation  too flexible: parameter is so excessive  not too flexible: only uni-modal Considering mixture model, latent variable model

1.1. Latent Variables The number of parameters in normal distribution.  : d(d+1)/2 +  : d  d 2.  Assuming diagonal covariance matrix reduces  : d, but this means that t are statistically independent. Latent variables  Degree of freedom can be controlled, and correlation can be captured. Goal  to express p(t) of the variable t 1,…,t d in terms of a smaller number of latent variables x=(x 1,…,x q ) where q < d.

Cont ’ d  Joint distribution of p(t,x)  Bayesian network express the factorization

Cont ’ d  Express p(t|x) in terms of mapping from latent variables to data variables.  The definition of latent variables model is completed by specifying distribution p(u), mapping y(x;w), marginal distributino p(x).  The desired model for distribution p(t), but it is intractable in almost case.  Factor analysis: One of the simplest latent variable models

Cont ’ d  W,  : adaptive parameters  p(x): chosen to be N(0,I)  u: chosen to be zero mean Gaussian with a diagonal covariance matrix .  Then P(t) is Gaussian, with mean  and covariance matrix  +WW T.  Degree of freedom: (d+1)(q+1)-q(q+1)/2  Can capture the dominant correlations between the data variables

1.2. Mixture Distributions Uni-modal  mixture of M simpler parametric distributions  p(t|i): usually normal distribution with its own  i,  i.   i : mixing coefficients  mixing coefficients: prior probabilities for the values of the label i.  Considering indicator variable z ni.  Posterior probabilities: R ni is expectation of z ni.

Cont ’ d EM-algorithm Mixture of latent-variable models Bayesian network representation of a mixture of latent variable models. Given the values of i and x, the variables t 1,…,t d are conditionally independent.

2. Probabilistic Principal Component Analysis Summary  q principal axes v j, j  {1,…,q}  v j are q dominant eigenvectors of sample covariance matrix.  q principal components:  reconstruction vector: Disadvantage  absence of a probability density model and associated likelihood measure

2.1. Relationship to Latent Variables