Part 2: Unsupervised Learning

Name: Part 2: Unsupervised Learning
Uploaded: 2017-07-13T18:07:59+00:00
Duration: PTM18S2
Channel: Ethan Lynch
Description: Part 2: Unsupervised Learning

Part 2: Unsupervised Learning
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Christopher M. Bishop Microsoft Research Cambridge ECCV 2004, Prague

Overview of Part 2 Mixture models EM Variational Inference
Bayesian model complexity Continuous latent variables Machine Learning Techniques for Computer Vision (ECCV 2004)

The Gaussian Distribution
Multivariate Gaussian Maximum likelihood covariance mean Machine Learning Techniques for Computer Vision (ECCV 2004)

Gaussian Mixtures Linear super-position of Gaussians
Normalization and positivity require Machine Learning Techniques for Computer Vision (ECCV 2004)

Example: Mixture of 3 Gaussians
Machine Learning Techniques for Computer Vision (ECCV 2004)

Maximum Likelihood for the GMM
Log likelihood function Sum over components appears inside the log no closed form ML solution Machine Learning Techniques for Computer Vision (ECCV 2004)

EM Algorithm – Informal Derivation

M step equations Machine Learning Techniques for Computer Vision (ECCV 2004)

E step equation Machine Learning Techniques for Computer Vision (ECCV 2004)

Can interpret the mixing coefficients as prior probabilities Corresponding posterior probabilities (responsibilities) Machine Learning Techniques for Computer Vision (ECCV 2004)

Time between eruptions (minutes)
Old Faithful Data Set Time between eruptions (minutes) Duration of eruption (minutes) Machine Learning Techniques for Computer Vision (ECCV 2004)

Machine Learning Techniques for Computer Vision (ECCV 2004)

Latent Variable View of EM
To sample from a Gaussian mixture: first pick one of the components with probability then draw a sample from that component repeat these two steps for each new data point Machine Learning Techniques for Computer Vision (ECCV 2004)

Goal: given a data set, find Suppose we knew the colours maximum likelihood would involve fitting each component to the corresponding cluster Problem: the colours are latent (hidden) variables Machine Learning Techniques for Computer Vision (ECCV 2004)

Incomplete and Complete Data

Latent Variable Viewpoint

Latent Variable Viewpoint
Binary latent variables describing which component generated each data point Conditional distribution of observed variable Prior distribution of latent variables Marginalizing over the latent variables we obtain Machine Learning Techniques for Computer Vision (ECCV 2004)

Graphical Representation of GMM

Suppose we knew the values for the latent variables maximize the complete-data log likelihood trivial closed-form solution: fit each component to the corresponding set of data points We don’t know the values of the latent variables however, for given parameter values we can compute the expected values of the latent variables Machine Learning Techniques for Computer Vision (ECCV 2004)

Posterior Probabilities (colour coded)

Over-fitting in Gaussian Mixture Models
Infinities in likelihood function when a component ‘collapses’ onto a data point: with Also, maximum likelihood cannot determine the number K of components Machine Learning Techniques for Computer Vision (ECCV 2004)

Cross Validation Can select model complexity using an independent validation data set If data is scarce use cross-validation: partition data into S subsets train on S-1 subsets test on remainder repeat and average Disadvantages computationally expensive can only determine one or two complexity parameters Machine Learning Techniques for Computer Vision (ECCV 2004)

Bayesian Mixture of Gaussians
Parameters and latent variables appear on equal footing Conjugate priors Machine Learning Techniques for Computer Vision (ECCV 2004)

Data Set Size Problem 1: learn the function for from 100 (slightly) noisy examples data set is computationally small but statistically large Problem 2: learn to recognize 1,000 everyday objects from 5,000,000 natural images data set is computationally large but statistically small Bayesian inference computationally more demanding than ML or MAP (but see discussion of Gaussian mixtures later) significant benefit for statistically small data sets Machine Learning Techniques for Computer Vision (ECCV 2004)

Variational Inference
Exact Bayesian inference intractable Markov chain Monte Carlo computationally expensive issues of convergence Variational Inference broadly applicable deterministic approximation let denote all latent variables and parameters approximate true posterior using a simpler distribution minimize Kullback-Leibler divergence Machine Learning Techniques for Computer Vision (ECCV 2004)

General View of Variational Inference
For arbitrary where Maximizing over would give the true posterior this is intractable by definition Machine Learning Techniques for Computer Vision (ECCV 2004)

Variational Lower Bound

Factorized Approximation
Goal: choose a family of q distributions which are: sufficiently flexible to give good approximation sufficiently simple to remain tractable Here we consider factorized distributions No further assumptions are required! Optimal solution for one factor, keeping the remainder fixed coupled solutions so initialize then cyclically update message passing view (Winn and Bishop, 2004) Machine Learning Techniques for Computer Vision (ECCV 2004)

Lower Bound Can also be evaluated Useful for maths/code verification
Also useful for model comparison: Machine Learning Techniques for Computer Vision (ECCV 2004)

Illustration: Univariate Gaussian
Likelihood function Conjugate prior Factorized variational distribution Machine Learning Techniques for Computer Vision (ECCV 2004)

Initial Configuration

After Updating Machine Learning Techniques for Computer Vision (ECCV 2004)

Converged Solution Machine Learning Techniques for Computer Vision (ECCV 2004)

Variational Mixture of Gaussians
Assume factorized posterior distribution No other approximations needed! Machine Learning Techniques for Computer Vision (ECCV 2004)

Variational Equations for GMM

Lower Bound for GMM Machine Learning Techniques for Computer Vision (ECCV 2004)

VIBES Bishop, Spiegelhalter and Winn (2002)

ML Limit If instead we choose we recover the maximum likelihood EM algorithm Machine Learning Techniques for Computer Vision (ECCV 2004)

Bound vs. K for Old Faithful Data

Bayesian Model Complexity

Sparse Bayes for Gaussian Mixture
Corduneanu and Bishop (2001) Start with large value of K treat mixing coefficients as parameters maximize marginal likelihood prunes out excess components Machine Learning Techniques for Computer Vision (ECCV 2004)

Summary: Variational Gaussian Mixtures
Simple modification of maximum likelihood EM code Small computational overhead compared to EM No singularities Automatic model order selection Machine Learning Techniques for Computer Vision (ECCV 2004)

Continuous Latent Variables
Conventional PCA data covariance matrix eigenvector decomposition Minimizes sum-of-squares projection not a probabilistic model how should we choose L ? Machine Learning Techniques for Computer Vision (ECCV 2004)

Probabilistic PCA Tipping and Bishop (1998)
L dimensional continuous latent space D dimensional data space PCA factor analysis Machine Learning Techniques for Computer Vision (ECCV 2004)

Probabilistic PCA Marginal distribution Advantages exact ML solution
computationally efficient EM algorithm captures dominant correlations with few parameters mixtures of PPCA Bayesian PCA building block for more complex models Machine Learning Techniques for Computer Vision (ECCV 2004)

EM for PCA Machine Learning Techniques for Computer Vision (ECCV 2004)

Bayesian PCA Bishop (1998) Gaussian prior over columns of
Automatic relevance determination (ARD) ML PCA Bayesian PCA Machine Learning Techniques for Computer Vision (ECCV 2004)

Non-linear Manifolds Example: images of a rigid object

Bayesian Mixture of BPCA Models

Flexible Sprites Jojic and Frey (2001)
Automatic decomposition of video sequence into background model ordered set of masks (one per object per frame) foreground model (one per object per frame) Machine Learning Techniques for Computer Vision (ECCV 2004)

Transformed Component Analysis
Generative model Now include transformations (translations) Extend to L layers Inference intractable so use variational framework Machine Learning Techniques for Computer Vision (ECCV 2004)

Bayesian Constellation Model
Li, Fergus and Perona (2003) Object recognition from small training sets Variational treatment of fully Bayesian model Machine Learning Techniques for Computer Vision (ECCV 2004)

Bayesian Constellation Model

Summary of Part 2 Discrete and continuous latent variables
EM algorithm Build complex models from simple components represented graphically incorporates prior knowledge Variational inference Bayesian model comparison Machine Learning Techniques for Computer Vision (ECCV 2004)

Part 2: Unsupervised Learning

Similar presentations

Presentation on theme: "Part 2: Unsupervised Learning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Part 2: Unsupervised Learning

Similar presentations

Presentation on theme: "Part 2: Unsupervised Learning"— Presentation transcript:

Similar presentations

About project

Feedback