Neil Lawrence Machine Learning Group Department of Computer Science

Non-Linear Probabilistic PCA with Gaussian Process Latent Variable Models
Neil Lawrence Machine Learning Group Department of Computer Science University of Sheffield, U.K.

Overview Principal Component Analysis
Latent Variable Model Probabilistic derivations Gaussian Process Latent Variable Models Optimisation Sparse algorithm Results Visualisation Inverse Kinematics

Influences Density Networks – MacKay 1995
Generative Topographic Mapping – Bishop, Svensen and Williams 1997 Probabilistic PCA – Tipping and Bishop 1998

Notation q – dimension of latent space. d – dimension of data space.
N – number of data points. centred data, latent variables, mapping matrix, W 2 <d£q. a(i) is vector from i th row of A ai is a vector from i th column of A

Reading Notation X and Y are design matrices.
Covariance given by N-1YTY. Inner product matrix given by YYT.

PCA – Probabilistic Interpretation

PCA – Probabilistic Interpretation II

Equivalence of PPCA Formulations
Solution for PPCA I: Solution for PPCA II: Equivalence is from

Gaussian Processes A Gaussian Process (GP) likelihood is of the form
where K is the covariance function or kernel. If we select the linear kernel We see PPCA is a product of GPs.

Probabilistic PCA is a GPLVM
Log-likelihood:

Non-linear Kernel Instead of linear kernel function.
Use RBF kernel function. Leads to non-linear GPs.

Ky non-linear – kernel PCA
Aside: Kernel PCA Max likelihood ´ min Kullback Leibler. PCA – minimise KL divergence between Gaussians Ky non-linear – kernel PCA Kx non-linear – GPLVM Both non-linear – ?

Correlation Matching Minimisation of KL divergence.
Matches dominant correlations covariance matrices. KPCA (with RBF kernel) Kx global Ky local. GPLVM (with RBF kernel) Kx local Ky global.

Interpretation Kernels are similarity measures.
Express correlations between points. GPLVM and KPCA try to match them. cf. MDS and principal co-ordinate analysis.

Pros & Cons of GPLVM Pros Cons Probabilistic
Missing data straightforward. Can sample from model given X. Different noise models can be handled. Kernel parameters can be optimised. Cons Speed of optimisation. Optimisation is non-convex.

GPLVM Optimisation Gradient based optimisation wrt X, , ,  (SCG).
Penalised ML with penalty of -tr(XTX). Example data-set Oil flow data Three phases of flow (stratified annular homogenous) Twelve measurement probes 1000 data-points We sub-sampled to 100 data-points.

SCG GPLVM Oil Results 2-D Manifold in 12-D space (shading is variance).

Efficient GPLVM Optimisation
Optimising q£N matrix X is slow. There are correlations between data-points.

‘Sparsification’ If XI is a sub-set of X.
For well chosen active set, I, |I|<<N For n I optimise q-dimensional x(n) independently.

Algorithm We selected the active-set according the the IVM scheme
Select active set. Optimise ,  and . For all nI Optimise xn. For small data-sets optimise XI. Repeat.

Some Results Oil data Digits data Face data Twos data
digits 0 to 4 from USPS data. 600 of each digit randomly selected. 16x16 greyscale images. Face data A video of Brendan Frey’s face. 1965 frames time info removed. 20x28 greyscale images. Twos data Cedar CD ROM digits. 700 examples of 8x8 twos. Binary images.

Oil Data Full training set this time. Used RBF kernel for GPLVM
GTM used 15x15 grid and 16 nodes. Trying to mimic experiments in original paper.

Oil Data GTM GPLVM (RBF)

Different Kernels RBF Kernel MLP Kernel Log likelihood: 1.12 £ 104

Digits RBF Kernel MLP Kernel Log likelihood: -4.49 £ 105
0 – red, 1 – blue, 2 – green, 3 – mauve, 4 – yellow.

Digits Data Image from

Demos Fantasy Digits. Fantasy Brendans.

Twos Data So far – Gaussian Noise. Can use Binomial likelihood.
Probit noise model. Use ADF approxmation. Can easily extend to EP. Practical Consequences About d times slower, need d times more storage. Twos data modelled as Gaussian. Binomial.

Twos Results

Inverse Kinematics Style-Based Inverse Kinematics Keith Grochow, Steve L. Martin, Aaron Hertzmann, Zoran Popović. ACM Trans. on Graphics (Proc. SIGGRAPH 2004). To appear. Learn a GPLVM on motion capture data. Use GPLVM as ‘soft style constraint’ in combination with hard kinematics constraints.

Video styleik.mov

Why GPLVM in IK? My thoughts: Inverse Kinematics
GPLVM is probabilistic (soft constraints). GPLVM can capture non-linearities in the data. Inverse Kinematics Can be viewed as missing value problem. GPLVM handles missing values well. Grochow et al. also mix styles by mixing GPLVMs.

Rant to ML Community Source code for GPLVM available Jun ’03.
Grochow et al. Downloaded after NIPS acceptance. Submitted Jan ’04 to SIGGRAPH. On-line source code is good!

Conclusions GPLVM is a Probabilistic Non-Linear PCA
Can sample from it. Evaluate likelihoods. Missing data no problem. Optimisation of X is the difficult part. We presented a sparse optimisation algorithm. Model has been ‘proven’ in real application. Put your source code on-line!

Neil Lawrence Machine Learning Group Department of Computer Science

Similar presentations

Presentation on theme: "Neil Lawrence Machine Learning Group Department of Computer Science"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Neil Lawrence Machine Learning Group Department of Computer Science

Similar presentations

Presentation on theme: "Neil Lawrence Machine Learning Group Department of Computer Science"— Presentation transcript:

Similar presentations

About project

Feedback