Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Similar presentations


Presentation on theme: "Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara."— Presentation transcript:

1 Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara

2 Tony Jebara, Columbia University Topic 12 Manifold Learning (Unsupervised) Beyond Principal Components Analysis (PCA) Multidimensional Scaling (MDS) Generative Topographic Map (GTM) Locally Linear Embedding (LLE) Convex Invariance Learning (CoIL) Kernel PCA (KPCA)

3 Tony Jebara, Columbia University Manifolds Data is often embedded in a lower dimensional space Consider image of face being translated from left-to-right How to capture the true coordinates of the data on the manifold or embedding space and represent it compactly? Open problem: many possible approaches… PCA: linear manifold MDS: get inter-point distances, find 2D data with same LLE: mimic neighborhoods using low dimensional vectors GTM: fit a grid of Gaussians to data via nonlinear warp Linear after Nonlinear normalization/invariance of data Linear in Hilbert space (Kernels)

4 Tony Jebara, Columbia University If we have eigenvectors, mean and coefficients: Getting eigenvectors (I.e. approximating the covariance): Eigenvectors are orthonormal: In coordinates of v, Gaussian is diagonal, cov =  All eigenvalues are non-negative Higher eigenvalues are higher variance, use those first To compute the coefficients: Principal Components Analysis

5 Tony Jebara, Columbia University Multidimensional Scaling (MDS) Idea: capture only distances between points X in original space Construct another set of low dim or 2D Y points having same distances A Dissimilarity d(x,y) is a function of two objects x and y such that A Metric also has to satisfy triangle inequality: Standard example: Euclidean l2 metric Assume for N objects, we compute a dissimilarity  matrix which tells us how far they are

6 Tony Jebara, Columbia University Multidimensional Scaling Given dissimilarity  between original X points under original d() metric, find Y points with dissimilarity D under another d’() metric such that D is similar to  Want to find Y’s that minimize some difference from D to  Eg. Least Squares Stress = Eg. Invariant Stress = Eg. Sammon Mapping = Eg. Strain = Some are global Some are local Gradient descent

7 Tony Jebara, Columbia University Have distances from cities to cities, these are on the surface of a sphere (Earth) in 3D space Reconstructed 2D points on plane capture essential properties (poles?) MDS Example 3D to 2D

8 Tony Jebara, Columbia University More elaborate example Have correlation matrix between crimes. These are arbitrary dimensionality. Hack: convert correlation to dissimilarity and show reconstructed Y MDS Example Multi-D to 2D

9 Tony Jebara, Columbia University Instead of distance, look at neighborhood of each point. Preserve reconstruction of point with neighbors in low dim Find K nearest neighbors for each point Describe neighborhood as best weights on neighbors to reconstruct the point Find best vectors that still have same weights Locally Linear Embedding Why?

10 Tony Jebara, Columbia University Locally Linear Embedding Finding W’s (convex combination of weights on neighbors): 1) Take Deriv & Set to 0 2) Solve Linear system 3) Find 4) Find w

11 Tony Jebara, Columbia University Locally Linear Embedding Finding Y’s (new low-D points that agree with the W’s) Solve for Y as the bottom d+1 eigenvectors of M Plot the Y values

12 Tony Jebara, Columbia University Original X data are raw images Dots are reconstructed two-dimensional Y points LLE Examples

13 Tony Jebara, Columbia University Top=PCA Bottom=LLE LLEs

14 Tony Jebara, Columbia University A principled altenative to the Kohonen map Forms a generative model of the manifold. Can sample it, etc. Find a nonlinear mapping y() from a 2D grid of Gaussians. Pick params W of mapping such that mapped Gaussians in data space maximize the likelihood of the observed data. Have two spaces, the data space t (old notation were X’s) and the hidden latent space x (old notation were Y’s). The mapping goes from latent space to observed space Generative Topographic Map

15 Tony Jebara, Columbia University We choose our priors and conditionals for all variables of interest Assume Gaussian noise on the y() mapping Assume our prior latent variables are a grid model equally spaced in latent space Can now write out the full likelihood GTM as a Grid of Gaussians

16 Tony Jebara, Columbia University Integrating over delta functions makes a summation Note the log-sum, need to apply EM to maximize Also, use the following parametric (linear in the basis) form of the mapping Examples of manifolds for randomly chosen W mappings Typically, we are given the data and want to find the maximum likelihood mapping W for it… GTM Distribution Model

17 Tony Jebara, Columbia University Recover non-linear manifold by warping grid with W params Synthetic Example: Left = Initialized Right = Converged Real Example: Oil Data 3-Classes Left = GTM Right = PCA GTM Examples


Download ppt "Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara."

Similar presentations


Ads by Google