Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Slides:



Advertisements
Similar presentations
Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Advertisements

Self-Organizing Maps Projection of p dimensional observations to a two (or one) dimensional grid space Constraint version of K-means clustering –Prototypes.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Input Space versus Feature Space in Kernel- Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of.
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Isomap Algorithm.
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 14: Embeddings 1Lecture 14: Embeddings.
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #20.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Principal Component Analysis
Dimensional reduction, PCA
1 Numerical geometry of non-rigid shapes Spectral Methods Tutorial. Spectral Methods Tutorial 6 © Maks Ovsjanikov tosca.cs.technion.ac.il/book Numerical.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
SVD and PCA COS 323. Dimensionality Reduction Map points in high-dimensional space to lower number of dimensionsMap points in high-dimensional space to.
Independent Component Analysis (ICA) and Factor Analysis (FA)
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Unsupervised Learning
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
Nonlinear Dimensionality Reduction by Locally Linear Embedding Sam T. Roweis and Lawrence K. Saul Reference: "Nonlinear dimensionality reduction by locally.
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Summarized by Soo-Jin Kim
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Adaptive nonlinear manifolds and their applications to pattern.
Dimensionality reduction: Some Assumptions High-dimensional data often lies on or near a much lower dimensional, curved manifold. A good way to represent.
1 Recognition by Appearance Appearance-based recognition is a competing paradigm to features and alignment. No features are extracted! Images are represented.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
CSE 185 Introduction to Computer Vision Face Recognition.
Ch 12. Continuous Latent Variables Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by S.-J. Kim and J.-K. Rhee Revised by D.-Y.
Dimensionality Reduction
Manifold learning: MDS and Isomap
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
Nonlinear Dimensionality Reduction Approach (ISOMAP)
Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
Matrix Notation for Representing Vectors
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Principle Component Analysis and its use in MA clustering Lecture 12.
Data Mining Course 2007 Eric Postma Clustering. Overview Three approaches to clustering 1.Minimization of reconstruction error PCA, nlPCA, k-means clustering.
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
CSC321: Lecture 25: Non-linear dimensionality reduction Geoffrey Hinton.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Spring 2008 Jinze.
Nonlinear Dimension Reduction: Semi-Definite Embedding vs. Local Linear Embedding Li Zhang and Lin Liao.
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Dimensionality Reduction Part 1: Linear Methods Comp Spring 2007.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
High Dimensional Probabilistic Modelling through Manifolds
Spectral Methods for Dimensionality
Neil Lawrence Machine Learning Group Department of Computer Science
Ch 12. Continuous Latent Variables ~ 12
9.3 Filtered delay embeddings
Machine Learning Basics
Spectral Methods Tutorial 6 1 © Maks Ovsjanikov
Object Modeling with Layers
Probabilistic Models with Latent Variables
Principal Component Analysis
CS4670: Intro to Computer Vision
Principal Component Analysis
NonLinear Dimensionality Reduction or Unfolding Manifolds
Presentation transcript:

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara

Tony Jebara, Columbia University Topic 12 Manifold Learning (Unsupervised) Beyond Principal Components Analysis (PCA) Multidimensional Scaling (MDS) Generative Topographic Map (GTM) Locally Linear Embedding (LLE) Convex Invariance Learning (CoIL) Kernel PCA (KPCA)

Tony Jebara, Columbia University Manifolds Data is often embedded in a lower dimensional space Consider image of face being translated from left-to-right How to capture the true coordinates of the data on the manifold or embedding space and represent it compactly? Open problem: many possible approaches… PCA: linear manifold MDS: get inter-point distances, find 2D data with same LLE: mimic neighborhoods using low dimensional vectors GTM: fit a grid of Gaussians to data via nonlinear warp Linear after Nonlinear normalization/invariance of data Linear in Hilbert space (Kernels)

Tony Jebara, Columbia University If we have eigenvectors, mean and coefficients: Getting eigenvectors (I.e. approximating the covariance): Eigenvectors are orthonormal: In coordinates of v, Gaussian is diagonal, cov =  All eigenvalues are non-negative Higher eigenvalues are higher variance, use those first To compute the coefficients: Principal Components Analysis

Tony Jebara, Columbia University Multidimensional Scaling (MDS) Idea: capture only distances between points X in original space Construct another set of low dim or 2D Y points having same distances A Dissimilarity d(x,y) is a function of two objects x and y such that A Metric also has to satisfy triangle inequality: Standard example: Euclidean l2 metric Assume for N objects, we compute a dissimilarity  matrix which tells us how far they are

Tony Jebara, Columbia University Multidimensional Scaling Given dissimilarity  between original X points under original d() metric, find Y points with dissimilarity D under another d’() metric such that D is similar to  Want to find Y’s that minimize some difference from D to  Eg. Least Squares Stress = Eg. Invariant Stress = Eg. Sammon Mapping = Eg. Strain = Some are global Some are local Gradient descent

Tony Jebara, Columbia University Have distances from cities to cities, these are on the surface of a sphere (Earth) in 3D space Reconstructed 2D points on plane capture essential properties (poles?) MDS Example 3D to 2D

Tony Jebara, Columbia University More elaborate example Have correlation matrix between crimes. These are arbitrary dimensionality. Hack: convert correlation to dissimilarity and show reconstructed Y MDS Example Multi-D to 2D

Tony Jebara, Columbia University Instead of distance, look at neighborhood of each point. Preserve reconstruction of point with neighbors in low dim Find K nearest neighbors for each point Describe neighborhood as best weights on neighbors to reconstruct the point Find best vectors that still have same weights Locally Linear Embedding Why?

Tony Jebara, Columbia University Locally Linear Embedding Finding W’s (convex combination of weights on neighbors): 1) Take Deriv & Set to 0 2) Solve Linear system 3) Find 4) Find w

Tony Jebara, Columbia University Locally Linear Embedding Finding Y’s (new low-D points that agree with the W’s) Solve for Y as the bottom d+1 eigenvectors of M Plot the Y values

Tony Jebara, Columbia University Original X data are raw images Dots are reconstructed two-dimensional Y points LLE Examples

Tony Jebara, Columbia University Top=PCA Bottom=LLE LLEs

Tony Jebara, Columbia University A principled altenative to the Kohonen map Forms a generative model of the manifold. Can sample it, etc. Find a nonlinear mapping y() from a 2D grid of Gaussians. Pick params W of mapping such that mapped Gaussians in data space maximize the likelihood of the observed data. Have two spaces, the data space t (old notation were X’s) and the hidden latent space x (old notation were Y’s). The mapping goes from latent space to observed space Generative Topographic Map

Tony Jebara, Columbia University We choose our priors and conditionals for all variables of interest Assume Gaussian noise on the y() mapping Assume our prior latent variables are a grid model equally spaced in latent space Can now write out the full likelihood GTM as a Grid of Gaussians

Tony Jebara, Columbia University Integrating over delta functions makes a summation Note the log-sum, need to apply EM to maximize Also, use the following parametric (linear in the basis) form of the mapping Examples of manifolds for randomly chosen W mappings Typically, we are given the data and want to find the maximum likelihood mapping W for it… GTM Distribution Model

Tony Jebara, Columbia University Recover non-linear manifold by warping grid with W params Synthetic Example: Left = Initialized Right = Converged Real Example: Oil Data 3-Classes Left = GTM Right = PCA GTM Examples