GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.

Slides:



Advertisements
Similar presentations
Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Advertisements

Lecture 9 Support Vector Machines
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Classification / Regression Support Vector Machines
A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1.
Input Space versus Feature Space in Kernel- Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of.
Support Vector Machines
Machine learning continued Image source:
Manifold Learning Dimensionality Reduction. Outline Introduction Dim. Reduction Manifold Isomap Overall procedure Approximating geodesic dist. Dijkstra’s.
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
1 A Survey on Distance Metric Learning (Part 1) Gerry Tesauro IBM T.J.Watson Research Center.
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Isomap Algorithm.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Principal Component Analysis
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Support Vector Machines
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Distance Metric Learning: A Comprehensive Survey
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Lecture 10: Support Vector Machines
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
Nonlinear Dimensionality Reduction by Locally Linear Embedding Sam T. Roweis and Lawrence K. Saul Reference: "Nonlinear dimensionality reduction by locally.
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
This week: overview on pattern recognition (related to machine learning)
Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Learning a Kernel Matrix for Nonlinear Dimensionality Reduction By K. Weinberger, F. Sha, and L. Saul Presented by Michael Barnathan.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
A Two-level Pose Estimation Framework Using Majority Voting of Gabor Wavelets and Bunch Graph Analysis J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard,
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
Manifold learning: MDS and Isomap
Nonlinear Dimensionality Reduction Approach (ISOMAP)
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
Non-Linear Dimensionality Reduction
Carlos H. R. Lima - Depto. of Civil and Environmental Engineering, University of Brasilia. Brazil. Upmanu Lall - Water Center, Columbia.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Nonlinear Dimension Reduction: Semi-Definite Embedding vs. Local Linear Embedding Li Zhang and Lin Liao.
Manifold Learning JAMES MCQUEEN – UW DEPARTMENT OF STATISTICS.
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
Support Vector Machines Part 2. Recap of SVM algorithm Given training set S = {(x 1, y 1 ), (x 2, y 2 ),..., (x m, y m ) | (x i, y i )   n  {+1, -1}
Columbia University Advanced Machine Learning & Perception – Fall 2006 Term Project Nonlinear Dimensionality Reduction and K-Nearest Neighbor Classification.
Spectral Methods for Dimensionality
CS 2750: Machine Learning Dimensionality Reduction
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Principal Component Analysis
CS4670: Intro to Computer Vision
Nonlinear Dimension Reduction:
NonLinear Dimensionality Reduction or Unfolding Manifolds
Presentation transcript:

GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer and Information Science

GRASP The Big Picture Given high dimensional data sampled from a low dimensional manifold, how to compute a faithful embedding?

GRASP Outline Part I: kernel PCA Part II: Manifold Learning Part III: Algorithm Part IV: Experimental Results

GRASP Part I. kernel PCA

GRASP Nearby points remain nearby, distant points remain distant. Estimate d. Input: Output : Problem: Embedding :

GRASP Subspaces D=3 d=2 D=2 d=1

GRASP Principal Component Analysis Project data into subspace of maximum variance: Can be solved as eigenvalue problem:

GRASP Using the kernel trick Do PCA in a higher dimensional feature space Can be defined implicitly through kernel matrix

GRASP Linear Gaussian Polynomial Common Kernels Do very well for classification. How about manifold learning?

GRASP Linear Kernel

GRASP Gaussian Kernels

GRASP Gaussian Kernels Feature vectors span as many dimensions as number of spheres with radius needed to enclose input vectors.

GRASP Polynomial Kernels

GRASP Part II. Manifold Learning via Semidefinite Programming

GRASP Local Isometry A smooth, invertible mapping that preserves distances and looks locally like a rotation plus translation.

GRASP Local Isometry A smooth, invertible mapping that preserves distances and looks locally like a rotation plus translation.

GRASP Neighborhood graph Connect each point to its k nearest neighbors. Discretized manifolds

GRASP Preserve local distances Approximation of local isometry: Constraint Neighborhood indicator

GRASP Goal: Problem: Heuristic: Objective Function? Find Minimum Rank Kernel Matrix Computationally Hard Maximize Pairwise Distances

GRASP Objective Function? (Cont’d) What happens if we maximize the pairwise distances?

GRASP Semidefinite Programming Problem: Maximize : subject to: Preserve local neighborhoods Unfold manifold Center output Semipositive definite

GRASP Part III Semidefinite Embedding in three easy steps (Also known as “Maximum Variance Unfolding” [Sun, Boyd, Xiao, Diaconis])

GRASP 1. Step: K-Nearest Neighbors Compute nearest neighbors and the Gram matrix for each neighborhood

GRASP 2. Step: Semidefinite programming Compute centered, locally isometric dot-product matrix with maximal trace

GRASP Estimate d from eigenvalue spectrum. Top eigenvectors give embedding 3. Step: kernel PCA

GRASP Part IV. Experimental Results

GRASP Trefoil Knot N=539 k=4 D=3 d=2

GRASP Trefoil Knot N=539 k=4 D=3 d=2

GRASP Teapot (full rotation) N=400 k=4 D=23028 d=2

GRASP N=200 k=4 D=23028 d=2 Teapot (half rotation)

GRASP Faces N=1000 k=4 D=540 d=2

GRASP Twos vs. Threes N=953 k=3 D=256 d=2

GRASP Part V. Supervised Experimental Results

GRASP Large Margin Classification SDE Kernel used in SVM Task: Binary Digit Classification Input: USPS Data Set Training / Testing set: 810/90 Neighborhood Size: k=4

GRASP SVM Kernel SDE is not well-suited for SVMs

GRASP SVM Kernel (cont’d) Non-Linear decision boundaryLinear decision boundary Unfolding does not necessarily help classification Reducing the dimensionality is counter-intuitive. Needs linear decision boundary on manifold.

GRASP Part VI. Conclusion

GRASP Previous Work Isomap and LLE can both be seen from a kernel view [Jihun Ham et al., ICML’04]

GRASP Previous Work (Isomap) Isomap and LLE can both be seen from a kernel view [Jihun Ham et al., ICML’04] Matrix not necessarily semi-positive definite SDEIsomap

GRASP Previous Work (Isomap) Isomap and LLE can both be seen from a kernel view [Jihun Ham et al., ICML’04] Matrix not necessarily semi-positive definite SDEIsomap

GRASP Previous Work (LLE) Isomap and LLE can both be seen from a kernel view [Jihun Ham et al., ICML’04] Eigenvalues do not reveal true dimensionality SDELLE

GRASP Conclusion Semidefinite Embedding (SDE) +extends kernel PCA to do manifold learning +uses semidefinite programming +has a guaranteed unique solution -not well suited for support vector machines -exact solution (so far) limited to N=2000

GRASP

Semidefinite Programming Problem: Maximize : subject to: Preserve local neighborhoods Unfold Manifold Center Output semi-positive definite

GRASP Semidefinite Programming Problem: Maximize : subject to: Preserve local neighborhoods Unfold Manifold Center Output semi-positive definite Introduce Slack

GRASP Swiss Roll N=800 k=4 D=3 d=2

GRASP Applications Visualization of Data Natural Language Processing

GRASP Trefoil Knot N=539 k=4 D=3 d=2 RBF Polynomial SDE

GRASP Motivation Similar vectorized pictures lie on a non-linear manifolds Linear Methods don’t work here