GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer and Information Science
GRASP The Big Picture Given high dimensional data sampled from a low dimensional manifold, how to compute a faithful embedding?
GRASP Outline Part I: kernel PCA Part II: Manifold Learning Part III: Algorithm Part IV: Experimental Results
GRASP Part I. kernel PCA
GRASP Nearby points remain nearby, distant points remain distant. Estimate d. Input: Output : Problem: Embedding :
GRASP Subspaces D=3 d=2 D=2 d=1
GRASP Principal Component Analysis Project data into subspace of maximum variance: Can be solved as eigenvalue problem:
GRASP Using the kernel trick Do PCA in a higher dimensional feature space Can be defined implicitly through kernel matrix
GRASP Linear Gaussian Polynomial Common Kernels Do very well for classification. How about manifold learning?
GRASP Linear Kernel
GRASP Gaussian Kernels
GRASP Gaussian Kernels Feature vectors span as many dimensions as number of spheres with radius needed to enclose input vectors.
GRASP Polynomial Kernels
GRASP Part II. Manifold Learning via Semidefinite Programming
GRASP Local Isometry A smooth, invertible mapping that preserves distances and looks locally like a rotation plus translation.
GRASP Local Isometry A smooth, invertible mapping that preserves distances and looks locally like a rotation plus translation.
GRASP Neighborhood graph Connect each point to its k nearest neighbors. Discretized manifolds
GRASP Preserve local distances Approximation of local isometry: Constraint Neighborhood indicator
GRASP Goal: Problem: Heuristic: Objective Function? Find Minimum Rank Kernel Matrix Computationally Hard Maximize Pairwise Distances
GRASP Objective Function? (Cont’d) What happens if we maximize the pairwise distances?
GRASP Semidefinite Programming Problem: Maximize : subject to: Preserve local neighborhoods Unfold manifold Center output Semipositive definite
GRASP Part III Semidefinite Embedding in three easy steps (Also known as “Maximum Variance Unfolding” [Sun, Boyd, Xiao, Diaconis])
GRASP 1. Step: K-Nearest Neighbors Compute nearest neighbors and the Gram matrix for each neighborhood
GRASP 2. Step: Semidefinite programming Compute centered, locally isometric dot-product matrix with maximal trace
GRASP Estimate d from eigenvalue spectrum. Top eigenvectors give embedding 3. Step: kernel PCA
GRASP Part IV. Experimental Results
GRASP Trefoil Knot N=539 k=4 D=3 d=2
GRASP Trefoil Knot N=539 k=4 D=3 d=2
GRASP Teapot (full rotation) N=400 k=4 D=23028 d=2
GRASP N=200 k=4 D=23028 d=2 Teapot (half rotation)
GRASP Faces N=1000 k=4 D=540 d=2
GRASP Twos vs. Threes N=953 k=3 D=256 d=2
GRASP Part V. Supervised Experimental Results
GRASP Large Margin Classification SDE Kernel used in SVM Task: Binary Digit Classification Input: USPS Data Set Training / Testing set: 810/90 Neighborhood Size: k=4
GRASP SVM Kernel SDE is not well-suited for SVMs
GRASP SVM Kernel (cont’d) Non-Linear decision boundaryLinear decision boundary Unfolding does not necessarily help classification Reducing the dimensionality is counter-intuitive. Needs linear decision boundary on manifold.
GRASP Part VI. Conclusion
GRASP Previous Work Isomap and LLE can both be seen from a kernel view [Jihun Ham et al., ICML’04]
GRASP Previous Work (Isomap) Isomap and LLE can both be seen from a kernel view [Jihun Ham et al., ICML’04] Matrix not necessarily semi-positive definite SDEIsomap
GRASP Previous Work (Isomap) Isomap and LLE can both be seen from a kernel view [Jihun Ham et al., ICML’04] Matrix not necessarily semi-positive definite SDEIsomap
GRASP Previous Work (LLE) Isomap and LLE can both be seen from a kernel view [Jihun Ham et al., ICML’04] Eigenvalues do not reveal true dimensionality SDELLE
GRASP Conclusion Semidefinite Embedding (SDE) +extends kernel PCA to do manifold learning +uses semidefinite programming +has a guaranteed unique solution -not well suited for support vector machines -exact solution (so far) limited to N=2000
GRASP
Semidefinite Programming Problem: Maximize : subject to: Preserve local neighborhoods Unfold Manifold Center Output semi-positive definite
GRASP Semidefinite Programming Problem: Maximize : subject to: Preserve local neighborhoods Unfold Manifold Center Output semi-positive definite Introduce Slack
GRASP Swiss Roll N=800 k=4 D=3 d=2
GRASP Applications Visualization of Data Natural Language Processing
GRASP Trefoil Knot N=539 k=4 D=3 d=2 RBF Polynomial SDE
GRASP Motivation Similar vectorized pictures lie on a non-linear manifolds Linear Methods don’t work here