Spectral Methods for Dimensionality

Slides:



Advertisements
Similar presentations
Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Advertisements

Partitional Algorithms to Detect Complex Clusters
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Graph Embedding and Extensions: A General Framework for Dimensionality Reduction Keywords: Dimensionality reduction, manifold learning, subspace learning,
Manifold Learning Dimensionality Reduction. Outline Introduction Dim. Reduction Manifold Isomap Overall procedure Approximating geodesic dist. Dijkstra’s.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
PCA + SVD.
AGE ESTIMATION: A CLASSIFICATION PROBLEM HANDE ALEMDAR, BERNA ALTINEL, NEŞE ALYÜZ, SERHAN DANİŞ.
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
1 A Survey on Distance Metric Learning (Part 1) Gerry Tesauro IBM T.J.Watson Research Center.
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
Clustering and Dimensionality Reduction Brendan and Yifang April
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Isomap Algorithm.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
One-Shot Multi-Set Non-rigid Feature-Spatial Matching
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture 21: Spectral Clustering
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Distance Metric Learning: A Comprehensive Survey
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
Nonlinear Dimensionality Reduction by Locally Linear Embedding Sam T. Roweis and Lawrence K. Saul Reference: "Nonlinear dimensionality reduction by locally.
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Summarized by Soo-Jin Kim
1 Graph Embedding (GE) & Marginal Fisher Analysis (MFA) 吳沛勳 劉冠成 韓仁智
Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Learning a Kernel Matrix for Nonlinear Dimensionality Reduction By K. Weinberger, F. Sha, and L. Saul Presented by Michael Barnathan.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
Manifold learning: MDS and Isomap
1 LING 696B: MDS and non-linear methods of dimension reduction.
Nonlinear Dimensionality Reduction Approach (ISOMAP)
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
H. Lexie Yang1, Dr. Melba M. Crawford2
Non-Linear Dimensionality Reduction
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Data Mining Course 2007 Eric Postma Clustering. Overview Three approaches to clustering 1.Minimization of reconstruction error PCA, nlPCA, k-means clustering.
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
Nonlinear Dimension Reduction: Semi-Definite Embedding vs. Local Linear Embedding Li Zhang and Lin Liao.
Clustering Clustering definition: Partition a given set of objects into M groups (clusters) such that the objects of each group are ‘similar’ and ‘different’
Nonlinear Dimensionality Reduction
CSCE822 Data Mining and Warehousing
Ch 12. Continuous Latent Variables ~ 12
Clustering Usman Roshan.
INTRODUCTION TO Machine Learning 3rd Edition
Unsupervised Riemannian Clustering of Probability Density Functions
Dimensionality reduction
Dipartimento di Ingegneria «Enzo Ferrari»,
Machine Learning Basics
Dimensionality Reduction
ISOMAP TRACKING WITH PARTICLE FILTERING
Jianping Fan Dept of CS UNC-Charlotte
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Dimensionality Reduction
Learning with information of features
CS 2750: Machine Learning Support Vector Machines
Principal Component Analysis
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
Feature space tansformation methods
Generally Discriminant Analysis
Nonlinear Dimension Reduction:
NonLinear Dimensionality Reduction or Unfolding Manifolds
Clustering Usman Roshan CS 675.
Presentation transcript:

Spectral Methods for Dimensionality ECOE580

Introduction How can we search low-dimensional structure in high-dimensional structure? Spectral methods Non-linear low-dimensional sub manifolds Computationally tractable Shortest path problems, LSE, SDP etc.

Inputs & outputs Given a high-dimensional data X = (x1 x2 …xn) xi є Rd Compute n corresponding outputs such that yi є Rm Faithful mapping Nearby inputs mapped to nearby outputs m << d Assume inputs are centered on the origin Sum(xi) = 0

Spectral Methods Top or bottom eigenvectors of specially constructed matrices Linear Methods Graphical Methods Nearest neighbor relations Weighting Kernel Methods

Linear Methods PCA Preserves covariance structure Input patterns projected to m-dim subspace by minimizing Which is equal to sub-space with minimum variance

PCA The output pattern yij = xi . ej The subspace will contain the significant data’s variance. A prominent gap in the eigenvalue spectrum indicates that a cut-off

Metric Multidimensional Scaling Uses inner product between different inputs The minimum error is obtained from spectral decomposition of the Gram matrix The output pattern yij = λj . eji

Metric Multidimensional Scaling Motivated by preserving pairwise distance Assuming that the inputs centered at origin Gram matrix G can be written in terms of S Where

Metric Multidimensional Scaling Yields the same outputs with PCA Distance metric can be generalized to non linear metrics

Graph Based Methods If the data set is highly nonlinear then linear methods fail. Constructs a sparse graph Nodes are input patterns Edges are neighborhood relations Contract matrices from these graphs to capture the underlying structure

Graph Based Methods Polynomial-time Uses shortest path LSE SDP

IsoMap Preserves the pairwise distances between inputs as measured along the sub-manifold from which they are sampled Variant of MDS it uses geodesic distance

IsoMap Geodesic distance : Shortest path through the graph Algorithm Connect k-NN Compute pairwise distance P, between all nodes Find all-to-all shortest path

IsoMap Apply MDS on P Find the top m eigenvalues Euclidean distance of outputs are geodesic distance of inputs Formal guarantee of convergence when the data set has no holes (convex)

Maximum Variance Unfolding Preserves the distances and angles between nearby inputs Constructs a Gram matrix Unfold the data by pulling the input patterns apart Final transformation is a locally rotation and transformation

MVU Compute kNN Indicator matrix nij = 1 when input i and j are neighbors or in the kNN set of some other instance Due to the distance and angle constraint when nij = 1 Unfold the input patter by maximizing the variance of output

MVU Above optimization can be solved by SDP.

Locally Linear Embedding Preserves local linear structure of nearby inputs Instead of top m eigenvectors of a dense gram matrix it uses bottom m eigenvectors of a sparse matrix

LLE Compute kNN Construct directed graph whose edges indicate NN Assign Wij to the edges (each input and its kNN viewed as a small liner patch) Weights are computed by re-construct each input x from its kNN

LLE Weights are 0 if input i and j do not have kNN relationship Sum of weights for every input is 1. Sparse matrix W with local properties of data Same relation holds for outputs Minimize above equation Outputs has unit covariance Outputs are centered

LLE Minimization of equals to computing the bottom m eigenvalues of

Laplacian Eigenmaps Preserves the proximity relations Map nearby inputs to nearby outputs Similar to LLE Compute kNN Construct undirected graph Assigns positive weights (uniform or decaying weights)

LE Let D denote the diagonal matrix with elements Obtain outputs by minimizing Nearness measured by W

LE The minimizing problem can be solved by finding bottom m eigenvectors of Matrices are sparse so algorithm can be scaled to larger datasets.

Kernel Functions Let H be a mapping from Rd->dot product feature space Then the PCA can be written as Kernel PCA often uses nonlinear kernels Polynomial Kernels Gaussian Kernels However these kernels are not well suited for manifold learning