Spectral Methods for Dimensionality ECOE580
Introduction How can we search low-dimensional structure in high-dimensional structure? Spectral methods Non-linear low-dimensional sub manifolds Computationally tractable Shortest path problems, LSE, SDP etc.
Inputs & outputs Given a high-dimensional data X = (x1 x2 …xn) xi є Rd Compute n corresponding outputs such that yi є Rm Faithful mapping Nearby inputs mapped to nearby outputs m << d Assume inputs are centered on the origin Sum(xi) = 0
Spectral Methods Top or bottom eigenvectors of specially constructed matrices Linear Methods Graphical Methods Nearest neighbor relations Weighting Kernel Methods
Linear Methods PCA Preserves covariance structure Input patterns projected to m-dim subspace by minimizing Which is equal to sub-space with minimum variance
PCA The output pattern yij = xi . ej The subspace will contain the significant data’s variance. A prominent gap in the eigenvalue spectrum indicates that a cut-off
Metric Multidimensional Scaling Uses inner product between different inputs The minimum error is obtained from spectral decomposition of the Gram matrix The output pattern yij = λj . eji
Metric Multidimensional Scaling Motivated by preserving pairwise distance Assuming that the inputs centered at origin Gram matrix G can be written in terms of S Where
Metric Multidimensional Scaling Yields the same outputs with PCA Distance metric can be generalized to non linear metrics
Graph Based Methods If the data set is highly nonlinear then linear methods fail. Constructs a sparse graph Nodes are input patterns Edges are neighborhood relations Contract matrices from these graphs to capture the underlying structure
Graph Based Methods Polynomial-time Uses shortest path LSE SDP
IsoMap Preserves the pairwise distances between inputs as measured along the sub-manifold from which they are sampled Variant of MDS it uses geodesic distance
IsoMap Geodesic distance : Shortest path through the graph Algorithm Connect k-NN Compute pairwise distance P, between all nodes Find all-to-all shortest path
IsoMap Apply MDS on P Find the top m eigenvalues Euclidean distance of outputs are geodesic distance of inputs Formal guarantee of convergence when the data set has no holes (convex)
Maximum Variance Unfolding Preserves the distances and angles between nearby inputs Constructs a Gram matrix Unfold the data by pulling the input patterns apart Final transformation is a locally rotation and transformation
MVU Compute kNN Indicator matrix nij = 1 when input i and j are neighbors or in the kNN set of some other instance Due to the distance and angle constraint when nij = 1 Unfold the input patter by maximizing the variance of output
MVU Above optimization can be solved by SDP.
Locally Linear Embedding Preserves local linear structure of nearby inputs Instead of top m eigenvectors of a dense gram matrix it uses bottom m eigenvectors of a sparse matrix
LLE Compute kNN Construct directed graph whose edges indicate NN Assign Wij to the edges (each input and its kNN viewed as a small liner patch) Weights are computed by re-construct each input x from its kNN
LLE Weights are 0 if input i and j do not have kNN relationship Sum of weights for every input is 1. Sparse matrix W with local properties of data Same relation holds for outputs Minimize above equation Outputs has unit covariance Outputs are centered
LLE Minimization of equals to computing the bottom m eigenvalues of
Laplacian Eigenmaps Preserves the proximity relations Map nearby inputs to nearby outputs Similar to LLE Compute kNN Construct undirected graph Assigns positive weights (uniform or decaying weights)
LE Let D denote the diagonal matrix with elements Obtain outputs by minimizing Nearness measured by W
LE The minimizing problem can be solved by finding bottom m eigenvectors of Matrices are sparse so algorithm can be scaled to larger datasets.
Kernel Functions Let H be a mapping from Rd->dot product feature space Then the PCA can be written as Kernel PCA often uses nonlinear kernels Polynomial Kernels Gaussian Kernels However these kernels are not well suited for manifold learning