Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.

Slides:



Advertisements
Similar presentations
C&O 355 Mathematical Programming Fall 2010 Lecture 9
Advertisements

Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
Dimension reduction (2) Projection pursuit ICA NCA Partial Least Squares Blais. “The role of the environment in synaptic plasticity…..” (1998) Liao et.
Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
Fast Algorithms For Hierarchical Range Histogram Constructions
1 OR II GSLM Outline  some terminology  differences between LP and NLP  basic questions in NLP  gradient and Hessian  quadratic form  contour,
The Stability of a Good Clustering Marina Meila University of Washington
Globally Optimal Estimates for Geometric Reconstruction Problems Tom Gilat, Adi Lakritz Advanced Topics in Computer Vision Seminar Faculty of Mathematics.
Convex Position Estimation in Wireless Sensor Networks
by Rianto Adhy Sasongko Supervisor: Dr.J.C.Allwright
1 A Survey on Distance Metric Learning (Part 1) Gerry Tesauro IBM T.J.Watson Research Center.
Jue Wang Michael F. Cohen IEEE CVPR Outline 1. Introduction 2. Failure Modes For Previous Approaches 3. Robust Matting 3.1 Optimized Color Sampling.
Data mining and statistical learning - lecture 6
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Visual Recognition Tutorial
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
Independent Component Analysis (ICA) and Factor Analysis (FA)
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
1 Numerical geometry of non-rigid shapes Non-Euclidean Embedding Non-Euclidean Embedding Lecture 6 © Alexander & Michael Bronstein tosca.cs.technion.ac.il/book.
Computer Graphics Recitation The plan today Least squares approach  General / Polynomial fitting  Linear systems of equations  Local polynomial.
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Today Wrap up of probability Vectors, Matrices. Calculus
Advanced Image Processing Image Relaxation – Restoration and Feature Extraction 02/02/10.
Linear Algebra and Image Processing
Lecture 10: Inner Products Norms and angles Projection Sections 2.10.(1-4), Sections 2.2.3, 2.3.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Summarized by Soo-Jin Kim
1 February 24 Matrices 3.2 Matrices; Row reduction Standard form of a set of linear equations: Chapter 3 Linear Algebra Matrix of coefficients: Augmented.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
CS774. Markov Random Field : Theory and Application Lecture 08 Kyomin Jung KAIST Sep
Cs: compressed sensing
Learning a Kernel Matrix for Nonlinear Dimensionality Reduction By K. Weinberger, F. Sha, and L. Saul Presented by Michael Barnathan.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
CHAPTER 5 SIGNAL SPACE ANALYSIS
Learning Spectral Clustering, With Application to Speech Separation F. R. Bach and M. I. Jordan, JMLR 2006.
Algorithms 2005 Ramesh Hariharan. Algebraic Methods.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Large-Scale Matrix Factorization with Missing Data under Additional Constraints Kaushik Mitra University of Maryland, College Park, MD Sameer Sheoreyy.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Signal & Weight Vector Spaces
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Krylov-Subspace Methods - I Lecture 6 Alessandra Nardi Thanks to Prof. Jacob White, Deepak Ramaswamy, Michal Rewienski, and Karen Veroy.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
1 Chapter 4 Interpolation and Approximation Lagrange Interpolation The basic interpolation problem can be posed in one of two ways: The basic interpolation.
Nonlinear Dimension Reduction: Semi-Definite Embedding vs. Local Linear Embedding Li Zhang and Lin Liao.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.
Lecture 16: Image alignment
Spectral Methods for Dimensionality
Computation of the solutions of nonlinear polynomial systems
Computing and Compressive Sensing in Wireless Sensor Networks
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Advanced Artificial Intelligence
Feature space tansformation methods
Generally Discriminant Analysis
I.4 Polyhedral Theory (NW)
I.4 Polyhedral Theory.
Nonlinear Dimension Reduction:
Performance Optimization
Linear Discrimination
Presentation transcript:

Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli

3/21/ Introduction Problem Problem discovery of low dimensional representations of high- dimensional data discovery of low dimensional representations of high- dimensional data in many cases, local proximity measurements also available in many cases, local proximity measurements also available e.g. computer vision, sensor localization e.g. computer vision, sensor localization Current Approach Current Approach semidefinite programs (SDPs) – convex optimization semidefinite programs (SDPs) – convex optimization Disadvantage: it doesn’t scale well for large inputs Disadvantage: it doesn’t scale well for large inputs Paper Contribution Paper Contribution method for solving very large problems of the above type method for solving very large problems of the above type much smaller/faster SDPs than those previously studied much smaller/faster SDPs than those previously studied

3/21/ Sensor localization Determine the 2D position of the sensors based on estimates of local distances between neighboring sensors Determine the 2D position of the sensors based on estimates of local distances between neighboring sensors sensors i, j neighbors iff sufficiently close to estimate their pairwise distance via limited-range radio transmission sensors i, j neighbors iff sufficiently close to estimate their pairwise distance via limited-range radio transmission Input: Input: n sensors n sensors d ij : estimate of local distance between neighboring sensors i,j d ij : estimate of local distance between neighboring sensors i,j Output: Output: x 1, x 2, … x n є R 2 : planar coordinates of sensors x 1, x 2, … x n є R 2 : planar coordinates of sensors

3/21/ Work so far… Minimize sum-of-squares loss function Minimize sum-of-squares loss function Centering constraint (assuming no sensor location is known in advance) Centering constraint (assuming no sensor location is known in advance) Optimization in (1) non convex Optimization in (1) non convex  Likely to be trapped in local minima ! (1) (2)

3/21/ Convex Optimization Convex function Convex function a real-valued function f defined on a domain C that for any two points x and y in C and any t in [0,1], a real-valued function f defined on a domain C that for any two points x and y in C and any t in [0,1], Convex optimization Convex optimization

3/21/ Solution to convexity Define n x n inner-product matrix X Define n x n inner-product matrix X X ij = x i · x j X ij = x i · x j Get convex optimization by relaxing the constraint that sensor locations x i lie in the R 2 plane Get convex optimization by relaxing the constraint that sensor locations x i lie in the R 2 plane x i vectors will lie in a subspace with dimension equal to the rank of the solution X x i vectors will lie in a subspace with dimension equal to the rank of the solution X  Project x i s into their 2D subspace of maximum variance to get planar coordinates (3)

3/21/ Maximum Variance Unfolding (MVU) The higher the rank of X, the greater the information loss after projection The higher the rank of X, the greater the information loss after projection Add extra term to the loss function to favor solutions with high variance (or high trace) Add extra term to the loss function to favor solutions with high variance (or high trace) trace of square matrix X (tr(X)): sum of the elements on X’s main diagonal trace of square matrix X (tr(X)): sum of the elements on X’s main diagonal parameter v > 0 balances the trade-off between maximizing variance and preserving local distances (maximum variance unfolding - MVU) parameter v > 0 balances the trade-off between maximizing variance and preserving local distances (maximum variance unfolding - MVU) (4)

3/21/ Matrix factorization (1/2) G : neighborhood graph defined by the sensor network G : neighborhood graph defined by the sensor network Assume location of sensors is a function defined over the nodes of G Assume location of sensors is a function defined over the nodes of G Functions on a graph can be approximated using eigenvectors of graph’s Laplacian matrix as basis functions (spectral graph theory) Functions on a graph can be approximated using eigenvectors of graph’s Laplacian matrix as basis functions (spectral graph theory) graph Laplacian l: graph Laplacian l: eigenvectors of graph Laplacian matrix ordered by smoothness eigenvectors of graph Laplacian matrix ordered by smoothness Approximate sensors’ locations using the m bottom eigenvectors of the Laplacian matrix of G Approximate sensors’ locations using the m bottom eigenvectors of the Laplacian matrix of G x i ≈ Σ α=1 m Q iα y α x i ≈ Σ α=1 m Q iα y α Q : n x m matrix with the m bottom eigenvectors of Laplacian matrix (precomputed) Q : n x m matrix with the m bottom eigenvectors of Laplacian matrix (precomputed) y α : m x 1 vector, α = 1, …, m (unknown) y α : m x 1 vector, α = 1, …, m (unknown)

3/21/ Matrix factorization (2/2) Define m x m inner-product matrix Y Define m x m inner-product matrix Y Y αβ = y α · y β Y αβ = y α · y β Factorize matrix X Factorize matrix X X ≈ QYQ T X ≈ QYQ T Get equivalent optimization Get equivalent optimization tr(Y) = tr(X), since Q stores mutually orthogonal eigenvectors tr(Y) = tr(X), since Q stores mutually orthogonal eigenvectors QYQ T satisfies centering constraint (uniform eigenvector not included) QYQ T satisfies centering constraint (uniform eigenvector not included) Instead of the n x n matrix X, optimization is solved for the much smaller m x m matrix Y ! Instead of the n x n matrix X, optimization is solved for the much smaller m x m matrix Y ! (5)

3/21/ Formulation as SDP Approach for large input problems: Approach for large input problems: cast the required optimization as SDP over small matrices with few constraints cast the required optimization as SDP over small matrices with few constraints Rewrite the previous formula as an SDP in standard form Rewrite the previous formula as an SDP in standard form Y є R m^2 : vector obtained by concatenating all the columns of Y Y є R m^2 : vector obtained by concatenating all the columns of Y A є R m^2 x m^2 : positive semidefinite matrix collecting all the quadratic coefficients in the objective function A є R m^2 x m^2 : positive semidefinite matrix collecting all the quadratic coefficients in the objective function b є R m2 : vector collecting all the linear coefficients in the objective function b є R m2 : vector collecting all the linear coefficients in the objective function l : lower bound on the quadratic piece of the objective function l : lower bound on the quadratic piece of the objective function Use Schur’s lemma to express this bound as a linear matrix inequality Use Schur’s lemma to express this bound as a linear matrix inequality (6)

3/21/ Formulation as SDP Approach for large input problems: Approach for large input problems: cast the required optimization as SDP over small matrices with few constraints cast the required optimization as SDP over small matrices with few constraints Unknown variables: m(m+1)/2 elements of Y and scalar l Unknown variables: m(m+1)/2 elements of Y and scalar l Constraints: positive semidefinite constraint on Y and linear matrix inequality of size m 2 x m 2 Constraints: positive semidefinite constraint on Y and linear matrix inequality of size m 2 x m 2 The complexity of the SDP does not depend on the number of nodes (n) or edges in the network! The complexity of the SDP does not depend on the number of nodes (n) or edges in the network! (6)

3/21/ Gradient-based improvement 2-step process (optional): 2-step process (optional): Starting from the m-dimensional solution of eq. (6), use conjugate gradient methods to maximize the objective function in eq. (4) Starting from the m-dimensional solution of eq. (6), use conjugate gradient methods to maximize the objective function in eq. (4) Project the results from the previous step into the R 2 plane and use conjugate gradient methods to minimize the loss function in eq. (1) Project the results from the previous step into the R 2 plane and use conjugate gradient methods to minimize the loss function in eq. (1) conjugate gradient method: iterative method for minimizing a quadratic function where its Hessian matrix (matrix of second partial derivatives) is positive definite conjugate gradient method: iterative method for minimizing a quadratic function where its Hessian matrix (matrix of second partial derivatives) is positive definite

3/21/ Results (1/2) n = 1055 largest cities in continental US local distances up to 18 neighbors within radius r = 0.09 local measurements corrupted by 10% Gaussian noise over the true local distance m = 10 bottom eigenvectors of graph Laplacian Result from SDP in (9) ~ 4sResult after conjugate gradient descent

3/21/ Results (2/2) n = 20,000 uniformly sampled points inside the unit square local distances up to 20 other nodes within radius r = 0.06 m = 10 bottom eigenvectors of graph Laplacian 19s to construct and solve the SDP 52s for 100 iterations in conjugate gradient descent

3/21/ Results (3/3) loss function in eq. (1) vs. number of eigenvectors loss function in eq. (1) vs. number of eigenvectors computation time vs. number of eigenvectors computation time vs. number of eigenvectors “sweet spot” around m ≈ 10 eigenvectors “sweet spot” around m ≈ 10 eigenvectors

3/21/ FastMVU on Robotics Control of a robot using sparse user input Control of a robot using sparse user input e.g. 2D mouse position e.g. 2D mouse position Robot localization Robot localization the robot’s location is inferred from the high dimensional description of its state in terms of sensorimotor input

3/21/ Conclusion Approach for inferring low dimensional representations from local distance constraints using MVU Approach for inferring low dimensional representations from local distance constraints using MVU Use of matrix factorization computed from the bottom eigenvectors of the graph Laplacian Use of matrix factorization computed from the bottom eigenvectors of the graph Laplacian Local search methods can refine solution Local search methods can refine solution Suitable for large input; its complexity does not depend on the input! Suitable for large input; its complexity does not depend on the input!