Dimensionality Reduction Part 2: Nonlinear Methods

Slides:

Advertisements

Similar presentations

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Advertisements

Dimensionality Reduction PCA -- SVD

Dimension reduction (1)

Manifold Learning Dimensionality Reduction. Outline Introduction Dim. Reduction Manifold Isomap Overall procedure Approximating geodesic dist. Dijkstra’s.

Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009

Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Isomap Algorithm.

Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

“Random Projections on Smooth Manifolds” -A short summary

Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.

Principal Component Analysis

LLE and ISOMAP Analysis of Robot Images Rong Xu. Background Intuition of Dimensionality Reduction Linear Approach –PCA(Principal Component Analysis) Nonlinear.

Dimensional reduction, PCA

Correspondence & Symmetry

Uncalibrated Geometry & Stratification Sastry and Yang

1 Numerical geometry of non-rigid shapes Spectral Methods Tutorial. Spectral Methods Tutorial 6 © Maks Ovsjanikov tosca.cs.technion.ac.il/book Numerical.

Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.

A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.

Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.

Dimensionality Reduction

1 Numerical geometry of non-rigid shapes Non-Euclidean Embedding Non-Euclidean Embedding Lecture 6 © Alexander & Michael Bronstein tosca.cs.technion.ac.il/book.

NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.

Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.

Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)

Nonlinear Dimensionality Reduction by Locally Linear Embedding Sam T. Roweis and Lawrence K. Saul Reference: "Nonlinear dimensionality reduction by locally.

CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.

Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.

Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University

Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.

Summarized by Soo-Jin Kim

CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2014.

Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University

CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.

1 Recognition by Appearance Appearance-based recognition is a competing paradigm to features and alignment. No features are extracted! Images are represented.

Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.

THE MANIFOLDS OF SPATIAL HEARING Ramani Duraiswami | Vikas C. Raykar Perceptual Interfaces and Reality Lab University of Maryland, College park.

Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)

ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.

Dimensionality Reduction

Manifold learning: MDS and Isomap

CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.

Nonlinear Dimensionality Reduction Approach (ISOMAP)

Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.

Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.

A Flexible New Technique for Camera Calibration Zhengyou Zhang Sung Huh CSPS 643 Individual Presentation 1 February 25,

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Data Mining Course 2007 Eric Postma Clustering. Overview Three approaches to clustering 1.Minimization of reconstruction error PCA, nlPCA, k-means clustering.

Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.

CSC321: Lecture 25: Non-linear dimensionality reduction Geoffrey Hinton.

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Spring 2008 Jinze.

Nonlinear Dimension Reduction: Semi-Definite Embedding vs. Local Linear Embedding Li Zhang and Lin Liao.

CSC321: Extra Lecture (not on the exam) Non-linear dimensionality reduction Geoffrey Hinton.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Dimensionality Reduction Part 1: Linear Methods Comp Spring 2007.

Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.

Eric Xing © Eric CMU, Machine Learning Data visualization and dimensionality reduction Eric Xing Lecture 7, August 13, 2010.

CSE 554 Lecture 8: Alignment

Spectral Methods for Dimensionality

Nonlinear Dimensionality Reduction

Unsupervised Riemannian Clustering of Probability Density Functions

کاربرد نگاشت با حفظ تنکی در شناسایی چهره

Dimensionality Reduction

Spectral Methods Tutorial 6 1 © Maks Ovsjanikov

Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE

Dimensionality Reduction

Dimensionality Reduction

Feature space tansformation methods

Nonlinear Dimension Reduction:

NonLinear Dimensionality Reduction or Unfolding Manifolds

Presentation transcript:

Dimensionality Reduction Part 2: Nonlinear Methods Comp 790-090 Spring 2007

Why Dimensionality Reduction Two approaches to reduce number of features Feature selection: select the salient features by some criteria Feature extraction: obtain a reduced set of features by a transformation of all features Data visualization and exploratory data analysis also need to reduce dimension Usually reduce to 2D or 3D

Deficiencies of Linear Methods Data may not be best summarized by linear combination of features Example: PCA cannot discover 1D structure of a helix

Intuition: how does your brain store these pictures?

Brain Representation

Brain Representation Every pixel? Or perceptually meaningful structure? Up-down pose Left-right pose Lighting direction So, your brain successfully reduced the high-dimensional inputs to an intrinsically 3-dimensional manifold!

Manifold Learning Y X latent observed Discover low dimensional representations (smooth manifold) for data in high dimension. Linear approaches(PCA, MDS) Non-linear approaches (ISOMAP, LLE, others)

Linear Approach- PCA PCA Finds subspace linear projections of input data.

Linear Method Linear Methods for Dimensionality Reduction PCA: rotate data so that principal axes lie in direction of maximum variance MDS: find coordinates that best preserve pairwise distances PCA

Motivation Linear Dimensionality Reduction doesn’t always work Data violates underlying “linear” assumptions Data is not accurately modeled by “affine” combinations of measurements Structure of data, while apparent, is not simple In the end, linear methods do nothing more than “globally transform” (rate, translate, and scale) all of the data, sometime what’s needed is to “unwrap” the data first

Stopgap Remedies Local PCA Neural Networks Compute PCA models for small overlapping item neighborhoods Requires a clustering preprocess Fast and simple, but results in no global parameterization Neural Networks Assumes a solution of a given dimension Uses relaxation methods to deform given solution to find a better fit Relaxation step is modeled as “layers” in a network where properties of future iterations are computed based on information from the current structure Many successes, but a bit of an art

Why Linear Modeling Fails Suppose that your sample data lies on some low-dimensional surface embedded within the high-dimensional measurement space. Linear models allow ALL affine combinations Often, certain combinations are atypical of the actual data Recognizing this is harder as dimensionality increases

What does PCA Really Model? Principle Component Analysis assumptions Mean-centered distribution What if the mean, itself is atypical? Eigenvectors of Covariance Basis vectors aligned with successive directions of greatest variance Classic 1st Order statistical model Distribution is characterized by its mean and variance (Gaussian Hyperspheres)

Non-Linear Dimensionality Reduction Non-linear Manifold Learning Instead of preserving global pairwise distances, non-linear dimensionality reduction tries to preserve only the geometric properties of local neighborhoods Discover a lower-dimensional “embedding” manifold Find a parameterization over that manifold Linear parameter space Projection mapping from original M-D space to d-D embedding space “reprojection, elevating, or lifting” “projection” Linear Embedding Space

Nonlinear DimRedux Steps Discover a low-dimensional embedding manifold Find a parameterization over the manifold Project data into parameter space Analyze, interpolate, and compress in embedding space Orient (by linear transformation) the parameter space to align axes with salient features Linear (affine) combinations are valid here In the case of interpolation and compression use “lifting” to estimate M-D original data

Nonlinear Methods Local Linear Embeddings [Roweis 2000] Isomaps [Tenenbaum 2000] These two papers ignited the field Principled approach (Asymptotically, as the amount of data goes to infinity they have been proven to find the “real” manifold) Widely applied Hotly contested

Nonlinear Approaches- Isomap Josh. Tenenbaum, Vin de Silva, John langford 2000 Constructing neighbourhood graph G For each pair of points in G, Computing shortest path distances ---- geodesic distances. Use Classical MDS with geodesic distances. Euclidean distance Geodesic distance

Sample points with Swiss Roll Altogether there are 20,000 points in the “Swiss roll” data set. We sample 1000 out of 20,000.

Construct neighborhood graph G K- nearest neighborhood (K=7) DG is 1000 by 1000 (Euclidean) distance matrix of two neighbors (figure A)

Compute all-points shortest path in G Now DG is 1000 by 1000 geodesic distance matrix of two arbitrary points along the manifold (figure B)

Use MDS to embed graph in Rd Find a d-dimensional Euclidean space Y (Figure c) to preserve the pariwise diatances.

Isomap Small Euclidean distance Key Observation: On a manifold distances are measured using geodesic distances rather than Euclidean distances Large geodesic distance

Problem: How to Get Geodesics Without knowledge of the manifold it is difficult to compute the geodesic distance between points It is even difficult if you know the manifold Solution Use a discrete geodesic approximation Apply a graph algorithm to approximate the geodesic distances

Dijkstra’s Algorithm Efficient Solution to all-points-shortest path problem Greedy breath-first algorithm

Dijkstra’s Algorithm Efficient Solution to all-points-shortest path problem Greedy breath-first algorithm

Dijkstra’s Algorithm Efficient Solution to all-points-shortest path problem Greedy breath-first algorithm

Dijkstra’s Algorithm Efficient Solution to all-points-shortest path problem Greedy breath-first algorithm

Isomap algorithm Compute fully-connected neighborhood of points for each item Can be k nearest neighbors or ε-ball Neighborhoods must be symmetric Test that resulting graph is fully-connected, if not increase either K or  Calculate pairwise Euclidean distances within each neighborhood Use Dijkstra’s Algorithm to compute shortest path from each point to non-neighboring points Run MDS on resulting distance matrix

Isomap Results Find a 2D embedding of the 3D S-curve (also shown for LLE) Isomap does a good job of preserving metric structure (not surprising) The affine structure is also well preserved

Residual Fitting Error

Neighborhood Graph

More Isomap Results

More Isomap Results

Isomap Failures Isomap also has problems on closed manifolds of arbitrary topology

Non-Linear Example A Data-Driven Reflectance Model (Matusik et al, Siggraph2003) Bidirectional Reflectance Distribution Functions(BRDF) Define ratio of the reflected radiance in a particular direction to the incident irradiance from direction. Isotropic BRDF

Measurement Modeling Bidirectional Reflectance Distribution Functions (BRDFs)

Measurement A “fast” BRDF measurement device inspired by Marshner[1998]

Measurement 20-80 million reflectance measurements per material Each tabulated BRDF entails 90x90x180x3=4,374,000 measurement bins

Measurement 20-80 million reflectance measurements per material Each tabulated BRDF entails 90x90x180x3=4,374,000 measurement bins

Rendering from Tabulated BRDFs Even without further analysis, our BRDFs are immediately useful Renderings made with Henrik Wann Jensen’s Dali renderer Nickel Hematite Gold Paint Pink Felt

BRDFs as Vectors in High-Dimensional Space Each tabulated BRDF is a vector in 90x90x180x3 =4,374,000 dimensional space 180 Unroll 90 90 4,374,000

Linear Analysis (PCA) Find optimal “linear basis” for our data set Eigenvalue magnitude Find optimal “linear basis” for our data set 45 components needed to reduce residue to under measurement error 20 40 60 80 100 120 Dimension mean 5 10 20 30 45 60 all

Problems with Linear Subspace Modeling Large number of basis vectors (45) Some linear combinations yield invalid or unlikely BRDFs (outside convex hull)

Problems with Linear Subspace Modeling Large number of basis vectors (45) Some linear combinations yield invalid or unlikely BRDFs (inside convex hull)

Results of Non-Linear Manifold Learning At 15 dimensions reconstruction error is less than 1% Parameter count similar to analytical models Error 5 10 15 Dimensionality

Non-Linear Advantages 15-dimensional parameter space More robust than linear model More extrapolations are plausible Linear Model Extrapolation Non-linear Model Extrapolation

Non-Linear Model Results

Non-Linear Model Results

Non-Linear Model Results

Representing Physical Processes Steel Oxidation

Local Linear Embeddings First Insight Locally, at a fine enough scale, everything looks linear

Local Linear Embeddings First Insight Find an affine combination the “neighborhood” about a point that best approximates it

Finding a Good Neighborhood This is the remaining “Art” aspect of nonlinear methods Common choices -ball: find all items that lie within an epsilon ball of the target item as measured under some metric Best if density of items is high and every point has a sufficient number of neighbors K-nearest neighbors: find the k-closest neighbors to a point under some metric Guarantees all items are similarly represented, limits dimension to K-1

Characterictics of a Manifold x1 x2 R2 Rn M x1 x2 R2 Rn z x x: coordinate for z Locally it is a linear patch Key: how to combine all local patches together?

LLE: Intuition Assumption: manifold is approximately “linear” when viewed locally, that is, in a small neighborhood Approximation error, e(W), can be made small Local neighborhood is effected by the constraint Wij=0 if zi is not a neighbor of zj A good projection should preserve this local geometric property as much as possible

LLE: Intuition We expect each data point and its neighbors to lie on or close to a locally linear patch of the manifold. Each point can be written as a linear combination of its neighbors. The weights chosen to minimize the reconstruction Error.

LLE: Intuition The weights that minimize the reconstruction errors are invariant to rotation, rescaling and translation of the data points. Invariance to translation is enforced by adding the constraint that the weights sum to one. The weights characterize the intrinsic geometric properties of each neighborhood. The same weights that reconstruct the data points in D dimensions should reconstruct it in the manifold in d dimensions. Local geometry is preserved

LLE: Intuition Use the same weights from the original space Low-dimensional embedding the i-th row of W Use the same weights from the original space

Local Linear Embedding (LLE) Assumption: manifold is approximately “linear” when viewed locally, that is, in a small neighborhood Approximation error, e(W), can be made small Meaning of W: a linear representation of every data point by its neighbors This is an intrinsic geometrical property of the manifold A good projection should preserve this geometric property as much as possible

Constrained Least Square problem Compute the optimal weight for each point individually: Neightbors of x Zero for all non-neighbors of x

Finding a Map to a Lower Dimensional Space Yi in Rk: projected vector for Xi The geometrical property is best preserved if the error below is small Y is given by the eigenvectors of the lowest d non-zero eigenvalues of the matrix Use the same weights computed above

Find Weights Rewriting as a matrix for all x Reorganizing Want to find W that minimizes , and satisfies “sum-to-one” constraint Ends up as constrained “least-squares” problem “Unknown W matrix” N N N M M N

Find Linear Embedding Space Now that we have the weight matrix W, find the linear vector that satisfies the following where W is N x N and X is M x N This can be found by finding the null space of Classic problem: run SVD on and find the orthogonal vector associated with the smallest d singular values (the smallest singular value will be zero and represent the system’s invariance to translation)

Numerical Issues Numerical problems can arise in computing LLEs The least-squared covariance matrix that arises in the computation of the weighting matrix, W, solution can be ill-conditioned Regularization (rescale the measurements by adding a small multiple of the Identity to covariance matrix) Finding small singular (eigen) values is not as well conditioned as finding large ones. The small ones are subject to numerical precision errors, and to get mixed Good (but slow) solvers exist, you have to use them

Results The resulting parameter vector, yi, gives the coordinates associated with the item xi The dth embedding coordinate is formed from the orthogonal vector associated with the dst singular value of A.

Reprojection Often, for data analysis, a parameterization is enough For interpolation and compression we might want to map points from the parameter space back to the “original” space No perfect solution, but a few approximations Delauney triangulate the points in the embedding space, find the triangle that the desired parameter setting falls into, and compute the baricenric coordinates of it, and use them as weights Interpolate by using a radially symmetric kernel centered about the desired parameter setting Works, but mappings might not be one-to-one

LLE Example 3-D S-Curve manifold with points color-coded Compute a 2-D embedding The local affine structure is well maintained The metric structure is okay locally, but can drift slowly over the domain (this causes the manifold to taper)

More LLE Examples

More LLE Examples

LLE Failures Does not work on to closed manifolds Cannot recognize Topology

Summary Non-Linear Dimensionality Reduction Methods Comparisons Issues These methods are considerably more powerful and temperamental than linear method Applications of these methods are a hot area of research Comparisons LLE is generally faster, but more brittle than Isomaps Isomaps tends to work better on smaller data sets (i.e. less dense sampling) Isomaps tends to be less sensitive to noise (perturbation of the input vectors) Issues Neither method handles closed manifolds and topological variations well