1 LING 696B: MDS and non-linear methods of dimension reduction.

Slides:



Advertisements
Similar presentations
Coherent Laplacian 3D protrusion segmentation Oxford Brookes Vision Group Queen Mary, University of London, 11/12/2009 Fabio Cuzzolin.
Advertisements

Nonlinear Dimension Reduction Presenter: Xingwei Yang The powerpoint is organized from: 1.Ronald R. Coifman et al. (Yale University) 2. Jieping Ye, (Arizona.
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
Clustering and Dimensionality Reduction Brendan and Yifang April
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Isomap Algorithm.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Ronald R. Coifman , Stéphane Lafon, 2006
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
One-Shot Multi-Set Non-rigid Feature-Spatial Matching
“Random Projections on Smooth Manifolds” -A short summary
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture 21: Spectral Clustering
Spectral Clustering 指導教授 : 王聖智 S. J. Wang 學生 : 羅介暐 Jie-Wei Luo.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Dimensional reduction, PCA
1 Numerical geometry of non-rigid shapes Spectral Methods Tutorial. Spectral Methods Tutorial 6 © Maks Ovsjanikov tosca.cs.technion.ac.il/book Numerical.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Previously Two view geometry: epipolar geometry Stereo vision: 3D reconstruction epipolar lines Baseline O O’ epipolar plane.
3D Geometry for Computer Graphics
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Dimensionality Reduction
NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Nonlinear Dimensionality Reduction by Locally Linear Embedding Sam T. Roweis and Lawrence K. Saul Reference: "Nonlinear dimensionality reduction by locally.
Diffusion Maps and Spectral Clustering
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
1 LING 696B: PCA and other linear projection methods.
Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Adaptive nonlinear manifolds and their applications to pattern.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Learning a Kernel Matrix for Nonlinear Dimensionality Reduction By K. Weinberger, F. Sha, and L. Saul Presented by Michael Barnathan.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
Manifold learning: MDS and Isomap
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
Nonlinear Dimensionality Reduction Approach (ISOMAP)
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
Non-Linear Dimensionality Reduction
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory.
1 LING 696B: Graph-based methods and Supervised learning.
Data Mining Course 2007 Eric Postma Clustering. Overview Three approaches to clustering 1.Minimization of reconstruction error PCA, nlPCA, k-means clustering.
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.
CSC321: Lecture 25: Non-linear dimensionality reduction Geoffrey Hinton.
Nonlinear Dimension Reduction: Semi-Definite Embedding vs. Local Linear Embedding Li Zhang and Lin Liao.
CSC321: Extra Lecture (not on the exam) Non-linear dimensionality reduction Geoffrey Hinton.
Manifold Learning JAMES MCQUEEN – UW DEPARTMENT OF STATISTICS.
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
Spectral Methods for Dimensionality
Nonlinear Dimensionality Reduction
Intrinsic Data Geometry from a Training Set
Unsupervised Riemannian Clustering of Probability Density Functions
Spectral Methods Tutorial 6 1 © Maks Ovsjanikov
Jianping Fan Dept of CS UNC-Charlotte
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Principal Component Analysis
Feature space tansformation methods
NonLinear Dimensionality Reduction or Unfolding Manifolds
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

1 LING 696B: MDS and non-linear methods of dimension reduction

2 Big picture so far Blob/pizza/pancake shaped data --> Gaussian distributions Clustering with blobs Linear dimention reduction What if the data are not blob-shaped? Can we still reduce the dimension? Can we still perform clustering?

3 Dimension reduction with PCA Decomposition of covariance matrix If only the first few ones are significant, we can ignore the rest, e.g. 2-D coordinates of X

4 Success of reduction = Blob-likeness of data a2a2 a1a1 Pancake data in 3D

5 Example: articulatory data Story and Titze: extracting composite articulatory control parameters from area functions using PCA PCA can be a “preprocessor” For K-means

6 Can neural nets do dimension reduction? Yes, but most architectures can be seen as implementations of some variant of linear projection Output X Input X hidden = W W Context/time-delay layer Can an Elman-style network discover segments?

7 Metric multidimensional scaling Input data: “distance” between stimuli Intend to recover some psychological space for the stimuli Dimension reduction also achieved through appropriate matrix decomposition

8 Calculating Metric MDS Data: distance matrix D Entries: D ij = || x i - x j || 2 Need to calculate X from D: Gram matrix: G = X*X T (N X N) Entries: G ij = = x i x j T (unknown) Main point: if the distance is Euclidean, and X is centered, then can compute the Gram matrix from distance matrix G = D substracts column mean, then the row mean (homework)

9 Calculating Metric MDS Get X from Gram matrix: decompose G Dimension reduction: only a few d i ’s are significant, the rest are small (similar to PCA), e.g. d = 2

10 Calculating Metric MDS Now don’t want rotation, but X itself (different from PCA). There are infinitely many solutions Any rotation matrix R, XR*R T X T = XX T Same problem with Factor Analysis: x = C*z + v = CR * R T z + v The recovered X has to be psychological X XTXT

11 MDS and PCA Both are linear dimension reduction Euclidean distance --> identical solutions for dimension reduction X T X (covariance matrix) and XX T (Gram matrix) have the same eigenvalues (homework) (see summary) MDS can be applied even if we don’t know whether it’s Euclidean (Non-metric MDS) MDS needs to diagonalize large matrices when N is large

12 Going beyond (linear+blob) combination Looking for non-Gaussian image with linear projections (last week) Linear Discriminant Analysis Independent Component Analysis Looking for non-linear projections that may find blobs (today) Isomap Spectral clustering

13 Why non-linear dimension reduction? Linear methods are all based on the Gaussian assumption Gaussians are closed under linear transformations Yet lots of data do not look like blobs In high dimensions, geometric intuition breaks down Hard to see what a distribution “looks like”

14 Non-linear dimension reduction Data sampled from a “manifold” structure Manifold: a “surface” that locally looks like Euclidean Each small piece Looks like Euclidean (pictures from L. Saul) No rotation or linear projection can produce this “interesting” structure

15 The generic dimension reduction problem Dimension reduction = finding lower dimensional embedding of the manifold Sensory data = embedding in an ambient measurement space (d large) Goal: embedding in a lower dimensional space (visual: d<4) Ideally, d = intrinsic dimension (~ cognition?)

16 The need for non-linear transformations Why directly applying MDS will not work? A twisted structure may change the ordering (see demo)

17 Embedding needs to preserve global structure Cutting the data into blobs?

18 Embedding needs to preserve global structure Cutting the data into blobs? No concept of global structure Can’t tell the intrinsic dimension

19 What does it mean to preserve global structure? This is hard to quantify, but we can at least look for an embedding that preserves some properties of the global structure E.g. preserves distance Example: distance between two points on earth The actual calculation depends on what we think the shape of earth is

20 Global structure through distance Geodesic distance: the distance between two points along the manifold d(A,B) = min{length(curve(A-->B))} curve(A-->B) is on the manifold No shortcuts! “global distance”

21 Global structure through undirected graphs In practice, no manifold, only work with data points But enough data points can always approximate the surface when they are “dense” Think of the data as connected by “rigid bars” Desired embedding: “stretch” the dataset as far as allowed by the bars Like making a map

22 Isomap (Tenenbaum et al) Idea: approximating geodesic distance by making small, local connections Dynamic programming through the neighborhood graph

23 Isomap (Tenenbaum et al) The algorithm Compute neighborhood graph (by K- nearest neighbor) Calculate pairwise distance d(i,j) by shortest path between point i and j (and also cut out outliers) Run metric MDS on the distance matrix D, extract the leading eigenvectors Key: maintain the geodesic distance rather than the ambient distance

24 The effect of neighborhood size in Isomap What happens if K is small? What happens if K is large? What happens if K = N? Should K be fixed? Have assumed a uniform distribution on the manifold (see demo)

25 Nice properties of Isomap Implicitly defines a non-linear projection of original data (through geodesic distance + MDS) so that: Euclidean distance new = geodesic distance old Compare to kernel methods (later) No local maxima: another eigenvalue problem Theoretical guarantee (footnote 18,19) Only needs to choose the neighborhood size K

26 Problem of Isomap What if the data have holes? Things with holes cannot be massaged into a convex set When the data consists of disjoint parts, don’t want to maintain the distance between the different parts Need to solve a clustering problem Make sense to keep this distance? How can we stretch Two parts at a time? How to stretch a circle?

27 Spectral clustering K-means/Gaussian mixture/PCA clustering only works for blobs Clustering non-blob data: image segmentation in computer vision (example from Kobus Barnard)

28 Spectral = graph structure Rather than working directly with data points, work with the graph constructed from data points Isomap: distance calculated from neighborhood graph Spectral clustering: find a layout of the graph that separates the clusters

29 Undirected graphs Backbone of the graph A set of nodes V={1,…,N} A set of edges E={e ij } Algebraic graphs: either connected or not connected Weighted graphs: the edges carry weights A lot of problems can be formulated as graph problems, e.g. Google, OT

30 Seeing the graph structure through matrix Fix an ordering of the nodes (1,…,N) Let edges from j to k correspond to a matrix entry A(j,k) or W(j,k) A(j,k) = 0/1 for unweighted graph W(j,k) = weights for weighted graph Laplacian (D-A) is another useful matrix 23

31 Spectrum of graph A lot of questions related to graphs can be answered through their matrices Examples The chance of a random walk going through a particular node (Google) The time needed for a random walk to reach equilibrium (Manhattan project) Approximate solutions to intractable problems, e.g. a layout of the graph that will separate less connected parts (clustering)

32 Clustering as a graph partitioning problem Normalized-cut problem: splitting the graph into two parts, so that Each part is not too small The edges being cut don’t carry too many weights Weights on edges from A to B Weights on edges within A AB

33 Normalized cut through spectral embedding Exact solution of normalized-cut is NP-hard (explodes for large graph) “Soft” version is solvable: looking for coordinates for the nodes x 1, … x N to minimize Strongly connected nodes stay nearby, weakly connected nodes stay faraway Such coordinates are provided by eigenvectors of adjacency/laplacian matrix (recall MDS) -- Spectral embedding

34 Belkin and Niyogi, and others Spectral clustering algorithm Construct a graph by connecting each data point with its neighbors Compute the laplacian matrix L Use the spectral embedding (bottom eigenvectors of L) to represent data, and run K-means What is the free parameter here?

35 The effect of neighborhood size in contructing a graph This can be specified with a radius, or a neighborhood size K Same problem as Isomap Don’t want to connect everyone Then graph is complete -- little structure Don’t want to connect too few Then the graph is too sparse -- not robust to holes/shortcuts/outliers This is a delicate matter (see demo)

36 Distributional clustering of words in Belkin and Niyogi Feature vector: word counts from the previous and following 300 words

37 Speech clustering in Belkin and Niyogi Feature vector: spectrogram (256)

38 Summary of graph-based methods When the geometry of data is unknown, it seems reasonable to work with a graph derived from data Dimension reduction: find a low-dimensional representation of the graph Clustering: use a spectral embedding of the graph to separate components Constructing the graph require heuristic parameters for the neighborhood size (choice of K)

39 Computation of linear and non-linear reduction All involve diagonalization of matrices PCA: covariance matrix (dense) MDS: Gram matrix derived from Euclidean distance (dense) Isomap: Gram matrix derived from geodesic distance (dense) Spectral clustering: weight matrix derived from data (sparse) Many variants do not have this nice property

40 Questions: How often does manifold arise in perception/cognition? What is the right metric for calculating local distance in the ambient space? Do people utilize manifold structure in different perceptual domains? (and what does this tell us about K) Vowel manifolds? (crazy experiment)

41 Last word I’m not sure this experiment will work. But can people just learn arbitrary manifold structures? Are there constraints on the structure that people can learn?

42