CSC321: Extra Lecture (not on the exam) Non-linear dimensionality reduction Geoffrey Hinton.

Slides:



Advertisements
Similar presentations
CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.
Advertisements

Self-Organizing Maps Projection of p dimensional observations to a two (or one) dimensional grid space Constraint version of K-means clustering –Prototypes.
Nonlinear Dimension Reduction Presenter: Xingwei Yang The powerpoint is organized from: 1.Ronald R. Coifman et al. (Yale University) 2. Jieping Ye, (Arizona.
Manifold Learning Dimensionality Reduction. Outline Introduction Dim. Reduction Manifold Isomap Overall procedure Approximating geodesic dist. Dijkstra’s.
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Isomap Algorithm.
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 14: Embeddings 1Lecture 14: Embeddings.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
LLE and ISOMAP Analysis of Robot Images Rong Xu. Background Intuition of Dimensionality Reduction Linear Approach –PCA(Principal Component Analysis) Nonlinear.
Image Manifolds : Learning-based Methods in Vision Alexei Efros, CMU, Spring 2007 © A.A. Efros With slides by Dave Thompson.
“Human Control of an Anthropomorphic Robot Hand”
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Y. Weiss (Hebrew U.) A. Torralba (MIT) Rob Fergus (NYU)
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
Nonlinear Dimensionality Reduction by Locally Linear Embedding Sam T. Roweis and Lawrence K. Saul Reference: "Nonlinear dimensionality reduction by locally.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Diffusion Maps and Spectral Clustering
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Stochastic k-Neighborhood Selection for Supervised and Unsupervised Learning University of Toronto Machine Learning Seminar Feb 21, 2013 Kevin SwerskyIlya.
CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.
Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
Dimensionality reduction: Some Assumptions High-dimensional data often lies on or near a much lower dimensional, curved manifold. A good way to represent.
CSC321: Neural Networks Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis Geoffrey Hinton.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Learning a Kernel Matrix for Nonlinear Dimensionality Reduction By K. Weinberger, F. Sha, and L. Saul Presented by Michael Barnathan.
Learning to perceive how hand-written digits were drawn Geoffrey Hinton Canadian Institute for Advanced Research and University of Toronto.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
Dimensionality Reduction
Manifold learning: MDS and Isomap
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
Nonlinear Dimensionality Reduction Approach (ISOMAP)
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
Non-Linear Dimensionality Reduction
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Principle Component Analysis and its use in MA clustering Lecture 12.
Data Mining Course 2007 Eric Postma Clustering. Overview Three approaches to clustering 1.Minimization of reconstruction error PCA, nlPCA, k-means clustering.
KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
CSC321: Lecture 25: Non-linear dimensionality reduction Geoffrey Hinton.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Nonlinear Dimension Reduction: Semi-Definite Embedding vs. Local Linear Embedding Li Zhang and Lin Liao.
CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Manifold Learning JAMES MCQUEEN – UW DEPARTMENT OF STATISTICS.
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
CSC2535: Lecture 4: Autoencoders, Free energy, and Minimum Description Length Geoffrey Hinton.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Spectral Methods for Dimensionality
Visualizing High-Dimensional Data
CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.
کاربرد نگاشت با حفظ تنکی در شناسایی چهره
Dipartimento di Ingegneria «Enzo Ferrari»,
Machine Learning Dimensionality Reduction
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Nonlinear Dimension Reduction:
NonLinear Dimensionality Reduction or Unfolding Manifolds
Presentation transcript:

CSC321: Extra Lecture (not on the exam) Non-linear dimensionality reduction Geoffrey Hinton

Linear methods of reducing dimensionality PCA finds the directions that have the most variance. –By representing where each datapoint is along these axes, we minimize the squared reconstruction error. –Linear autoencoders are equivalent to PCA Multi-Dimensional Scaling arranges the low-dimensional points so as to minimize the discrepancy between the pairwise distances in the original space and the pairwise distances in the low-D space. MDS is equivalent to PCA (if we normalize the data right) high-D distance low-D distance

Non-linear autoencoders with extra layers are much more powerful than PCA but they are slow to optimize and they get different, locally optimal solutions each time. Multi-Dimensional Scaling can be made non-linear by putting more importance on the small distances. An extreme version is the Sammon mapping: Non-linear MDS is also slow to optimize and also gets stuck in different local optima each time. Other non-linear methods of reducing dimensionality high-D distance low-D distance

Linear methods cannot interpolate properly between the leftmost and rightmost images in each row. This is because the interpolated images are NOT averages of the images at the two ends. The method shown here does not interpolate properly either because it can only use examples from the training set. It cannot create new images.

IsoMap: Local MDS without local optima Instead of only modeling local distances, we can try to measure the distances along the manifold and then model these intrinsic distances. –The main problem is to find a robust way of measuring distances along the manifold. –If we can measure manifold distances, the global optimisation is easy: It’s just global MDS (i.e. PCA) 2-D 1-D If we measure distances along the manifold, d(1,6) > d(1,4) 1 4 6

How Isomap measures intrinsic distances Connect each datapoint to its K nearest neighbors in the high-dimensional space. Put the true Euclidean distance on each of these links. Then approximate the manifold distance between any pair of points as the shortest path in this graph. A B

Using Isomap to discover the intrinsic manifold in a set of face images

A probabilistic version of local MDS It is more important to get local distances right than non-local ones, but getting infinitessimal distances right is not infinitely important. –All the small distances are about equally important to model correctly. –Stochastic neighbor embedding has a probabilistic way of deciding if a pairwise distance is “local”.

Computing affinities between datapoints Each high dimensional point, i, has a conditional probability of picking each other point, j, as its neighbor. The conditional distribution over neighbors is based on the high-dimensional pairwise distances. High-D Space i j k probability of picking j given that you start at i

Turning conditional probabilities into pairwise probabilities To get a symmetric affinity between i and j we sum the two conditional probabilities and divide by the number of points (points are not allowed to choose themselves). This ensures that all the pairwise affinities sum to 1 so they can be treated as probabilities. joint probability of picking the pair i,j

Throwing away the raw data The pairwise affinities are all we are going to use for deciding on the low-dimensional representations of the datapoints. There are many possible ways to define the affinity of a pair of high-dimensional data-points. –Other types of affinity may work better. –Temporal adjacency is especially easy to use.

Evaluating an arrangement of the data in a low-dimensional space Give each data-point a location in the low- dimensional space. –Define low-dimensional probabilities symmetrically. –Evaluate the representation by seeing how well the low-D probabilities model the high-D affinities. Low-D Space i j k

The cost function for a low-dimensional representation For points where p ij is large and q ij is small we lose a lot. –Nearby points in high-D really want to be nearby in low-D For points where q ij is large and p ij is small we lose a little because we waste some of the probability mass in the Q distribution. –Widely separated points in high-D have a mild preference for not being too close in low-D. –But it doesn’t cost much to make the manifold bend back on itself.

The forces acting on the low-dimensional points Points are pulled towards each other if the p’s are bigger than the q’s and repelled if the q’s are bigger than the p’s –Its equivalent to having springs whose stiffnesses are set dynamically. i j extension stiffness

Data from sne paper Unsupervised SNE embedding of the digits 0-4. Not all the data is displayed