Manifold Learning JAMES MCQUEEN – UW DEPARTMENT OF STATISTICS.

Slides:



Advertisements
Similar presentations
Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1.
Manifold Learning Dimensionality Reduction. Outline Introduction Dim. Reduction Manifold Isomap Overall procedure Approximating geodesic dist. Dijkstra’s.
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Isomap Algorithm.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Dimensionality R e d u c t i o n. Another unsupervised task Clustering, etc. -- all forms of data modeling Trying to identify statistically supportable.
Principal Component Analysis
Dimensional reduction, PCA
1 Numerical geometry of non-rigid shapes Spectral Methods Tutorial. Spectral Methods Tutorial 6 © Maks Ovsjanikov tosca.cs.technion.ac.il/book Numerical.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Nonlinear Dimensionality Reduction by Locally Linear Embedding Sam T. Roweis and Lawrence K. Saul Reference: "Nonlinear dimensionality reduction by locally.
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Summarized by Soo-Jin Kim
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.
1 Recognition by Appearance Appearance-based recognition is a competing paradigm to features and alignment. No features are extracted! Images are represented.
FODAVA-Lead Education, Community Building, and Research: Dimension Reduction and Data Reduction: Foundations for Interactive Visualization Haesun Park.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
CSE 185 Introduction to Computer Vision Face Recognition.
Manifold learning: MDS and Isomap
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
Nonlinear Dimensionality Reduction Approach (ISOMAP)
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
Non-Linear Dimensionality Reduction
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Supervisor: Nakhmani Arie Semester: Winter 2007 Target Recognition Harmatz Isca.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
CSC321: Lecture 25: Non-linear dimensionality reduction Geoffrey Hinton.
3/13/2016Data Mining 1 Lecture 1-2 Data and Data Preparation Phayung Meesad, Ph.D. King Mongkut’s University of Technology North Bangkok (KMUTNB) Bangkok.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Spring 2008 Jinze.
Unsupervised Learning II Feature Extraction
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Dimensionality Reduction Part 1: Linear Methods Comp Spring 2007.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Spectral Methods for Dimensionality
Unsupervised Learning
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Background on Classification
Unsupervised Riemannian Clustering of Probability Density Functions
Dipartimento di Ingegneria «Enzo Ferrari»,
Recognition: Face Recognition
Principal Component Analysis (PCA)
Spectral Methods Tutorial 6 1 © Maks Ovsjanikov
Machine Learning Dimensionality Reduction
ISOMAP TRACKING WITH PARTICLE FILTERING
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Learning with information of features
Goodfellow: Chapter 14 Autoencoders
Dimensionality Reduction
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Intro to Machine Learning
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Feature Selection Methods
NonLinear Dimensionality Reduction or Unfolding Manifolds
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Manifold Learning JAMES MCQUEEN – UW DEPARTMENT OF STATISTICS

About Me  4 th year PhD student working with Marina Meila  Work in Machine Learning focus on Manifold Learning  Worked at Amazon on Personalization Analytics team on A/B testing  Built the online product recommendation system at startup Modesens.com  Consulting Actuary at Towers Watson  Undergrad at UW in Economics/Math

Machine Learning  Data is everywhere  Data is being recorded at an unprecedented rate  How do we find out the useful information among the noise

Supervised Learning Features Model Examples: Regression (e.g. predicting market-value of houses) Classification (e.g. click-predict) target input output evaluate

Unsupervised Learning  Examples:  Clustering (e.g. Topic Models, Hidden Markov Models)  Anomaly detection (e.g. searching for outliers)  Dimension Reduction Features Model input output ??? evaluate

Dimension Reduction  This is not lossless compression  100x100 pixel image = 10,000 features  Pre-processing tool  Data visualization Features X Features X transformation same “information” #features = dimension D

Linear Dimension Reduction  The nature of the transformation F: X -> Y  Principle Component Analysis  Rotation of the data to a new coordinate system.  Project data on to coordinates with greatest variance  PCA is a Linear Map from X to Y

Examples: PCA Projection of the Digit Data onto the first two principal components. Components are ordered by variance. The first 85 (of 784) account for 90% of total variation. MNIST 40,000 (28x28) greyscale images of handwritten digits 0-9.

Non-linear Dimension Reduction  What if the mapping: F: X -> Y was not a linear mapping?  Hard to optimize over “all non-linear functions”  What are we even trying to do?  Leads to many heuristics  When would this even work?  Requires the introduction of manifolds

What’s a Manifold anyway?  Regular surfaces: “two dimensional object living in a three dimensional world.”  We don’t have to restrict ourselves to 3 dimensions and 2 dimensions  Manifold extends regular surfaces to: “d dimensional object living in a D dimensional world.”

Why Should I Care?  AKA: does my data really lie on (or near) a manifold?  Some examples of where this might be true.  Sometimes linear dimension reduction doesn’t work

So how do we do it?  There are a lot of different heuristics:  Isomap : preserve pairwise graph distances calculated using a sparse neighbhorhood graph  Locally Linear Embedding : preserve the local regression weights that reconstruct the original data with the new data.  Local Tangent Space Alignment : Estimate the tangent space at each point and put a restriction to allow for a global alignment step  Spectral Embedding : take the eigenvectors associated with the largest eigenvalues of the Laplacian Matrix  They all result in a different embedding

Example Manifold Learning Results

Challenges of Manifold Learning  Each Method is based off a Heuristic – there’s no agreement  In almost all cases there is significant distortion even when an isometric embedding is possible  How do you measure distortion?  Potentially this is due to insufficient data  So-called “curse of dimensionality”  Many Manifold Learning algorithms either do not scale well (fundamentally in their design) or have not been implemented in a scalable fashion  Problem: you need lots of data for good estimates but methods don’t scale to large data sets.

Distortion & the Riemannian Metric  “Riemannian Metric” used to measure geometric quantities on a manifold  Creates a “push-forward” metric when we transform a data set.  If we use this metric then we achieve Isometry.

Scaling Manifold Learning  Most manifold learning algorithms are not implemented to scale.  Sci-kit learn has most of the algorithms implemented but the primary objective of sci-kit learn is usability.  Bottlenecks:  Neighborhood graphs  Eigendecomposition  Solution:  Approximate Neighbors Graph (FLANN)  Sparse Symmetric Positive Definite Methods (LOBPCG)

Megaman  Scalable Manifold Learning in Python  Harnesses Cython, FLANN, LOBPCG & pre-conditioning to scale up- manifold learning algorithms.  Smart with memory and designed with research in mind.  Github:  Website/Documentation:  Question: how many of you are familiar with the character Megaman?

Megaman Results: A sample of embedding 3M words & phrases in 300 dimensions taken using Word2Vec on Google News Sample of 675,000 observed spectra in 3750 dimensions. Colors indicate strength of Hydrogen alpha emissions (common in emission nebulae)

Opportunity for Undergraduates  Do you have a background in Python programming and/or C++?  Work with a team developing state-of-the-art manifold learning algorithms in python  Sci-kit learn extension  Send me an

Questions?