Dipartimento di Ingegneria «Enzo Ferrari»,

Slides:



Advertisements
Similar presentations
Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Advertisements

EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.
M. Belkin and P. Niyogi, Neural Computation, pp. 1373–1396, 2003.
Self-Organizing Maps Projection of p dimensional observations to a two (or one) dimensional grid space Constraint version of K-means clustering –Prototypes.
Nonlinear Dimension Reduction Presenter: Xingwei Yang The powerpoint is organized from: 1.Ronald R. Coifman et al. (Yale University) 2. Jieping Ye, (Arizona.
AGE ESTIMATION: A CLASSIFICATION PROBLEM HANDE ALEMDAR, BERNA ALTINEL, NEŞE ALYÜZ, SERHAN DANİŞ.
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
Clustering and Dimensionality Reduction Brendan and Yifang April
“Random Projections on Smooth Manifolds” -A short summary
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Principal Component Analysis
Dimensionality Reduction and Embeddings
Dimensionality Reduction
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
Dimensionality Reduction
NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
SVD(Singular Value Decomposition) and Its Applications
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Adaptive nonlinear manifolds and their applications to pattern.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
Manifold learning: MDS and Isomap
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
Nonlinear Dimensionality Reduction Approach (ISOMAP)
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
Non-Linear Dimensionality Reduction
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory.
CpSc 881: Machine Learning PCA and MDS. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Data Mining Course 2007 Eric Postma Clustering. Overview Three approaches to clustering 1.Minimization of reconstruction error PCA, nlPCA, k-means clustering.
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
CSC321: Lecture 25: Non-linear dimensionality reduction Geoffrey Hinton.
CSC321: Extra Lecture (not on the exam) Non-linear dimensionality reduction Geoffrey Hinton.
Manifold Learning JAMES MCQUEEN – UW DEPARTMENT OF STATISTICS.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Spectral Methods for Dimensionality
Nonlinear Dimensionality Reduction
Visualizing High-Dimensional Data
CSCE822 Data Mining and Warehousing
Background on Classification
INTRODUCTION TO Machine Learning 3rd Edition
LECTURE 10: DISCRIMINANT ANALYSIS
Unsupervised Riemannian Clustering of Probability Density Functions
Lecture 8:Eigenfaces and Shared Features
CS 2750: Machine Learning Dimensionality Reduction
Lecture: Face Recognition and Feature Reduction
Principal Component Analysis (PCA)
Dimensionality Reduction
Spectral Methods Tutorial 6 1 © Maks Ovsjanikov
Machine Learning Dimensionality Reduction
Principal Component Analysis
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Outline H. Murase, and S. K. Nayar, “Visual learning and recognition of 3-D objects from appearance,” International Journal of Computer Vision, vol. 14,
Feature space tansformation methods
CS4670: Intro to Computer Vision
LECTURE 09: DISCRIMINANT ANALYSIS
NonLinear Dimensionality Reduction or Unfolding Manifolds
Presentation transcript:

Pattern Recognition and Machine Learning Dimensionality Reduction Simone Calderara Dipartimento di Ingegneria «Enzo Ferrari», Università di Modena e Reggio Emilia

A solution is to use Dimensionality reduction Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc) Answering similarity queries in high-dimensions is a difficult problem due to “curse of dimensionality” A solution is to use Dimensionality reduction The main idea: reduce the dimensionality of the space. Project the d-dimensional points in a k-dimensional space so that: k << d distances are preserved as well as possible Solve the problem in low dimensions

Multi-Dimensional Scaling (MDS) Map the items in a k-dimensional space trying to minimize the stress Steepest Descent algorithm: Start with an assignment Minimize stress by moving points But the running time is O(N2) and O(N) to add a new item

Embeddings Given a metric distance matrix D, embed the objects in a k-dimensional vector space using a mapping F such that D(i,j) is close to D’(F(i),F(j)) Two types of mapping according to distances (Embedding): Isometric mapping: D’(F(i),F(j)) = D(i,j) Contractive mapping: D’(F(i),F(j)) <= D(i,j) Where d’ is some Lp measure Two types of embeddings according to warping technique Linear -> data points are projected by a liear transformation (PCA) Non linear -> data points are projected non linearly (Laplacian ISOMAP, T-sne)

PCA Algorithm PCA algorithm: 1. X  Create N x d data matrix, with one row vector xn per data point 2. X subtract mean x from each row vector xn in X 3. Σ  covariance matrix of X Find eigenvectors and eigenvalues of Σ PC’s  the M eigenvectors with largest eigenvalues

Geometric Rationale of PCA objective of PCA is to rigidly rotate the axes of this p-dimensional space to new positions (principal axes) that have the following properties: ordered such that principal axis 1 has the highest variance, axis 2 has the next highest variance, .... , and axis p has the lowest variance covariance among each pair of the principal axes is zero (the principal axes are uncorrelated). PCA principal AXIS

How many components? Check the distribution of eigen-values Take enough eigenvectors to cover 80-90% of the variance

Example Sensor networks Sensors in Intel Berkeley Lab

Pairwise link quality vs. distance Distance between a pair of sensors

PCA in action Given a 54x54 matrix of pairwise link qualities Do PCA Project down to 2 principal dimensions PCA discovered the map of the lab

Problems and limitations of PCA What if very large dimensional data? e.g., Images (d ≥ 104) Problem: Covariance matrix Σ is size (d2) d=104  |Σ| = 108 Singular Value Decomposition (SVD)! efficient algorithms available some implementations find just top N eigenvectors

Laplacian Eigenmaps

Manifold A manifold is a topological space which is locally Euclidean. In general, any object which is nearly "flat" on small scales is a manifold. Examples of 1-D manifolds include a line, a circle, and two separate circles. A manifold is an abstract mathematical space in which every point has a neighborhood which resembles Euclidean space, but in which the global structure may be more complicated

Embedding Real Rational Integer An embedding is a representation of a topological object, manifold, graph, field, etc. in a certain space in such a way that its connectivity or algebraic properties are preserved. Examples: Real Rational Integer

Manifold and Dimensionality Reduction (1) Manifold: generalized “subspace” in Rn Points in a local region on a manifold can be indexed by a subset of Rk (k<<n) Rn M R2 z X2 x x : coordinate of z X1

Manifold and Dimensionality Reduction (2) If there is a global indexing scheme for M… y: data point Rn M R2 closest z X2 x x : coordinate of z reduced dimension representation of y X1

Geodesic Distance (1) Geodesic: the shortest curve on a manifold that connects two points on the manifold Example: on a sphere, geodesics are great circles Geodesic distance: length of the geodesic small circle great circle

Geodesic Distance (2) Euclidean distance needs not be a good measure between two points on a manifold Length of geodesic is more appropriate

LLE and Laplacian Eigenmap The graph-based algorithms have 3 basic steps. 1. Find K nearest neighbors. 2. Estimate local properties of manifold by looking at neighborhoods found in Step 1. 3. Find a global embedding that preserves the properties found in Step 2.

Lapalcian of a Graph (1) A Let G(V,E) be a undirected graph without graph loops. The Un-normalized Laplacian of the graph is L=D-A 1 2 4 3 A

Laplacian Eigenmap (1) Consider that 𝑥 𝑖 ∈𝑀, and M is a manifold embedded in Rl. Find y1,.., yn in Rm such that yi represents xi(m<<l )

Laplacian Eigenmap (2) Construct the adjacency graph to approximate the manifold 1 3 2 4 3 -1 -1 3 -1 -1 L = L=D-A -1 -1 ︰ ︰

Laplacian Eigenmap (3) There are two variations for W (weight matrix) - simple-minded (1 if connected, 0 o.w.) - heat kernel (t is real) 𝐴 𝑖𝑗 = 𝑒 − 𝑥 𝑖 − 𝑥 𝑗 2 𝑡

Laplacian Eigenmap N-Dimensional case Now we consider the more general problem of embedding the graph into m- dimensional Euclidean space Let Y be such a nxm map N-dimensional dirichlet energy 𝑎𝑟𝑔 min 𝑌 𝑡𝑟𝑎𝑐𝑒 𝑌 ′ 𝐿𝑌 𝑤𝑖𝑡ℎ 𝑦 ′ 𝑦=𝐼 Solutions are the first m eigenvectors Y= 𝑦 11 𝑦 12 … 𝑦 21 𝑦 22 … ⋮ ⋮ … 𝑦 1𝑚 𝑦 2𝑚 𝑦 𝑛𝑚

Applications We can apply manifold learning to pattern recognition (face, handwriting etc) Recently, ISOMAP and Laplacian eigenmap are used to initialize the human body model. Handwritten digit visualization

Considerations PROS: Laplacian eigenmap provides a computationally efficient approach to non-linear dimensionality reduction that has locality preserving properties BUT Laplacian Eigenmap attempts to approximate or preserve neighborhood information If you need GLOBAL consistency? -> Look at the ISOMAP method

T-SNE

T - SNE Map point from High Dimensional space (x )to low dimensional space (y) preserving points distributions density distribution around single points are preserved by the objective

T - SNE Map point from High Dimensional space (x )to low dimensional space (y) preserving points distributions density distribution around single points are preserved by the objective

Stochastic neighbour embedding Similarity of datapoint is converted to probabilities Similarity in the low dimensional space y OBJECTIVE: Make the two distributions be as close as possible Minimize the Kullback Liebler Divergence

Gradient descent solution Solve the problem pointwise by taking gradinet of C w.r.t. points i=1…n Start with random yi with i=1…n with n number of points Move yi using 𝜕𝐶 𝜕 𝑦 𝑖 DO it for all points in Y using momentum Gradient proof here: L.J.P. van der Maaten and G.E. Hinton. Visualizing High-Dimensional Data Using t- SNE. Journal of Machine Learning Research 9 https://lvdmaaten.github.io/publications/papers/JMLR_2008.pdf

Gradient Interpretation Similar to N body problem

Example Netflix movies More examples can be found here https://lvdmaaten.github.io/tsne/

T-sne code and additional resources T-SNE is the most popular embedding visualization method now. It is in most of the ML packages Inside SCIKIT LEARN Code and implementation for different languages here https://lvdmaaten.github.io/tsne/ Sigma is crucial a good example on how sigma affect mapping https://distill.pub/2016/misread-tsne/ Different TSNE variants : Symmetric, BH , Random Tree based