Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.

Slides:



Advertisements
Similar presentations
Aggregating local image descriptors into compact codes
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Dimensionality Reduction and Embeddings
CS 790Q Biometrics Face Recognition Using Dimensionality Reduction PCA and LDA M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Dimensional reduction, PCA
CONTENT BASED FACE RECOGNITION Ankur Jain 01D05007 Pranshu Sharma Prashant Baronia 01D05005 Swapnil Zarekar 01D05001 Under the guidance of Prof.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Dimensionality Reduction
CS Instance Based Learning1 Instance Based Learning.
NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Dan Simon Cleveland State University
Nonlinear Dimensionality Reduction by Locally Linear Embedding Sam T. Roweis and Lawrence K. Saul Reference: "Nonlinear dimensionality reduction by locally.
Projection methods in chemistry Autumn 2011 By: Atefe Malek.khatabi M. Daszykowski, B. Walczak, D.L. Massart* Chemometrics and Intelligent Laboratory.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Principal Components Analysis (PCA). a technique for finding patterns in data of high dimension.
CSE 185 Introduction to Computer Vision Pattern Recognition.
Radial Basis Function Networks
1 Graph Embedding (GE) & Marginal Fisher Analysis (MFA) 吳沛勳 劉冠成 韓仁智
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Adaptive nonlinear manifolds and their applications to pattern.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.
Manifold learning: MDS and Isomap
Nonlinear Dimensionality Reduction Approach (ISOMAP)
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Principal Manifolds and Probabilistic Subspaces for Visual Recognition Baback Moghaddam TPAMI, June John Galeotti Advanced Perception February 12,
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Supervisor: Nakhmani Arie Semester: Winter 2007 Target Recognition Harmatz Isca.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
Vector Quantization CAP5015 Fall 2005.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Spectral Methods for Dimensionality
Nonlinear Dimensionality Reduction
Deep Feedforward Networks
Principal Component Analysis (PCA)
Structure learning with deep autoencoders
PCA vs ICA vs LDA.
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Object Modeling with Layers
Descriptive Statistics vs. Factor Analysis
network of simple neuron-like computing elements
Foundation of Video Coding Part II: Scalar and Vector Quantization
NonLinear Dimensionality Reduction or Unfolding Manifolds
Lecture 16. Classification (II): Practical Considerations
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg

Outline About the project What is intrinsic dimensionality? How can we assess the ID? – PCA – Neural Network – Dimensionality Estimators Experimental Results

Why did we chose this Project? We wanted to learn more about developing and experiment with algorithms for analyzing high-dimensional data We want to see how we can implement this into an application

Papers N. Kambhatla, T. Leen, “Dimension Reduction by Local Principal Component Analysis” J. Bruske and G. Sommer, “Intrinsic Dimensionality Estimation with Optimally Topology Preserving Maps” P. Verveer, R. Duin, “An evaluation of intrinsic dimensionality estimators”

How does dimensionality reduction influence our lives? Compress images, audio and video Redusing noise Editing Reconstruction

This is a image going through different steps in a reconstruction

Intrinsic Dimensionality The number of ‘free’ parameters needed to generate a pattern Ex: f(x) = -x²=> 1 dimensional f(x,y) = -x² => 1 dimensional

LOCAL PRINCIPAL COMPONENT ANALYSIS

Principal Component Analysis (PCA) The classic technique for linear dimension reduction. It is a vector space transformation which reduce multidimensional data sets to lower dimensions for analysis. It is a way of identifying patterns in data, and expressing the data in such a way as to highlight their similarities and differences.

Advantages of PCA Since patterns in data can be hard to find in data of high dimension, where the luxury of graphical representation is not available, PCA is a powerful tool for analysing data. Once you have found these patterns in the data, you can compress the data, -by reducing the number of dimensions- without much loss of information.

Example

Problems with PCA Data might be uncorrelated, but PCA relies on second-order statistics (correlation), so sometimes it fails to find the most compact description of the data.

Problems with PCA

First eigenvector

Second eigenvector

A better solution?

Local eigenvector

Local eigenvectors

Another problem

Is this the principal eigenvector?

Or do we need more than one?

Choose

The answer depends on your application Low resolutionHigh resolution

Challenges How to partition the space? How many partitions should we use? How many dimensions should we retain?

How to partition the space? Vector Quantization Lloyd Algorithm Partition the space in k sets Repeat until convergence: Calculate the centroids of each set Associate each point with the nearest centroid

Lloyd Algorithm Set 1 Set 2 Step 1: randomly assign

Lloyd Algorithm Set 1 Set 2 Step 2: Calculate centriods

Lloyd Algorithm Set 1 Set 2 Step 3: Associate points with nearest centroid

Lloyd Algorithm Set 1 Set 2 Step 2: Calculate centroids

Lloyd Algorithm Set 1 Set 2 Step 3: Associate points with nearest centroid

Lloyd Algorithm Set 1 Set 2 Result after 2 iterations:

How many partitions should we use? Bruske & Sommer: “just try them all” For k = 1 to k ≤ dimension(set): 1.Subdivide the space in k regions 2.Perform PCA on each region 3.Retain significant eigenvalues per region

Which eigenvalues are significant? Depends on: Intrinsic dimensionality Curvature of the (hyper-)surface Noise

Which eigenvalues are significant? Discussed in class: Largest-n In papers: Cutoff after normalization (Bruske & Summer) Statistical method (Verveer & Duin)

Which eigenvalues are significant? (Bruske and Summer) Cutoff after normalization µ x is the xth eigenvalue With α = 5, 10 or 20.

Which eigenvalues are significant? Statistical method (Verveer & Duin) Calculate the error rate on the reconstructed data if the lowest eigenvalue is dropped Decide whether this error rate is significant

Error distance for local PCA (Kambhatla and Leen) 1)Euclidean Distance 2)Reconstruction Distance

Results One dimensional space, embedded in 256*256 = 65,536 dimensions 180 images of rotating cylinder ID = 1

Results

NEURAL NETWORK PCA

Basic Computational Element - Neuron Inputs/Outputs, Synaptic Weights, Activation Function

3-Layer Autoassociators N input, N output and M<N hidden neurons. Drawbacks for this model. The optimal solution remains the PCA projection.

5-Layer Autoassociators  Neural Network Approximators for principal surfaces using 5-layers of neurons.  Global, non-linear dimension reduction technique.  Succesfull implementation of nonlinear PCA using these networks for image and speech dimension reduction and for obtaining concise representations of color.

Third layer carries the dimension reduced representation, has width M<N Linear functions used for representation layer. The networks are trained to minimize MSE training criteria. Approximators of principal surfaces.

Locally Linear Approach to nonlinear dimension reduction (Local PCA Algorithm) Much faster than to train five-layer autoassociators and provide superior solutions. This algorithm attempts to minimize the MSE (like 5-layers autoassociators) between the original data and its reconstruction from a low-dimensional representation. (reconstruction error)

5-layer Auto-associators vs. Local PCA (VQPCA) Difficulty to train 5-layer auto-associators. Faster training in VQPCA algorithm. (VQPCA can be accelerated using tree- structured or multistage VQ) 5-layer auto-associators are prone to trapping in poor local optimal. VQPCA slower for encoding new data but much faster for decoding.

5-layer Auto-associators vs. Local PCA (VQPCA) The results of 1 st paper indicate that VQPCA is not suitable for real-time applications (e.g videoconferencing) where we need very fast encoding. For decoding only (e.g image retrieval from databases), VQPCA is a good choice - accurate and fast-.

Estimate new dimensionality 2 Algorithms proposed: Local Eigenvalue Algorithm: based on the local eigenvalues of the covariance matrix in small regions in feature space The k-Nearest Neighbor Algorithm: based on the distribution of the distances from an arbitrary data vector to a number (k) of its neighbors. Both work but not always!

Comparison of Algorithms Both algorithms work with the assumption – Vectors in dataset are uniformly distributed in a small region of N-dimensional surface. Accuracy of nearest neighbor algorithm increases for k>2. In real data sets, we have noise. Nearest neighbor algorithm has less sensitivity to noise than the local Eigenvalue estimator. The NN algorithm needs fewer vectors than LEE. NN algorithm generally underestimates the intrinsic dimensionality for sets with high dimensionality. (Usually the algorithm returns a floating number dimension ) NN algorithm has problems with borders (edge effect), LLE does not suffer from this problem. NN algorithm is much faster than LLE. For very small data sets with high dimensionality the intrinsic dimensionality can not be found.

The End