Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.

Slides:



Advertisements
Similar presentations
Aggregating local image descriptors into compact codes
Advertisements

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.
NEURAL NETWORKS Perceptron
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
“Random Projections on Smooth Manifolds” -A short summary
Principal Component Analysis
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Bayesian belief networks 2. PCA and ICA
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Dimensionality Reduction
NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
Lecture 09 Clustering-based Learning
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Dan Simon Cleveland State University
Nonlinear Dimensionality Reduction by Locally Linear Embedding Sam T. Roweis and Lawrence K. Saul Reference: "Nonlinear dimensionality reduction by locally.
Radial Basis Function Networks
Projection methods in chemistry Autumn 2011 By: Atefe Malek.khatabi M. Daszykowski, B. Walczak, D.L. Massart* Chemometrics and Intelligent Laboratory.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Evaluating Performance for Data Mining Techniques
Summarized by Soo-Jin Kim
嵌入式視覺 Pattern Recognition for Embedded Vision Template matching Statistical / Structural Pattern Recognition Neural networks.
CSE 185 Introduction to Computer Vision Pattern Recognition.
Radial Basis Function Networks
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Adaptive nonlinear manifolds and their applications to pattern.
CSIE Dept., National Taiwan Univ., Taiwan
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
Nonlinear Dimensionality Reduction Approach (ISOMAP)
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
Vector Quantization CAP5015 Fall 2005.
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
Manifold Learning JAMES MCQUEEN – UW DEPARTMENT OF STATISTICS.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Deep Feedforward Networks
LECTURE 11: Advanced Discriminant Analysis
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Principal Component Analysis (PCA)
Structure learning with deep autoencoders
PCA vs ICA vs LDA.
Object Modeling with Layers
Neural Networks and Their Application in the Fields of Coporate Finance By Eric Séverin Hanna Viinikainen.
Descriptive Statistics vs. Factor Analysis
NonLinear Dimensionality Reduction or Unfolding Manifolds
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg

Outline About the project What is intrinsic dimensionality? How can we assess the ID? – PCA – Neural Network – Nearest Neighbour Experimental Results

Why did we chose this Project? We wanted to learn more about developing and experiment with algorithms for analyzing high-dimensional data We want to see how we can implement this into a program

Papers N. Kambhatla, T. Leen, “Dimension Reduction by Local Principal Component Analysis” J. Bruske and G. Sommer, “Intrinsic Dimensionality Estimation with Optimally Topology Preserving Maps” P. Verveer, R. Duin, “An evaluation of intrinsic dimensionality estimators”

How does dimensionality reduction influence our lives? Compress images, audio and video Redusing noise Editing Reconstruction

This is a image going through different steps in a reconstruction

Intrinsic Dimensionality The number of ‘free’ parameters needed to generate a pattern Ex: f(x) = -x²=> 1 dimensional f(x,y) = -x² => 1 dimensional

PRINCIPAL COMPONENT ANALYSIS

Principal Component Analysis (PCA) The classic technique for linear dimension reduction. It is a vector space transformation which reduce multidimensional data sets to lower dimensions for analysis. It is a way of identifying patterns in data, and expressing the data in such a way as to highlight their similarities and differences.

Advantages of PCA Since patterns in data can be hard to find in data of high dimension, where the luxury of graphical representation is not available, PCA is a powerful tool for analysing data. Once you have found these patterns in the data, you can compress the data, -by reducing the number of dimensions- without much loss of information.

Example

Problems with PCA Data might be uncorrelated, but PCA relies on second-order statistics (correlation), so sometimes it fails to find the most compact description of the data.

Problems with PCA

First eigenvector

Second eigenvector

A better solution?

Local eigenvector

Local eigenvectors

Another problem

Is this the principal eigenvector?

Or do we need more than one?

Choose

The answer depends on your application Low resolutionHigh resolution

Challenges How to partition the space? How many partitions should we use? How many dimensions should we retain?

How to partition the space? Vector Quantization Lloyd Algorithm Partition the space in k sets Repeat until convergence: Calculate the centroids of each set Associate each point with the nearest centroid

Lloyd Algorithm Set 1 Set 2 Step 1: randomly assign

Lloyd Algorithm Set 1 Set 2 Step 2: Calculate centriods

Lloyd Algorithm Set 1 Set 2 Step 3: Associate points with nearest centroid

Lloyd Algorithm Set 1 Set 2 Step 2: Calculate centroids

Lloyd Algorithm Set 1 Set 2 Step 3: Associate points with nearest centroid

Lloyd Algorithm Set 1 Set 2 Result after 2 iterations:

How many partitions should we use? Bruske & Sommer: “just try them all” For k = 1 to k ≤ dimension(set): Subdivide the space in k regions Perform PCA on each region Retain significant eigenvalues per region

Which eigenvalues are significant? Depends on: Intrinsic dimensionality Curvature of the surface Noise

Which eigenvalues are significant? Discussed in class: Largest-n In papers: Cutoff after normalization (Bruske & Summer) Statistical method (Verveer & Duin)

Which eigenvalues are significant? Cutoff after normalization µ x is the xth eigenvalue With α = 5, 10 or 20.

Which eigenvalues are significant? Statistical method (Verveer & Duin) Calculate the error rate on the reconstructed data if the lowest eigenvalue is dropped Decide whether this error rate is significant

Results One dimensional space, embedded in 256*256 = 65,536 dimensions 180 images of rotating cylinder ID = 1

Results

NEURAL NETWORK PCA

Basic Computational Element - Neuron Inputs/Outputs, Synaptic Weights, Activation Function

3-Layer Autoassociators N input, N output and M<N hidden neurons. Drawbacks for this model. The optimal solution remains the PCA projection.

5-Layer Autoassociators  Neural Network Approximators for principal surfaces using 5-layers of neurons.  Global, non-linear dimension reduction technique.  Succesfull implementation of nonlinear PCA using these networks for image and speech dimension reduction and for obtaining concise representations of color.

Third layer carries the dimension reduced representation, has width M<N Linear functions used for representation layer. The networks are trained to minimize MSE training criteria. Approximators of principal surfaces.

Locally Linear Approach to nonlinear dimension reduction (VQPCA Algorithm) Much faster than to train five-layer autoassociators and provide superior solutions. This algorithm attempts to minimize the MSE (like 5-layers autoassociators) between the original data and its reconstruction from a low-dimensional representation. (reconstruction error)

2 Steps in Algorithm: 1)Partition the data space by VQ (clustering). 2)Performs local PCA about each cluster center. VQPCA VQPCA is actually a local PCA to each cluster.

We can use 2 kinds of distances measures in VQPCA: 1) Euclidean Distance 2) Reconstruction Distance Example intended for a 1D local PCA:

5-layer Autoassociators vs VQPCA Difficulty to train 5-layer autoassociators. Faster training in VQPCA algorithm. (VQ can be accelerated using tree-structured or multistage VQ) 5-layer autoassociators are prone to trapping in poor local optimal. VQPCA slower for encoding new data but much faster for decoding.