Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions WK9 – Principle Component Analysis CS 476: Networks.

Slides:



Advertisements
Similar presentations
Introduction to Neural Networks Computing
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.
Machine Learning Lecture 8 Data Processing and Representation
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
2806 Neural Computation Principal Component Analysis Lecture Ari Visa.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Principal Component Analysis
Factor Analysis Purpose of Factor Analysis
Principal component analysis (PCA)
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
Self Organization: Hebbian Learning CS/CMPE 333 – Neural Networks.
Dimensional reduction, PCA
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions WK5 – Dynamic.
吳育德 陽明大學放射醫學科學研究所 台北榮總整合性腦功能研究室 Introduction To Principal Component Analysis.
Face Recognition Jeremy Wyatt.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Bayesian belief networks 2. PCA and ICA
Principal component analysis (PCA)
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Principal component analysis (PCA) Purpose of PCA Covariance and correlation matrices PCA using eigenvalues PCA using singular value decompositions Selection.
Contents Optimisation Perceptron Convergence Conclusions CS 476: Networks of Neural Computation, CSD, UOC, 2009 WK1 - Introduction CS 476: Networks of.
WK6 – Self-Organising Networks:
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Techniques for studying correlation and covariance structure
SVD(Singular Value Decomposition) and Its Applications
Summarized by Soo-Jin Kim
Principle Component Analysis (PCA) Networks (§ 5.8) PCA: a statistical procedure –Reduce dimensionality of input vectors Too many features, some of them.
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Linear Algebra Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
Deep Learning – Fall 2013 Instructor: Bhiksha Raj Paper: T. D. Sanger, “Optimal Unsupervised Learning in a Single-Layer Linear Feedforward Neural Network”,
Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
SINGULAR VALUE DECOMPOSITION (SVD)
Ming-Feng Yeh1 CHAPTER 16 AdaptiveResonanceTheory.
Vector Norms and the related Matrix Norms. Properties of a Vector Norm: Euclidean Vector Norm: Riemannian metric:
Unsupervised Learning Motivation: Given a set of training examples with no teacher or critic, why do we learn? Feature extraction Data compression Signal.
Introduction to Linear Algebra Mark Goldman Emily Mackevicius.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Review of Matrix Operations Vector: a sequence of elements (the order is important) e.g., x = (2, 1) denotes a vector length = sqrt(2*2+1*1) orientation.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Feature Extraction 主講人:虞台文.
Chapter 13 Discrete Image Transforms
Nonlinear balanced model residualization via neural networks Juergen Hahn.
Unsupervised Learning II Feature Extraction
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Unsupervised Learning II Feature Extraction
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
PREDICT 422: Practical Machine Learning
Deep Feedforward Networks
Review of Matrix Operations
LECTURE 11: Advanced Discriminant Analysis
Background on Classification
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Matrices and Vectors Review Objective
Blind Signal Separation using Principal Components Analysis
Bayesian belief networks 2. PCA and ICA
Principal Component Analysis
Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors.
Feature space tansformation methods
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions WK9 – Principle Component Analysis CS 476: Networks of Neural Computation WK9 – Principle Component Analysis Dr. Stathis Kasderidis Dept. of Computer Science University of Crete Spring Semester, 2009

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions Contents Introduction to Principal Component Analysis Generalised Hebbian Algorithm Adaptive Principal Components Extraction Kernel Principal Components Analysis Conclusions

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions PCA Principal Component Analysis The PCA method is a statistical method for Feature Selection and Dimensionality Reduction. Feature Selection is a process whereby a data space is transformed into a feature space. In principal both spaces have the same dimensionality. However, in the PCA method, the transformation is design in such way that the data set be represented by a reduced number of “effective” features and yet retain most of the intrinsic information contained in the data; in other words the data set undergoes a dimensionality reduction.

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions PCA Principal Component Analysis-1 Suppose that we have a x of dimension m and we wish to transmit it using l numbers, where l<m. If we simply truncate the vector x, we will cause a mean square error equal to the sum of the variances of the elements eliminated from x. So, we ask: Does there exist an invertible linear transformation T such that the truncation of Tx is optimum in the mean-squared sense? Clearly, the transformation T should have the property that some of its components have low variance. Principal Component Analysis maximises the rate

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions PCA Principal Component Analysis-2 of decrease of variance and is the right choice. Before we present neural network, Hebbian-based, algorithms that do this we first present the statistical analysis of the problem. Let X be an m-dimensional random vector representing the environment of interest. We assume that the vector X has zero mean: E[X]=0 Where E is the statistical expectation operator. If X has not zero mean we first subtract the mean from X before we proceed with the rest of the analysis.

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions PCA Principal Component Analysis-3 Let q denote a unit vector, also of dimension m, onto which the vector X is to be projected. This projection is defined by the inner product of the vectors X and q: A=X T q=q T X Subject to the constraint: ||q||=(q T q) ½ =1 The projection A is a random variable with a mean and variance related to the statistics of vector X. Assuming that X has zero mean we can calculate the mean value of the projection A: E[A]=q T E[X]=0

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions PCA Principal Component Analysis-4 The variance of A is therefore the same as its mean-square value and so we can write:  2 =E[A 2 ]=E[(q T X)(X T q)]=q T E[XX T ]q=q T R q The m-by-m matrix R is the correlation matrix of the random vector X, formally defined as the expectation of the outer product of the vector X with itself, as shown: R=E[XX T ] We observe that the matrix R is symmetric, which means that: R T= R

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions PCA Principal Component Analysis-5 From this property it follows that for any m-by-1 vectors a and b we have: a T Rb= b T Ra From the above we see that the variance  2 of A is a function of the unit vector q; we can then thus write:  (q)=  2= q T R q From the above we can think of  (q) as a variance probe. To minimise the variance of A we must find the vectors q which are the extremal points of  (q),

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions PCA Principal Component Analysis-6 Subject to the constraint of unit length. If q is a vector such that  (q) has an extreme value, then for any small  q of the unit vector q, we find that, to the first order in  q:  (q+  q )=  (q) Now from the definition of the variance probe we have:  (q+  q )= (q+  q) T R (q+  q)= q T Rq+2(  q) T Rq+ (  q) T R  q Where in the previous line we have made use of the symmetric property of matrix R.

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions PCA Principal Component Analysis-7 Ignoring the second-order term (  q) T R  q and invoking the definition of  (q) we may write:  (q+  q )= q T Rq+2(  q) T Rq=  (q) +2(  q) T Rq The above relation implies that: (  q) T Rq=0 Note that just any perturbation  q of q is not admissible; rather we restrict to use those for which the Euclidean norm of the perturbed vector q+  q remains equal to unity: || q+  q ||=1 Or: (q+  q) T (q+  q)=1

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions PCA Principal Component Analysis-8 Taking into account that q is already a vector of unit length, this means that: (  q) T q=0 This means that perturbation  q must be orthogonal to q and therefore only a small change in the direction of q is permitted. Combining the previous two equations we can now write: (  q) T R q- (  q) T q=0  (  q) T( R q- q)=0 Where is a scaling constant for the elements of R. We can now write:

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions PCA Principal Component Analysis-9 R q= q This means that q is an eigenvector and is an eigenvalue of R. The matrix R has real and non-negative eigenvalues (because it is symmetric). Let the eigenvalues of matrix R be denoted by i and the corresponding vectors by q i where the eigenvalues are arranged in a decreasing order: 1 > 2 > … > m so that 1 = max.

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions PCA Principal Component Analysis-10 We can then write matrix R as: Combining the previous results we can see that the variance probes are the same as the eigenvalues:  (q j )= j, for j=1,2,…,m To summarise the previous analysis we have two important results: The eigenvectors of the correlation matrix R pertaining to the zero-mean random variable X define the unit vectors q j, representing the principal directions along which the variance probes  (q j ) have their extreme values;

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions PCA Principal Component Analysis-11 The associated eigenvalues define the extremal values of the variance probes. We now we want to investigate the representation of a data vector x which is a realisation of the random vector X. With m eigenvectors q j we have m possible projection directions. The projections of x into the eigenvectors are given by:  j =q j T x= x T q j, j=1,2,…,m The numbers  j are called the principal components. To reconstruct the original vector x from the projections we combine all projections into

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions PCA Principal Component Analysis-12 a single vector:  =[  1,  2,…,  m ] T =[x T q 1, x T q 2,…, x T q m ] T =Q T x Where Q is the matrix which is constructed by the (column) eigenvectors of R. From the above we see that: x=Q  This is nothing more than a coordinate

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions PCA Principal Component Analysis-13 transformation from the input space, of vector x, to the feature space of the vector . From the perspective of the pattern recognition the usefulness of the PCA method is that it provides an effective technique for dimensionality reduction. In particular we may reduce the number of features needed for effective data representation by discarding those linear combinations in the previous formula that have small variances and retain only these terms that have large variances. Let 1, 2, …, l denote the largest l eigenvalues of R. We may then approximate the vector x by

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions PCA Principal Component Analysis-14 truncating the previous sum to the first l terms:

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions GHA Generalised Hebbian Algorithm We will present now a neural network method which solves the PCA problem. It belongs to the so-called re- estimation algorithms class of PCA methods. The network which solves the problem is shown below:

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions GHA Generalised Hebbian Algorithm -1 For the feedforward network shown we make two structural assumptions: Each neuron in the output layer of the network is linear; The network has m inputs and l outputs, both of which are specified. Moreover, the network has fewer outputs than inputs (i.e. l < m). It can be shown that under these assumptions and by using a special form of Hebbian learning the network truly learns to calculate the principal components in its output nodes. The GHA can be summarised as follows:

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions GHA Generalised Hebbian Algorithm Initialise the synaptic weights of the network, w ji, to small random values at time n=1. Assign a small positive value to the learning rate parameter  ; 2. For n=1, j=1,2,…,l and i=1,2,…,m, compute: Where x i (n) is the ith component of the m-by-1 input vector x(n) and l is the desire number of principal compenents; 3. Increment n by 1, go to step 2, and continue until the synaptic weights w ji reach their steady state

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions GHA Generalised Hebbian Algorithm -3 values. For large n, the weight w ji of neuron j converges to the ith component of the eigenvector associated with jth eigenvalue of the correlation matrix of the input vector x(n). The output neurons represent the eigenvalues of correlation matrix with decreasing order from 1 towards l.

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions APEX Adaptive Principal Components Extraction Another algorithm for extracting the principal components is the adaptive principal components extraction (APEX) algorithm. This network uses both feedforward and feedback connections. The algorithm is iterative in nature and if we are given the first (j-1) principal components the jth one can be easily computed. This algorithm belongs to the class of decorrelating algorithms. The network that implements the algorithm is shown next:

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions APEX Adaptive Principal Components Extraction-1 The network structure is defined as follows: Each neuron is assumed to be linear (in the output layer); Feedforward connections exist from the input nodes to each of the neurons 1,2,…,j, with j<m. The feedforward connections operate with a Hebbian rule. They are

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions APEX Adaptive Principal Components Extraction-2 excitatory and therefore provide amplification. These connections are represented by the w j (n) vector. Lateral connections exist from the individual outputs of neurons 1,2,…,j-1 to neuron j of the output layer, thereby applying feedback to the network. These connections are represented by the a j (n) vector. The lateral connections operate with an anti- Hebbian learning rule which has the effect of making them inhibitory. The algorithm is summarised as follows: 1. Initialise the feedforward weight vector w j and the feedback weight vector a j to small random numbers at time n=1, where j=1,2,…,m. Assign a small

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions APEX Adaptive Principal Components Extraction-3 positive value to the learning rate parameter  ; 2. Set j=1, and for n=1,2,…, compute: where x(n) is the input vector. For large n, we have w 1 (n)  q 1, where q 1 is the eigenvector asociated with the largest eigenvalue 1 of the correlation matrix of x(n); 3. Set j=2, and for n=1,2,…, compute:

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions APEX Adaptive Principal Components Extraction-4 4. Increment j by 1, go to step 3, and continue until j=m, where m is the desired number of principal components. (Note that j=1 corresponds to eigenvector associated with the largest eigenvalue, which is taken care in step 2). For large n we have w j (n)  q j and a j (n)  0, where q j is the eigenvector associated with the jth eigenvalue of the correlation matrix of x(n).

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions Kernel PCA Kernel Principal Components Analysis A last algorithm which uses kernels (more on the SVM lecture) will be given below. We simply summarise the algorithm. This algorithm can be considered as a non-linear PCA methods as we first project the input space in a feature space using a non-linear transform  (x) and then we perform a linear PCA analysis in the feature space. This is different from the previous methods in that they calculate a linear transformation between the input and the feature spaces. Summary of the kernel PCA method: 1. Given the training examples {x i } i=1, compute the

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions Kernel PCA Kernel Principal Components Analysis-1 the N-by-N kernel matrix K={K(x i, x j )}, where: K(x i, x j )=  T (x i )  (x j ) 2. Solve the eigenvalue problem: Ka= a where is an eigenvalue of the kernel matrix K and a is the associated eigenvector; 3. Normalise the eigenvectors so computed by requiring that: a k T a k= 1/ k, k=1,2,…,p where p is the smallest nonzero eigenvalue of the matrix K, assuming that the eigenvalues are arranged in decreasing order;

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions Kernel PCA Kernel Principal Components Analysis-2 4. For the extraction of the principal components of a test point x, compute the projections: where a k,j is the jth element of eigenvector a k.

Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions Typically we use PCA methods for dimension reduction as a pre-processing step before we apply other methods, for example in a pattern recognition problem. There are batch and adaptive numerical methods for the calculation of the PCA. An example for the first class is the Singular Value Decomposition (SVD) method while the GHA algorithm is for example and adaptive method. It is used mainly for finding out clusters in high- dimensional spaces, as it is difficult to visualise these clusters otherwise.