Presentation is loading. Please wait.

Presentation is loading. Please wait.

Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Similar presentations


Presentation on theme: "Principal Component Analysis CMPUT 466/551 Nilanjan Ray."— Presentation transcript:

1 Principal Component Analysis CMPUT 466/551 Nilanjan Ray

2 Overview Principal component analysis (PCA) is a way to reduce data dimensionality PCA projects high dimensional data to a lower dimension PCA projects the data in the least square sense– it captures big (principal) variability in the data and ignores small variability

3 PCA: An Intuitive Approach Let us say we have x i, i=1…N data points in p dimensions (p is large) If we want to represent the data set by a single point x 0, then Can we justify this choice mathematically? Source: Chapter 3 of [DHS] It turns out that if you minimize J 0, you get the above solution, viz., sample mean Sample mean

4 PCA: An Intuitive Approach… Representing the data set x i, i=1…N by its mean is quite uninformative So let’s try to represent the data by a straight line of the form: This is equation of a straight line that says that it passes through m e is a unit vector along the straight line And the signed distance of a point x from m is a The training points projected on this straight line would be

5 PCA: An Intuitive Approach… Let’s now determine a i ’s Partially differentiating with respect to a i we get: Plugging in this expression for a i in J 1 we get: where is called the scatter matrix

6 So minimizing J 1 is equivalent to maximizing: PCA: An Intuitive Approach… Subject to the constraint that e is a unit vector: Use Lagrange multiplier method to form the objective function: Differentiate to obtain the equation: Solution is that e is the eigenvector of S corresponding to the largest eigenvalue

7 PCA: An Intuitive Approach… The preceding analysis can be extended in the following way. Instead of projecting the data points on to a straight line, we may now want to project them on a d-dimensional plane of the form: d is much smaller than the original dimension p In this case one can form the objective function: It can also be shown that the vectors e 1, e 2, …, e d are d eigenvectors corresponding to d largest eigen values of the scatter matrix

8 PCA: Visually Data points are represented in a rotated orthogonal coordinate system: the origin is the mean of the data points and the axes are provided by the eigenvectors.

9 Computation of PCA In practice we compute PCA via SVD (singular value decomposition) Form the centered data matrix: Compute its SVD: U and V are orthogonal matrices, D is a diagonal matrix

10 Computation of PCA… Note that the scatter matrix can be written as: So the eigenvectors of S are the columns of U and the eigenvalues are the diagonal elements of D 2 Take only a few significant eigenvalue-eigenvector pairs d << p; The new reduced dimension representation becomes:

11 Computation of PCA… Sometimes we are given only a few high dimensional data points, i.e., p >> N In such cases compute the SVD of X T : So that we get: Then, proceed as before, choose only d < N significant eigenvalues for data representation:

12 PCA: A Gaussian Viewpoint where the covariance matrix  is estimated from the scatter matrix as (1/N)S u’s and  ’s are respectively eigenvectors and eigenvalues of S. If p is large, then we need a even larger number of data points to estimate the covariance matrix. So, when a limited number of training data points is available the estimation of the covariance matrix goes quite wrong. This is known as curse of dimensionality in this context. To combat curse of dimensionality, we discard smaller eigenvalues and be content with:

13 PCA Examples Image compression example Novelty detection example

14 Kernel PCA Assumption behind PCA is that the data points x are multivariate Gaussian Often this assumption does not hold However, it may still be possible that a transformation  (x) is still Gaussian, then we can perform PCA in the space of  (x) Kernel PCA performs this PCA; however, because of “kernel trick,” it never computes the mapping  (x) explicitly!

15 KPCA: Basic Idea

16 Kernel PCA Formulation We need the following fact: Let v be a eigenvector of the scatter matrix: Then v belongs to the linear space spanned by the data points x i i=1, 2, …N. Proof:

17 Kernel PCA Formulation… Let C be the scatter matrix of the centered mapping  (x): Let w be an eigenvector of C, then w can be written as a linear combination: Also, we have: Combining, we get:

18 Kernel PCA Formulation… Kernel or Gram matrix

19 Kernel PCA Formulation… From the eigen equation And the fact that the eigenvector w is normalized to 1, we obtain:

20 KPCA Algorithm Step 1: Compute the Gram matrix: Step 2: Compute (eigenvalue, eigenvector) pairs of K: Step 3: Normalize the eigenvectors: Thus, an eigenvector w l of C is now represented as: To project a test feature  (x) onto w l we need to compute: So, we never need  explicitly

21 Feature Map Centering So far we assumed that the feature map  (x) is centered for thedata points x 1, … x N Actually, this centering can be done on the Gram matrix without ever explicitly computing the feature map  (x). Scholkopf, Smola, Muller, “Nonlinear component analysis as a kernel eigenvalue problem,” Technical report #44, Max Plank Institute, 1996. is the kernel matrix for centered features, i.e., A similar expression exist for projecting test features on the feature eigenspace

22 KPCA: USPS Digit Recognition Scholkopf, Smola, Muller, “Nonlinear component analysis as a kernel eigenvalue problem,” Technical report #44, Max Plank Institute, 1996. Kernel function: (d)(d) Classier: Linear SVM with features as kernel principal components N = 3000, p = 16-by-16 image Linear PCA


Download ppt "Principal Component Analysis CMPUT 466/551 Nilanjan Ray."

Similar presentations


Ads by Google