Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh

Similar presentations


Presentation on theme: "1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh"— Presentation transcript:

1 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com

2 Contents Basics of PCA Application of PCA in Face Recognition Some Terms in PCA Motivation for KPCA Basics of KPCA Applications of KPCA

3 High-dimensional Data Gene expression Face imagesHandwritten digits

4 Why Feature Reduction? Most machine learning and data mining techniques may not be effective for high-dimensional data –Curse of Dimensionality –Query accuracy and efficiency degrade rapidly as the dimension increases. The intrinsic dimension may be small. –For example, the number of genes responsible for a certain type of disease may be small.

5 Why Reduce Dimensionality? 1.Reduces time complexity: Less computation 2.Reduces space complexity: Less parameters 3.Saves the cost of observing the feature 4.Simpler models are more robust on small datasets 5.More interpretable; simpler explanation 6.Data visualization (structure, groups, outliers, etc) if plotted in 2 or 3 dimensions

6 Feature reduction algorithms Unsupervised –Latent Semantic Indexing (LSI): truncated SVD –Independent Component Analysis (ICA) –Principal Component Analysis (PCA) –Canonical Correlation Analysis (CCA) Supervised –Linear Discriminant Analysis (LDA) Semi-supervised –Research topic

7 Algebraic derivation of PCs Main steps for computing PCs –Form the covariance matrix S. –Compute its eigenvectors: –Use the first d eigenvectors to form the d PCs. –The transformation G is given by

8 Optimality property of PCA Dimension reduction Reconstruction Original data

9 Optimality property of PCA The matrix G consisting of the first d eigenvectors of the covariance matrix S solves the following min problem: Main theoretical result: reconstruction error PCA projection minimizes the reconstruction error among all linear projections of size d.

10 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their dimensionality. Project high dimensional data onto a lower dimensional sub-space using linear or non-linear transformations.

11 Dimensionality Reduction Linear transformations are simple to compute and tractable. Classical –linear- approaches: – Principal Component Analysis (PCA) – Fisher Discriminant Analysis (FDA) –Singular Value Decomosition (SVD) --Factor Analysis (FA) --Canonical Correlation(CCA) k x 1 k x d d x 1 (k<<d)

12 Principal Component Analysis (PCA) Each dimensionality reduction technique finds an appropriate transformation by satisfying certain criteria (e.g., information loss, data discrimination, etc.) The goal of PCA is to reduce the dimensionality of the data while retaining as much as possible of the variation present in the dataset.

13 Principal Component Analysis (PCA) Find a basis in a low dimensional sub-space: –Approximate vectors by projecting them in a low dimensional sub-space: (1) Original space representation: (2) Lower-dimensional sub-space representation: Note: if K=N, then

14 Principal Component Analysis (PCA) Example (K=N):

15 Principal Component Analysis (PCA) Methodology –Suppose x 1, x 2,..., x M are N x 1 vectors

16 Principal Component Analysis (PCA) Methodology – cont.

17 Principal Component Analysis (PCA) Linear transformation implied by PCA –The linear transformation R N  R K that performs the dimensionality reduction is:

18 Principal Component Analysis (PCA) How many principal components to keep? –To choose K, you can use the following criterion: Unfortunately for some data sets to meet this requirement we need K almost equal to N. That is, no effective data reduction is possible.

19 Principal Component Analysis (PCA) Eigenvalue spectrum λiλi K λNλN Scree plot

20 Principal Component Analysis (PCA) Standardization –The principal components are dependent on the units used to measure the original variables as well as on the range of values they assume. –We should always standardize the data prior to using PCA. –A common standardization method is to transform all the data to have zero mean and unit standard deviation:

21 CS 479/679 Pattern Recognition – Spring 2006 Dimensionality Reduction Using PCA/LDA Chapter 3 (Duda et al.) – Section 3.8 Case Studies: Face Recognition Using Dimensionality Reduction M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neuroscience, 3(1), pp. 71-86, 1991. D. Swets, J. Weng, "Using Discriminant Eigenfeatures for Image Retrieval", IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8), pp. 831-836, 1996. A. Martinez, A. Kak, "PCA versus LDA", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228-233, 2001.

22 Principal Component Analysis (PCA) Face Recognition –The simplest approach is to think of it as a template matching problem –Problems arise when performing recognition in a high-dimensional space. –Significant improvements can be achieved by first mapping the data into a lower dimensionality space. –How to find this lower-dimensional space?

23 Principal Component Analysis (PCA) Main idea behind eigenfaces average face

24 Principal Component Analysis (PCA) Computation of the eigenfaces

25 Principal Component Analysis (PCA) Computation of the eigenfaces – cont.

26 Principal Component Analysis (PCA) Computation of the eigenfaces – cont. uiui Mind that this is normalized..

27 Principal Component Analysis (PCA) Computation of the eigenfaces – cont.

28 Principal Component Analysis (PCA) Representing faces onto this basis

29 Principal Component Analysis (PCA) Representing faces onto this basis – cont.

30 Principal Component Analysis (PCA) Face Recognition Using Eigenfaces

31 Principal Component Analysis (PCA) Face Recognition Using Eigenfaces – cont. –The distance e r is called distance within the face space (difs) –Comment: we can use the common Euclidean distance to compute e r, however, it has been reported that the Mahalanobis distance performs better:

32 Principal Component Analysis (PCA) Face Detection Using Eigenfaces

33 Principal Component Analysis (PCA) Face Detection Using Eigenfaces – cont.

34 Principal Components Analysis So, principal components are given by: b 1 = u 11 x 1 + u 12 x 2 +... + u 1N x N b 2 = u 21 x 1 + u 22 x 2 +... + u 2N x N... b N = a N1 x 1 + a N2 x 2 +... + a NN x N x j ’s are standardized if correlation matrix is used (mean 0.0, SD 1.0) Score of ith unit on jth principal component b i,j = u j1 x i1 + u j2 x i2 +... + u jN x iN

35 PCA Scores x i2 x i1 b i,1 b i,2

36 Principal Components Analysis Amount of variance accounted for by: 1st principal component, λ 1, 1st eigenvalue 2nd principal component, λ 2, 2ndeigenvalue... λ 1 > λ 2 > λ 3 > λ 4 >... Average λ j = 1 (correlation matrix)

37 Principal Components Analysis: Eigenvalues λ1λ1 λ2λ2 U1U1

38 PCA: Terminology jth principal component is jth eigenvector of correlation/covariance matrix coefficients, u jk, are elements of eigenvectors and relate original variables (standardized if using correlation matrix) to components scores are values of units on components (produced using coefficients) amount of variance accounted for by component is given by eigenvalue, λ j proportion of variance accounted for by component is given by λ j / Σ λ j loading of kth original variable on jth component is given by u jk √λ j --correlation between variable and component

39 Principal Components Analysis Covariance Matrix: –Variables must be in same units –Emphasizes variables with most variance –Mean eigenvalue ≠1.0 –Useful in morphometrics, a few other cases Correlation Matrix: –Variables are standardized (mean 0.0, SD 1.0) –Variables can be in different units –All variables have same impact on analysis –Mean eigenvalue = 1.0

40 PCA: Potential Problems Lack of Independence –NO PROBLEM Lack of Normality –Normality desirable but not essential Lack of Precision –Precision desirable but not essential Many Zeroes in Data Matrix –Problem (use Correspondence Analysis)

41 Principal Component Analysis (PCA) PCA and classification (cont’d)

42 Motivation

43 ??????? Motivation

44 Linear projections will not detect the pattern.

45 Limitations of linear PCA 1,2,3 =1/3

46 Nonlinear PCA Three popular methods are available: 1)Neural-network based PCA (E. Oja, 1982) 2)Method of Principal Curves (T.J. Hastie and W. Stuetzle, 1989) 3) Kernel based PCA (B. Schölkopf, A. Smola, and K. Müller, 1998)

47 PCA NPCA

48 Kernel PCA: The main idea

49 A Useful Theorem for Hilbert space Let H be a Hilbert space and x 1, ……x n in H. Let M=span{x 1, ……x n }. Also u and v in M. =, i=1,……,n implies u=v Proof. Try your self.

50 Kernel methods in PCA Linear PCA where C is covariance matrix for centered data X: (1) and (2) are equivalent conditions. (2)

51 Kernel methods in PCA Now let us suppose: In Kernel PCA, we do the PCA in feature space. remember about centering! (*) Possibly F is a very high dimension space.

52 Kernel Methods in PCA Again all solutions with lie in the space generated by It has two useful consequences: 1} 2) We may instead solve the set of equations

53 Defining an lxl kernel matrix K: Kernel Methods in PCA And using the result (1) in ( 2) we get But we need not solve (3). It can be shown easily that the following simpler system gives us solutions that are interesting to us.

54 Compute eigenvalue problem for the kernel matrix The solutions ( k,  k ) further need to be normalized by imposing I f x is our new observation, the feature value (??) will be and kth principal score will be Kernel Methods in PCA

55 Data centering: Hence, the kernel for the transformed space is Kernel Methods in PCA

56 Expressed as an operation on the kernel matrix this can be rewritten as j where j is the all 1s vector. Kernel Methods in PCA

57 Linear PCA Kernel PCA captures the nonlinear structure of the data

58 Linear PCA Kernel PCA captures the nonlinear structure of the data

59 Algorithm Input: Data X={x 1, x 2, …, x l } in n-dimensional space. Process: K i,j = k(x i,x j ); i,j=1,…, l. Output: Transformed data … for centered data Kernel matrix... k-dimensional vector projection of new data into this subspace

60 Reference I.T. Jolliffe. (2002)Principal Component Analysis.. Schölkopf, et al. (1998 Kernel Principal Component Analysis)/ B.. Schölkopf and A.J. Smola(2000/20012002) Learning with Kernels Christopher J C Burges (2005).Geometric Methods for Feature Extraction and Dimensional Reduction.


Download ppt "1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh"

Similar presentations


Ads by Google