Presentation is loading. Please wait.

Presentation is loading. Please wait.

1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.

Similar presentations


Presentation on theme: "1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification."— Presentation transcript:

1 1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification of data. The purpose is to reduce the dimensionality of a data set (sample) by finding a new set of variables, smaller than the original set, that nonetheless retains most of the sample's information. 6.Principal Component Analysis

2 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis By information we mean the variation present in the sample, given by the correlations between the original variables. The new variables, called principal components (PCs), are uncorrelated, and are ordered by the fraction of the total information each retains.

3 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis Principal component direction of maximum variance in the input space principal eigenvector of the covariance matrix Goal: Relate these two definitions

4 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis Variance A random variable fluctuating about its mean value Average of the square of the fluctuations

5 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis Covariance Pair of random variables, each fluctuating about its mean value. Average of product of fluctuations

6 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis

7 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis Covariance matrix N random variables NxN symmetric matrix Diagonal elements are variances

8 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis eigenvectors with k largest eigenvalues Now you can calculate them, but what do they mean? Principal components

9 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis Covariance to variance From the covariance, the variance of any projection can be calculated. w : unit vector

10 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis Maximizing parallel variance Principal eigenvector of C (the one with the largest eigenvalue)

11 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis Geometric picture of principal components the 1st PC z 1 is a minimum distance fit to a line in X space the 2nd PC z 2 is a minimum distance fit to a line in the plane perpendicular to the 1st PC

12 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis Then…

13 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis

14 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis

15 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis

16 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis

17 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis

18 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis

19 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis

20 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis

21 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis

22 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis

23 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis

24 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis

25 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis

26 1er. Escuela Red ProTIC - Tandil, de Abril, Principal Component Analysis

27 1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Algebraic definition of PCs Given a sample of n observations on a vector of p variables x = (x 1,x 2,….,x p ) define the first principal component of the sample by the linear transformation λ where the vector is chosen such thatis maximum 6.Principal Component Analysis

28 1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Likewise, define the k th PC of the sample by the linear transformation where the vector is chosen such that is maximum subject to 6.Principal Component Analysis

29 1er. Escuela Red ProTIC - Tandil, de Abril, 2006 To find first note that is the covariance matrix for the variables 6.Principal Component Analysis

30 1er. Escuela Red ProTIC - Tandil, de Abril, 2006 To find maximize subject to Let λ be a Lagrange multiplier by differentiating… then maximize is an eigenvector of corresponding to eigenvalue therefore 6.Principal Component Analysis

31 1er. Escuela Red ProTIC - Tandil, de Abril, 2006 We have maximized Algebraic derivation of So is the largest eigenvalue of The first PC retains the greatest amount of variation in the sample. 6.Principal Component Analysis

32 1er. Escuela Red ProTIC - Tandil, de Abril, 2006 To find vector maximize then let λ and φ be Lagrange multipliers, and maximize subject to First note that 6.Principal Component Analysis

33 1er. Escuela Red ProTIC - Tandil, de Abril, 2006 We find that is also an eigenvector of whose eigenvalue is the second largest. In general The k th largest eigenvalue of is the variance of the k th PC. The k th PC retains the k th greatest fraction of the variation in the sample. 6.Principal Component Analysis

34 1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Algebraic formulation of PCA Given a sample of n observations define a vector of p PCs according to on a vector of p variables whose k th column is the k th eigenvector of where is an orthogonal p x p matrix 6.Principal Component Analysis

35 1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Then is the covariance matrix of the PCs, being diagonal with elements 6.Principal Component Analysis

36 1er. Escuela Red ProTIC - Tandil, de Abril, 2006 In general it is useful to define standardized variables by If the are each measured about their sample mean then the covariance matrix of will be equal to the correlation matrix of and the PCs will be dimensionless 6.Principal Component Analysis

37 1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Practical computation of PCs Given a sample of n observations on a vector of p variables (each measured about its sample mean) compute the covariance matrix where is the n x p matrix whose i th row is the i th observation 6.Principal Component Analysis

38 1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Then, compute the n x p matrix whose i th row is the PC score for the i th observation: Write to decompose each observation into PCs: 6.Principal Component Analysis

39 1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Usage of PCA: Data compression Because the k th PC retains the k th greatest fraction of the variation, we can approximate each observation by truncating the sum at the first m < p PCs: 6.Principal Component Analysis

40 1er. Escuela Red ProTIC - Tandil, de Abril, 2006 This reduces the dimensionality of the data from p to m < p by approximating where is the n x m portion of and is the p x m portion of 6.Principal Component Analysis


Download ppt "1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification."

Similar presentations


Ads by Google