Presentation is loading. Please wait.

Presentation is loading. Please wait.

Principal Component Analysis (PCA)

Similar presentations


Presentation on theme: "Principal Component Analysis (PCA)"— Presentation transcript:

1 Principal Component Analysis (PCA)
Presented by Aycan YALÇIN

2 Outline of the Presentation
Introduction Objectives of PCA Terminology Algorithm Applications Conclusion

3 Introduction

4 Introduction Problem:
Analysis of multivariate data plays a key role in data analysis Multidimensional hyperspace is often difficult to visualize Represent data in a manner that facilitates the analysis

5 Introduction (cont’d)
Objectives of unsupervised learning methods: Reduce dimensionality Score all observations Cluster similar observations together Well-known linear transformation methods: PCA, Factor Analysis, Projection Pursuit,etc.

6 Introduction (cont’d)
Benefits of dimensionality reduction: The computational overhead of the subsequent processing stages is reduced Noise may be reduced A projection into a subspace of a very low dimension is useful for visualizing the data

7 Objectives of PCA

8 Objectives of PCA Principal Component Analysis is a technique used to:
Reduce the dimensionality of the data set Identify new meaningful underlying variables Loose minimum information by finding the directions in which a cloud of data points is stretched most.

9 Objectives of PCA (cont’d)
PCA or Karhunen- Loeve transform summarizes the variation in a (possibly) correlated multi-attribute to a set of (a smaller number of) uncorrelated components (principal components). These uncorrelated variables are linear combinations of original variables. the objective of PCA is to reduce the dimensionality by extracting the smallest number components that account for most of the variation in the original multivariate data and to summarize the data with little loss of information.

10 Terminology

11 Terminology Variance Covariance Eigenvectors & Eigenvalues
Principal Components

12 Terminology (Variance)
Standard deviation: Average distance from mean to a point Variance: Standard deviation squared One-dimensional measure

13 Terminology (Covariance)
How two dimensions vary from the mean with respect to each other cov(X,Y) > 0: Dimensions increase together cov(X,Y) < 0: One increases, one decreases cov(X,Y) = 0: Dimensions are independent

14 Terminology (Covariance Matrix)
Contains covariance values between all possible dimensions: Example for three dimensions (x,y,z) (Always symetric): cov(x,x)  variance of component x

15 Terminology (Eigenvalues & Eigenvectors)
Eigenvalues measure the amount of the variation explained by each PC (largest for the first PC and smaller for the subsequent PCs) > 1 indicates that PCs account for more variance than accounted by one of the original variables in standardized data This is commonly used as a cutoff point for which PCs are retained. Eigenvectors provides the weights to compute the uncorrelated PC. These vectors give the directions in which the data cloud is stretched most

16 Terminology (Eigenvalues & Eigenvectors)
Vectors x having same direction as Ax are called eigenvectors of A (A is an n by n matrix). In the equation Ax=x,  is called an eigenvalue of A. Ax=x  (A-I)x=0 How to calculate x and : Calculate det(A-I), yields a polynomial (degree n) Determine roots to det(A-I)=0, roots are eigenvalues  Solve (A- I) x=0 for each  to obtain eigenvectors x

17 Terminology (Principal Component)
The extracted uncorrelated components are called principal components(PC) Estimated from the eigenvectors of the covariance or correlation matrix of the original variables. The projections of the data on the eigenvectors Extracted by linear transformations of the original variables so that the first few PC’s contain most of the variations in the original dataset.

18 Algorithm

19 transform from 2 to 1 dimension
Algorithm We look for axes which minimise projection errors and maximise the variance after projection n-dimensional vectors m-dimensional m < n Ex: transform from 2 to 1 dimension

20 more information (variance)
Algorithm (cont’d) Preserve as much of the variance as possible more information (variance) rotate less information project

21 Algorithm (cont’d) Data is a matrix such as
Rows  Observations(values) Columns  Attributes (dimensions) First center data by subtracting the mean in each dimension i is observation, j is dimension and m is total number of observation Calculate covariance matrix for DataAdjust

22 Algorithm (cont’d) Calculate eigenvalues  and eigenvectors x for covariance matrix: Eigenvalues j are used for calculation of [% of total variance] (Vj) for each component j

23 Algorithm (cont’d) Choose components – form feature vector
Eigenvalues  and eigenvectors x are sorted in descending order Component with highest  is principal component Featurevector=(x1, ... , xn) where xi is a column oriented eigenvector. Contains chosen components. Derive new dataset Transpose Featurevector and DataAdjust Finaldata=RowFeatureVector x RowDataAdjust Original data in terms of chosen components Finaldata has eigenvectors as coordinate axes

24 Algorithm (cont’d) Retrieving old data (e.g. in data compression)
RetrievedRowData = (RowFeatureVectorT x FinalData)+OriginalMean Yields original data using the chosen components

25 Algorithm (cont’d) Estimating the Number of PC
Scree Test: Plotting the eigenvalues against the corresponding PC produces a scree plot that illustrates the rate of change in the magnitude of the eigenvalues for the PC. The rate of decline tends to be fast first then levels off. The ‘elbow’, or the point at which the curve bends, is considered to indicate the maximum number of PC to extract. One less PC than the number at the elbow might be appropriate if you are concerned about getting an overly defined solution.

26 Applications

27 Applications Example applications: Computer Vision
Representation Pattern Identification Image compression Face recognition Gene expression analysis Purpose: Determine core set of conditions for useful gene comparison Handwritten character recognition Data Compression, etc.

28 Conclusion

29 Conclusion PCA can be useful when there is a severe high-degree of correlation present in the multi-attributes When a data set consists of several clusters, the principal axes found by PCA usually pick projections with good separation. PCA provides an effective basis for feature extraction in this case. For good data compression, PCA offers a useful self-organized learning procedure

30 Conclusion (cont’d) Shortcomings of PCA:
PCA requires to diagonalise matrix C (dimension:n x n). Heavy if n is large ! PCA only finds linear sub-spaces It works best if the individual components are Gaussian-distributed(e.g ICA does not rely on such a distribution) PCA does not say how many target dimensions to use

31 Questions?


Download ppt "Principal Component Analysis (PCA)"

Similar presentations


Ads by Google