Presentation is loading. Please wait.

Presentation is loading. Please wait.

Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.

Similar presentations


Presentation on theme: "Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015."— Presentation transcript:

1 Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015

2 What is PCA? An orthogonal transformation Convert correlated variables to an artificial variable(Principle Component) The resulting vectors are an orthogonal basis set A tool in exploratory data analysisexploratory data analysis https://en.wikipedia.org/wiki/Principal_component_analysis

3 Why use PCA? Reduce the dimensionality of the data Compress the data Prepare the data for further analysis using other techniques Understand your data better by interpreting the loadings, and by graphing the derived variables http://psych.colorado.edu/wiki/lib/exe/fetch.php?media=labs:learnr:emily_-_principal_components_analysis_in_r:pca_how_to.pdf Dr. Peter Westfall

4 How PCA works 1.PCA begin with covariance matrix: Cov(X)=X T X 2.For the covariance matrix, calculate its eigenvectors and eigenvalues. 3.Get sets of eigenvectors z i and eigenvaluesλ i (Constraint: z i T z i =1) 4.arrange the eigenvectors in decreasing order of the eigenvalues 5.Pick eigenvectors, multiple by original data matrix(X), we will get PC matrix. https://www.riskprep.com/all-tutorials/36-exam-22/132-understanding-principal-component-analysis-pca

5 Example of how PCA works (by R) A financial sample data with 8 variables and 25obs Perform PCA on this data and reduce the number of variables from 8 to something more manageable https://www.riskprep.com/all-tutorials/36-exam-22/132-understanding-principal-component-analysis-pca

6 Simulate PC on uncorrelated data and highly correlated data (by R) PCA is better for more highly correlated data in that greater reduction is achievable. Provided by Dr. Peter Westfall

7 PCA standardization Why: The variable with the smaller numbers – even though this may be the more important number – will be overwhelmed by the other larger numbers in what it contributes to the covariance https://www.riskprep.com/all-tutorials/36-exam-22/132-understanding-principal-component-analysis-pca

8 properties of PC The number of principal components is less than or equal to the number of original variables. The first principal component has the largest possible variance. Each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components. https://en.wikipedia.org/wiki/Principal_component_analysis

9 What is SVD? Applied_Regression_Analysis_A_Research_Tool.pdf

10 Relationship between SVD and PCA From SVD we have X = UL 1/2 Z T -> W = XZ = UL 1/2 If X is an n × p matrix of observations on p variables, each column of W is a new variable defined as a linear transformation of the original variables. Applied_Regression_Analysis_A_Research_Tool.pdf

11 EFA vs PCA EFA: EFA provides a model to explain why the data looks like it does. PCA: PC is not a model that explains how the data looks. There is no model at all. Provided by Dr. Peter Westfall

12 EFA vs PCA http://www.gac-usp.com.br/resources/use_of_exploratory_factor_analysis_park_dailey.pdf

13 EFA vs PCA EFA: in EFA one postulates that there is a smaller set of unobserved (latent) variables or constructs underlying the variables actually observed or measured (this is commonly done to assess validity) PCA: in PCA one is simply trying to mathematically derive a relatively small number of variables to use to convey as much of the information in the observed/measured variables as possible http://www.gac-usp.com.br/resources/use_of_exploratory_factor_analysis_park_dailey.pdf

14 Application of PCA Data visualization Image compression

15 Data visualization If a multivariate dataset is visualized as a set of coordinates in a high-dimensional data space (1 axis per variable), PCA can supply the user with a lower-dimensional picture.dimensional https://en.wikipedia.org/wiki/Principal_component_analysis

16 PCA using on compressing image The PCA formulation may be used as a digital image compression algorithm with a low level of loss. http://www.scielo.br/scielo.php?script=sci_arttext&pid=S1679-45082012000200004

17 princomp vs prcomp For prcomp: The calculation is done by a singular value decomposition of the (centered and possibly scaled) data matrix, not by using eigen on the covariance matrix. This is generally the preferred method for numerical accuracy. For princomp: The calculation is done using eigen on the correlation or covariance matrix, as determined by cor. This is done for compatibility with the S- PLUS result. A preferred method of calculation is to use svd on x, as is done in prcomp." http://stats.stackexchange.com/questions/20101/what-is-the-difference-between-r-functions-prcomp-and-princomp

18 Thanks!


Download ppt "Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015."

Similar presentations


Ads by Google