Stat240: Principal Component Analysis (PCA). Open/closed book examination data >scores=as.matrix(read.table("http://www1.mat hs.leeds.ac.uk/~charles/mva-

Slides:



Advertisements
Similar presentations
Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Advertisements

Factor Analysis and Principal Components Removing Redundancies and Finding Hidden Variables.
Dimension reduction (1)
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Principal Components Analysis with SAS Karl L. Wuensch Dept of Psychology East Carolina University.
Lecture 7: Principal component analysis (PCA)
WENDIANN SETHI SPRING 2011 SPSS ADVANCED ANALYSIS.
Overview of Factor Analysis Construct combinations of quantitative variablesConstruct combinations of quantitative variables Reduce a large set of variables.
LINEAR REGRESSION: What it Is and How it Works. Overview What is Bivariate Linear Regression? The Regression Equation How It’s Based on r Assumptions.
Factor Analysis Research Methods and Statistics. Learning Outcomes At the end of this lecture and with additional reading you will be able to Describe.
Principal component analysis (PCA)
Data mining and statistical learning, lecture 4 Outline Regression on a large number of correlated inputs  A few comments about shrinkage methods, such.
1 Carrying out EFA - stages Ensure that data are suitable Decide on the model - PAF or PCA Decide how many factors are required to represent you data When.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Goals of Factor Analysis (1) (1)to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify.
Principal component analysis (PCA)
Education 795 Class Notes Factor Analysis II Note set 7.
Principal component analysis (PCA) Purpose of PCA Covariance and correlation matrices PCA using eigenvalues PCA using singular value decompositions Selection.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Data Forensics: A Compare and Contrast Analysis of Multiple Methods Christie Plackner.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Factor Analysis Istijanto MM, MCom. Definition Factor analysis  Data reduction technique and summarization  Identifying the underlying factors/ dimensions.
Advanced Correlational Analyses D/RS 1013 Factor Analysis.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Factor Analysis Psy 524 Ainsworth. Assumptions Assumes reliable correlations Highly affected by missing data, outlying cases and truncated data Data screening.
Interpreting Principal Components Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University L i n.
Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.
Introduction to Multivariate Analysis of Variance, Factor Analysis, and Logistic Regression Rubab G. ARIM, MA University of British Columbia December 2006.
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Applied Quantitative Analysis and Practices
Education 795 Class Notes Factor Analysis Note set 6.
Chapter 13.  Both Principle components analysis (PCA) and Exploratory factor analysis (EFA) are used to understand the underlying patterns in the data.
Principle Component Analysis and its use in MA clustering Lecture 12.
Multivariate Data Analysis Chapter 3 – Factor Analysis.
Factor Analysis I Principle Components Analysis. “Data Reduction” Purpose of factor analysis is to determine a minimum number of “factors” or components.
Factor Analysis Basics. Why Factor? Combine similar variables into more meaningful factors. Reduce the number of variables dramatically while retaining.
1 Principal Components Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
Canonical Correlation Analysis (CCA). CCA This is it! The mother of all linear statistical analysis When ? We want to find a structural relation between.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Principal Component Analysis
Exploratory Factor Analysis
EXPLORATORY FACTOR ANALYSIS (EFA)
Exploring Microarray data
Dimension Reduction in Workers Compensation
Factor analysis Advanced Quantitative Research Methods
Dimension Reduction via PCA (Principal Component Analysis)
Applied Statistics Using SAS and SPSS
Measuring latent variables
Interpreting Principal Components
Analysing data from a questionnaire:
Measuring latent variables
Measuring latent variables
Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA.
Covariance Vs Correlation Matrix
Principal Components Analysis
Adequacy of Linear Regression Models
Chapter_19 Factor Analysis
Factor Analysis (Principal Components) Output
Principal Component Analysis
Applied Statistics Using SPSS
Adequacy of Linear Regression Models
Adequacy of Linear Regression Models
Factor Analysis.
Measuring latent variables
Presentation transcript:

Stat240: Principal Component Analysis (PCA)

Open/closed book examination data >scores=as.matrix(read.table(" hs.leeds.ac.uk/~charles/mva- data/openclosedbook.dat", head=T)) >colnames(scores) >pairs(scores) MC VC LO NO SO

Sample Variance-Covariance > cov.scores=cov(scores) > round(cov.scores,2) MC VC LO NO SO MC VC LO NO SO > eigen.value=eigen(cov.scores)$values > round(eigen.value,2) [1] > eigen.vec=eigen(cov.scores)$vectors > round(eigen.vec,2) [,1] [,2] [,3] [,4] [,5] [1,] [2,] [3,] [4,] [5,] variances loadings

Principal Components PC1: PC2: PC3: PC4: PC5:

Scree plot >plot(1:5, eigen.value, xlab="i", ylab="variance", main="scree plot", type="b") > round(cumsum(eigen.value)/sum(eigen.value),3) [1]

“princomp” R has a function to conduct PCA > help(princomp) > obj=princomp(scores) > plot(obj, type= " lines " ) > biplot(obj)

PCA in checking MVN assumption By examining normality of PCs, especially the first two PCs. – Histograms, q-q plots – Bivariate plots – Checking outliers

PCA in regression Data: Y nx1, X nxp PCA is useful when we want to regress Y on a large number of independent variables (X) – Reduce dimension – Handle collinearity One would like to transform X to the principal components How to choose principal components?

PCA in regression A misconception: retain those with large variances – There is a tendency that PCs with large variances can better explain the dependent variable – But PCs with small variances might also have predictive value – Should consider largest correlation

Factor Analysis (FA)

PCA vs FA Both attempt to do data reduction PCA leads to principal components FA leads to factors PCAFA X 1 X 2 X 3 X 4 PC 1 … … PC 4 X 1 X 2 X 3 X 4 F 1 F 2 F 3

FA in R The function is “factanal” Example: v1 <- c(1,1,1,1,1,1,1,1,1,1,3,3,3,3,3,4,5,6) v2 <- c(1,2,1,1,1,1,2,1,2,1,3,4,3,3,3,4,6,5) v3 <- c(3,3,3,3,3,1,1,1,1,1,1,1,1,1,1,5,4,6) v4 <- c(3,3,4,3,3,1,1,2,1,1,1,1,2,1,1,5,6,4) v5 <- c(1,1,1,1,1,3,3,3,3,3,1,1,1,1,1,6,4,5) v6 <- c(1,1,1,2,1,3,3,3,4,3,1,1,1,2,1,6,5,4) m1 <- cbind(v1,v2,v3,v4,v5,v6) obj=factanal(m1, factors=2) obj=factanal(covmat=cov(m1), factors=2) plot(obj$loadings,type="n“) text(obj$loadings,labels=c("v1", "v2", "v3", "v4", "v5","v6")) The default method is MLE The default rotation method used by “factanal” is varmax

Example: Examination Scores P=6: Gaelic, English, History, Arithmetic, Algebra, Geometry N=220 male students R=

Factor Rotation Motivation: get better insights Varimax criterion – The rotation that maximizes the total variance of squares of (scaled) loadings