Exploratory Factor Analysis

Slides:



Advertisements
Similar presentations
Need to check (e.g., cov) and pretty-up: The LISREL software may be obtained at Other software packages include Eqs (
Advertisements

STA305 Spring 2014 This started with excerpts from STA2101f13
Structural Equation Modeling Using Mplus Chongming Yang Research Support Center FHSS College.
Statistical Techniques I EXST7005 Multiple Regression.
Confirmatory Factor Analysis
Introduction to Regression with Measurement Error STA431: Spring 2015.
Factor Analysis Continued
Exploratory Factor Analysis
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Structural Equation Models: The General Case STA431: Spring 2013.
Logistic Regression STA302 F 2014 See last slide for copyright information 1.
Structural Equation Modeling
Lecture 7: Principal component analysis (PCA)
Psychology 202b Advanced Psychological Statistics, II April 7, 2011.
Factor Analysis Ulf H. Olsson Professor of Statistics.
Principal Components An Introduction Exploratory factoring Meaning & application of “principal components” Basic steps in a PC analysis PC extraction process.
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
Factor Analysis Purpose of Factor Analysis Maximum likelihood Factor Analysis Least-squares Factor rotation techniques R commands for factor analysis References.
Factor Analysis Purpose of Factor Analysis
Introduction to Regression with Measurement Error STA431: Spring 2013.
Factor Analysis Psy 524 Ainsworth.
Principal Components An Introduction
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Factor Analysis PowerPoint Prepared by Alfred.
Double Measurement Regression STA431: Spring 2013.
SAS: The last of the great mainframe stats packages STA431 Winter/Spring 2015.
Maximum Likelihood See Davison Ch. 4 for background and a more thorough discussion. Sometimes.
Introduction to Regression with Measurement Error STA302: Fall/Winter 2013.
Factorial ANOVA More than one categorical explanatory variable STA305 Spring 2014.
Logistic Regression STA2101/442 F 2014 See last slide for copyright information.
1 Exploratory & Confirmatory Factor Analysis Alan C. Acock OSU Summer Institute, 2009.
Advanced Correlational Analyses D/RS 1013 Factor Analysis.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Factor Analysis Psy 524 Ainsworth. Assumptions Assumes reliable correlations Highly affected by missing data, outlying cases and truncated data Data screening.
SAS: The last of the great mainframe stats packages STA431 Winter/Spring 2013.
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
Advanced Statistics Factor Analysis, II. Last lecture 1. What causes what, ξ → Xs, Xs→ ξ ? 2. Do we explore the relation of Xs to ξs, or do we test (try.
Multivariate Statistics Confirmatory Factor Analysis I W. M. van der Veld University of Amsterdam.
Confirmatory Factor Analysis Part Two STA431: Spring 2013.
Categorical Independent Variables STA302 Fall 2013.
Lecture 12 Factor Analysis.
STA302: Regression Analysis. Statistics Objective: To draw reasonable conclusions from noisy numerical data Entry point: Study relationships between variables.
Education 795 Class Notes Factor Analysis Note set 6.
Chapter 13.  Both Principle components analysis (PCA) and Exploratory factor analysis (EFA) are used to understand the underlying patterns in the data.
Categorical Independent Variables STA302 Fall 2015.
Advanced Statistics Factor Analysis, I. Introduction Factor analysis is a statistical technique about the relation between: (a)observed variables (X i.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
CFA Model Revision Byrne Chapter 4 Brown Chapter 5.
STA302: Regression Analysis. Statistics Objective: To draw reasonable conclusions from noisy numerical data Entry point: Study relationships between variables.
The Sample Variation Method A way to select sample size This slide show is a free open source document. See the last slide for copyright information.
Logistic Regression For a binary response variable: 1=Yes, 0=No This slide show is a free open source document. See the last slide for copyright information.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
SAS: The last of the great mainframe stats packages
EXPLORATORY FACTOR ANALYSIS (EFA)
Question So, I’ve done my factor analysis.
Confirmatory Factor Analysis Part Two STA431: Spring 2017
Logistic Regression STA2101/442 F 2017
Poisson Regression STA 2101/442 Fall 2017
Structural Equation Models: The General Case
Interactions and Factorial ANOVA
Logistic Regression Part One
STA302: Regression Analysis
Confirmatory Factor Analysis Part Two STA431: Spring 2013
Multinomial Logit Models
Measuring latent variables
Principles of the Global Positioning System Lecture 11
Exploratory Factor Analysis. Factor Analysis: The Measurement Model D1D1 D8D8 D7D7 D6D6 D5D5 D4D4 D3D3 D2D2 F1F1 F2F2.
Factor Analysis.
Measuring latent variables
Presentation transcript:

Exploratory Factor Analysis STA431: Spring 2017 This is cut down from 2015 EFA to save time. I eliminated stuff like commonality that was preparation for understanding the output of proc factor. Put 2015 material in the text. In the text, use R’s psych package. See last slide for copyright information

Factor Analysis: The Measurement Model D observable, F latent \mathbf{D}_i = \boldsymbol{\Lambda}\mathbf{F}_i+\mathbf{e}_i % 42 F1 F2

Example with 2 factors and 8 observed variables \mathbf{D}_i = \boldsymbol{\Lambda}\mathbf{F}_i+\mathbf{e}_i % 36 \left( \begin{array}{c} D_{i,1} \\ D_{i,2} \\ D_{i,3} \\ D_{i,4} \\ D_{i,5} \\ D_{i,6} \\ D_{i,7} \\ D_{i,8} \end{array} \right) = \left( \begin{array}{c c} \lambda_{11} & \lambda_{12} \\ \lambda_{21} & \lambda_{22} \\ \lambda_{31} & \lambda_{32} \\ \lambda_{41} & \lambda_{42} \\ \lambda_{51} & \lambda_{52} \\ \lambda_{61} & \lambda_{62} \\ \lambda_{71} & \lambda_{27} \\ \lambda_{81} & \lambda_{82} \\ \end{array} \right) \left(\begin{array}{c} F_{i,1} \\ F_{i,2} \end{array}\right) + \left(\begin{array}{c} e_{i,1} \\ e_{i,2} \\ e_{i,3} \\ e_{i,4} \\ e_{i,5} \\ e_{i,6} \\ e_{i,7} \\ e_{i,8} \end{array}\right) % 24 The lambda values are called factor loadings.

Terminology The lambda values are called factor loadings. F1 and F2 are sometimes called common factors, because they influence all the observed variables. Error terms e1, …, e8 are sometimes called unique factors, because each one influences only a single observed variable. D_{i,1} &=& \lambda_{11} F_{i,1} + \lambda_{12} F_{i,2} + e_{i,1} \\ D_{i,2} &=& \lambda_{21} F_{i,1} + \lambda_{22} F_{i,2} + e_{i,2} \mbox{ ~etc.} % 24

Factor Analysis can be Exploratory: The goal is to describe and summarize the data by explaining a large number of observed variables in terms of a smaller number of latent variables (factors). The factors are the reason the observable variables have the correlations they do. Confirmatory: Statistical estimation and testing as usual.

Unconstrained (Exploratory) Factor Analysis Of course can have lots of factors: 16 pf HW: Does this pass the test of the parameter count rule? Arrows from all factors to all observed variables. Massively non-identifiable. Reasonable, been going on for around 70-100 years, and completely DOOMED TO FAILURE as a method of statistical estimation.

Calculate the covariance matrix \begin{eqnarray*} \mathbf{D}_i &=& \boldsymbol{\Lambda}\mathbf{F}_i+\mathbf{e}_i \\ cov(\mathbf{F}_i) &=& \boldsymbol{\Phi} \\ cov(\mathbf{e}_i) &=& \boldsymbol{\Omega} % 36 \end{eqnarray*} $\mathbf{F}_i$ and $\mathbf{e}_i$ independent (multivariate normal) % 36 cov(\mathbf{D}_i) = \boldsymbol{\Sigma} = \boldsymbol{\Lambda\Phi\Lambda}^\top + \boldsymbol{\Omega} % 36

A Re-parameterization \boldsymbol{\Sigma} = \boldsymbol{\Lambda\Phi\Lambda}^\top + \boldsymbol{\Omega} % 36 Square root matrix: $\boldsymbol{\Phi} = \mathbf{SS} = \mathbf{SS}^\top$, so % 36 \boldsymbol{\Lambda\Phi\Lambda}^\top&=& \boldsymbol{\Lambda}\mathbf{SS}^\top\boldsymbol{\Lambda}^\top \\ &=& (\boldsymbol{\Lambda}\mathbf{S}) \mathbf{I} (\mathbf{S}^\top\boldsymbol{\Lambda}^\top) \\ &=& (\boldsymbol{\Lambda}\mathbf{S}) \mathbf{I} (\boldsymbol{\Lambda}\mathbf{S})^\top \\ &=& \boldsymbol{\Lambda}_2 \mathbf{I} \boldsymbol{\Lambda}_2^\top % 36

Parameters are not identifiable Two distinct (Lambda, Phi, Omega) sets give the same Sigma, and hence the same distribution of the data (under normality). Actually, there are infinitely many. Let Q be an arbitrary covariance matrix for F. \boldsymbol{\Sigma} = \boldsymbol{\Lambda\Phi\Lambda}^\top + \boldsymbol{\Omega} % 36 \boldsymbol{\Lambda\Phi\Lambda}^\top = \boldsymbol{\Lambda}_2 \mathbf{I} \boldsymbol{\Lambda}_2^\top % 36 \boldsymbol{\Lambda}_2 \mathbf{I} \boldsymbol{\Lambda}_2^\top &=& \boldsymbol{\Lambda}_2 \mathbf{Q}^{-\frac{1}{2}} \mathbf{Q} \mathbf{Q}^{-\frac{1}{2}} \boldsymbol{\Lambda}_2^\top \\ &=& (\boldsymbol{\Lambda}_2 \mathbf{Q}^{-\frac{1}{2}}) \mathbf{Q} (\mathbf{Q}^{-\frac{1}{2}\top} \boldsymbol{\Lambda}_2^\top) \\ &=& (\boldsymbol{\Lambda}_2 \mathbf{Q}^{-\frac{1}{2}}) \mathbf{Q} (\boldsymbol{\Lambda}_2 \mathbf{Q}^{-\frac{1}{2}})^\top \\ &=& \boldsymbol{\Lambda}_3 \mathbf{Q} \boldsymbol{\Lambda}_3^\top % 30

Restrict the model Set Phi = the identity, so cov(F) = I All the factors are standardized, as well as independent. Justify this on the grounds of simplicity. Say the factors are “orthogonal” (at right angles, uncorrelated). \boldsymbol{\Lambda\Phi\Lambda}^\top = \boldsymbol{\Lambda}_2 \mathbf{I} \boldsymbol{\Lambda}_2^\top % 36

Another Source of non-identifiability R is an orthoganal (rotation) matrix \boldsymbol{\Sigma} &=& \boldsymbol{\Lambda}\boldsymbol{\Lambda}^\top + \boldsymbol{\Omega} \\ &=& \boldsymbol{\Lambda}\mathbf{RR}^\top\boldsymbol{\Lambda}^\top + \boldsymbol{\Omega} \\ &=& (\boldsymbol{\Lambda}\mathbf{R}) (\mathbf{R}^\top\boldsymbol{\Lambda}^\top) + \boldsymbol{\Omega} \\ &=& (\boldsymbol{\Lambda}\mathbf{R}) (\boldsymbol{\Lambda}\mathbf{R})^\top + \boldsymbol{\Omega} \\ &=& \boldsymbol{\Lambda}_2\boldsymbol{\Lambda}_2^\top + \boldsymbol{\Omega} % 36 Infinitely many rotation matrices produce the same Sigma.

A Solution Place some restrictions on the factor loadings, so that the only rotation matrix that preserves the restrictions is the identity matrix. For example, λij = 0 for j>i There are other sets of restrictions that work. Generally, they result in a set of factor loadings that are impossible to interpret. Don’t worry about it. Estimate the loadings by maximum likelihood. Other methods are possible but used much less than in the past. All (orthoganal) rotations result in the same value of the likelihood function (the maximum is not unique). Rotate the factors (that is, post-multiply the estimated loadings by a rotation matrix) so as to achieve a simple pattern that is easy to interpret. The result is often satisfying, but has no necessary connection to reality.

Consulting advice When a non-statistician claims to have done a “factor analysis,” ask what kind. Usually it was a principal components analysis. Principal components are linear combinations of the observed variables. They come from the observed variables by direct calculation. In true factor analysis, it’s the observed variables that arise from the factors. So principal components analysis is kind of like backwards factor analysis, though the spirit is similar. Most factor analysis software (SAS, SPSS, etc.) does principal components analysis by default.

Copyright Information This slide show was prepared by Jerry Brunner, Department of Statistics, University of Toronto. It is licensed under a Creative Commons Attribution - ShareAlike 3.0 Unported License. Use any part of it as you like and share the result freely. These Powerpoint slides are available from the course website: http://www.utstat.toronto.edu/~brunner/oldclass/431s17