Principal Components & Common Factoring An Introduction

Slides:



Advertisements
Similar presentations
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 12, 2012.
Advertisements

Canonical Correlation simple correlation -- y 1 = x 1 multiple correlation -- y 1 = x 1 x 2 x 3 canonical correlation -- y 1 y 2 y 3 = x 1 x 2 x 3 The.
Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.
Cluster Analysis Purpose and process of clustering Profile analysis Selection of variables and sample Determining the # of clusters.
Factor Analysis Continued
Exploratory Factor Analysis
Factor Rotation & Factor Scores: Interpreting & Using Factors Well- & Ill-defined Factors Simple Structure Simple Structure & Factor Rotation Major Kinds.
Chapter Nineteen Factor Analysis.
Lecture 7: Principal component analysis (PCA)
Psychology 202b Advanced Psychological Statistics, II April 7, 2011.
MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.
VALIDITY.
Principal Components An Introduction Exploratory factoring Meaning & application of “principal components” Basic steps in a PC analysis PC extraction process.
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
Principal Components An Introduction exploratory factoring meaning & application of “principal components” Basic steps in a PC analysis PC extraction process.
Factor Analysis There are two main types of factor analysis:
Multiple Regression Models Advantages of multiple regression Important preliminary analyses Parts of a multiple regression model & interpretation Differences.
Power Analysis for Correlation & Multiple Regression Sample Size & multiple regression Subject-to-variable ratios Stability of correlation values Useful.
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
Multiple Regression Models: Some Details & Surprises Review of raw & standardized models Differences between r, b & β Bivariate & Multivariate patterns.
Determining the # Of PCs Remembering the process Some cautionary comments Statistical approaches Mathematical approaches “Nontrivial factors” approaches.
Simple Regression correlation vs. prediction research prediction and relationship strength interpreting regression formulas –quantitative vs. binary predictor.
Multivariate Analyses & Programmatic Research Re-introduction to Multivariate research Re-introduction to Programmatic research Factorial designs  “It.
Bivariate & Multivariate Regression correlation vs. prediction research prediction and relationship strength interpreting regression formulas process of.
Factor Rotation & Factor Scores: Interpreting & Using Factors Well- & Ill-defined Factors Simple Structure Simple Structure & Factor Rotation Major Kinds.
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Factor Analysis Psy 524 Ainsworth.
Principal Components An Introduction
Near East University Department of English Language Teaching Advanced Research Techniques Correlational Studies Abdalmonam H. Elkorbow.
Discriminant Function Analysis Basics Psy524 Andrew Ainsworth.
Advanced Correlational Analyses D/RS 1013 Factor Analysis.
Factor Analysis Psy 524 Ainsworth. Assumptions Assumes reliable correlations Highly affected by missing data, outlying cases and truncated data Data screening.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
PC Decisions: # PCs, Rotation & Interpretation Remembering the process Some cautionary comments Statistical approaches Mathematical approaches “Nontrivial.
Lecture 12 Factor Analysis.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Applied Quantitative Analysis and Practices
Education 795 Class Notes Factor Analysis Note set 6.
Chapter 13.  Both Principle components analysis (PCA) and Exploratory factor analysis (EFA) are used to understand the underlying patterns in the data.
Advanced Statistics Factor Analysis, I. Introduction Factor analysis is a statistical technique about the relation between: (a)observed variables (X i.
Applied Quantitative Analysis and Practices LECTURE#19 By Dr. Osman Sadiq Paracha.
Principal Component Analysis
FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Exploratory Factor Analysis
Question So, I’ve done my factor analysis.
Measuring latent variables
Reliability and Validity of Measurement
Educational Research: Correlational Studies
Cross Sectional Designs
Measuring latent variables
Since When is it Standard to Be Deviant?
Descriptive Statistics vs. Factor Analysis
EPSY 5245 EPSY 5245 Michael C. Rodriguez
Multivariate Statistics
Principal Component Analysis
Factor Analysis BMTRY 726 7/19/2018.
Chapter_19 Factor Analysis
Inferential Statistics
Lecture 8: Factor analysis (FA)
Exploratory Factor Analysis. Factor Analysis: The Measurement Model D1D1 D8D8 D7D7 D6D6 D5D5 D4D4 D3D3 D2D2 F1F1 F2F2.
Cal State Northridge Psy 427 Andrew Ainsworth PhD
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Principal Components & Common Factoring An Introduction exploratory & confirmatory factoring meaning & application of “principal components” Basic steps in a PC analysis PC extraction process # PCs determination PC rotation & interpretation factoring items vs. factoring scales selecting and “accepting” data sets “World View” of PC vs. CF Choosing between PC and CF

Exploratory vs. Confirmatory Factoring Exploratory Factoring – when we do not have RH: about . . . the number of factors what variables load on which factors we will “explore” the factor structure of the variables, consider multiple alternative solutions, and arrive at a post hoc solution Weak Confirmatory Factoring – when we have RH: about the # factors and factor memberships we will “test” the proposed weak a priori factor structure Strong Confirmatory Factoring – when we have RH: about relative strength of contribution to factors by variables we will “test” the proposed strong a priori factor structure

Meaning of “Principal Components” “Component” analyses are those that are based on the “full” correlation matrix 1.00s in the diagonal yep, there’s other kinds, more later “Principal” analyses are those for which each successive factor... accounts for maximum available variance is orthogonal (uncorrelated, independent) with all prior factors full solution (as many factors as variables) accounts for all the variance

Applications of PC analysis Components analysis is a kind of “data reduction” start with an inter-related set of “measured variables” identify a smaller set of “composite variables” that can be constructed from the “measured variables” and that carry as much of their information as possible A “Full components solution” ... has as many PCs as variables accounts for 100% of the variables’ variance each variable has a final communality of 1.00 – all of its variance is accounted for by the full set of PCs A “Truncated components solution” … has fewer PCs than variables accounts for <100% of the variables’ variance each variable has a communality < 1.00 -- not all of its variance is accounted for by the PCs

The basic steps of a PC analysis Compute the correlation matrix Extract a full components solution Determine the number of components to “keep” total variance accounted for variable communalities “Rotate” the components and “interpret” (name) them Structure weights > |.3|-|.4| define which variables “load” Compute “component scores” “Apply” components solution theoretically -- understand meaning of the data reduction statistically -- use the component scores in other analyses interpretability replicability

PC Factor Extraction PC1 = b11X1 + b21X2 + … + bk1Xk Extraction is the process of forming PCs as linear combinations of the measured variables PC1 = b11X1 + b21X2 + … + bk1Xk PC2 = b12X1 + b22X2 + … + bk2Xk PCf = b1fX1 + b2fX2 + … + bkfXk Here’s the thing to remember… We usually perform factor analyses to “find out how many groups of related variables there are” … however … The mathematical goal of extraction is to “reproduce the variables’ variance, efficiently”

PC Factor Extraction, cont. Consider R on the right Obviously there are 2 kinds of information among these 4 variables X1 & X2 X3 & X4 Looks like the PCs should be formed as, X1 X2 X3 X4 X1 1.0 X2 .7 1.0 X3 .3 .3 1.0 X4 .3 .3 .5 1.0 PC1 = b11X1 + b21X2 -- capturing the information in X1 & X2 PC2 = b32X3 + b42X4 -- capturing the information in X3 & X4 But remember, PC extraction isn’t trying to “group variables” it is trying to “reproduce variance” notice that there are “cross correlations” between the “groups” of variables !!

PC Factor Extraction, cont. So, because of the cross correlations, in order to maximize the variance reproduced, PC1 will be formed more like ... PC1 = .5X1 + .5X2 + .4X3 + .4X4 Notice that all the variables contribute to defining PC1 Notice the slightly higher loadings for X1 & X2 Because PC1 didn’t focus on the X1 & X2 variable group or X3 & X4 variable group, there will still be variance to account for in both, and PC2 will be formed, probably something like … PC2 = .3X1 + .3X2 - .4X3 - .4X4 Notice that all the variables contribute to defining PC2 Notice the slightly higher loadings for X3 & X4

PC Factor Extraction, cont. While this set of PCs will account for lots of the variables’ variance -- it doesn’t provide a very satisfactory interpretation PC1 has all 4 variables loading on it PC2 has all 4 variables loading on it and 2 of then have negative weights, even though all the variables are positively correlated with each other The goal here was point out what extraction does (maximize variance accounted for) and what it doesn’t do (find groups of variables)

Determining the Number of PCs Determining the number of PCs is arguably the most important decision in the analysis … rotation, interpretation and use of the PCs are all influenced by the how may PCs are “kept” for those processes there are many different procedures available – none are guaranteed to work !! probably the best approach to determining the # of PCS… remember that this is an exploratory factoring -- that means you don’t have decent RH: about the number of factors So … Explore … consider different “reasonable” # PCs and “try them out” rotate, interpret &/or tryout resulting factor scores from each and then decide To get started we’ll use the SPSS “standard” of λ > 1.00

Rotation – finding “groups” in the variables Factor Rotations changing the “viewing angle” or “head tilt” of the factor space makes the groupings visible in the graph apparent in the structure matrix Unrotated Structure PC1 PC2 V1 .7 .5 V2 .6 .6 V3 .6 -.5 V4 .7 -.6 PC1’ Rotated Structure PC1 PC2 V1 .7 -.1 V2 .7 .1 V3 .1 .5 V4 .2 .6 PC2 V2 V1 PC1 V3 V4 PC2’

Interpretation – Naming “groups” in the variables Usually interpret factors using the rotated solutions using the rotated Factors are named for the variables correlated with them Usual “cutoffs” are +/- .3 - .4 So … a variable that shares at least 9-16% of its variance with a factor is used to name that factor Variables may “load” on none, 1 or 2+ factors Rotated Structure PC1 PC2 V1 .7 -.1 V2 .7 .1 V3 .1 .5 V4 .2 .6 This rotated structure is easy – PC1 is V1 & V2 PC2 is V3 & V4 It is seldom this easy !?!?!

“Kinds” of Factors General Factor all or “almost all” variables load there is a dominant underlying theme among the set of variables which can be represented with a single composite variable Group Factor some subset of the variables load there is an identifiable sub-theme in the variables that must be represented with a specific subset of the variables “smaller” vs. “larger” group factors (# vars & % variance) Unique Factor single variable loads

“Kinds” of Variables Univocal variable -- loads on a single factor Multivocal variable -- loads on 2+ factors Nonvocal variable -- doesn’t load on any factor You should notice a pattern here… a higher “cutoff” (e.g., .40) tends to produce … fewer variables loading on a given factor less likely to have a general factor fewer multivocal variables more nonvocal variables a lower “cutoff” (e.g., .30) tends to produce … more variables loading on a given factror more likely to have a general factor more multivocal variables fewer nonvocal variables

Factoring items vs. factoring scales Items are often factored as part of the process of scale development check if the items “go together” as the scale’s author intended Scales (composites of items) are factored to … examine construct validity of “new” scales test “theory” about what constructs are interrelated Remember, the reason we have scales is that individual items are typically unreliable and have limited validity

Factoring items vs. factoring scales, cont. The limited reliability and validity of items means that they will be measured with less precision, and so, their intercorrelations from any one sample will be “fraught with error” Since factoring starts with R, factorings of items is likely to yield spurious solutions -- replication of item-level factoring is very important !! Consider for a moment… Is the issue really “items vs. scales” ?? No -- it is really the reliability and validity of the “things being factored” scales having these properties more than scale items

Selecting Variables for a Factor Analysis The variables in the analysis determine the analysis results this has been true in every model we’ve looked at (remember how the inclusion of covariate and/or interaction terms has radically changed some results we’ve seen) this is very true of factor analysis, because the goal is to find “sets of variables” Variable sets for factoring come in two “kinds” when the researcher has “hand-selected” each variable when the researcher selects a “closed set” of variables (e.g., the sub-scales of a standard inventory, the items of an interview, or the elements of data in a “medical chart”)

Selecting Variables for a Factor Analysis, cont. Sometimes a researcher has access to a data set that someone else has collected -- an “opportunistic data set” while this can be a real money/time saver, be sure to recognize the possible limitations be sure the sample represents a population you care about carefully consider the variables that “aren’t included” and the possible effects their absence has on the resulting factors this is especially true if the data set was chosen to be “efficient” -- variables chosen to cover several domains you should plan to replicate any results obtained from opportunistic data

Selecting the Sample for a Factor Analysis How many? Keep in mind that the R (correlation matrix) and so the factor solution is the same no matter now many cases are used -- so the point is the representativeness and stability of the correlations Advice about the subject/variable ration varies pretty dramatically 5-10 cases per variable 300 cases minimum (maybe + # per item) Consider that Stdr = 1 /  (N-3) n=50 r +/- .146 n=100 r +/- .101 n=200 r +/- .07 n=300 r +/- .058 n=500 r +/- .045 n=1000 r +/- .031

Selecting the Sample for a Factor Analysis, cont. Who? Sometimes the need to increase our sample size leads us to “acts of desperation”, i.e., taking anybody? Be sure your sample represents a single “homogeneous” population Consider that one interesting research question is whether different populations or sub-populations have different factor structures

World View of PC Analyses PC analysis is based on a very simple “world view” We measure variables The goal of factoring is data reduction determine the # of kinds of information in the variables build a PC for each R holds the relationships between the variables PCs are composite variables computed from linear combinations of the measured variables

World View of CF Analyses CF is based on a somewhat more complicated and “causal” world view Any domain (e.g., intelligence, personality) has some set of “latent constructs” A person’s “values” on these “latent constructs” causes their scores on any measured variable(s) any variable has two parts “common part” -- caused by values of the latent constructs” “unique part” -- not related to any latent construct (“error”)

World View of CF Analyses, cont the goal of factoring is to reveal the number and identify of these “latent constructs” R must be “adjusted” to represent the relationships between portions of the variables that are produced by the “latent constructs” represent the correlations between the “common” parts of the variables CFs are linear combinations of the “common” parts of the measured variables that capture the underlying constructs”

Example of CF world view “latent constructs” IQ Math Ability Reading Skill Social Skills “measures” adding, subtraction, multiplication vocabulary, reading speed, reading comprehension politeness, listening skills, sharing skills Each measure is “produced” by a weighted combination of the latent constructs, plus something unique to that measure . . . adding = .5*IQ +.8*Math + 0*Reading + 0*Social + Ua subtraction = .5*IQ +.8*Math + 0*Reading + 0*Social + Us vocabulary = .5*IQ + 0*Math + .8*Reading + 0*Social + Uv politeness = .4*IQ + 0*Math + 0*Reading +.8*Social + Up

Example of CF world view, cont When we factor these, we might find something like CF1 CF2 CF3 CF4 adding .4 .6 subtraction .4 .6 multiplication .4 .6 vocabulary .4 .6 reading speed .4 .6 reading comp .4 .6 politeness .3 .6 listening skills .3 .6 sharing skills .3 .6 Name each “latent construct” that was revealed by this analysis

Principal Axis Analysis “Principal” again refers to the extraction process each successive factor is orthogonal and accounts for the maximum available covariance among the variables “Axis” tells us that the factors are extracted from a “reduced” correlation matrix diagonals < 1.00 diagonals = the estimated “communality” of each variable reflecting that not all of the variance of that variable is “produced” by the set of “latent variables” So, factors extracted from the “reduced” R will reveal the latent variables

Which model to choose -- PC or PAF ? Traditionally... PC is used for “psychometric” purposes reduction of collinear predictor sets examination of the structure of “scoring systems” consideration of scales and sub-scales works with full R because composites will be computed from original variable scores not “common parts” CF is used for “theoretical” purposes identification of “underlying constructs” number and identity of “basic elements of behavior” The basis for “latent class” analyses of many kinds both measurement & structural models works with reduced R because it hold the “meaningful” part of the variables and their interrelationships The researcher selects the procedure based on their purpose for the factor analysis !!