Presentation on theme: "Warning: Very Mathematical!"— Presentation transcript:
1Warning: Very Mathematical! Factor AnalysisWarning: Very Mathematical!
2Motivating Example: McMaster’s Family Assessment Device Consider the McMaster’s Family Assessment Device. It consists of 60 questions all on 4-point ordinal scale.On some items a high score indicates good family functioning, while others are indicative of lower functioning.Obviously many of the questions have some things in common and collectively measure different aspects of family functioning.
3Motivating Example: McMaster’s Family Assessment Device (Q1-Q12) We can see the positive family traits measured by the even-numbered items and the negative family traits measured by the odd-numbered items are positively correlated amongst themselves and negatively correlated with each other as expected.
4Motivating Example: McMaster’s Family Assessment Device But what does a subjects responses across all 12-items truly measure? We could simply add the items together, after reversing the scaling on the items that relate to the negative aspects of family functioning and family member interactions. After doing this a low total score would indicate a “good” family and a high total score would indicate a “bad” family. But perhaps there are different aspects of family functioning that the survey is measuring.
5Factor AnalysisFactor analysis is used to illuminate the underlying dimensionality of a set of measures.Some of these questions may cluster together to potentially measure things such things as communication, collaboration, closeness, or commitment.This is the idea behind factor analysis.
6Factor Analysis Data reduction tool Removes redundancy or duplication from a set of correlated variables.Represents correlated variables with a smaller set of “derived” variables. These derived variables may measure some underlying features of the respondents.Factors are formed that are relatively independent of one another.Two types of “variables”:latent variables: factorsobserved variables (items on the survey)
7Applications of Factor Analysis 1. Identification of Underlying Factors:clusters variables into homogeneous setscreates new variables (i.e. factors)allows us to gain insight to categories2. Screening of Variables:identifies groupings to allow us to select one variable to represent manyuseful in regression and other subsequent analyses
8Applications of Factor Analysis 3. Summary:flexibility in being able to extract few or many factors4. Sampling of variables:helps select small group of variables of representative yet uncorrelated variables from larger set to solve practical problem5. Clustering of objects:“inverse” factor analysis, we will see an example of this when examining a car model perception survey.
9“Perhaps the most widely used (and misused) multivariate statistic is factor analysis. Few statisticians are neutral aboutthis technique. Proponents feel that factor analysis is thegreatest invention since the double bed, while its detractors feelit is a useless procedure that can be used to support nearly anydesired interpretation of the data. The truth, as is usually the case,lies somewhere in between. Used properly, factor analysis can yield much useful information; when applied blindly, without regard for its limitations, it is about as useful and informative as Tarot cards. In particular, factor analysis can be used to explore the data for patterns, confirm our hypotheses, or reduce the many variables to a more manageable number.”-- Norman Streiner, PDQ Statistics
10One Factor Model Classical Test Theory Idea: Ideal: X1 = F + e1 var(ej) = var(ek) , j ≠ kX2 = F + e2…Xp = F + epReality: X1 = λ1F + e1 var(ej) ≠ var(ek) , j ≠ kX2 = λ2F + e2Xp = λpF + ep(unequal “sensitivity” to change in factor)
11Key ConceptsF is latent (i.e. unobserved, underlying) variable called a factor.X’s are the observed variables.ej is measurement error for Xj.λj is the “loading” on factor F for Xj.
12Optional Assumptions We will make these to simplify our discussions X’s are standardized prior to beginning a factor analysis, i.e. converted to z-scores.F is also standardized, that is the standard deviation of F is 1 and the mean is 0.
13Some math associated with the ONE factor model λj2 is also called the “communality” of Xj in the one factor case (Standard notation for communality: hj2)For standardized Xj , Corr(F, Xj) = λjFor standardized variables, the percentage of variability in Xj explained by F is λj2. (like an R2 in regression)If Xj is N(0,1) or at least standardized, then λj is equivalent to:the slope in a regression of Xj on Fthe correlation between F and XjInterpretation of λj:standardized regression coefficient (regression)path coefficient (path analysis)factor loading (factor analysis)
14Some more math associated with the ONE factor model Corr(Xj , Xk )= λjλkNote that the correlation between Xj and Xk is completely determined by the common factor F.Factor loadings (λj) are equivalent to correlation between factors and variables when only a SINGLE common factor is involved.
15Example: McMaster’s FAD For the 12-items of McMaster’s FAD used in this survey we can see that the one factor solution essentially reverses the scales of the items measuring unhealthy family dynamics (-) and then “adds” up the items to get an overall score. The loadings are not all equal in magnitude so it does give more weight to some items than others.
16Example: McMaster’s FAD The communalities ( ℎ 𝑗 2 ) are simply the squares of the factor loadings ( 𝜆 𝑗 2 ). For example the communality for item 1 ( 𝑋 1 ) is: = etc. This is one of the lowest communalities. We can also see that item 12 ( 𝑋 12 ) has the highest communality.𝜆 𝑗 ′ 𝑠ℎ 𝑗 2 ′ s
17Steps in Exploratory Factor Analysis (EFA) (1) Collect data: choose relevant variables.(2) Extract initial factors (via principal components)(3) Choose number of factors to retain(4) Choose estimation method, estimate model(5) Rotate and interpret factors.(6) (a) Decide if changes need to be made (e.g. drop item(s), include item(s))(b) repeat steps (4)-(5)(7) Construct scales or factor scores and potentially use them in further analysis. For example we might use the factor scores as predictors in a regression model for some response of interest.
18Data MatrixFactor analysis is totally dependent on correlations between variables.Factor analysis summarizes correlation structureX1……...XpF1…..FmX1……...XpX1.XpX1.XpO1.On𝜆 𝑖𝑗 ′ 𝑠 ~ factor loadingsCorrelationMatrixFactorMatrixData Matrix
19Frailty Variables 12 tests yielding a numeric response Speed of fast walk (+) Upper extremity strength (+)Speed of usual walk (+) Pinch strength (+)Time to do chair stands (-) Grip strength (+)Arm circumference (+) Knee extension (+)Body mass index (+) Hip extension (+)Tricep skinfold thickness (+) Time to do Pegboard test (-)Shoulder rotation (+)
21One Factor Frailty Solution Variable | Loadingsarm_circ |skinfld |fastwalk |gripstr |pinchstr |upextstr |kneeext |hipext |shldrrot |pegbrd |bmi |uslwalk |chrstand |These numbers representthe correlations betweenthe common factor, F,and the input variables.Clearly, estimating F ispart of the process
22Xp = λp1F1 + λp2F2 +…+ λpmFm + ep More than One Factorm factor orthogonal modelORTHOGONAL = INDEPENDENT, meaning the underlying factors F1, …, Fm are uncorrelated.m factors, p observed variablesX1 = λ11F1 + λ12F2 +…+ λ1mFm + e1X2 = λ21F1 + λ22F2 +…+ λ2mFm + e2…….Xp = λp1F1 + λp2F2 +…+ λpmFm + ep
23More than One Factor Same general assumptions as one factor model. Corr(Fs,Xj) = λjsPlus:- Corr(Fs,Fr) = 0 for all s ≠ r (i.e. orthogonal)- this is forced independence- simplifies covariance/correlation structure- Corr(Xi,Xj) = λi1 λj1+ λi2 λj2+ λi3 λj3+….
24Factor Matrix 𝜆 11 ⋯ 𝜆 1𝑚 ⋮ ⋱ ⋮ 𝜆 𝑝1 ⋯ 𝜆 𝑝𝑚 𝜆 11 ⋯ 𝜆 1𝑚 ⋮ ⋱ ⋮ 𝜆 𝑝1 ⋯ 𝜆 𝑝𝑚Columns represent derived factorsRows represent input variablesLoadings represent degree to which each of the variables “correlates” with each of the factorsLoadings range from -1 to 1Inspection of factor loadings reveals extent to which each of the variables contributes to the meaning of each of the factors.High loadings provide meaning and interpretation of factors (~ regression coefficients)Ex: Car Rating Survey
25Frailty Variables Speed of fast walk (+) Upper extremity strength (+) Speed of usual walk (+) Pinch strength (+)Time to do chair stands (-) Grip strength (+)Arm circumference (+) Knee extension (+)Body mass index (+) Hip extension (+)Tricep skinfold thickness (+) Time to do Pegboard test (-)Shoulder rotation (+)
26Frailty Example Hand strength Leg strength Size Speed Factors LoadingsVariable | Uniquenessarm_circ |skinfld |fastwalk |gripstr |pinchstr |upextstr |kneeext |hipext |shldrrot |pegbrd |bmi |uslwalk |chrstand |HandstrengthLegstrengthSizeSpeed
27CommunalitiesThe communality of Xj is the proportion of the variance of Xj explained by the m common factors:Recall one factor model: What was the interpretation of λj2?In other words, it can be thought of as the sum of squared multiple-correlation coefficients between the Xj and the factors.Uniqueness(Xj) = 1 - Comm(Xj)
28Communality of Xj“Common” part of variance- correlation between Xj and the part of Xj due to the underlying factors, assuming Xj is standardized.- Var(Xj) = “communality” +”uniqueness”- For standardized Xj: 1 = “communality” +”uniqueness”- Thus,Uniqueness = 1 – Communality- Can think of Uniqueness = Var(ej) If Xj is informative, communality is high If Xj is not informative, uniqueness is highIntuitively: variables with high communality share more in common with the rest of the variables.
29How many factors?Intuitively: The number of uncorrelated constructs that are jointly measured by the X’s. Factor analysis is only useful if number of factors is less than number of X’s. Requires decent correlation structure in the X’s. (Goal: “data reduction”)Identifiability: Is there enough information in the data to estimate all of the parameters in the factor analysis? May be constrained to a certain number of factors. Generally we like to have 10 observations per X in order to estimate underlying.
30Choosing Number of Factors Use “principal components” to help decidetype of factor analysis (PCA)number of factors is equivalent to number of variableseach factor is a weighted combination of the input variables:F1 = a1X1 + a2X2 + ….Recall: For a factor analysis, generally,X1 = a1F1 + a2F2 +...
31Estimating Principal Components The first PC is the linear combination with maximum variance, PC 1 = F 1 = a 11 X 1 + a 12 X 2 +…+ a 1p X pThat is, it finds 𝑎 11 , 𝑎 12 ,…, 𝑎 1𝑝 = 𝒂 𝟏 to maximizeVar(F1) = 𝑉𝑎𝑟 𝑎 11 𝑋 1 +…+ 𝑎 1𝑝 𝑋 𝑝 =𝑉𝑎𝑟 𝒂 𝟏 𝑻 𝑋constrained such that 𝑗=1 𝑝 𝑎 1𝑗 2 =1First PC: linear combination 𝒂 𝟏 𝑻 𝑋 that maximizes Var(a1TX) such that 𝑗=1 𝑝 𝑎 1𝑗 2 =1Second PC: PC 2 = F 2 = a 21 X 1 + a 22 X 2 +…+ a 2p X p Linear combination that maximizes Var(a2TX) such that 𝑗=1 𝑝 𝑎 2𝑗 2 =1 AND Corr(F1,F2)=0And so on…..
32Eigenvalues We use eigenvalues to select how many factors to retain. We usually consider the eigenvalues from a principal components analysis (PCA).Two interpretations:eigenvalue equivalent number of variables which the factor representseigenvalue amount of “variance” in the data described by the factor.Rules to go by:number of eigenvalues > 1scree plot% variance explainedcomprehensibility
33Frailty Example (p = 13)PCA = principal components; all p = 13 components retainedComponent Eigenvalue Difference Proportion Cumulative
36At this stage…. Don’t worry about interpretation of factors! Main concern: whether a smaller number of factors can account for variabilityResearcher (i.e. YOU) needs to:provide number of common factors to be extracted ORprovide objective criterion for choosing number of factors (e.g. scree plot, % variability, etc.)
37RotationIn principal components, the first factor describes most of variability.After choosing number of factors to retain, we want to spread variability more evenly among factors.To do this we “rotate” factors:redefine factors such that loadings on various factors tend to be very high (-1 or 1) or very low (0)intuitively, this makes sharper distinctions in the meanings of the factorsWe use “factor analysis” for rotation NOT principal components!
42Unique Solution? The factor analysis solution is NOT unique! More than one solution will yield the same “result.”We will understand this better by the end of the lecture…..
43RotationUses “ambiguity” or non-uniqueness of solution to make interpretation more simpleWhere does ambiguity come in?Unrotated solution is based on the idea that each factor tries to maximize variance explained, conditional on previous factorsWhat if we take that away?Then, there is not one “best” solution.All solutions are relatively the same.Goal is simple structureMost construct validation assumes simple (typically rotated) structure.Rotation does NOT improve fit, just interpretability!
45Orthogonal vs. Oblique Rotation 𝜆 11 ⋯ 𝜆 1𝑚 ⋮ ⋱ ⋮ 𝜆 𝑝1 ⋯ 𝜆 𝑝𝑚Orthogonal: Factors are independentvarimax: maximize squared loading variance across variables (sum over factors)quartimax: maximize squared loading variance across factors (sum over variables)Intuition: from previous picture, there is a right angle between axesNote: “Uniquenesses” remain the same!
46Orthogonal vs. Oblique Rotation Oblique: Factors not independent. Change in “angle.”oblimin: minimize squared loading covariance between factors.promax: simplify orthogonal rotation by making small loadings even closer to zero.lots of others!Intuition: from previous picture, angle between axes is not necessarily a right angle.Note: “Uniquenesses” remain the same!
49Which to use? Choice is generally not critical Interpretation with orthogonal is “simple” because factors are independent and loadings are correlations.Structure may appear more simple in oblique, but correlation of factors can be difficult to reconcile (deal with interactions, etc.)Theory? Are the conceptual meanings of the factors associated?Oblique:Loading is no longer interpretable as correlation between object and factor.2 matrices: pattern matrix (loadings) and structure matrix (correlations)In JMP
50Steps in Exploratory Factor Analysis (EFA) (1) Collect data: choose relevant variables.(2) Extract initial factors (via principal components)(3) Choose number of factors to retain(4) Choose estimation method, estimate model(5) Rotate and interpret(6) (a) Decide if changes need to be made (e.g. drop item(s), include item(s))(b) repeat (4)-(5)(7) Construct scales and use in further analysis
51Drop variables with Uniqueness > 0.50 in 5 factor model . PCA arm_circ skinfld fastwalk gripstr pinchstr kneeext bmi uslwalk(n=782)Principal Components; 8 components retainedComponent Eigenvalue Difference Proportion Cumulative
54Methods for Extracting Factors Principal Components (PC)Principal Factor MethodIterated Principal Factor / Least SquaresMaximum Likelihood (ML)Most common: ML and PC (both in JMP)
55Principal Factor Analysis Simplified explanationSteps:1. Get initial estimates of communalitiessquared multiple correlationshighest absolute correlation in row2. Take correlation matrix and replace diagonal elements by communalities. We call this the “adjusted” correlation matrix.3. Apply principal components analysis
56Principal Factor Analysis Replace 1’s (variances) withestimate of communalityObtain correlation matrix1r12r13r14r15r16r17r21r23r24r25r26r27r31r32r34r35r36r37r41r42r43r45r46r47r51r52r53r55r56r57r61r62r63r64r65r67r71r72r73r74r75r76h12r12r13r14r15r16r17r21h22r23r24r25r26r27r31r32h32r34r35r36r37r41r42r43h42r45r46r47r51r52r53r55h52r56r57r61r62r63r64r65h62r67r71r72r73r74r75r76h72Apply principal components to “adjusted” correlationmatrix and use results.
57Iterative Principal Factor / Least Squares 1. Perform Principal Factor as described above.2. Instead of stopping after principal components, re-estimate communalities based on loadings.3. Repeat until convergence.Better than without iterating!
58Iterated Principal Factors / Least Squares Standard Least Squares approachMinimize:
59Maximum Likelihood Method Assume F’s are normalUse likelihood functionMaximize parametersIterative procedureNotes:Normality matters!Estimates can get “stuck” at boundary(e.g. communality of 0 or 1).Must rotate for interpretable solution
60Choice of Method Give different results because they: Benefit of ML use different proceduresuse different restrictionsmake different assumptionsBenefit of MLCan get statistics which allow you to compare factor analytic modelsBut “requires” normality assumption!
61Which Method Should You Use? Statisticians: Use PC and MLSocial Sciences: LS and Principal FactorRecommendations: Try both and look at the results, choose the one “you” like. If the X’s are mostly Likert scale ordinal items, then unless there are lots of survey items I would recommend PC approach. Also Orthogonal rotation > Oblique rotation.If the X’s are approximately normally distributed I would recommend the ML method.Other statistical packages have more options!
62Factor Scores and Scales Each object (e.g. each subject) gets a factor score for each factor.Old data vs. New dataThe factors themselves are “new” variables“Subject’s” score is weighted combination of scores on input variablesThese weights are NOT the factor loadings!Loadings and weights are determined simultaneously so that there is no correlation between resulting factors.We won’t bother here with the mathematics….
64Interpretation“Naming” of Factors – the ability to name factors is one measure of success of a factor analysis.Wrong Interpretation: Factors represent separate groups of people.Right Interpretation: Each factor represents a continuum along which people vary (and dimensions are orthogonal if orthogonal rotation is used)
65Exploratory versus Confirmatory Exploratory Factor Analysis (EFA):summarize datadescribe correlation structure between variablesgenerate hypothesesConfirmatory Factor Analysis (CFA):testing consistency with a preconceived theoryshould hypothesize a priori at least number of factorsother considerations
66Criticisms of Factor Analysis Labels of factors can be arbitrary or lack scientific basisDerived factors often very obviousdefense: but we get a quantification“Garbage in, garbage out”really a criticism of input variablesfactor analysis reorganizes input matrixToo many steps that could affect resultsToo complicatedCorrelation matrix is often poor measure of association of input variables.Voodoo Magic Hocus-Pocus Stuff
68Example 1: MBA Car Survey In January 1998, 303 MBA students were asked about their evaluations of and preferences for 10 different automobiles. The automobiles, listed in order of presentation in the survey were BMW 328i, Ford Explorer, Infiniti J30, Jeep Grand Cherokee, Lexus ES 300, Chrysler Town & Country, Mercedes C280, Saab 9000, Porsche Boxter, and Volvo V90. Each student rated all 10 cars. To retain independence of the observations, one car was randomly selected for each student, resulting n = 303 evaluations each done by a different person. The students rated each car on 16 attributes (X’s). The first eight items on the survey asked students to were asked to assess to what extent the following words described the car: Exciting, Dependable, Luxurious, Outdoorsy, Powerful, Stylish, Comfortable, and Rugged. Responses were ordinal: 5 = Extremely descriptive, …, 1 = Not at all descriptive. THIS COMPLETES STEP (1): Collect Data - Choose Relevant Variables
69Example 1: MBA Car Survey The next eight items students were asked to use the following ordinal scheme: 5 = Strongly Agree, …, 1 = Strongly Disagree to answer the questions listed below: “This car is fun to drive” “This car is safe” “This car is a high-performance car” “This car is a family car” “This car is versatile” “This car is sporty” “This car is a high-status car” “This car is practical”
70Example 1: MBA Car Survey Thus we have 16 items (X’s) to use when conducting a factor analysis for the survey results. X1 = Exciting X9 = Fun X2 = Dependable X10 = Safe X3 = Luxurious X11 = Performance X4 = Outdoorsy X12 = Family X5 = Powerful X13 = Versatile X6 = Stylish X14 = Sports X7 = Comfortable X15 = Status X8 = Rugged X16 = Practical
71Example 1: MBA Car Survey Start by looking at a correlation matrix and the corresponding scatterplot matrix.Which variable “groupings” or “clusters” do you see based on the correlations?Exciting, Powerful, Fun, Performance, Sports, Status?Dependable, Safe, Comfortable?Rugged, Outdoorsy, Versatile, Sports?Family, Practical, Versatile, Safe?Notice some variables appear in more than one “group” or “cluster”.What about negative correlations?
72Example 1: MBA Car Survey Step (2): Extract initial factors (via PCA)Before we look at interpretation we need to decide how many factors we wish to retain/estimate.Using the Eigenvalue > 1 rule we would keep m = 3 factors.The Scree Plot suggests m = 3 or 4.In order to get 90% variation explained we need m = 9 which it too many!
73Example 1: MBA Car Survey Step (3): Choose number of factors to retain Even though in factor analysis we don’t usually interpret the unrotated principal components we will briefly look at them here.I usually use the eigenvalue rule, so m = 3 is my preliminary choice.
74Example 1: MBA Car Survey Steps (4) & (5): Choose estimation method, estimate model, rotate factors, and interpret.
75Example 1: MBA Car Survey Steps (4) & (5): Choose estimation method, estimate model, rotate factors, and interpret.
76Example 1: MBA Car Survey Steps (4) & (5): Choose estimation method, estimate model, rotate factors, and interpret.The Uniqueness of Dependability > (rule of thumb for item exclusion)Uniqueness = 1 – Communality = =We could consider deleting this variable from our survey. Dependability may be a characteristic of all “types” of vehicles, so it may not align with any one specific car or type of car in this survey.
77Example 1: MBA Car Survey How would you name these factors?Factor 1 = ????Factor 2 = ????Factor 3 = ????
78Example 1: MBA Car Survey (6) (a) Decide if changes need to be made (e.g. drop item(s), include item(s))(b) repeat (4)-(5)(7) Construct scales and use in further analysisMaximum Likelihood (ML) factor analysis with Quartimax rotation with Dependability removed from the survey results.To construct “scales” we can save the factor values or scores (Fi’s) and use them for subsequent analyses. On the following slides we look at some different examples of what we can do with the factor scores in terms of “further analyses”.
79Example 1: MBA Car Survey Here we can see that the factors are uncorrelated and the variable Exciting, Luxurious, Powerful, Stylish, and Fun are fairly strongly correlated with the first factor F1.The scatterplots of individual survey items do not show much as all items are on a 5-point Likert scale.In next slide we average the factor scores by car model and construct a scatterplot matrix of the average factors with points labeled by car model.
81Example 2: McMaster’s FAD Two underlying factors (m = 2) are suggested based on the eigenvalues. The two factor analysis gives loadings on the variables that contrast good trait vs. the bad family traits measured by the survey.
82Example 2: McMaster’s FAD QUESTIONStrongly AgreeAgreeDisagreeStrongly Disagree1. Planning family activities is difficult because we misunderstand each other.12342. In times of crisis we can turn to each other for support.3. We cannot talk to each other about the sadness we feel.4. Individuals are accepted for what they are.5. We avoid discussing our fears and concerns.6. We can express feelings to each other.7. There are lots of bad feelings in the family.8. We feel accepted for what we are.9. Making decisions is a problem in our family.10. We are able to make decisions about how to solve problems.11. We do not get along well with each other.12. We confide in each other