Presentation on theme: "Applied Structural Equation Modeling for Dummies, by Dummies February 22, 2013 Indiana University, Bloomington Joseph J. Sudano, Jr., PhD Center for."— Presentation transcript:
1 Applied Structural Equation Modeling for Dummies, by Dummies February 22, 2013 Indiana University, BloomingtonJoseph J. Sudano, Jr., PhDCenter for Health Care Research and PolicyCase Western Reserve University at The MetroHealth SystemAdam T. Perzynski, PhD
6 Thanks So Much!! Acknowledgements: Bill Pridemore PhD Adam Perzynski PhDDavid W. Baker MDRandy Cebul MDFred Wolinsky PhDNo conflicts of interest (but I wish there were some major financial ones!)
7 Presentation Outline Conceptual overview. What is SEM? Basic idea underpinning SEMMajor applicationsShared characteristics among SEM techniquesTerms, nomenclature, symbols, vocabularyBasic SEM exampleSample size, other issues and model fitSoftware and texts
8 What Is Structural Equation Modeling? SEM: very general, very powerful multivariate technique.Specialized versions of other analysis methods.Major applications of SEM:Causal modeling or path analysis.Confirmatory factor analysis (CFA).Second order factor analysis.Covariance structure models.Correlation structure models.SEM is a very general and very powerful multivariate technique that includes specialized versions of other analysis methods as special cases. I’m going to assume that you have some basic understanding of statistical concepts such as standard deviation, variance, covariance and correlations, and analytic techniques such as ordinary least squares, because SEM is basically a specialized version or versions of other analysis methods that you are probably aware of.Causal modeling or path analysis, which hypothesizes causal relationships among variables and test the causal models with linear equation systems. These models can involve either observed variables, latent variables, or both.Confirmatory factor analysis, an extension of factor analysis in which specific hypotheses about the structure of the factor loadings and intercorrelations are tested.Second order factor analysis, an extension of factor analysis in which the correlation matrix of the common factors is itself factor analyzed to provide second order factors. Classic example is the SF-36.Covariance structure models, which hypothesizes that a covariance matrix has a particular form. For example, you can test the hypothesis that a set of variables all have equal variances with this procedure. Classic example is testing for measurement invariance across language groups or sex in personality inventories or subjective measures of health and wellbeing.Correlation structure models, which hypothesizes that a correlation matrix has a particular form. A classic example is the hypothesis that the correlation matrix has the structure of a circumplex (or circular ordering), for example circumplex models of personality and emotions or the circumplex model of marital and family systems.
9 Advantages of SEM Compared to Multiple Regression More flexible modelingUses CFA to correct for measurement errorAttractive graphical modeling interfaceTesting models overall vs. individual coefficientsMore flexible assumptions (particularly allowing interpretation in the face of multicollinearity).Uses confirmatory factor analysis (CFA) to reduce measurement error by having multiple indicators per latent variable.Most software packages have these nifty graphical interfaces that can save models in gif tif and other formats to insert into documents.SEM allows one to test entire models and to test them overall, versus focusing on individual coefficients.
10 What are it’s Advantages? Test models with multiple dependent variablesAbility to model mediating variablesAbility to model error termsOther advantages include the ability to test models with multiple dependent variables and the ability to model mediating variables and error terms as well.
11 What are it’s Advantages? Test coefficients across multiple between-subjects groupsAbility to handle difficult dataLongitudinal with auto-correlated errorMulti-level dataNon-normal dataIncomplete dataTest coefficients across multiple between subjects groups.Simple Example: Is the effect of education on income the same for blacks and whites?Ability to handle difficult data:Longitudinal with auto-correlated errorMulti-level dataNon-normal dataIncomplete data
12 Shared Characteristics of SEM Methods SEM is a prioriThink in terms of models and hypothesesForces the investigator to provide lots of informationwhich variables affect othersdirectionality of effect
13 Shared Characteristics of SEM Methods SEM allows distinctions between observed and latent variablesBasic statistic in SEM in the covarianceNot just for non-experimental dataView many standard statistical procedures as special cases of SEMStatistical significance less important than for more standard techniques
14 Terms, Nomenclature, Symbols, and Vocabulary (Not Necessarily in That Order) Variance = s2Standard deviation = sCorrelation = rCovariance = sXY = COV(X,Y)Disturbance = DX Y DMeasurement error = e or EA X EBut before we move on to some examples we need to get acquainted and re-acquainted with some terms and symbols, the vocabulary of SEM if you will.We have all the basic statistical notions and terms like variance, standard deviation and correlation, but of particular importance to SEM is the notion of covariance and the covariance matrix.Covariance is used in most of the statistical packages as the basic data for structural equation modeling.There is a distinction usually made in SEM regarding residual variances for variables in equations, where disturbance or D is the variance in Y unexplained by a variable X presumed to affect it. Measurement error is slightly different. First X is an observed variable that is presumed to measure A, a latent variable; E is variance in X unexplained by A.
15 Terms, Nomenclature, Symbols, and Vocabulary Experimental researchindependent and dependent variables.Non-experimental researchpredictor and criterion variablesIn their most basic form, experimental and non-experimental studies concern the relationship between two variables. In experimental studies we usually use the terms independent and dependent, where the independent var is manipulated by the researcher and the effect on the dependent variable is observed. In non-experimental studies, many researchers tend to use the terms predictor and criterion instead of (respectively) independent and dependent although I use them all with almost no distinction between the two.Somewhat different names for SEM...we typically use the terms observed or manifest indicators indicated by a square or rectangle and latent variables or factors indicated by a circle or ovalObserved (or manifest)Latent (or factors)
16 Terms, Nomenclature, Symbols, and Vocabulary ExogenousEndogenousDirect effectsReciprocal effectsCorrelation or covariance“of external origin”Outside the model“of internal origin”Inside the model
17 Terms, Nomenclature, Symbols, and Vocabulary Measurement modelThat part of a SEM model dealing with latent variables and indicators.Structural modelContrasted with the aboveSet of exogenous and endogenous variables in the model with arrows and disturbance termsThe measurement model is that part (possibly all) of a SEM model which deals with the latent variables and their indicators. A pure measurement model is a confirmatory factor analysis (CFA) model in which there is unmeasured covariance (two-headed arrows) between each possible pair of latent variables, there are straight arrows from the latent variables to their respective indicators, there are straight arrows from the error and disturbance terms to their respective variables, but there are no direct effects (straight arrows) connecting the latent variables. The measurement model is evaluated like any other SEM model, using fit indices or measures. Some folks suggest there is no point in proceeding to the structural model until one is satisfied the measurement model is valid.The structural model may be contrasted with the measurement model. It is the set of exogenous and endogenous variables in the model, together with the direct effects (straight arrows) connecting them, and the disturbance and error terms for these variables (reflecting the effects of unmeasured variables not in the model)
18 Measurement Model: Confirmatory Factor Analysis Observed or manifest variablesGHQHostilityHopelessnessSelf-rated healthPsychosocialhealthD1e4e3e2e1The measurement model is that part (possibly all) of a SEM model which deals with the latent variables and their indicators. A pure measurement model is a confirmatory factor analysis (CFA) model in which there is unmeasured covariance (two-headed arrows) between each possible pair of latent variables, there are straight arrows from the latent variables to their respective indicators, there are straight arrows from the error and disturbance terms to their respective variables, but there are no direct effects (straight arrows) connecting the latent variables. The measurement model is evaluated like any other SEM model, using goodness of fit measures. There is no point in proceeding to the structural model until one is satisfied the measurement model is valid.Latent construct or factorSingh-Manoux, Clark and Marmot Multiple measures of socio-economic position and psychosocial health: proximal and distal measures.
19 Structural Model with Additional Variables Observed or manifest variablesGHQHostilityHopelessnessIncomeOccupationEducationSelf-rated healthPsychosocialhealthD1e4e3e2e1In traditional regression models we might bundle the 4 observed variables over on the right side into a composite variable “psychosocial health”…we would not have the ability to indicate how educ et al were related to one another...and we would assume perfect measurement of psychosocial health…in SEM we don’t do that.The structural model may be contrasted with the measurement model. It is the set of exogenous and endogenous variables in the model, together with the direct effects (straight arrows) connecting them, and the disturbance and error terms for these variables (reflecting the effects of unmeasured variables not in the model)Latent construct or factorSingh-Manoux, Clark and Marmot Multiple measures of socio-economic position and psychosocial health: proximal and distal measures.
20 Causal Modeling or Path Analysis and Confirmatory Factor Analysis EducationHostilitye1a= direct effectb+c=indirectHopelessnesse2PsychosocialhealthcIncomeGHQe3D1D3Here’s an example of direct and indirect effects, and how the graphical expression of multiple regression in the previous slide now gives way to a more web-like “causal model” where temporal relationships are implied, and the notion of mediation or mediating variables is put forth using a combination of path analysis and confirmatory factor analysis.Self-rated healthe4OccupationD2Singh-Manoux, Clark and Marmot Multiple measures of socio-economic position and psychosocial health: proximal and distal measures.
21 What Sample Size is Enough for SEM? The same as for regression*More is pretty much always betterSome fit indexes are sensitive to small samples*Unless you do things that are fancy!
22 What’s a Good Model? Fit measures: Chi-square test CFI (Comparative Fit Index)RMSE (Root Mean Square Error)TLI (Tucker Lewis Index)GFI (Goodness of Fit Index)And many, many, many moreIFI, NFI, AIC, CIAC, BIC, BCC
23 How Many Indicators Do I Need? That depends…How many do you have? (e.g., secondary data analysis)A prior concernsScale development standardsSubject burdenMore is often NOT better
24 Software LISREL 9.1 from SSI (Scientific Software International) IBM’s SPSS AmosEQS (Multivariate Software)Mplus (Linda and Bengt Muthen)CALIS (module from SAS)Stata’s new sem moduleR (lavaan and sem modules)
27 Texts (and a reference) Barbara M. Byrne (2012): Structural Equation Modeling with Mplus, Routledge PressShe also has an earlier work using AmosRex Kline (2010): Principles and Practice of Structural Equation Modeling, Guilford PressNiels Blunch (2012): Introduction to Structural Equation Modeling Using IBM SPSS Statistics and Amos, Sage PublicationsJames L. Arbuckle (2012): IBM SPSS Amos 21 User’s Guide, IBM Corporation (free from the Web)Rick H. Hoyle (2012): Handbook of Structural Equation Modeling, Guilford PressGreat fit index site: