Structural Equation Models An Overview

Structural Equation Models An Overview
As with any regression model, structural equation models are causal X  Y (X, exogenous variable, causes Y, endogenous variable) A more complex variant would involve simultaneous causation (X causes Y and Y causes X at the same time) As with any regression model, expressed in form of equations: Y = b X + e

Structural Equation Models An Overview
SEM models usually involve continuous variables, or at least quantitative variables that are conceptually continuous dummy variables can be handled, but only in a very limited way a regression model is a simple form of structural equation model a factor analysis model is a form of structural equation model too more complex SEMs put together features of both with the ability to simultaneously estimate parameters in multiple groups, SEMs can also subsume ANOVA models

Structural Equation Models
Models can be expressed as path diagrams Above is part of a path diagram for a regression model, with X2 dependent and X1 independent. Actually, we need to add the error term to the model to make the diagram complete:

The model parameters in this simple model are: b1, more familiar as the regression coefficient connecting X1 with X2 the estimated variance of the error term the estimated variance of X1, which in this case is the same as the observed variance of X1 There are 3 empirical pieces of information from which we can estimate these 3 parameters the variance of X1 ; the variance of X2 the covariance of X1 & X2

The equation in this model is: X2= b1 X1 + e1 No intercept? Structural equation models generally involve mean-centered variables, so there is no intercept in equations only in more complicated “mean models” will we worry about the intercept we will cover mean/moment models later in the course in most regression models, the intercept is of less interest than the slope parameters (e.g., we want to know that as a person’s age increases by 1 year, he/she will watch 3 more minutes of television, but we don’t care so much that the “expected” amount of TV viewing at age 0 is 30 minutes [likely incorrect anyway]

A more complex model Equations: X4 = b1*X1 + b2*X2 + e4 X5 = b3*X1 + b4*X2 + b5*X3 + e5 This model assumes that the path from X3 to X4 is 0

Previous model assumed covariance between e4, e5 = 0 This model relaxes this assumption. Same for correlations (covariances) among X1, X2, X3

In this model, X1, X2 and X3 are exogenous (independent) X4, X5 are endogenous (dependent) The error terms, e4 and e5, are technically exogenous too.

This model has 3 equations X2,X3 are endogenous but we think of them as intervening variables in path analysis terms Psychologists tend to use the term “mediators” instead of or in addition to “intervening variables”) If variables all standardized, effect of X1 on X4 (total effect) is: (b1*b2) +( b3*b4)

This model similar to previous, but path involving b5 is added: Testable assumption: b5=0 (test of this model vs. previous model) b5 represents direct effect of X1 on X4 b3*b4, b1*b2 are indirect effects Total effect=(b1*b2) + (b3*b4) + b5

Model parameters: b1,b2,b3,b4,b5 Also a type of model parameter: All variances and covariances among exogenous variables Here, X1 is exogenous but X2,X3,X4 not. e2,e3 and e4 are exogenous

There are 4 observed variables: X1,X2,X3,X4. Let S be the covariance matrix of observed covariances among these variables. Empirical covariance matrix, S, has 10 elements (all possible covariances between X-variables). Reproduced covariance matrix (Σ) is an estimate of S based on the model parameters. It can be calculated from model parameters.

A non-recursive model We usually deal with recursive models, but non-recursive models can be handled too. (Not all of them, though: the model shown here is under-identified, which means its parameters are not uniquely estimable)

Manifest and Latent variables
In this course, we concentrate on Structural Equation Models involving LATENT VARIABLES. Properties of latent variables: - Latent variables are not directly measured LVs can be said to represent underlying “constructs” some relationship (hopefully linear, with indicators (manifest variables) Relationship rarely involves perfect correlation.

Synonyms: Latent variable: construct unobserved variable factor Manifest variable: indicator item observed variable (an error term is, technically, a type of latent variable)

Fundamental insight that motivates much of what is done in the LV SEM world: We can rarely measure without error Related: Measurement error is serious stuff (major consequences for parameter estimation) There are many different sources of measurement error and these are generally not random Bad enough if it’s random, but non-random measurement error biases parameter estimates obtained by “conventional” means Obtaining multiple measures (multiple indicators) helps (think of it as “triangulation”)

Multiple measurement Example: How happy a child is Possible measures:
- child care worker #1 rates the child - child care worker #2 rates the child - child asked to show how happy by piling building blocks - video tape number of times child smiles Each of these measures is fallible (indeed, can be totally wrong in particular cases), though we expect the measurements to be correlated

LATENT VS. MANIFEST VARIABLES: DIAGRAMMING Latent1 is a “latent variable” – not directly measured. In factor analysis, this would be a “factor”. Diagrammatically , circle = latent variable square = manifest variable (error terms sometimes shown as enclosed with circle, sometimes just labeled but not enclosed by a circle).

In factor analysis, Latent1 would be a “factor” with 4 indicators. The model has four measurement equations: X1 = b1*Latent1 + e1 X2 = b2*Latent1 + e2 X3 = b3*Latent1 + e3 X4 = b4*Latent1 + e4

A model with 2 latent variables: In this model, the 2 latent variables are correlated; this is indicated by the curved lines with “double headed” arrows. In factor analysis, this would be a two factor model This model has 6 equations

Structural Equation Models (Confirmatory Factor Analysis)
Equations: X1 = b1*Latent1 + e1 X2 = b2*Latent1 + e2 X3 = b3*Latent1 + e3 X4 = b4*Latent2 + e4 X5 = b5*Latent2+ e5 X6 = b6*Latent2 + e6 There is a correlation between X4 and X1, but it is expressed through the parameters b4, b1 and the covariance between Latent1 and Latent2

Previous model an example of simple structure. It is possible to add parameters (in this case, Latent2  X3: The equation becomes X3 = b3*Latent1 + b7*Latent2 + e3 In factor analysis, we’d call item X3 “factorally complex”

This model has 6 manifest variables (X1 through X6). The covariance matrix S represents the empirically observed covariances among these 6 variables. This model has 8 exogenous variables: e1, e2, e3, e4, e5, e6, Latent1 and Latent2 We may model covariances among exogenous variables (curved arrow) but not among endogenous variables. [Why? Algebraically, we can always express the latter as a function of the former + regression coefficients]

Model Parameters in this model: 6 regression coefficients (b1 through b6) Variances and covariances among the exogenous variables (variance of e1,e2,e3,e4,e5,e6, variance of Latent1, variance of Latent 2 AND the covariance between Latent1 and Latent2)

Manifest variable variances and covariances
The “building blocks” of structural equation models As is the case with regression models, we can estimate most SEM models without the raw data – just need variances and covariances** and sometimes the means ** well, at least until we get to models for non-normal data or models for missing data!

Models discussed here are primarily for continuous variables (X-variables and Y-variables) Latent variables are conceptually continuous. Models are based on covariances of observed variables COV(X,Y) = Σ (Xi)(Yi) / (N-1) where Xi is mean-centred value of X (X minus mean of X)

Models are based on covariances of observed variables COV(X,Y) = Σ (Xi)(Yi) / (N-1) where Xi is mean-centred value of X (X minus mean of X) In regression b* = covxx-1covxy where b* = vector of b’s without intercept

What we lose when we work with covariances: Means and intercepts (not serious: we can easily bring these back in later) Think about OLS assumptions (discuss)

What we lose when we work with covariances: 2. Think about OLS assumptions non-linearities (some are readily transformable – no problem(!), but some are not) Interactions (type of non-linearity) Residuals (detection of outliers, etc.) Form of distribution (skewed? Kurtotic?)

Measurement Error, and is relationship to SEM models
Regular regression, assumes X1, X2 measured without error X1, X2 imperfect indicators of L1 and L2 respectively.

X1, X2 imperfect indicators of L1 and L2 respectively. Imagine X1 correlated .80 with L1; X2 correlated .80 with L2 If the real correlation between L1 and L2 is .50, the observed correlation between X1 and X2 will only be .50 x .64 = .32 This is sometimes referred to as attenuation. SEM MODELS WITH LATENT VARIABLES CORRECT FOR ATTENUATION The price: we usually need 3 indicators per latent variable to solve equations (can sometimes get away with 2)

Sadly, in more complex models with multiple LVs, parameter coefficients aren’t just downward biased Could be that a coefficient is actually higher than it should be (“all bets are off”) Need models that will adjust for measurement error (!), which is what SEM models will do for us

Models with Causal Relationships among Latent Variables
Extension involving causal relationships among LVs. Factor analysis  latent1, latent2 exogenous Latent 1 exogenous, Latent2 endogenous Error term: d2 -

Models with Causal Relationships among Latent Variables
Equations: Measurement equations: X1 = 1*Latent1 + e1 X2=b2*Latent1 + e2 X3 = b3*Latent1 + e3 X4 = b4*Latent2 + e4 X5 = b5* Latent2 + e5 X6 = 1*Latent2 + e6 2. Struct. Equations among latent variables: Latent2 = b1*Latent1 + d2

Special Cases SEM models are ideally suited for models where all of the variables are perfectly normally distributed (and, by implication, conceptually continuous), where we have multiple indicators for each variable, where relationships are all linear What about situations where this is not the case?

What if I don’t have multiple indicators for all of my variables?
Special Cases We will spend a lot of time in the course discussing the “limits” and how these are dealt with. The following is a very cursory and simplified summary. What if I don’t have multiple indicators for all of my variables? Single-indicator variables can be included in models but we must make stronger assumptions about error (e.g., “measured without error” or assume a given % of error and further assume it is random)

Can I use dummy variables?
Special Cases Can I use dummy variables? As totally exogenous variables, yes (interestingly, texts tend not to provide examples, discuss interpretation issues, etc.) As endogenous variables, generally no ** What if my variables are measured on 4-point or 5-point scales instead of being continuously distributed? There is a variety of approaches to dealing with “coarsely categorized” data, providing the variables included in the model are conceptually continuous ** though we will discuss latent class and “mixture” models late in the course

What about interaction models?
Special Cases What about interaction models? Though not impossible, these are extremely difficult Exception: where one of the X-variables involved in the interaction is categorical and data can be “grouped” (e.g., interaction between country and education with dependent variable religiosity: could model this as a “multiple group” problem Group 1 = USA Group 2=Britain etc.). I have a model with an N of Can I run an SEM model on it? Generally, no. For virtually all SEM models, the minimum N is in the range. Larger sample sizes may be required for non-normal data models.

Special Cases 6. A quantitative methodologist in my department told me not to even think about SEM models because they assume perfectly normally distributed data and in real life we rarely see this. This critique is “old” and predates the development of new approaches to deal with non-normality SEM models are fairly robust to departures from normality anyway

Special Cases 7. A colleague told me that LISREL represents the absolute height of abstracted empiricism. The method gives us a false sense of security around the precision of estimates when we’d be far better off with “rough and dirty” estimates from a simple set of OLS equations. Interestingly, LISREL is implicitly realist and not empiricist in epistemological orientation; technically, an empiricist would say, “if you can’t measure it, it doesn’t exist” and latent variables are by definition variables that you can’t measure (directly). The fact that parameter estimates may have wide-ranging sources of imperfection should not prevent us from seeking to reduce bias as much as possible. Clearly, an unbiased estimate is better than a biased estimate. Whether the researcher chooses to present estimates as “highly precise” or otherwise is a different issue.

Special Cases 8. The Problem with LISREL is that it is too easy to mess up without us knowing that our model is based on incorrect assumptions. This is not a reason to abandon the technique, but rather a reason to learn how to use it properly. We will spend time in class discussing the problem of the estimation of models that make no sense (with appropriate examples from the literature!)

A few words about SEM software
Generally expensive (typically $700US for academic versions) Sometimes available as part of site licenses: Somewhat restricted SEM software is built into SAS as the CALIS procedure Some university campus site licenses for SPSS contain the AMOS “module” (but many do not)

The Software for SEM models
In most cases, a covariance matrix must be generated. Usually, an SEM program will do this, but sometimes it is necessary to generate the matrix from other software, such as SPSS or SAS, using PROC CORR (SAS), Correlations (SPSS), etc. Even if the program does this internally, this is the “first step”.

The Software for SEM models
SAS: PROC CALIS SPSS: No built-in program, but AMOS is sold as an “add on”. AMOS can read SPSS files LISREL can read files of many types, including SPSS and SAS. Other programs: EQS, MPlus

The Software for SEM models: AMOS
AMOS works with a graphic interface. Draw the model of interest, insert variable names connected to an SPSS dataset, then “attach” this dataset. Intuitively appealing Limitation: a nightmare with very large models which clutter the screen and are hard to follow

The Software for SEM models: The SAS CALIS procedure
Strong programming similarities with EQS Some programming similarities with the “SIMPLIS” version of LISREL Basically, we need to: Write out equations a) linking manifest to latent variables b) linking latent variables to other latent variables Identify exogenous variable variances and covariances as parameters

The Software for SEM models: The LISREL program
LISREL’s basic programming form is matrix A bit more difficult to get used to, but very powerful once mastered LISREL also has a scalar (equation-based) facility called SIMPLIS. This course makes more use of LISREL than other software (though in the first week we will use AMOS, which is a good learning tool)

The Software for SEM models: EQS
EQS basic programming form is scalar Some matrix-style specification possible Basic form: write out equations, specify variances and covariances of exogenous variables An option in this course (will be discussed, briefly, if there is class interest). Program most commonly used in Psychology

The Software for SEM models: Other Software
MPlus (nice generalizations to latent class, mixture models etc.) -- we will try to present some MPlus examples in the class Mx (free distribution) – matrix form, user interface more difficult EZPath

Last slide Tomorrow’s class:
Translating diagrams to equations and vice versa Working with AMOS Specifying model parameters Covariance algebra for SEM models (scalar form)

Structural Equation Models An Overview

Similar presentations

Presentation on theme: "Structural Equation Models An Overview"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Structural Equation Models An Overview

Similar presentations

Presentation on theme: "Structural Equation Models An Overview"— Presentation transcript:

Similar presentations

About project

Feedback