Structural Equation Models: The General Case STA431: Spring 2013.

Slides:



Advertisements
Similar presentations
STA305 Spring 2014 This started with excerpts from STA2101f13
Advertisements

SEM PURPOSE Model phenomena from observed or theoretical stances
Structural Equation Modeling Using Mplus Chongming Yang Research Support Center FHSS College.
General Structural Equation (LISREL) Models
Structural Equation Modeling
Brief introduction on Logistic Regression
Statistical Techniques I EXST7005 Multiple Regression.
Structural Equation Modeling: An Overview P. Paxton.
Introduction to Regression with Measurement Error STA431: Spring 2015.
Exploratory Factor Analysis
Linear Regression and Binary Variables The independent variable does not necessarily need to be continuous. If the independent variable is binary (e.g.,
Instrumental Variables Estimation and Two Stage Least Square
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Logistic Regression STA302 F 2014 See last slide for copyright information 1.
Structural Equation Modeling
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Chapter 4 Multiple Regression.
The Simple Regression Model
Structural Equation Modeling
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Linear and generalised linear models
Linear and generalised linear models
Basics of regression analysis
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Introduction to Regression with Measurement Error STA431: Spring 2013.
Correlation and Regression Analysis
G Lect 31 G Lecture 3 SEM Model notation Review of mediation Estimating SEM models Moderation.
Structural Equation Modeling Intro to SEM Psy 524 Ainsworth.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Structural Equation Modeling Continued: Lecture 2 Psy 524 Ainsworth.
Structural Equation Modeling 3 Psy 524 Andrew Ainsworth.
Normal Linear Model STA211/442 Fall Suggested Reading Faraway’s Linear Models with R, just start reading. Davison’s Statistical Models, Chapter.
Double Measurement Regression STA431: Spring 2013.
Maximum Likelihood See Davison Ch. 4 for background and a more thorough discussion. Sometimes.
Introduction to Regression with Measurement Error STA302: Fall/Winter 2013.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
Slide 10.1 Structural Equation Models MathematicalMarketing Chapter 10 Structural Equation Models In This Chapter We Will Cover The theme of this chapter.
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.
Factorial ANOVA More than one categorical explanatory variable STA305 Spring 2014.
SEM: Basics Byrne Chapter 1 Tabachnick SEM
Logistic Regression STA2101/442 F 2014 See last slide for copyright information.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
LECTURE 14 TUESDAY, 13 OCTOBER STA 291 Fall
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Confirmatory Factor Analysis Part Two STA431: Spring 2013.
Categorical Independent Variables STA302 Fall 2013.
SEM: Basics Byrne Chapter 1 Tabachnick SEM
STA302: Regression Analysis. Statistics Objective: To draw reasonable conclusions from noisy numerical data Entry point: Study relationships between variables.
SEM Basics 2 Byrne Chapter 2 Kline pg 7-15, 50-51, ,
ANOVA, Regression and Multiple Regression March
Categorical Independent Variables STA302 Fall 2015.
STA302: Regression Analysis. Statistics Objective: To draw reasonable conclusions from noisy numerical data Entry point: Study relationships between variables.
Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H Row A
The Sample Variation Method A way to select sample size This slide show is a free open source document. See the last slide for copyright information.
Logistic Regression For a binary response variable: 1=Yes, 0=No This slide show is a free open source document. See the last slide for copyright information.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
BPS - 5th Ed. Chapter 231 Inference for Regression.
AP PHYSICS 1 SUMMER PACKET Table of Contents 1.What is Physics? 2.Scientific Method 3.Mathematics and Physics 4.Standards of Measurement 5.Metric System.
Stats Methods at IC Lecture 3: Regression.
Exploratory Factor Analysis
Logistic Regression STA2101/442 F 2017
Structural Equation Models: The General Case
Interactions and Factorial ANOVA
Logistic Regression Part One
Exploratory Factor Analysis. Factor Analysis: The Measurement Model D1D1 D8D8 D7D7 D6D6 D5D5 D4D4 D3D3 D2D2 F1F1 F2F2.
Structural Equation Modeling
Presentation transcript:

Structural Equation Models: The General Case STA431: Spring 2013

An Extension of Multiple Regression More than one regression-like equation Includes latent variables Variables can be explanatory in one equation and response in another Modest changes in notation Vocabulary Path diagrams No intercepts, all expected values zero Serious modeling (compared to ordinary statistical models) Parameter identifiability

Variables can be response in one equation and explanatory in another Variables (IQ = Intelligence Quotient): – X 1 = Mother’s adult IQ – X 2 = Father’s adult IQ – Y 1 = Person’s adult IQ – Y 2 = Child’s IQ in Grade 8 Of course all these variables are measured with error. We will lose the intercepts very soon.

Modest changes in notation Regression coefficients are now called gamma instead of beta Betas are used for links between Y variables Intercepts are alphas but they will soon disappear. Especially when model equations are written in scalar form, we feel free to drop the subscript i; implicitly, everything is independent and identically distributed for i = 1, …, n.

Strange Vocabulary Variables can be Latent or Manifest. – Manifest means observable – All error terms are latent Variables can be Exogenous or Endogenous – Exogenous variables appear only on the right side of the = sign. Think “X” for explanatory variable. All error terms are exogenous – Endogenous variables appear on the left of at least one = sign. Think “end” of an arrow pointing from exogenous to endogenous Betas link endogenous variables to other endogenous variables.

Path diagrams

Path Diagram Rules Latent variables are enclosed by ovals. Observable (manifest) variables are enclosed by rectangles. Error terms are not enclosed – Sometimes the arrows from the error terms seem to come from nowhere. The symbol for the error term does not appear in the path diagram. – Sometimes there are no arrows for the error terms at all. It is just assumed that such an arrow points to each endogenous variable. Straight, single-headed arrows point from each variable on the right side of an equation to the endogenous variable on the left side. – Sometimes the coefficient is written on the arrow, but sometimes it is not. A curved, double-headed arrow between two variables (always exogenous variables) means they have a non-zero covariance. – Sometimes the symbol for the covariance is written on the curved arrow, but sometimes it is not.

Causal Modeling (cause and effect) The arrows deliberately imply that if A  B, we are saying A contributes to B, or partly causes it. There may be other contributing variables. All the ones that are unknown are lumped together in the error term. It is a leap of faith to assume that these unknown variables are independent of the variables in the model. This same leap of faith is made in ordinary regression. Usually, we must live with it or go home.

But Correlation is not the same as causation! ABA B A B C Young smokers who buy contraband cigarettes tend to smoke more.

Confounding variable: A variable that contributes to both the explanatory variable and the response variable, causing a misleading relationship between them. A B C

Mozart Effect Babies who listen to classical music tend to do better in school later on. Does this mean parents should play classical music for their babies? Please comment. (What is one possible confounding variable?)

Experimental vs. Observational studies Observational: explanatory variable, response variable just observed and recorded Experimental: Cases randomly assigned to values of explanatory variable Only a true experimental study can establish a causal connection between explanatory variable and response variable

Structural equation models are mostly applied to observational data The correlation-causation issue is a logical problem, and no statistical technique can make it go away. So you (or the scientists you are helping) have to be able to defend the what-causes-what aspects of the model on other grounds. Parents’ IQ contributes to your IQ and your IQ contributes to your kid’s IQ. This is reasonable. It certainly does not go in the opposite direction.

Models of Cause and Effect This is about the interpretation (and use) of structural equation models. Strictly speaking it is not a statistical issue and you don’t have to think this way. However, … If you object to modeling cause and effect, structural equation modelers will challenge you. They will point out that regression models are structural equation models. Why do you put some variables on the left of the equals sign and not others? – You want to predict them. – It makes more sense that they are caused by the explanatory variables, compared to the other way around. If you want pure prediction, use standard tools. But if you want to discuss why a regression coefficient is positive or negative, you are assuming the explanatory variables in some way contribute to the response variable.

Serious Modeling Once you accept that model equations are statements about what contributes to what, you realize that structural equation models represent a rough theory of the data, with some parts (the parameter values) unknown. They are somewhere between ordinary statistical models, which are like one-size-fits-all clothing, and true scientific models, which are like tailor made clothing. So they are very flexible and potentially valuable. It is good to combine what the data can tell you with what you already know. But structural equation models can require a lot of input and careful thought to construct. In this course, we will get by mostly on common sense. In general, the parameters of the most reasonable model need not be identifiable. It depends upon the form of the data as well as on the model. Identifiability needs to be checked. Frequently, this can be done by inspection.

Example: Halo Effects in Real Estate

Losing the intercepts and expected values Mostly, the intercepts and expected values are not identifiable anyway, as in multiple regression with measurement error. We have a chance to identify a function of the parameter vector – the parameters that appear in the covariance matrix Σ = V(D). Re-parameterize. The new parameter vector is the set of parameters in Σ, and also μ = E(D). Estimate μ with x-bar, forget it, and concentrate on inference for the parameters in Σ. To make calculation of the covariance matrix easier, write the model equations with zero expected values and no intercepts. The answer is also correct for non- zero intercepts and expected values, by the centering rule.

From this point on the models have no means and no intercepts. Now more examples

Multiple Regression X1X1 X2X2 Y

Regression with measurement error

A Path Model with Measurement Error

A Factor Analysis Model X1X1 X2X2 X3X3 X4X4 X5X5 General Intelligence e1e1 e2e2 e3e3 e4e4 e5e5

A Longitudinal Model P1P1 M1M1 P2P2 P3P3 P4P4 M2M2 M3M3 M4M4

Estimation and Testing as Before X Y1Y1 Y2Y2

Distribution of the data

Maximum Likelihood Minimize the “Objective Function”

Tests Z tests for H 0 : Parameter = 0 are produced by default “Chi-square” = (n-1) * Final value of objective function is the standard test for goodness of fit. Multiply by n instead of n-1 to get a true likelihood ratio test. Consider two nested models. One is more constrained (restricted) than the other. Then n * the difference in final objective functions is the large-sample likelihood ratio test, df = number of (linear) restrictions on the parameter. Other tests (for example Wald tests) are possible too.

A General Two-Stage Model

More Details

Recall the example

Observable variables in the latent variable model (fairly common) These present no problem Let P(e j =0) = 1, so Var(e j ) = 0 And Cov(e i,e j )=0 because if P(e j =0) = 1 So in the covariance matrix Ω=V(e), just set ω ij = ω ji = 0, i=1,…,k

What should you be able to do? Given a path diagram, write the model equations and say which exogenous variables are correlated with each other. Given the model equations and information about which exogenous variables are correlated with each other, draw the path diagram. Given either piece of information, write the model in matrix form and say what all the matrices are. Calculate model covariance matrices Check identifiability

Recall the notation

For the latent variable model, calculate Φ = V(F) So,

For the measurement model, calculate Σ = V(D)

Two-stage Proofs of Identifiability Show the parameters of the measurement model (Λ, Φ, Ω) can be recovered from Σ = V(D). Show the parameters of the latent variable model (β, Γ, Φ 11, Ψ) can be recovered from Φ = V(F). This means all the parameters can be recovered from Σ. Break a big problem into two smaller ones. Develop rules for checking identifiability at each stage.

Copyright Information This slide show was prepared by Jerry Brunner, Department of Statistics, University of Toronto. It is licensed under a Creative Commons Attribution - ShareAlike 3.0 Unported License. Use any part of it as you like and share the result freely. These Powerpoint slides are available from the course website: