Presentation on theme: "Soc 3306a: Path Analysis Using Multiple Regression and Path Analysis to Model Causality."— Presentation transcript:
Soc 3306a: Path Analysis Using Multiple Regression and Path Analysis to Model Causality
Causality Criteria: Association (correlation) Non-spuriousness Time order Theory (implied)
Causation Evidence for causation cannot be attributed from correlational data But can be found in: 1. the strength of the partial relationships (the bivariate relationship does not disappear when controlling for another variable) 2. assumed time order (derived from theory)
Path Analysis Can be used to test causality through the use of bivariate and multivariate regression Note that you are only finding evidence for causality, not proving it. Can use the standardized coefficients (the beta weights) to determine the strengths of the direct and indirect relationships in a multivariate model Is variability in DV stochastic (chance) or can it be explained by systematic components (correctly specified IV’s)
STEP 1 Specify a model derived from theory and a set of hypotheses Example: Model would predict that the variation in the dependent variable SEI can be explained by four independent variables, SEX, EDUC, INCOME, and AGE In other words, hypothesizes a causal relationship to explain SEI
SEI SEX AGE EDUC INC Exogenous VariablesEndogenous Variables Hypothetical Model For SEI
STEP 2 Test the bivariate correlations to determine which relationships are real. Initial correlation matrix showed that SEX was not significantly associated with any of the other variables except INCOME, which was a very weak negative relationship, so it was dropped from the model. Note: Bivariate scatterplots showed that all relationships were linear. Histograms and skewness statistics were within normal limits.
SEI AGE EDUC INC Exogenous VariablesEndogenous Variables Revised Hypothetical Model For SEI
Figure 1 Revised Bivariate Correlations Examine correlations between SEI and IV’s Moderately strong, positive relationship between SEI and Education, a weak- moderate relationship with INCOME and a very weak, non-significant one with AGE Look also at correlations between IV’s Strong correlations between IV’s ( >.700) can indicate multicollinearity. No problems observed in this model.
STEP 3: Find Path Coefficients The direct and indirect path coefficients are the standardized slopes or Beta Weights To find them, a series of multiple regression models are tested
Testing of Models Model 1 SEI = AGE + EDUC + INC + e e = error or unexplained variance Model 2 INC = AGE + EDUC + e Model 3 EDUC = AGE + e
Figure 1: Model 1 This is a full multiple regression model to regress SEI on all IV’s Examine the scatterplots for linearity and homoscedasticity Interpret the model. Is it significant? Interpret R (multiple correlation coefficient) and Adj. R 2 (coefficient of determination) Interpret slopes, betas and significance. Check partial correlations. Add betas to model diagram
Figure 2: Model 2 Now we need to calculate the other relationships (Betas) in the model Regress INC on EDUC and AGE Add betas to path diagram.
Figure 3: Model 3 Regress EDUC on AGE Again, add beta to path diagram.
SEI AGE EDUC INC Exogenous VariablesEndogenous Variables Causal Model For SEI.049 ns.182***.175*** -.071**.226***.561***
Causal Effect of EDUC and INC Causal Effect of EDUC: Indirect….. EDUC-INC->SEI=.226x.175=.040 Direct…. EDUC->SEI =.561 Total Causal Effect Indirect + Direct= =.601 Causal Effect of INC: Direct…. INC->SEI =.175 Total Causal Effect =.175
Issues Related to Path Analysis Very sensitive to model specification Failure to include relevant causal variables or inclusion of irrelevant variables can substantially affect the path coefficients Example: inclusion of AGE in above model Build your model one variable at a time (use Blocks and asking for R 2 change under statistics) to test for significant change in R 2 value until new additions do not significantly increase explanatory value of model further. But will not solve problem of irrelevant IV’s (i.e. when your model is overidentified)
SEM (Structural Equation Modeling) To avoid overidentification, the best strategy is to also examine alternative explanatory models One new technique is structural equation modeling (SEM) using specialized software (i.e. SPSS’s AMOS program) Can test several models simultaneously Although we will not cover SEM in class, it is something to keep in mind for future model building.
Comment on SEI Model (above) Model shown above had adj. R 2 =.396 Overall, INC, EDUC, AGE explained 39.6% of variation in SEI But, unexplained variance (error) was =.604 (stochastic component) 60.4% of variation in SEI still unexplained Furthermore, causal effect of AGE only.038 Specification error – this model is underidentified Could drop AGE and consider other important IV’s (i.e. CLASS, OCCUPATIONAL PRESTIGE)? See Figure 4 Revised Model Using CLASS