EPSY 651: Structural Equation Modeling I. Where does SEM fit in Quantitative Methodology? Draws on three traditions in mathematics and science: Psychology.

Slides:



Advertisements
Similar presentations
3.3 Hypothesis Testing in Multiple Linear Regression
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Managerial Economics in a Global Economy
Structural Equation Modeling. What is SEM Swiss Army Knife of Statistics Can replicate virtually any model from “canned” stats packages (some limitations.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Structural Equation Modeling
Chapter 7 Statistical Data Treatment and Evaluation
Hypothesis Testing Steps in Hypothesis Testing:
Correlation and regression Dr. Ghada Abo-Zaid
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Linear regression models
Ch11 Curve Fitting Dr. Deshi Ye
The General Linear Model. The Simple Linear Model Linear Regression.
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Chapter 10 Simple Regression.
CORRELATION LECTURE 1 EPSY 640 Texas A&M University.
CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited Ref: Cohen, Cohen, West, & Aiken (2003), ch. 2.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
LECTURE 11 Hypotheses about Correlations EPSY 640 Texas A&M University.
LECTURE 5 MULTIPLE REGRESSION TOPICS –SQUARED MULTIPLE CORRELATION –B AND BETA WEIGHTS –HIERARCHICAL REGRESSION MODELS –SETS OF INDEPENDENT VARIABLES –SIGNIFICANCE.
LECTURE 12 Multiple regression analysis Epsy 640 Texas A&M University.
Chapter 11 Multiple Regression.
Multiple Linear Regression
Introduction to Probability and Statistics Linear Regression and Correlation.
Correlational Designs
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
LECTURE 13 PATH MODELING EPSY 640 Texas A&M University.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Simple Linear Regression and Correlation
Simple Linear Regression Analysis
Linear Regression/Correlation
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Lecture 16 Correlation and Coefficient of Correlation
Correlation and Regression
Introduction to Linear Regression and Correlation Analysis
Correlation and Regression
Regression Analysis (2)
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.2 Estimating Differences.
Bivariate Regression (Part 1) Chapter1212 Visual Displays and Correlation Analysis Bivariate Regression Regression Terminology Ordinary Least Squares Formulas.
Understanding Statistics
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Regression Analysis © 2007 Prentice Hall17-1. © 2007 Prentice Hall17-2 Chapter Outline 1) Correlations 2) Bivariate Regression 3) Statistics Associated.
Correlation & Regression Analysis
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Chapter 17 STRUCTURAL EQUATION MODELING. Structural Equation Modeling (SEM)  Relatively new statistical technique used to test theoretical or causal.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
The SweSAT Vocabulary (word): understanding of words and concepts. Data Sufficiency (ds): numerical reasoning ability. Reading Comprehension (read): Swedish.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Central Bank of Egypt Basic statistics. Central Bank of Egypt 2 Index I.Measures of Central Tendency II.Measures of variability of distribution III.Covariance.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Applied Regression Analysis BUSI 6220
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation and Simple Linear Regression
6-1 Introduction To Empirical Models
Correlation and Simple Linear Regression
Simple Linear Regression
Product moment correlation
Linear Regression and Correlation
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

EPSY 651: Structural Equation Modeling I

Where does SEM fit in Quantitative Methodology? Draws on three traditions in mathematics and science: Psychology (Spearman, Kelley, Thurstone, Cronbach, etc. Sociology (Wright) Agriculture and statistics: (Pearson, Fisher, Neymann, Rao, etc.) Largely due to Jöreskog in 1960s & 1970s Map below shows its positioning

MANIFEST MODELING Classical statistics within the parametric tradition Canonical analysis subsumes most methods as special cases

LATENT MODELING Psychological concept of “FACTOR” is central to latent modeling: unobserved directly but “indicated” through observed variables Emphasis on error as individual differences as well as problem of observation (measurement) rather than “lack of fit” conception in manifest modeling

STRUCTURAL EQUATION MODELING PURPOSES MODEL real world phenomena in social sciences with respect to –POPULATIONS –ECOLOGIES –TIME

SEM PROCEDURE FOCUS ON DECOMPOSITION OF COVARIANCE MATRIX:  xy =  (  x,  y,  2 x,  2 y,  xy ) +  (e x,e y, e xy ) x =  +  y = By +  x + e

TESTING in SEM SEM tests A PRIORI (theoretically specified) MODELS SEM has potential to consider model revisions SEM is not necessarily good for exploratory modeling

SEM COMPARISONS SEM can COMPARE Ecologies or Populations for identical models or Simultaneously compare multiple groups or ecologies with each having unique models Statistical testing is available for all parts of all models as well as overall model fit

CORRELATION

Karl Pearson ( (exerpted from E S Pearson, Karl Pearson: An Appreciation of some aspects of his life and works, Cambridge University Press, 1938).

Pearson Correlation n  (x i – x x )(y i – y y )/(n-1) r xy = i=1_____________________________ = s xy /s x s y s x s y =  z x i z y i /(n-1) = COVARIANCE / SD(x)SD(y)

COVARIANCE DEFINED AS CO-VARIATION COV xy = Sxy “UNSTANDARDIZED CORRELATION” Distribution is statistically workable Basis of Structural Equation Modeling (SEM) is constructing models for covariances of variables

SAT Math Calc Grade.364 (40) error. 932(.955) Figure 3.4: Path model representation of correlation between SAT Math scores and Calculus Grades  1 – r 2 s e = standard deviation of errors correlation covariance

Path Models path coefficient -standardized coefficient next to arrow, covariance in parentheses error coefficient- the correlation between the errors, or discrepancies between observed and predicted Calc Grade scores, and the observed Calc Grade scores. Predicted(Calc Grade) = SAT-Math +.5 errors are sometimes called disturbances

X Y a XY b X Y e c Figure 3.2: Path model representations of correlation

BIVARIATE DATA 2 VARIABLES QUESTION: DO THEY COVARY? IF SO, HOW DO WE INTERPRET? IF NOT, IS THERE A THIRD INTERVENING (MEDIATING) VARIABLE OR EXOGENOUS VARIABLE THAT SUPPRESSES THE RELATIONSHIP? OR MODERATES THE RELATIONSHIP

IDEALIZED SCATTERPLOT POSITIVE RELATIONSHIP X Y Prediction line

IDEALIZED SCATTERPLOT NEGATIVE RELATIONSHIP X Y Prediction line 95% confidence interval around prediction X. Y.

IDEALIZED SCATTERPLOT NO RELATIONSHIP X Y Prediction line

SUPPRESSED SCATTERPLOT NO APPARENT RELATIONSHIP X Y Prediction lines MALES FEMALES

MODEERATION AND SUPPRESSION IN A SCATTERPLOT NO APPARENT RELATIONSHIP X Y Prediction lines MALES FEMALES

IDEALIZED SCATTERPLOT POSITIVE CURVILINEAR RELATIONSHIP X Y Linear prediction line Quadratic prediction line

Hypotheses about Correlations

One sample tests for Pearson r Two sample tests for Pearson r Multisample test for Pearson r Assumptions: normality of x, y being correlated

One Sample Test for Pearson r Null hypothesis:  = 0, Alternate   0 test statistic: t = r/ [(1- r 2 ) / (n-2)] 1/2 with degrees of freedom = n-2

One Sample Test for Pearson r ex. Descriptive Statistics for Kindergarteners on a Reading Test (from SPSS) MeanStd. DeviationN Naming letters Overall reading Correlations NamingOverall Naming letters ** Sig. (1-tailed)..000 N7676 Overall reading.784**1.000 Sig. (1-tailed).000. N7676 ** Correlation is significant at the 0.01 level (1-tailed).

One Sample Test for Pearson r Null hypothesis:  = c, Alternate   c test statistic: z = (Zr - Zc )/ [1/(n-3)] 1/2 where z=normal statistic, Zr = Fisher Z transform

Fisher’s Z transform Zr = tanh -1 r = (1/2) ln[(1+  r  ) /(1 -  r |)] This creates a new variable with mean Z  and SD 1/  1/(n-3) which is normally distributed

Non-null r example Null:  (girls) =.784 Alternate:  (girls) .784 Data: r =.845, n= 35 Z  (girls=.784) = 1.055, Zr(girls=.845)=1.238 z = ( )/[1/(35-3)] 1/2 =.183/(1/ ) = 1.035, nonsig.

Two Sample Test for Difference in Pearson r’s Null hypothesis:  1 =  2 Alternate hypothesis  1   2 test statistic: z =( Zr 1 - Zr 2 ) / [1/(n 1 -3) + 1/(n 2 -3)] 1/2 where z= normal statistic

Example Null hypothesis:  girls =  boys Alternate hypothesis  girls   2boys test statistic: r girls =.845, r boys =.717 n girls = 35, n boys = 41 z = Z(.845) - Z(.717) / [1/(35-3) + 1/(41- 3)] 1/2 = ( ) / [1/32 + 1/38] 1/2 =.337 /.240 = 1.405, nonsig.

Multisample test for Pearson r Three or more samples: Null hypothesis:  1 =  2 =  3 etc Alternate hypothesis: some  i   j Test statistic:  2 =  w i Z 2 i - w.Z 2 w which is chi-square distributed with #groups- 1 degrees of freedom and w i = n i -3, w.=  w i, and Z w =  w i Z i /w.

Example Multisample test for Pearson r Nonsig.

Multiple Group Models of Correlation SEM approach models several groups with either the SAME or Different correlations: X X y y boys girls  xy = a

Multigroup SEM SEM Analysis produces chi-square test of goodness of fit (lack of fit) for the hypothesis about ALL groups at once Other indices: Comparative Fit Index (CFI), Normed Fit Index (NFI), Root Mean Square Error of Approximation (RMSEA) CFI, NFI >.95 means good fit RMSEA <.06 means good fit

Multigroup SEM SEM assumes large sample size, multinormality of all variables Robust as long as skewness and kurtosis are less than  3, sample size is probably > 100 per group (200 is better), or few parameters are being estimated (sample size as low as 70 per group may be OK with good distribution characteristics)

Multiple regression analysis

The test of the overall hypothesis that y is unrelated to all predictors, equivalent to H 0 :  2 y  123… = 0 H 1 :  2 y  123… = 0 is tested by F = [ R 2 y  123… / p] / [ ( 1 - R 2 y  123… ) / (n – p – 1) ] F = [ SS reg / p ] / [ SS e / (n – p – 1)]

Multiple regression analysis SOURCEdfSum of SquaresMean Square F x 1, x 2 …pSS reg SS reg / p SS reg / p SS e /(n-p- 1) e (residual) n-p-1SS e SS e / (n-p-1) total n-1SS y SS y / (n-1)

Multiple regression analysis predicting Depression LOCUS OF CONTROL, SELF-ESTEEM, SELF-RELIANCE

ss x 1 ss x 2 SSy SSe Fig. 8.4: Venn diagram for multiple regression with two predictors and one outcome measure SS reg

Type I ss x 1 Type III ss x 2 SSy SSe Fig. 8.5: Type I contributions SSx 1 SSx 2

Type III ss x 1 Type III ss x 2 SSy SSe Fig. 8.6: Type IIII unique contributions SSx 1 SSx 2

Multiple Regression ANOVA table SOURCEdfSum of SquaresMean SquareF (Type I) Model2SS reg SS reg / 2SS reg / 2 SS e / (n- 3) x 1 1 SS x1 SS x1 / 1SS x1 / 1 SS e /(n-3) x 21 SS x2  x1 SS x2  x1 SS x2  x1 / 1 SS e /(n-3) e n-3SS e SS e / (n-3) total n-1SS y SS y / (n-3)

X1X1 X2X2 Y e  =.5  =.6 r =.4 R 2 = (.74)(.8)(.4)  ( ) = PATH DIAGRAM FOR REGRESSION

Depression DEPRESSION LOC. CON. SELF-EST SELF-REL R 2 =.60 e .4

Shrinkage R 2 Different definitions: ask which is being used: –What is population value for a sample R 2 ? R 2 s = 1 – (1- R 2 )(n-1)/(n-k-1) –What is the cross-validation from sample to sample? R 2 sc = 1 – (1- R 2 )(n+k)/(n-k)

Estimation Methods Types of Estimation: –Ordinary Least Squares (OLS) Minimize sum of squared errors around the prediction line –Generalized Least Squares A regression technique that is used when the error terms from an ordinary least squares regression display non-random patterns such as autocorrelation or heteroskedasticity.ordinary least squares –Maximum Likelihood

Maximum Likelihood Estimation Maximum likelihood estimation There is nothing visual about the maximum likelihood method - but it is a powerful method and, at least for large samples, very preciseMaximum likelihood estimation begins with writing a mathematical expression known as the Likelihood Function of the sample data. Loosely speaking, the likelihood of a set of data is the probability of obtaining that particular set of data, given the chosen probability distribution model. This expression contains the unknown model parameters. The values of these parameters that maximize the sample likelihood are known as the Maximum Likelihood Estimatesor MLE's. Maximum likelihood estimation is a totally analytic maximization procedure. MLE's and Likelihood Functions generally have very desirable large sample properties: –they become unbiased minimum variance estimators as the sample size increases –they have approximate normal distributions and approximate sample variances that can be calculated and used to generate confidence bounds –likelihood functions can be used to test hypotheses about models and parameters With small samples, MLE's may not be very precise and may even generate a line that lies above or below the data pointsThere are only two drawbacks to MLE's, but they are important ones: –With small numbers of failures (less than 5, and sometimes less than 10 is small), MLE's can be heavily biased and the large sample optimality properties do not apply Calculating MLE's often requires specialized software for solving complex non- linear equations. This is less of a problem as time goes by, as more statistical packages are upgrading to contain MLE analysis capability every year.

Outliers Leverage (for a single predictor): L i = 1/n + (Xi –Mx) 2 /  x 2 (min=1/n, max=1) Values larger than 1/n by large amount should be of concern Cook’s Di =  (Y – Yi) 2 / [(k+1)MSres] –the difference between predicted Y with and without Xi   

Outliers In SPSS under SAVE options COOKs and Leverage Values are options you can select Result is new variables in your SPSS data set with the values for each case given You can sort on either one to investigate the largest values for each You can delete the cases with largest values and recompute the regression to see if it changed

t12 t13 t14 COO_1 LEV_1