2 SEM Basics SEM is a statistical technique for testing and estimating causal relationships first proposed in 1921 by the American Geneticist Dr. Sewall Green Wright ( )
3 SEM Basics SEM without latent variables is called Path Analysis. SEM is a confirmatory analysis procedure although sometimes it can also be used as an exploratory analysis tool. SEM is a set of usually inter-related linear regression equations.
4 A simple example: Path Diagrams & Equations for Eating Disorder Every variable with an incoming arrow leads to a regression equation. Our regression equation system is as follows: Directional Arrows indicate cause and effect
5 SEM Programs For this example, we will use PROC CALIS. It takes our linear equations (previous slide) and estimates the parameters for the model. Then it evaluates the goodness of fit of the model. PROC CALIS (and PROC TCALIS) in SAS LISREL (Karl Gustav Jöreskog & Dag Sörbom) EQS (Peter Bentler, UCLA) AMOS The most popular software packages for SEM are:
6 Dr. Karl Gustav Jöreskog (Sweden) & Dr. Peter M. Bentler (USA) Widely viewed as leaders in SEM development in our times
7 SAS Code Give the linear equations describing the system Suppress Pearson correlations Proc calis: use cov rather than corr SAS correlation procedure Give variances of exogenous variables Error variances Specifies output Dataset with type Covariance matrix proc corr cov nocorr data=eddata outp=edcova(type=cov); run; proc calis cov mod data=edcova; Lineqs bi = b1 am + b2 sw + E1, sw = b3 am + b4 bi + E2, dt = b5 bi + b6 sw + E3, rd = b7 dt + E4 ; Std E1-E4 = The1-The4 ; Cov E1 E2 = Ps1; Run; We must set the error equal to a parameter value; otherwise it is assumed to be 0 BI and SW are correlated so we must estimate the correlation of their error terms’ variances
8 Results of SAS Analysis: Age of Menstruation (AM) Body Image (BI) Adolescent Self Worth (SW) Drive for Thinness (DT) Risk for Disorder (RD) SAS gives us parameter estimates, error estimates, and t- values for each path included in the model. We use a t-test to determine which paths are significant. In addition, we can calculate the confidence intervals (-4.02,.04) (-.77,.52) (-1.33,.79).8292 (.72,.94) (-.27,.06) (-2.02,1.92).3341 (.04,.63)
9 Goodness of Fit After reporting the parameter estimates, SAS reports many different measures of fit so we can evaluate it in any way we choose. The more measures we use to evaluate our model, the better. A good fit does not necessarily mean a perfect model. We can still have unnecessary variables or be missing important ones. By convention, a model is “good” if: GFI >.90/.95, Small Chi-Square value, large p-value, RMSEA Estimate should be close to zero. SAS Output:
10 Useful Websites Google and Wikipedia have done a good job for searching and summarizing many items including SEM. Type “structural equation modeling” in Google, you will see the SEM wiki site listed as the first item: Looking at the recommended sites towards the end of the SEM wiki page, you will find further useful links such as: 1.A good website for SEM lecture notes: 2.LISREL: 3.EQS: 4.MPLUS: 5.GLLAMM: 6.SEM AFNI (brain functional pathway analysis): 7.SAS Proc TCALIS: HTML/default/viewer.htm#statug_tcalis_sect087.htm HTML/default/viewer.htm#statug_tcalis_sect087.htm 8.The UCLA SAS Web:
Part II: PCNA and Bootstrap Resampling 1. Partial Correlation Network Analysis
12 PCNA: Generating a Path Diagram When there is not a hypothesized diagram for a SEM analysis, we can generate a path diagram using partial correlation network analysis. In 2006, Marrelec discussed the concept of detecting an underlying connectivity network in data, and the methods for analysis. He noted the importance of detection without hypothesized relationships, as SEM requires. In 2007, Marrelec et. al. published a work praising the use of Partial Correlation Network Analysis (PCNA) in conjunction with SEM. Partial correlation analysis is a technique that allows us to investigate the relationship between two variables free of influence from other variables. Consider two variables, X and Y. We want to know the correlation of X and Y while controlling for Z. The most intuitive way to understand partial correlation is to consider two regressions.
13 We have N variables, and we are interested to know which pairs have significant relationships when controlling for all other variables in the system. Additionally, we are interested to know which pairs’ relationships is changed by the disease state of the measured tissue, for example. For each pair of variables, i and j, we regress the two variables individually on all other variables in the system, and calculate the corresponding residuals. This creates two variables, and, representing the original variables free of the influence of all other variables in the system. Then we can evaluate their correlation. Our PCNA Bootstrap Methodology is the partial correlation of the variables. However, this is just one number, so we cannot incorporate the influence of covariates into the significance test of this value. This is why we use a bootstrapping procedure.
Part II: PCNA and Bootstrap Resampling 2. Bootstrap Resampling
15 Bootstrap Resampling Use each resample to calculate the partial correlation. Now we have a population of n measurements for each pair of variables. If we perform this analysis on our two datasets individually, we will have 1000 estimates of partial correlation for the normal tissue and 1000 estimates for the diseased tissue. We have our original sample of m subjects. 123m … Select one of them at random, and then replace it before randomly selecting the next. Repeat this m times. Now we have a sample of m subjects consisting of subjects from the original sample. However, some subjects may be repeated, and some subjects from the original sample may not be present in our resample. The idea behind bootstrapping is resampling with replacement. 1 i 23m …
16 Bootstrap Resampling We will let the significance of the relationships in the normal dataset represent the general significance of partial correlation among variables in the system. We can create a difference variable to estimate the difference of the partial correlation between the normal tissue and diseased tissue. The significance of the differences represents the influence of disease on the partial correlation between variables. The results we must evaluate are two lists of partial correlations (those for the normal tissue, and those for the diseased tissue). NormalDiseasedDifference V1W1V1 – W1 V2W2V2 – W2 V3W3V3 – W3 ……… Sort the normal and difference variables. If 0 is contained in the middle 95% of the observations, then we would say the relationship or influence of disease is insignificant for this pair of variables. (This is called the percentile method).
17 Results The results of the PCNA bootstrap in the brain data (four datasets; covariates: drug, group) example is shown at the left. No arrows! At this point, we would ask the collaborating researcher for input on the directionality of each path. For paths not easily determined, we can implement one path in each direction. The results would be a hypothesized relationship that can be verified using structural equation modeling with an independent data set.