Presentation on theme: "1 Model Evaluation and Selection. 2 Example Objective: Demonstrate how to evaluate a single model and how to compare alternative models."— Presentation transcript:
1 Model Evaluation and Selection
2 Example Objective: Demonstrate how to evaluate a single model and how to compare alternative models.
3 Evaluating the Sufficiency of a Single Model (followup to example of Mediation Test) When this model is run, a variety of measures of model fit will be generated. A question of importance is, "Is the fit of the model sufficiently good to yield reliable results?" The alternative model is one in which there is also an arrow from s_age to tcov. In other words, does fire severity explain the effect of stand age on cover, or, is there another pathway of influence independent of fire severity?
4 Finding Measures of Model Fit in Amos I It is always good to check the section of the output called Notes for Model. Here we can see that a minimum was achieved and the full p-value for the chi- square. P-value greater than 0.05 suggests that we could accept this model (it indicates no major deviations between data and model). The model chi-square is the most commonly used measure of absolute model fit.
5 Further Considerations of Model Chi-square It is well known that model p-values are not always the best way to decide if a model is adequate (in an absolute sense) or the best model (in a relative sense). This is a complex topic and one that lacks complete consensus. What is generally agreed upon is: (1) Chi-squares automatically increase with increasing sample size and p-values reflect increasing power for detecting deviations. (2) P-values for model chi-squares are pretty useful when sample sizes are less than 200, especially for models that do not include latent variables possessing multiple indicators. (3) It is recommended that folks look at multiple measures.
6 Further Considerations (cont.) One useful way to evaluate model adequacy is to see if the addition of pathways causes the model chi-square to drop by more than 3.84 units. This is the single-degree- of-freedom chi-square test. If adding a path reduces the chi-square by less than 3.84, it implies that the added path is not strongly supported by the data. In the current example, the chi-square is 3.243, which tells us that adding a path from s_age to tcov could only reduce model chi-square by This further indicates that our model could be considered to be adequate.
7 Finding Measures of Model Fit in Amos II Cmin means minimum chi-square. Model Fit tab gives us several measures to consider.
8 continued clicking on labels gives additional info
9 continued RMSEA indicates close fit. Also that a value of 0 (perfect fit) cannot be ruled out. An AIC for our model (the default model) of could only be reduced to a value of by saturating our model. This is less than the minimum recommended AIC difference of 2.0, suggesting models indistinguishable. BUT, AIC is often not a reliable measure.
10 continued some more The CAIC (consistent AIC) is generally viewed to be a better measure than AIC. Here we see that the default model value is more than 2.0 units smaller than the saturated model, supporting the conclusion that our model is adequate.
11 and still some more The BIC (Bayesian Information Criterion) is one of the more popular measures at the moment. In this case, the saturated model BIC is only greater, which is less than the 2.0 difference recommended for picking among models. This index tells us that while the evidence is better for the default model, the saturated model cant be ruled out.
12 and even still some more The Hoelter index relates back to our model Chi- square and its p-values. It tells us that at a sample size of 106, we would have enough power to detect an additional path from s_age to tcov with a p-value less than samples would be required to obtain a p-value less than 0.01.
13 AIC difference criteria AIC diffsupport for equivalency of models 0-2substantial 4-7weak > 10none Burnham, K.P. and Anderson, D.R Model Selection and Multimodel Inference. Springer Verlag. (second edition), p 70.
14 BIC difference criteria BIC diffsupport for difference between models 0-2weak 2-6positive 6-10strong > 10very strong Raftery, A.E Sociological Methodology. 25: , p 70
15 What do we conclude in this case? Given the data we have available, we could justify (in my view) omitting the pathway from s_age to tcov. However, we must recognize that this is an approximation of the truth. If we had more samples, would they lead us to decide that we needed to include a path from s_age to tcov? Without the additional samples we dont really know. Comparing the path coefficients for the two models would allow us to decide the scientific consequences of our model choice.
16 What is the SEM perspective on model selection? In SEM we use our scientific knowledge to guide our decisions, and this applies especially to model selection. Do we believe it serves our scientific purposes to omit the path from s_age to tcov? We certainly can present the results for the path in the following fashion if we think it merits discussion. s_agefidxtcov e1e ns -0.35
17 Final thought "Statistical tests are aids to (hopefully wise) judgement, not two-valued logical declarations of truth or falsity". Abelson, RP (1995) Statistics as Principled Argument. Lawrence Erlbaum Associates, Hillsdale, NJ, USA