Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparing overall goodness of fit across models

Similar presentations


Presentation on theme: "Comparing overall goodness of fit across models"— Presentation transcript:

1 Comparing overall goodness of fit across models
Jane E. Miller, PhD Additional perspective to looking at the statistical significance of the βs on the individual independent variables, addresses the question of whether collectively, one or more additional variables adds to the overall fit of a multivariate model The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.

2 Overview Review: Statistical significance of GOF statistics
Individual coefficients Model goodness of fit (GOF) GOF statistics To compare fit of nested models To compare fit of non-nested models Which to use for OLS and for logit models Presenting results of GOF tests Much of what I will discuss today is behind the scenes computations that you will conduct but not describe step by step in the text. At the end of this podcast, I give a few guidelines on how to present the results of those tests as you write up a paper about your analysis. I also refer to the pertinent chapters in The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition for more detail. Read slide The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.

3 Review: Statistical significance of individual coefficients
Inferential statistics for individual coefficients (βs) in a multivariate regression model provide the information to test whether that β is statistically significantly different from zero Assesses the contribution of that independent variable to explaining variation in the dependent variable, taking into account the other independent variables in the model The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.

4 Goodness-of-fit (GOF) statistics
– 2 log likelihood statistic Akaike Information Criterion (AIC) Bayesian Information Criterion (BIC) Also known as Schwarz Criterion (SC) or Schwarz Bayesian Information Criterion (SBIC) Most GOF statistics are part of standard output from a multivariate regression model Others GOF statistics can be Requested as an option to the regression command Manually calculated from standard output The Bayesian Information Criterion is also known as the Schwarz Criterion “SC”, or the Schwarz Bayesian Criterion “SBIC”, so if you see those acronyms on your output, that is what they refer to.

5 Review: Model Goodness of Fit
To test whether the model with a particular set of independent variables (IVs) in a multivariate specification fits better than the null model (with intercept only, no IVs) Compare GOF statistic for that model against critical value for Pertinent number of degrees of freedom Type of test statistic E.g., evaluate how well that set of IVs collectively explain variation in the dependent variable (DV)

6 Difference in goodness of fit across models
To test whether additional or different variables yield a statistically significant improvement in model fit Estimate series of models using a consistent sample Calculate Difference in GOF statistic across models Difference in number of degrees of freedom for those models Compare to critical value for the test statistic with pertinent number of degrees of freedom

7 Example: Nested model specifications
Independent variables Model I Model II Model III Infant traits: race and gender X SES: low income, < HS, teen mother Maternal smoking Nested statistical models can be thought of as fitting within one another Starting with the fewest independent variables, a series of nested models successively includes more independent variable(s) while keeping those from the preceding model(s) Like nesting Matryoshka dolls, nested statistical models (also known as hierarchical multiple regression), can be thought of as fitting within one another. Starting with the smallest model (fewest independent variables), a series of nested models successively includes more independent variables while keeping those from the preceding models. This grid shows a series of three nested models, moving L to R models I, II and III. Model I includes only the infant traits race and gender, as shown by the “X” in the column for model I and the row for infant traits. Model II adds three SES variables (low income, <HS and teen mother) to model I, as shown by the “X” in the rows for infant traits and SES in the column for model II. These models are nested because model II adds a new block of variables (SES) and retains all others from model I, and model III adds smoking to model II. Hierarchical models are not to be confused with hierarchical data (as used in multilevel models).

8 Example: Non-nested model specifications
Independent variables Model I Model II Model III Infant traits: race and gender X SES: low income, < HS, teen mother Maternal smoking Models II and III are not nested because III adds maternal smoking but drops the SES variables Both models II and III are nested with model I A model that dropped one variable (e.g., mother’s age) and added another (e.g., smoking) to model II would not be nested with either model II or model III.

9 Other examples of non-nested model specifications
Alternative baseline hazards specifications, e.g., Exponential Weibull Gompertz Different HLM specifications, e.g., Unconditional means Fixed effects Random effects Different interaction specifications

10 Which GOF statistics to use
Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC) can be used to assess best fit when comparing across Nested models Non-nested models F-statistic and – 2 log likelihood statistic can only be used to compare nested models Akaike Information Criterion, usually referred to as “AIC”, and the Bayesian Information Criterion “BIC” can be used for either nested or non-nested models.

11 F-statistic and –2 log likelihood statistic
Now let’s take a look at some examples comparing model GOF using the F-statistic and the -2logL statistic

12 Example GOF statistics from nested OLS models of birth weight
Model I Model II Model III Infant traits only Infant traits & SES Infant traits, SES & smoking F-statistic 102.49 81.39 94.08 Degrees of freedom (df) 3 8 9 Bayesian Information Criterion (BIC) −275.2 −557.1 −728.4 Here is a grid showing the information needed to compare fit of three OLS models of birth weight. There is one column for each of the three models being compared: A model with Infant traits only as the predictor variables (model I) Infant traits and SES (model II) And finally a model that controls for infant traits, SES and maternal smoking (model III). In the rows are the F-statistic, the # degrees of freedom (# independent variables in the model) and the BIC

13 Using the F-statistic to test difference in GOF
For Model I vs. Model II The difference in F for model I vs. model II is − = 21.10 The difference in degrees of freedom is 8 – 3 = 5 For the F distribution with 5 degrees of freedom (df) for the numerator Based on the difference in number of IVs between models I and II ∞ degrees of freedom for the denominator Based on the number of cases used to estimate the models For the F-statistic, > 40 df is generally treated as ∞ (infinite) df p = 0.01 The critical value is 9.02 (see a table of F-statistics) One way to compare overall fit of OLS models is to calculate the difference in F-statistics for two nested models. As a textbook on statistics will tell you, in order to conduct an F-test, we need to know the # DF for the numerator, which is based on the difference in number of IVs between models I and II. In our example, df for the numerator = 5 the # df for the denominator, which is based on the number of cases used to estimate the models in our example, we have ∞ degrees of freedom for the denominator For the F-statistic, > 40 df is generally treated as ∞ (infinite) df the p-value of interest, in our case p<.01 We can then look up the critical value of the F distribution based on those df and the p-value. For our values, critical value is 9.02

14 Testing GOF with F-statistic, cont.
The difference in F between models I and II exceeds the critical value 21.10 > 9.02 Model II added socioeconomic characteristics (age, education, income) to model I So we conclude that collectively, the socioeconomic characteristics improve the overall fit of the birth weight model at p < 0.01 Additional perspective to looking at the statistical significance of the βs on the individual age, education, and income variables We then compare the calculated value of the test statistic against the critical value for that type of statistic and # of degrees of freedom Read slide

15 Testing GOF for logit models
To compare fit across a series of nested logistic models use the −2 Log likelihood statistics Logic is analogous to that for F-statistic: Calculate Difference in model GOF Difference in number of degrees of freedom (df) Compare to critical value with pertinent number of degrees of freedom The same general logic pertains for comparing fit across a series of nested logistic regression models, except it is conducted using the – 2 log likelihood statistic

16 Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC)
Two other GOF statistics that are often reported are the…

17 AIC and BIC correct for the number of IVs in the model
BIC and AIC statistics correct for the fact that models with many IVs are likely to have larger log likelihood or R2 statistics than models with fewer IVs For two models that explain similar proportions of the overall variance in the DV, the preferred model is the one with fewer independent variables AIC and BIC reward parsimony The model with the smallest value of BIC is considered the best-fitting model In some cases this will be the most negative BIC Both the AIC and BIC reward parsimony, meaning that they penalize models that have more independent variables Read slide

18 Formula for Akaike Information Criterion (AIC)
For OLS models AICk = N × ln(SSEk/N) + 2(pk + 1) SSE = error sum of squares pk = # of independent variables in model k N = sample size For logit models AICk = –2 log likelihoodk + 2pk Can be requested as an option to the regression command, or manually calculated from standard regression output Read slide

19 Example: Using AIC to assess GOF for a logit model
Logit model of low birth weight AIC Degrees of freedom Model with controls for infant traits, SES and smoking 6,150.43 9 Null model (intercept only, no covariates) 6,379.90 AIC for the specification with controls for infant traits, SES, and maternal smoking is less than the AIC for the null model 6, < 6,379.90 Thus inclusion of those IVs improves the overall fit of the model As an example, here is a little table reporting just the AIC and degrees of freedom for a logit model of low birth weight with controls for … (1st row) compared to a null model with intercept but no independent variables (bottom row). Read text

20 Formula for Bayesian Information Criterion (BIC)
Corrects for the fact that models with more IVs and those based on large sample sizes often have larger R2 For OLS models BICk = N × [ln(1– R2k)] + pk × [ln(N)] N = sample size R2k = R2 for Model k pk = # of independent variables in Model k For logit models BICk = Lk2 – pk × ln(N) Lk2 = the likelihood ratio χ2 for model k Schwarz Criterion (SC) is a form of the BIC The BIC, also known as the Schwarz Criterion or Schwarz Bayesian Information Criterion, is calculated… Read slide

21 Example: Using BIC to test difference in GOF
OLS models of birth weight in grams Model I Model II Model III Infant traits only Infant traits & SES Infant traits, SES & smoking Bayesian Information Criterion (BIC) −275.2 −557.1 −728.4 BICIII < BICII < BIC I −728 < −557 < −275 The model with the smallest value of BIC is considered the best-fitting model Thus the best-fitting model is the model that controls for infant traits, SES, and smoking Here is a mini-table showing the BIC for the series of nested OLS models we have been tracing through this example. predicting birth weight in grams. Read text

22 Note about formulas for AIC and BIC
Different textbooks and software programs use slightly different formulas to calculate AIC and BIC Some formulas correct AIC for sample size (AICc), others do not Some formulas use weighted N’s, others unweighted N’s Check the manual for the formula used to calculate AIC and BIC in the specific software and procedure used to estimate your models These differences in formulas do not affect interpretation of AIC and BIC for comparing models within your own analyses, because such comparisons are across models using a consistent formula A caveat… Read slide

23 Tables to present information needed for GOF tests across models
For each multivariate model, present GOF statistic(s), labeled with the name of the statistic, e.g., F-statistic BIC Degrees of freedom See chapters 5 and 11 of The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition, for guidelines and examples of multivariate tables Since this series of podcasts is linked to a book on WRITING ABOUT multivariate analysis, I will close this lecture by discussing what you should present about results of goodness of fit tests. Use a combination of tables and prose to present the information needed for GOF comparison of models [read bullets] The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.

24 Prose to present results of differences in overall fit across models
Introduce the substantive reason behind the GOF test, given your Research question Progression of models Report and interpret results of the comparison in GOF across models The difference in the test statistic Accompanying difference in degrees of freedom State the conclusions you draw from that test about specification of your model In the prose [read bullets] On the next few slides I will show and explain poor and better descriptions of the conclusions based on the results of an F test comparing across fit of models. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.

25 Poor presentation: Results of GOF test across models
“The difference in F for model I vs. model II is − = (table 15.3). The difference in degrees of freedom between those models is 8 – 3 = 5. For the F distribution with 5 degrees of freedom (df) for the numerator (based on the difference in the number of independent variables between models I and II) and ∞ degrees of freedom for the denominator (based on the number of cases used to estimate the models) and p = 0.01 the critical value is So we conclude that model II fits better than model I.” Far too much explanation of how to conduct the comparison of GOF statistics Do that work behind the scenes and report the results Explains the conclusion of the GOF comparison of models without explaining the purpose of that test in the context of the topic This version includes far too much explanation of how to conduct the comparison of GOF statistics. Do that work behind the scenes and report the results It also explains the conclusion of the GOF comparison of models without explaining the purpose of that test in the context of the topic under study. In other words, it leaves the conclusion generic, without mentioning the IVs, DV purpose of the series of nested models what the change in GOF means substantively for specification of the model

26 Better presentation: Results of GOF test across models
“The difference in model GOF between models I and II (F-statistic = with 5 and ∞ degrees of freedom; table 15.3) demonstrates that collectively the socioeconomic characteristics improve the overall fit of the birth weight model at p < 0.01 compared to a model with infant traits only.” Names The dependent variable (birth weight) The independent variables (infant traits, socioeconomic characteristics) The table in which the GOF statistics for each model can be found What the better fit of model II suggests about the preferred model specification Here is “better” prose describing the purpose and conclusions of the test in difference in GOF across models. Note that it refers to a table where the GOF statistics and degrees of freedom for each of the models can be found. Read slide

27 Summary Difference in model goodness of fit (GOF) statistics can test whether additional or different variables yield a statistically significant improvement in overall model fit F- statistics and –2 log likelihood statistics can only be used to compare nested models AIC and BIC can be used to compare either nested or non-nested models Present results of GOF comparison Use a combination of tables and prose Describe conclusions, not process Relate to topic at hand The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.

28 Suggested resources Cohen, Jacob, Patricia Cohen, Stephen G. West, and Leona S. Aiken Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, 3rd Edition. Florence, KY: Routledge. Miller, J. E The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. University of Chicago Press, chapters 5 and 15. Treiman, Donald J Quantitative Data Analysis: Doing Social Research to Test Ideas. San Francisco: Jossey-Bass. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.

29 Suggested online resources
Podcast on testing whether a multivariate specification can be simplified

30 Suggested practice exercises
Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Question #8 in the problem set for chapter 15 Suggested course extensions for chapter 15 “Reviewing” exercise #2 “Applying statistics and writing” exercises #1, 2, and 5 “Revising” exercise #2 Suggested course extensions for chapter 16 The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.

31 Contact information Jane E. Miller, PhD Online materials available at The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.


Download ppt "Comparing overall goodness of fit across models"

Similar presentations


Ads by Google