12 Two key frequencies in frequentist statistics Frequency definition of probabilityFrequency of error in a decision rule
13 Null H tests with Fisherian P-values Single model onlyP-value= Prob of discrepancy at least as great as observed by chance.Not terribly useful for model selection
14 Neyman-Pearson Tests 2 models Null model test along a maximally sensitive axis.Binary response: Accept Null or reject NullSize of test (α) describes frequency of rejecting null in error.Not about the data, it is about the test.You support your decision because you have made it with a reliable procedure.N-P test tell you very little about relative support for alternative models.
15 Decisions vs. Conclusions Decision based inference reasonable within a regulatory framework.Not so appropriate for scienceJohn Tukey (1960) advocated seeking to reach conclusions not making decisions.Accumulate evidence until a conclusion is very strongly supported.Treat as true.Revise if new evidence contradicts.
16 All are tools for aiding scientific thought In conclusion framework, multiple statistical metrics not “incompatible”All are tools for aiding scientific thought
17 Statistical EvidenceData based estimate of the relative distance between two models and “truth”
18 Common Evidence Functions Likelihood ratiosDifferences in information criteriaOthers availableE.g. Log(Jackknife prediction likelihood ratio)
19 Model Adequacy Bruce Lindsay The discrepancy of a model from truth Truth represented by an empirical distribution function,A model is “adequate” if the estimated discrepancy is less than some arbitrary but meaningful level.
20 Model Adequacy and Goodness of Fit Estimation framework rather than testing frameworkConfidence intervals rather than testingRejection of “true model formalism”
21 Model Adequacy, Goodness of Fit, and Evidence Adequacy does not explicitly compare modelsImplicit comparisonModel adequacy interpretable as bound on strength of evidence for any better modelUnifies Model Adequacy and Evidence in a common framework
22 Model adequacy interpreted as a bound on evidence for a possibly better model Empirical Distribution - “Truth”Model 1Potentially better modelModel adequacy measureEvidence measure
23 Goodness of fit misnomer Badness of fit measures & goodness of fit testsComparison of model to a nonparametric estimate of true distribution.G2-StatisticHelinger DistancePearson χ2Neyman χ2
24 Points of interest Badness of fit is the scope for improvement Evidence for one model relative to another model is the difference of badness of fit.
25 ΔIC estimates differences of Kullback-Leibler Discrepancies ΔIC = log(likelihood ratio) when # of parameters are equalComplexity penalty is a bias correction to adjust of increase in apparent precision with an increase in # parameters.
27 Which Information Criterion? AIC? AICc ? SIC/BIC?Don’t use AIC5.9 of one versus 6.1 of the other
28 What is sample size for complexity penalty? Mark/Recapture based on multinomial likelihoodsObservation is a capture history not a session
29 To Q or not to Q?IC based model selection assumes a good model in set.Over-dispersion is common in Mark/Recapture dataDon’t have a good model in setDue to lack of independence of observationsParameter estimate bias generally not influencedBut fit will appear too good!Model selection will choose more highly parameterized models than appropriate
30 Quasi likelihood approach χ2 goodness of fit test for most general modelIf reject H0 estimate variance inflationc^ = χ2 /dfCorrect fit component of IC & redo selection
32 Problems with Quasilikelihood correction C^ is essentially a variance estimate.Variance estimates unstable without a lot of datalnL/c^ is a ratio statisticRatio statistics highly unstable if the uncertainty in the denominator is not trivialUnlike AICc, bias correction is estimated.Estimating a bias correction inflates variance!
33 Fixes Explicitly include random component in model Then redo model selectionBootstrapped median c^Model selection with Jackknifed prediction likelihood
34 Large or small model sets? Problem: Model Selection BiasWhen # of models large relative to data size some models will have a good fit just by chanceSmallBurnham & Anderson strongly advocate small model sets representing well thought out scienceLarge model sets = “data dredging”LargeThe science may not be matureSmall model sets may risk missing important factors
35 Model Selection from Many Candidates Taper(2004) SIC(x) = -2In(L) + (In(n) + x)k.
36 Performance of SIC(X) with small data set. N=50, true covariates=10, spurious covariates=30, all models of order <=20, X 1014 candidate models'
37 Chen & Chen 2009M subset size, P= # of possible terms
38 Explicit Tradeoff Small model sets Large model sets Allows exploration of fine structure and small effectsRisks missing unanticipated large effectsLarge model setsWill catch unknown large effectsWill miss fine structureLarge or small model sets is a principled choice that data analysts should make based on their background knowledge and needs
39 Akaike Weights & Model Averaging Beware, there be dragons here!
40 Akaike Weights“Relative likelihood of model i given the data and model set”“Weight of evidence that model i most appropriate given data and model set”
41 Model Averaging “Conditional” Variance “Unconditional” Variance. Conditional on selected model“Unconditional” Variance.Actually conditional on entire model set
42 Good impulse with Huge Problems I do not recommend Akaike weightsI do not recommend model averaging in this fashionImportance of good models is diminished by adding bad modelsLocation of average influenced by adding redundant models
43 Model Redundancy Model Space is not filled uniformly Models tend to be developed in highly redundant clusters.Some points in model space allow few modelsSome points allow many
44 Redundant models do not add much information Model dimensionModel adequacyModel dimensionModel adequacy
45 A more reasonable approach Bootstrap DataFit model set & select best modelEstimate derived parameter θ from best modelAccumulate θRepeatWithinTimeConstraintsMean or median θ with percentile confidence intervals