Presentation on theme: "Problems with infinite solutions in logistic regression"— Presentation transcript:
1 Problems with infinite solutions in logistic regression Ian WhiteMRC Biostatistics Unit, CambridgeUK Stata Users’ GroupLondon, 12th September 2006h:\stats\boundary
2 IntroductionI teach logistic regression for the analysis of case-control studies to Epidemiology Master’s students, using StataI stress how to work out degrees of freedome.g. if E has 2 levels and M has 4 levels then you get 3 d.f. for testing the E*M interactionOur practical uses data on 244 cases of leprosy and 1027 controlsprevious BCG vaccination is the exposure of interestlevel of schooling is a possible effect modifierin what follows I’m ignoring other confounders
3 Leprosy data -> tabulation of d outcome 0=control, | 1=case | Freq. Percent Cum.0 | ,1 |Total | ,-> tabulation of bcg exposureBCG scar | Freq. Percent Cum.Absent |Present |-> tabulation of school possible effect modifierSchooling | Freq. Percent Cum.0 |1 |2 |3 |lep-bdy.do
4 Main effects model . xi: logistic d i.bcg i.school i.bcg _Ibcg_ (naturally coded; _Ibcg_0 omitted)i.school _Ischool_ (naturally coded; _Ischool_0 omitted)Logistic regression Number of obs =LR chi2(4) =Prob > chi2 =Log likelihood = Pseudo R =d | Odds Ratio Std. Err z P>|z| [95% Conf. Interval]_Ibcg_1 |_Ischool_1 |_Ischool_2 |_Ischool_3 |. estimates store main
5 Interaction model . xi: logistic d i.bcg*i.school i.bcg _Ibcg_ (naturally coded; _Ibcg_0 omitted)i.school _Ischool_ (naturally coded; _Ischool_0 omitted)i.bcg*i.school _IbcgXsch_#_# (coded as above)Logistic regression Number of obs =LR chi2(7) =Prob > chi2 =Log likelihood = Pseudo R =d | Odds Ratio Std. Err z P>|z| [95% Conf. Interval]_Ibcg_1 |_Ischool_1 |_Ischool_2 |_Ischool_3 | e e e e-07_IbcgXsch_~1 |_IbcgXsch_~2 |_IbcgXsch_~3 | eNote: 17 failures and 0 successes completely determined.. estimates store inter
6 The problem . table bcg school, by(d) 0=control |, 1=case |and BCG | Schoolingscar ||Absent |Present ||Absent |Present |
7 LR test . xi: logistic d i.bcg i.school LR chi2(4) = 97.50 Log likelihood =. estimates store main. xi: logistic d i.bcg*i.schoolLR chi2(7) =Log likelihood =. estimates store inter. lrtest main interLikelihood-ratio test LR chi2(2) =(Assumption: main nested in inter) Prob > chi2 =
8 What is Stata doing? (guess) Recognises the information matrix is singularHence reduces model df by 1In other situations Stata drops observationsif a single variable perfectly predicts success/failurethis happens if the problematic cell doesn’t occur in a reference categorythen Stata refuses to perform lrtest, but we can force it to do soStata still gets df=2; can use df(3) option
9 . gen bcgrev=1-bcg. xi: logistic d i.bcgrev*i.schooli.bcgrev _Ibcgrev_ (naturally coded; _Ibcgrev_0 omitted)i.school _Ischool_ (naturally coded; _Ischool_0 omitted)i.bcg~v*i.sch~l _IbcgXsch_#_# (coded as above)note: _IbcgXsch_1_3 != 0 predicts failure perfectly_IbcgXsch_1_3 dropped and 17 obs not usedLogistic regression Number of obs =LR chi2(6) =Prob > chi2 =Log likelihood = Pseudo R =d | Odds Ratio Std. Err z P>|z| [95% Conf. Interval]_Ibcgrev_1 |_Ischool_1 |_Ischool_2 |_Ischool_3 |_IbcgXsch_~1 |_IbcgXsch_~2 |. est store interrev. lrtest interrev mainobservations differ: 1254 vs. 1271r(498);. lrtest interrev main, forceLikelihood-ratio test LR chi2(2) =(Assumption: main nested in interrev) Prob > chi2 =
10 What’s right?Zero cell suggests small sample so asymptotic c2 distribution may be inappropriate for LRTtrue in this case: have a bcg*school category with only 1 observationbut I’m going to demonstrate the same problem in hypothetical example with expected cell counts > 3 but a zero observed cell countCould combine or drop cells to get rid of zeroesbut the cell with zeroes may carry informationProblems with testing boundary values are well knowne.g. LRT for testing zero variance component isn’t c21but here the point estimate, not the null value, is on the boundary
11 Example to explain why LRT makes some sense . tab x y, chi2 exact| yx | | Total0 | |1 | |Total | |Pearson chi2(1) = Pr = 0.035Fisher's exact =1-sided Fisher's exact =main2.log
12 Model: logit P(y=1|x) = a + bx Difference in log lik = 3.4LRT = 6.8 on 0 df?Data \ 0 10See main2.do
13 Example to explore correct df using Pearson / Fisher as gold standard . tab x y, chi2 exact| yx | | Total1 | |2 | |3 | |Total | |Pearson chi2(2) = Pr = 0.018Fisher's exact =Main3.doAll expected counts ≥3Don’t want to drop or merge category 1 - contains the evidence for association!
14 . xi: logistic y i.xi.x _Ix_ (naturally coded; _Ix_1 omitted)Logistic regression Number of obs =LR chi2(2) =Prob > chi2 =Log likelihood = Pseudo R =y | Odds Ratio Std. Err z P>|z| [95% Conf. Interval]_Ix_2 | e e e e+09_Ix_3 | eNote: 6 failures and 0 successes completely determined.. est store x. xi: logistic yLR chi2(0) =Prob > chi2 =Log likelihood = Pseudo R =. est store null
15 LRT . xi: logistic y i.x Log likelihood = -11.457255 . est store x . est store null. lrtest x nullLikelihood-ratio test LR chi2(1) =(Assumption: null nested in x) Prob > chi2 =
16 Clearly LRT isn’t great. But 1df is even worse than 2df Comparison of tests| yx | | Total1 | |2 | |3 | |Total | |Pearson chi2(2) = P = 0.018Fisher's exact = P = 0.029LR chi2(1) = P =(using 2df: P = )Clearly LRT isn’t great. But 1df is even worse than 2df
17 Note In this example, we could use Pearson / Fisher as gold standard. Can’t do this in more complex examples (e.g. adjust for several covariates).
18 My proposal for Statalrtest appears to adjust df for infinite parameter estimates: it should notModel df should be incremented to allow for any variables dropped because they perfectly predict success/failureDon’t need to increment log lik as it is 0 for the cases droppedCan the ad hoc handling of zeroes by xi:logistic be improved?
19 Conclusions for statisticians Must remember the c2 approximation is still poor for these LRTstypically anti-conservative? (Kuss, 2002)Performance of LRT can be improved by using penalised likelihood (Firth, 1993; Bull, 2006) - like a mildly informative priorworth using routinely?Gold standard: Bayes or exact logistic regression (logXact)?
21 Output for example with 2-level x . logit y xLog likelihood =y | Coef. Std. Err z P>|z| [95% Conf. Interval]_cons |. estimates store x. logit yLog likelihood =_cons |. estimates store null. lrtest x nulldf(unrestricted) = df(restricted) = 1r(498);. lrtest x null, force df(1)Likelihood-ratio test LR chi2(1) =(Assumption: null nested in x) Prob > chi2 =main2.log