# Problems with infinite solutions in logistic regression

## Presentation on theme: "Problems with infinite solutions in logistic regression"— Presentation transcript:

Problems with infinite solutions in logistic regression
Ian White MRC Biostatistics Unit, Cambridge UK Stata Users’ Group London, 12th September 2006 h:\stats\boundary

Introduction I teach logistic regression for the analysis of case-control studies to Epidemiology Master’s students, using Stata I stress how to work out degrees of freedom e.g. if E has 2 levels and M has 4 levels then you get 3 d.f. for testing the E*M interaction Our practical uses data on 244 cases of leprosy and 1027 controls previous BCG vaccination is the exposure of interest level of schooling is a possible effect modifier in what follows I’m ignoring other confounders

Leprosy data -> tabulation of d outcome 0=control, |
1=case | Freq. Percent Cum. 0 | , 1 | Total | , -> tabulation of bcg exposure BCG scar | Freq. Percent Cum. Absent | Present | -> tabulation of school possible effect modifier Schooling | Freq. Percent Cum. 0 | 1 | 2 | 3 | lep-bdy.do

Main effects model . xi: logistic d i.bcg i.school
i.bcg _Ibcg_ (naturally coded; _Ibcg_0 omitted) i.school _Ischool_ (naturally coded; _Ischool_0 omitted) Logistic regression Number of obs = LR chi2(4) = Prob > chi2 = Log likelihood = Pseudo R = d | Odds Ratio Std. Err z P>|z| [95% Conf. Interval] _Ibcg_1 | _Ischool_1 | _Ischool_2 | _Ischool_3 | . estimates store main

Interaction model . xi: logistic d i.bcg*i.school
i.bcg _Ibcg_ (naturally coded; _Ibcg_0 omitted) i.school _Ischool_ (naturally coded; _Ischool_0 omitted) i.bcg*i.school _IbcgXsch_#_# (coded as above) Logistic regression Number of obs = LR chi2(7) = Prob > chi2 = Log likelihood = Pseudo R = d | Odds Ratio Std. Err z P>|z| [95% Conf. Interval] _Ibcg_1 | _Ischool_1 | _Ischool_2 | _Ischool_3 | e e e e-07 _IbcgXsch_~1 | _IbcgXsch_~2 | _IbcgXsch_~3 | e Note: 17 failures and 0 successes completely determined. . estimates store inter

The problem . table bcg school, by(d)
0=control | , 1=case | and BCG | Schooling scar | | Absent | Present | | Absent | Present |

LR test . xi: logistic d i.bcg i.school LR chi2(4) = 97.50
Log likelihood = . estimates store main . xi: logistic d i.bcg*i.school LR chi2(7) = Log likelihood = . estimates store inter . lrtest main inter Likelihood-ratio test LR chi2(2) = (Assumption: main nested in inter) Prob > chi2 =

What is Stata doing? (guess)
Recognises the information matrix is singular Hence reduces model df by 1 In other situations Stata drops observations if a single variable perfectly predicts success/failure this happens if the problematic cell doesn’t occur in a reference category then Stata refuses to perform lrtest, but we can force it to do so Stata still gets df=2; can use df(3) option

. gen bcgrev=1-bcg . xi: logistic d i.bcgrev*i.school i.bcgrev _Ibcgrev_ (naturally coded; _Ibcgrev_0 omitted) i.school _Ischool_ (naturally coded; _Ischool_0 omitted) i.bcg~v*i.sch~l _IbcgXsch_#_# (coded as above) note: _IbcgXsch_1_3 != 0 predicts failure perfectly _IbcgXsch_1_3 dropped and 17 obs not used Logistic regression Number of obs = LR chi2(6) = Prob > chi2 = Log likelihood = Pseudo R = d | Odds Ratio Std. Err z P>|z| [95% Conf. Interval] _Ibcgrev_1 | _Ischool_1 | _Ischool_2 | _Ischool_3 | _IbcgXsch_~1 | _IbcgXsch_~2 | . est store interrev . lrtest interrev main observations differ: 1254 vs. 1271 r(498); . lrtest interrev main, force Likelihood-ratio test LR chi2(2) = (Assumption: main nested in interrev) Prob > chi2 =

What’s right? Zero cell suggests small sample so asymptotic c2 distribution may be inappropriate for LRT true in this case: have a bcg*school category with only 1 observation but I’m going to demonstrate the same problem in hypothetical example with expected cell counts > 3 but a zero observed cell count Could combine or drop cells to get rid of zeroes but the cell with zeroes may carry information Problems with testing boundary values are well known e.g. LRT for testing zero variance component isn’t c21 but here the point estimate, not the null value, is on the boundary

Example to explain why LRT makes some sense
. tab x y, chi2 exact | y x | | Total 0 | | 1 | | Total | | Pearson chi2(1) = Pr = 0.035 Fisher's exact = 1-sided Fisher's exact = main2.log

Model: logit P(y=1|x) = a + bx
Difference in log lik = 3.4 LRT = 6.8 on 0 df? Data \ 0 10 See main2.do

Example to explore correct df using Pearson / Fisher as gold standard
. tab x y, chi2 exact | y x | | Total 1 | | 2 | | 3 | | Total | | Pearson chi2(2) = Pr = 0.018 Fisher's exact = Main3.do All expected counts ≥3 Don’t want to drop or merge category 1 - contains the evidence for association!

. xi: logistic y i.x i.x _Ix_ (naturally coded; _Ix_1 omitted) Logistic regression Number of obs = LR chi2(2) = Prob > chi2 = Log likelihood = Pseudo R = y | Odds Ratio Std. Err z P>|z| [95% Conf. Interval] _Ix_2 | e e e e+09 _Ix_3 | e Note: 6 failures and 0 successes completely determined. . est store x . xi: logistic y LR chi2(0) = Prob > chi2 = Log likelihood = Pseudo R = . est store null

LRT . xi: logistic y i.x Log likelihood = -11.457255 . est store x
. est store null . lrtest x null Likelihood-ratio test LR chi2(1) = (Assumption: null nested in x) Prob > chi2 =

Clearly LRT isn’t great. But 1df is even worse than 2df
Comparison of tests | y x | | Total 1 | | 2 | | 3 | | Total | | Pearson chi2(2) = P = 0.018 Fisher's exact = P = 0.029 LR chi2(1) = P = (using 2df: P = ) Clearly LRT isn’t great. But 1df is even worse than 2df

Note In this example, we could use Pearson / Fisher as gold standard.
Can’t do this in more complex examples (e.g. adjust for several covariates).

My proposal for Stata lrtest appears to adjust df for infinite parameter estimates: it should not Model df should be incremented to allow for any variables dropped because they perfectly predict success/failure Don’t need to increment log lik as it is 0 for the cases dropped Can the ad hoc handling of zeroes by xi:logistic be improved?

Conclusions for statisticians
Must remember the c2 approximation is still poor for these LRTs typically anti-conservative? (Kuss, 2002) Performance of LRT can be improved by using penalised likelihood (Firth, 1993; Bull, 2006) - like a mildly informative prior worth using routinely? Gold standard: Bayes or exact logistic regression (logXact)?

The end

Output for example with 2-level x
. logit y x Log likelihood = y | Coef. Std. Err z P>|z| [95% Conf. Interval] _cons | . estimates store x . logit y Log likelihood = _cons | . estimates store null . lrtest x null df(unrestricted) = df(restricted) = 1 r(498); . lrtest x null, force df(1) Likelihood-ratio test LR chi2(1) = (Assumption: null nested in x) Prob > chi2 = main2.log