Logistic Regression with “Grouped” Data Lobster Survival by Size in a Tethering Experiment Source: E.B. Wilkinson, J.H. Grabowski, G.D. Sherwood, P.O. Yund (2015). "Influence of Predator Identity on the Strength of Predator Avoidance Response in Lobsters," Journal of Experimental Biology and Ecology, Vol. 465, pp. 107-112.
Data Description Experiment involved 159 juvenile lobsters in a Saco Bay, Maine Outcome: Whether or not lobster survived predator attack in Tethering Experiment Predictor Variable: Length of Carapace (mm). Lobsters grouped in m = 11 groups of width of 3mm (27 to 57 by 3)
Models Data: (Yi,ni) i=1,…,m Distribution: Binomial at each Size Level (Xi) Link Function: Logit: log(pi/(1-pi)) is a linear function of Predictor (Size Level) 3 Possible Linear Predictors: log(pi/(1-pi)) = a (Mean is same for all Sizes, No association) log(pi/(1-pi)) = a + bXi (Linearly related to size) log(pi/(1-pi)) = b1Z1 +…+ bm-1Zm-1 + bmZm Zi = 1 if Size Level i, 0 o.w. This allows for m distinct logits, without a linear trend in size (aka Saturated model)
Probability Distribution & Likelihood Function - I
Maximum Likelihood Estimation – Model 1
Maximum Likelihood Estimation – Model 3
Model 2 – R Output glm(formula = lob.y ~ size, family = binomial("logit")) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -7.89597 1.38501 -5.701 1.19e-08 *** size 0.19586 0.03415 5.735 9.77e-09 ***
Evaluating the log-likelihood for Different Models
Deviance and Likelihood Ratio Tests -2*(log Likelihood of Model – log Likelihood of Saturated Model) Degrees of Freedom = # of Parameters in Saturated Model - # in model When comparing a Complete and Reduced Model, take difference between the Deviances (Reduced – Full) Under the null hypothesis, the statistic will be chi-square with degrees of freedom = difference in degrees of freedoms for the 2 Deviances (number of restrictions under null hypothesis) Deviance can be used to test goodness-of-fit of model.
Deviance and Likelihood Ratio Tests
Pearson Chi-Square Test for Goodness-of-Fit
Pearson Chi-Square Test for Goodness-of-Fit Even though some of the group sample sizes are small, and some Expected cell Counts are below 5, it is clear that Model 2 provides a Good Fit to the data
Residuals
Residuals
Computational Approach for ML Estimator
Estimated Variance-Covariance for ML Estimator
ML Estimate, Variance, Standard Errors > mod2 <- glm(lob.y ~ size, family=binomial("logit")) > summary(mod2) Call: glm(formula = lob.y ~ size, family = binomial("logit")) Deviance Residuals: Min 1Q Median 3Q Max -1.12729 -0.43534 0.04841 0.29938 1.02995 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -7.89597 1.38501 -5.701 1.19e-08 *** size 0.19586 0.03415 5.735 9.77e-09 *** Null deviance: 52.1054 on 10 degrees of freedom Residual deviance: 4.5623 on 9 degrees of freedom AIC: 32.24 logLik(mod2) 'log Lik.' -14.11992 (df=2)