# Multinomial Logistic Regression David F. Staples.

## Presentation on theme: "Multinomial Logistic Regression David F. Staples."— Presentation transcript:

Multinomial Logistic Regression David F. Staples

Outline Review of Logistic Regression BCS Example Extension to Multiple Response Groups Nominal Categories Ordinal Categories Model Fitting & Interpretation Shallow Lake Trophic Status

Logistic Regression Based on a Binomial Random Variable: Y = {0,1}  Prob(Y = 1) = p  Prob(Y = 0) = 1-p p(x) = P(Y i = 1|X i ) =, where Xβ = β 0 + β 1 x 1 +…+ β k x k.

Logistic Regression Based on a Binomial Random Variable: Y = {0,1}  Prob(Y = 1) = p  Prob(Y = 0) = 1-p p(x) = P(Y i = 1|X i ) =, where Xβ = β 0 + β 1 x 1 +…+ β k x k. A logit transformation is used to linearize p(x): = β 0 + β 1 x 1 +…+ β k x k = Xβ → The β’s give the additive effect of X’s on the Log Odds Log Odds of ‘Success’

Logistic Regression Example Model p as a function of Macrophyte Patch Area glm(BCS ~ Patch_area, family = binomial) Estimate SE z Pr(>|z|) Intercept -2.433e+00 5.108e-01 -4.764 1.9e-06 Patch_area 1.765e-04 4.725e-05 3.736 0.0001 Dichotomous Variable is the Presence/Absence of BCS  Y = 1 if BCS Present  Y = 0 if BCS Absent  p = Prob(BCS Present)

Interpreting Logistic Regression glm(BCS ~ Patch_area, family = binomial) Estimate SE z Pr(>|z|) Intercept -2.433e+00 5.108e-01 -4.764 1.9e-06 Patch_area 1.765e-04 4.725e-05 3.736 0.0001 Effect of Patch Area on P(BCS) Non-Linear Transformation  Value of Intercept  Value of Other Variables

Interpreting Logistic Regression For the average size patch area (8374), the log odds ratio would be: -2.433 + 0.0001765 * 8374 = -0.955 exponentiate to get the Odds of Success: exp(-.955) = p/1-p = 0.38, Solve for p, Prob(BCS Present|Area=8374) =.28 glm(BCS ~ Patch_area, family = binomial) Estimate SE z Pr(>|z|) Intercept -2.433e+00 5.108e-01 -4.764 1.9e-06 Patch_area 1.765e-04 4.725e-05 3.736 0.0001

Interpreting Logistic Regression When p = 0.5, the log odds equals 0, –2.433 +.0001765*Area = 0. Thus, the patch area for p =.50 is 2.433/.0001765 = 13784.7 glm(BCS ~ Patch_area, family = binomial) Estimate SE z Pr(>|z|) Intercept -2.433e+00 5.108e-01 -4.764 1.9e-06 Patch_area 1.765e-04 4.725e-05 3.736 0.0001

Multinomial Logistic Regression Logistic Regression with > 2 response categories Model Probabilities Relative to ‘Reference’ Category Response May be Nominal or Ordinal NominalOrdinal

Shallow Lake Trophic Status 3 Categories Defining Lake State: Y = 1 if Lake Clear Y = 2 if Lake Shifting States Y = 3 if Lake Turbid

Nominal (un-ordered) Multinomial Logistic library(nnet) multinom(StateNom ~ TP) (Int) TP 2 -2.47 0.012 3 -1.89 0.014 Std. Errors: (Int) TP 2 0.549 0.004 3 0.447 0.004 Residual Deviance: 113.8345 AIC: 121.8345

Nominal (un-ordered) Multinomial Logistic Library(nnet) multinom(StateNom ~ TP) (Int) TP 2 -2.47 0.012 3 -1.89 0.014 For TP = 50 p(Shifting) is about 16% of p(Clear)

Nominal (un-ordered) Multinomial Logistic For TP = 50 p(Turbid) is about 30% of p(Clear) Library(nnet) multinom(StateNom ~ TP) (Int) TP 2 -2.47 0.012 3 -1.89 0.014

Nominal (un-ordered) Multinomial Logistic Odds of Shifting State vs. Clear State

Ordinal Multinomial Logistic a.k.a. Proportional Odds Model 3 Ordered Status Categories: Y = 1 if lake clear Y = 2 if lake shifting states Y = 3 if lake turbid

Ordinal Multinomial Logistic a.k.a. Proportional Odds Model library(MASS) StateOrd = as.ordered(StateNom) polr(StateOrd ~ TP) Value SE t value TP 0.009 0.002 3.81 Intercepts: Value SE t value 1|2 1.103 0.342 3.22 2|3 1.889 0.397 4.76 Residual Deviance: 118.99 AIC: 124.9897 3 Ordered Status Categories: Y = 1 if lake clear Y = 2 if lake shifting states Y = 3 if lake turbid Assume Same Slope => Fewer Parameters

m2 = polr(StateOrd ~ TP) newd = data.frame(TP = seq(0,600)) prd = predict(m2, newdata=newd, type='p') matplot(newd\$TP,prd)

Nominal/Ordinal Comparison

Nominal (un-ordered) Multinomial Logistic Library(nnet) multinom(StateNom ~ TP) (Intercept) TP 2 -2.469517 0.01248172 3 -1.891459 0.01384079 Std. Errors: (Intercept) TP 2 0.5486044 0.004183882 3 0.4465049 0.003932610 Residual Deviance: 113.8345 AIC: 121.8345 For J = 3 Categories defining lake state: Y = 1 if lake clear Y = 2 if lake shifting states Y = 3 if lake turbid

Ordinal Multinomial Logistic a.k.a. Proportional Odds Model For J = 3 Categories defining lake state: Y = 1 if lake clear Y = 2 if lake shifting states Y = 3 if lake turbid (State 2 is Intermediate between 1 & 3) Library(MASS) StateOrd = as.ordered(StateNom) polr(StateOrd ~ TP, Hess = T) Value SE t value TP 0.0086 0.0023 3.8085 Intercepts: Value SE t value 1|2 1.1028 0.3417 3.2277 2|3 1.8889 0.3968 4.7605 Residual Deviance: 118.9897 AIC: 124.9897