Presentation is loading. Please wait.

Presentation is loading. Please wait.

Logistic regression (when you have a binary response variable)

Similar presentations


Presentation on theme: "Logistic regression (when you have a binary response variable)"— Presentation transcript:

1 Logistic regression (when you have a binary response variable)

2 Overview A logistic regression model Interpretation of slope coefficients Estimation of slope coefficients Tests and confidence intervals for slope coefficients

3 Space shuttle example n = 24 space shuttle launches prior to the Challenger disaster on 27 January 1986 Response y is a (binary) indicator variable –y = 1 if any O-ring failures during launch –y = 0 if no O-ring failures during launch Predictor x 1 is launch temperature, in degrees Fahrenheit

4 The space shuttle example

5 A logistic regression model

6 Recall the simple linear regression function

7 What is the mean of a binary response? If there are 20% smokers and 80% non-smokers, and Y i = 1, if smoker and 0, if non-smoker, then: The mean of the binary response equals the probability (20%) that a randomly selected person is a smoker.

8 What is the mean of a binary response? If p i = P (Y i = 1) and 1 – p i = P (Y i = 0), then: In general, the mean of a binary response equals the probability (p i ) that the first event happens.

9 A linear regression function for a binary response for Y i = 0, 1 If the simple linear regression function is: Then, the mean response … … is the probability that Y i = 1 when the level of the predictor variable is x i.

10 The space shuttle example

11 (Simple) logistic regression function

12

13

14 The space shuttle example

15 Alternative formulation of (simple) logistic regression function (algebra) The “logit” = log of the odds

16 The space shuttle example

17 Interpretation of the slope coefficients

18 What are the odds (of success)? If 20% are smokers and 80% are non-smokers: “The odds are 4 to 1.” For every 4 non-smokers, there is 1 smoker. The odds for success is the ratio of the probability of success to the probability of failure.

19 What are the odds (of success)? If p i = P (Y i = 1) = probability of success, and 1 – p i = P (Y i = 0) = probability of failure, then:

20 Incidentally… If 20% are smokers and 80% are non-smokers: “The odds are 1 to 4.” For every 1 smoker, there are 4 non-smokers.

21 Odds ratio MALE: 20% smokers and 80% non-smokers: FEMALE: 40% smokers and 60% non-smokers: The odds that a male is a nonsmoker is 2.67 times the odds that a female is a nonsmoker.

22 Odds ratio Group 1 Group 2 The odds ratio

23 The space shuttle example Predicted odds: Predicted odds at x 1 = 55 degrees: Predicted odds at x 1 = 56 degrees:

24 The space shuttle example Predicted odds ratio for x 1 = 56 relative to x 1 = 55:

25 The space shuttle example Link Function: Logit Response Information Variable Value Count failure 1 7 (Event) 0 17 Total 24 Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant 10.875 5.703 1.91 0.057 temp -0.17132 0.08344 -2.05 0.040 0.84 0.72 0.99

26 Interpretation of the slope coefficient In general, the antilog of the coefficient (“e to the coefficient”) estimates the ratio of the odds at x+1 to the odds at x.

27 The space shuttle example Predicted odds: Predicted odds at x 1 = 55 degrees: Predicted odds at x 1 = 80 degrees:

28 The space shuttle example Predicted odds ratio for x 1 = 55 relative to x 1 = 80: The odds of O-ring failure at 55 degrees Fahrenheit is 70 times the odds of O-ring failure at 80 degrees Fahrenheit!

29 Interpretation of the slope coefficient The ratio of the odds at x 1 = A relative to the odds at x 1 = B (for fixed values of other x’s) is:

30 Estimation of the logistic regression coefficients

31 Maximum likelihood estimation Choose as estimates of the parameters the values that assign the highest probability to (“maximize the likelihood of”) the observed outcome.

32 Suppose For first observation, Y 1 = 1 and x 1 = 53: … for second observation, Y 2 = 1 and x 2 = 56:… and for 24th observation, Y 24 = 0 and x 24 = 81:

33 If α = 10 and β = -0.15, what is probability of observed outcome? The likelihood of the observed outcome is:

34 Maximum likelihood estimation Choose as estimates of the parameters the values that assign the highest probability to (“maximize the likelihood of”) the observed outcome.

35 Suppose For first observation, Y 1 = 1 and x 1 = 53: … for second observation, Y 2 = 1 and x 2 = 56: … and for 24th observation, Y 24 = 0 and x 24 = 81:

36 If α = 10.8 and β = -0.17, what is probability of observed outcome? The likelihood of the observed outcome is:

37 The space shuttle example Link Function: Logit Response Information Variable Value Count failure 1 7 (Event) 0 17 Total 24 Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant 10.875 5.703 1.91 0.057 temp -0.17132 0.08344 -2.05 0.040 0.84 0.72 0.99

38 Properties of MLEs If a model is correct and the sample size is large enough: –MLEs are essentially unbiased. (Good!) –Formulas exist for estimating the standard errors of the estimators. (Good!) –The estimators are about as precise as any nearly unbiased estimators. (Good!) –MLEs are approximately normally distributed.

39 Tests and confidence intervals for single coefficients

40 Inference for β j Test statistic: follows approximate standard normal distribution. Confidence interval:

41 The space shuttle example Link Function: Logit Response Information Variable Value Count failure 1 7 (Event) 0 17 Total 24 Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant 10.875 5.703 1.91 0.057 temp -0.17132 0.08344 -2.05 0.040 0.84 0.72 0.99

42 The space shuttle example There is sufficient evidence, at the α = 0.05 level, to conclude that temperature is related to the probability of O-ring failure. For every 1-degree increase in temperature, the odds ratio of O-ring failure to O-ring success is estimated to be 0.84 (95% CI is 0.72 to 0.99).

43 Survival in the Donner Party In 1846, Donner and Reed families traveled from Illinois to California by covered wagon. Group became stranded in eastern Sierra Nevada mountains when hit by heavy snow. 40 of 87 members (45 adults over age 15) died from famine and exposure. Are females better able to withstand harsh conditions than are males?

44 Survival in the Donner Party

45 Link Function: Logit Response Information Variable Value Count STATUS SURVIVED 20 (Event) DIED 25 Total 45 Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant 1.633 1.110 1.47 0.141 AGE -0.07820 0.03729 -2.10 0.036 0.92 0.86 0.99 Gender 1.5973 0.7555 2.11 0.034 4.94 1.12 21.72

46 Binary logistic regression in Minitab Select Stat >> Regression >> Binary Logistic Regression… In box labeled Response, specify your binary response variable. In box labeled Model, specify all of the predictors. Select OK.


Download ppt "Logistic regression (when you have a binary response variable)"

Similar presentations


Ads by Google