Presentation is loading. Please wait.

Presentation is loading. Please wait.

April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.

Similar presentations


Presentation on theme: "April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F."— Presentation transcript:

1 April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

2 HRT Use and Polyps 72175 102114 Case (Polyps)Control (No Polyps) HRT Use 216 174289 RO = 72/102 175/114 = 0.46 No HRT Use 247 RO HRT Use (Case v Control)    463 ) (RO) 2  174) (289) (247) (216) =16.04 463

3 Inference for binary data Relative risk, odds ratios, 2x2 tables are limited –Can’t adjust for many confounders –Limited to categorical predictors –Can’t look at multiple variables simultaneously Logistic regression –Adjust for many confounders –Study continuous predictors –Model interactions

4 Linear regression model Y =  o +  1 X 1 +  2 X 2 +... +  p X p Y = dependent variable X i = independent variables Y is continuous, normally distributed Model the mean response (Y) based on the predictors   is mean of Y when all Xs are 0   is increase in mean of Y for increase in 1 unit of X

5 New regression model? Y?=  o +  1 X 1 +  2 X 2 +... +  p X p Y = binary outcome (0 or 1) X i = independent variables Would like to use this type of model for a binary outcome variable

6 Draw a line ?

7 What if you had multiple observations at each Score (or you grouped scores) ScoreProportion Dying < 101/10 = 0.10 11-204/15 = 0.27 21-305/15 = 0.33 31-408/16 = 0.50 * * * *

8 Possibilities for Y Y?=  o +  1 X 1 +  2 X 2 +... +  p X p Y = probability of Y = 1 (Problem: Y bound by 0 -1) Y = odds of Y = 1 Y = log (odds of Y = 1) – Has good properties

9 Probability, Odds, Log Odds  Odds (  Log (Odds) 0.010.01-4.60 0.100.11-2.20 0.200.25-1.38 0.300.43-0.85 0.400.63-0.41 0.501.000.00 0.601.500.41 0.702.330.85 0.804.001.38 0.909.002.20 0.9999.004.60 Bound by 0 -1 Extreme Values Less extreme values and symmetric about  =0.5

10

11 Nearly a straight line for middle values of P

12 Logistic regression equation Model log odds of outcome as a linear function of one or more variables X i = predictors, independent variables The model is:

13 A Little Math The natural LOG and exponential (EXP) functions are inverse functions of each other –LOG (a) = bEXP (b) = a –LOG (1) = 0EXP(0) = 1 –LOG (.5) = -0.693EXP(-.693) =.5 –LOG (1.5) =.405EXP(.405) = 1.5 These will be logistic regression betasThese will be the odds ratios Note: Calculators and Excel use LN for natural logarithm

14 A Little Math LOG function –Takes values [ 0 to +infinity] [-infinity to +infinity] EXP function –Takes values [ -infinity to infinity] [0 to +infinity]

15 A Little Math Properties of LOG function –log (a*b) = log (a) + log (b) –log (a/b) = log (a) – log (b) Properties of EXP function –exp (a+b) = exp(a) * exp(b) –exp (a-b) = exp(a)/exp(b) Differences in log odds Odds Ratios

16 (ODDS)

17 These will be typical betas from the logistic regression model These will be the odds ratios

18 Logistic regression – single binary covariate We need to use a dummy variable to code for men and women x = 1 for women, 0 for men What do the betas mean? What is odds ratio, women versus men? The model is:

19 Odds for Men and Women For men; For women; After some algebra, the odds ratio is equal to;   is difference in log odds between men and women

20 Example - risk of CVD for men vs. women log(odds) =  0 +  1 x = -2.5504 - 1.0527*x For females; log(odds) = -2.5504 - 1.0527(1) = -3.6031 For males; log(odds) = -2.5504 - 1.0527(0) = -2.5504 exp(  1 ) = odds ratio for women vs. men Here, exp(  1 ) = exp(-1.0527) = 0.35 Women are at a 65% lower risk of the outcome than men (OR<1) Dif = -1.0527

21 Note Odds ratio from 2 x 2 table EXP (  ) from logistic regression for binary risk factor These will be equal

22 Multiple logistic regression model log(odds) =  o +  1 X 1 +  2 X 2 +... +  p X p log(odds) = logarithm of the odds for the outcome, dependent variable X i = predictors, independent variables  i - log(OR) associated with either exposure (for categorical predictors) a 1 unit increase in predictor (for continuous) OR adjusted for other variables in model

23 Interpretation of coefficients - continuous predictors Example - effect of age on risk of death in 10 years log(odds) = -8.2784+ 0.1026*age  0 = -8.2784,  1 = 0.1026 exp(  1 ) = exp(0.1026) = 1.108 A one year increase in age is associated with an odds ratio of death of 1.108 (assumption that this is true for any 2 consecutive ages) This is an increase of approximately 11% (= 1.108 - 1)

24 Interpretation of coefficients - continuous predictors What about a 5 year increase in age? Multiply coefficient by the change you want to look at; exp(5*  1 ) = exp(5*0.1026) = 1.67 A five year increase in age is associated with an odds ratio of death of 1.67 This is an increase of 67% Note: exp(5*  1 ) does not equal 5*exp(  1 )

25 Parameter Estimation How do we come up with estimates for  i ? Can’t use least squares since outcome is not continuous Use Maximum Likelihood Estimation (MLE)

26 Maximum Likelihood Estimation Choose parameter estimates that maximize the probability of observing the data you observed. Example for estimation a proportion  –Observe 7/10 have characteristic –P = 0.70 is estimate  –P = 0.70 is MLE of  Why?) –Which value of  maximizes the probability of getting 7 of 10? –Answer: 0.70

27 MLE Simple Example Wish to estimate a proportion  Sample n = 2 –Observe 1 of 2 have characteristic –L =  –What value of  maximizes L? –Answer:  = 0.5 which is p=1/2

28 Fitted regression line Curve based on:  o effects location  1 effects curvature

29 Inference for multiple logistic regression Collect data, choose model, estimate  o and  i s Describe odds ratios, exp(  i ), in statistical terms. –How confident are we of our estimate? –Is the odds ratio is different from one due to chance? Not interested in inference for  o (related to overall probability of outcome)

30 Confidence Intervals for logistic regression coefficients General form of 95% CI: Estimate ± 1.96*SE –B i estimate, provided by SAS –SE is complicated, provided by SAS Related to variability of our data and sample size

31 95% Confidence Intervals for the odds ratio Based on transforming the 95% confidence interval for the parameter estimates Supplied automatically by SAS Look to see if interval contains 1 “We have a statistically significant association between the predictor and the outcome controlling for all other covariates” Equivalent to a hypothesis test; reject Ho: OR = 1 at alpha = 0.05. Based on whether or not 1 is in the interval

32 Hypothesis test for individual logistic regression coefficient Null and alternative hypotheses –Ho :  i = 0, Ha:  i  0 Test statistic:  2 = (  i / SE) 2, supplied by SAS p-values are supplied by SAS If p<0.05, “there is a statistically significant association between the predictor and outcome variable controlling for all other covariates” at alpha = 0.05

33 PROC LOGISTIC PROC LOGISTIC DATA = dataset ; MODEL outcome = list of x variables; RUN; CLASS statement allows for categorical variables with many groups (>2)

34 DATA temp; INPUT apache death @@ ; xdeath = 2; if death = 1 then xdeath = 1; DATALINES; 0 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0 16 0 17 1 18 1 19 0 20 0 21 1 22 1 23 0 24 1 25 1 26 1 27 0 28 1 29 1 30 1 31 1 32 1 33 1 34 1 35 1 36 1 37 1 38 1 41 0 ; PROC LOGIST DATA=temp; MODEL xdeath = apache; RUN;

35 The LOGISTIC Procedure Model Information Data Set WORK.TEMP Response Variable xdeath Number of Response Levels 2 Number of Observations 39 Model binary logit Optimization Technique Fisher's scoring Response Profile Ordered Total Value xdeath Frequency 1 1 18 2 2 21 Probability modeled is xdeath=1.

36 The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -4.3861 1.3687 10.2686 0.0014 apache 1 0.2034 0.0605 11.3093 0.0008 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits apache 1.226 1.089 1.380 EXP(0.2034) EXP(0.2034 – 1.96*.0605) EXP(0.2034 +1.96*.0605)

37 TOMHS – bpstudy sas dataset Variable CLINICAL (1=yes, 0 =no) indicates whether patient had a CVD event Run logistic regression separately for age and gender to determine if: –Age is related to CVD What is the odds associated with a 1 year increase in age What is the odds associated with a 5 year increase in age –Gender is related to CVD What is the odds of CVD (women versus men) Run logistic regression for age and gender together Note: Download dataset from web-page or use dataset on SATURN


Download ppt "April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F."

Similar presentations


Ads by Google