Presentation is loading. Please wait.

Presentation is loading. Please wait.

Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.

Similar presentations


Presentation on theme: "Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez."— Presentation transcript:

1 Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez

2 Logistic regression Most important model for categorical response (y i ) data Categorical response with 2 levels (binary: 0 and 1) Categorical response with ≥ 3 levels (nominal or ordinal) Predictor variables (x i ) can take on any form: binary, categorical, and/or continuous

3 Logistic Regression Curve 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 123456789101112131415161718192021 x Probability

4 Logit Transformation Logistic regression models transform probabilities called logits. where iindexes all cases (observations). p i is the probability the event (a sale, for example) occurs in the i th case. logis the natural log (to the base e).

5 Assumption pipi (p i )

6 Logistic regression model with a single continuous predictor logit (p i ) = log (odds) =  0 +  1 X 1 where logit(p i )logit transformation of the probability of the event  0 intercept of the regression line  1 slope of the regression line

7 LOGISTIC and GENMOD procedure for a single continuous predictor PROC LOGISTIC DATA= dataset ; MODEL response=predictor / ; OUTPUT OUT=SAS-dataset keyword=name ; RUN; PROC GENMOD DATA=dataset ; MAKE ‘OBSTATS’ OUT=SAS-data-set; MODEL response=predictors ; RUN;

8 Descending option in proc logistic and proc genmod The descending option in SAS causes the levels of your response variable to be sorted from highest to lowest (by default, SAS models the probability of the lower category). In the binary response setting, we code the event of interest as a ‘1’ and use the descending option to model that probability P(Y = 1 | X = x). In our SAS example, we’ll see what happens when this option is not used.

9 Interpretation of a single continuous parameter The sign (±) of β determines whether the log odds of y is increasing or decreasing for every 1-unit increase in x. If β > 0, there is an increase in the log odds of y for every 1-unit increase in x. If β < 0, there is a decrease in the log odds of y for every 1-unit increase in x. If β = 0 there is no linear relationship between the log odds and x.

10 Parameter interpretation (ctd). Exponentiating both sides of the logit link function we get the following: = odds = exp(  0 +  1 X 1 ) = e  0 e  1X1 The odds increase multiplicatively by e β for every 1- unit increase in x. Whether the increase is greater than 1 or less than one depends on whether β >0 or β <0. The odds at X = x+1 are e β times the odds at X = x. Therefore, e β is an odds ratio!

11 Logistic regression model with a single categorical (≥ 2 levels) predictor logit (p i ) = log (odds) =  0 +  k X k where logit(p i )logit transformation of the probability of the event  0 intercept of the regression line  k difference between the logits for category k vs. the reference category

12 LOGISTIC and GENMOD procedures for a single categorical predictor PROC LOGISTIC DATA=dataset ; CLASS variables ; MODEL response=predictors ; OUTPUT OUT=SAS-data-set keyword=name ; RUN; PROC GENMOD DATA=dataset ; CLASS variables ; MAKE ‘OBSTATS’ OUT=SAS-data-set; MODEL response=predictors ; RUN;

13 Class statement in proc logistic SAS will create dummy variables for a categorical variable if you tell it to. We need to specify dummy coding by using the param = ref option in the class statement; we can also specify the comparison group by using the ref = option after the variable name. Using class automatically generates a test of significance for all parameters associated with the class variable (table of Type 3 tests); if you use dummy variables instead (more on this soon), you will not automatically get an “overall” test for that variable. We will see this more clearly in the SAS examples.

14 Reference category Each factor has as many parameters as categories, but one is redundant, so we need to specify a reference category. Similar concept to what you just learned for simple linear regression.

15 Interpretation of a single categorical parameter If your reference group is level 0, then the coefficient of β k represents the difference in the log odds between level k of your variable and level 0. Therefore, e β is an odds ratio for category k vs. the reference category of x.

16 Creating your own dummy variables and not using the class statement An equivalent model uses dummy variables (that you create), which accounts for redundancy by not including a dummy variable for your reference category. The choice of reference category is arbitrary. Remember, this method will not produce an “overall” test of significance for that variable.

17 Hypothesis testing Significance tests focuses on a test of H 0 : β = 0 vs. H a : β ≠ 0. The Wald, Likelihood Ratio, and Score test are used (we’ll focus on Wald method) Wald CI easily obtained, score and LR CI numerically obtained. For Wald, the 95% CI (on the log odds scale) is

18 95% CI for parameter Similarly, the Wald 95% CI for the odds ratio is obtained by exponentiation. The following yields the lower and upper 95% confidence limits: 1.96 corresponds to z 0.05/2, where z~N(0,1)

19 Hypothesis testing (ctd) The Wald statistic of the test H 0 : β = β 0 is Under H 0, the test statistic is asymptotically chi-sq. with 1 df (at α = 0.05, the critical value is 3.84).

20 References 1.Paul D. Allison, “Logistic Regression Using the SAS System: Theory and Application”, SAS Institute, Cary, North Carolina, 1999. 2.Alan Agresti, “Categorical Data Analysis”, 2 nd Ed., Wiley Interscience, 2002. 3.David W. Hosmer and Stanley Lemeshow “Applied Logistic Regression”, Wiley- Interscience, 2 nd Edition, 2000.


Download ppt "Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez."

Similar presentations


Ads by Google