Presentation on theme: "Simple Logistic Regression"— Presentation transcript:
1Simple Logistic Regression An introduction toPROC FREQ andPROC LOGISTIC
2Introduction to Logistic Regression Logistic Regression is used when the outcome variable of interest is categorical, rather than continuous. Examples include: death vs. no death, recovery vs. no recovery, obese vs. not obese, etc. All of the examples you will see in this class have binary outcomes, meaning there are only two possible outcomes.Simple Logistic Regression has only one predictor variable. You may already be familiar with this type of regression under a different name: odds ratio.
3Simple Logistic Regression: An example Imagine you are interested in investigating whether there is a relationship between race and party identification. Race (Black or White) is the independent variable, and Party Identification (Democrat or Republican) is the dependent variable. Consider the following table:Example from Agresti, A. Categorical Data Analysis, 2nd ed
4Race x Party Identification DemocratRepublicanBlack10311White341405
5The odds of being a Democrat for Black vs. White is: OR(odds ratio) = (103/11)/(341/405) = (103x405)/(341x11) = 11.12Blacks have a times greater odds of being a Democrat than Whites.The odds of being a Republican for Black vs. White is:(11/103)/(405/341) = (11x341)/(405x103) = 0.09Blacks have a 91% (1-0.09) lower odds of being a Republican than Whites.
6Odds Ratios in SASCopy the following code into SAS:
7Odds Ratios with PROC FREQ There are two ways to get Odds Ratios in SAS when there is one predictor and one outcome variable. The first is with PROC FREQ. Type the following code into SAS:
8Notes about the SAS code: weight is a term in SAS which weighs whatever variable you specify. When you have a table you want to enter into SAS, it is often easier to use a “count” variable rather than list each subject individually. Because the data set has 860 observations, we would have to type out 860 separate datalines if we did not use the “count” variable and “weight count” option.
9TABLES tells SAS to construct a table with the two specified variables (in this case, race and party).The chisq option requests all Chi-Square statistics.The relrisk option gives you estimates of the odds ratio and relative risks for the two columns.
11Reading the TableEach cell has four numbers: count, percent, row %, and column %There are 103 Black Democrats, which is 11.98% of the total sample.90.35% of Blacks are Democrats.20.32% of Democrats are Black. Compare this to 2.64% of Republicans who are Black.
12Interpreting Chi-Square Statistic The Chi-Square (Χ2) test statistic tests the null hypothesis that two variables are independent versus the alternative, that they are not independent (that is, related).Ho: race and party identification are independentHa: race and party identification are associatedΧ2 = , pvalue <Reject Ho. Conclude that race and party identification are associated.
14Interpreting the Odds Ratio You can find the OR in the SAS output under “Case-Control (Odds Ratio).”The odds ratio is with a 95% Confidence Interval of [5.87, 21.05]. Because this C.I. does not contain 0, we know that the OR is statistically significant.Blacks have a times greater odds of being Democratic than Whites.
15A note about the PROC FREQ table: DemRepBlack10311White341405Notice the way the table isset up in SAS:When calculating the OR in PROC FREQ, SAS will alphabetize the table, and this affects the OR it will calculate. SAS is calculating the odds of being a Democrat for Blacks versus Whites (or the odds of being Black for Democrats versus Republicans). If you wanted the odds of being Democratic for Whites versus Blacks, you would have to either calculate this by hand or use PROC LOGISTIC.
16Odds Ratio with PROC LOGISTIC To simplify our data set, we will change our variables to have values of 1 and 0, rather than B/W and D/R. If someone is Black, s/he will have a value of “1” for the variable “race2.” Whites will have a value of “0.” If someone is a Democrat, s/he will have a value of “1” for “party2.” Republicans will have a value of “0.” Type the following code into SAS, which creates a new data set called “partyid2”:
17PROC LOGISTICOnce you have created the new data set, do regression analysis on the data, using PROC LOGISTIC (notice the format is similar to that of linear regression, with the model statement y = x):“Descending” tells SAS to model the probability that “party2” = 1 (Democratic). If you did not include the descending statement, SAS would model the probability that “party2” = 0 (Republican). All subsequent interpretations will be in terms of the odds of being Democratic, not Republican.
19Interpreting the Output From PROC LOGISITC, we now have an equation for our log(odds):Log(odds) = β0 + β1xLog(odds) = xwhere x = 1 if the person is Black and x = 0 if the person is White.
20Calculating the Odds Ratio Suppose we wanted to know the odds of being a Democrat for Blacks vs. Whites.The log(odds) of being Democratic for Blacks is:β0 + β1(1) = β0 + β1The log(odds) of being Democratic for Whites is:β0 + β1(0) = β0.To calculate the OR, take the log(odds) for Blacks minus the log(odds) for Whites:β0 + β1 – (β0) = β1Then exponentiate this value:exp(β1) = exp(2.4088) = 11.12This is the same OR calculated earlier using PROC FREQ. In addition, it is given to you in the PROC LOGISTIC output under “Odds Ratio Estimates” with the 95% C.I.
21Calculating the OR, cont. Suppose we wanted to know the odds of being a Democrat for Whites vs. Blacks.To calculate the OR, take the log(odds) for Whites minus the log(odds) for Blacks:β0 – (β0 + β1) = -β1Then exponentiate this value:exp(-β1) = exp( ) =Whites have a 91% ( ) decreased odds of being Democratic than Blacks.
22Significance TestingTesting the significance of a parameter estimate can be done by constructing a confidence interval around that parameter estimate.If the C.I. for an estimate (or log(OR)) contains 0, the variable is not significantly associated with the outcome.If the C.I. for an OR contains 1, the variable is not significantly associated with the outcome.
23The Wald Chi-Square statistic tests whether the parameter estimate equals zero, that is Ho: β1 = 0 vs. Ha: β1 ≠ 0.From the output, we see that the pvalue of this test < , so we reject Ho and conclude that race is significantly related to party identification.
24Confidence Interval Construction Confidence interval construction is similar to what you have seen for linear regression, except that it is now on the natural log scale:95% C.I. for β1 = β1 +/- 1.96*se(β1)= /- 1.96*(0.3256)= [1.77,3.05]. This C.I. does not contain 0.exp [1.77,3.05] = [5.875, ] This C.I. does not contain 1.Notice that [5.875, ] is also the 95% C.I. for the OR given in the SAS output.
25Calculating the Probability If you were asked to calculate the probability that someone is a Democrat, given that he is Black, you would use the following formula:Π(probability) = exp(log(odds))/[1+ exp(log(odds))]Π = exp( )/[1+ exp( )] =A Black person has a 90.35% chance of being a Democrat.
26SummaryThis has been an introduction to calculating odds ratios in PROC FREQ and PROC LOGISTIC. The next section will introduce you to multiple predictors in logistic regression, including interactions.