Logistic Regression In logistic regression the outcome variable is binary, and the purpose of the analysis is to assess the effects of multiple explanatory variables, which can be numeric and/or categorical, on the outcome variable.
Requirements for Logistic Regression The Following need to be specified: 1) An outcome variable with two possible categorical outcomes (1=success; 0=failure). 2) A way to estimate the probability P of the outcome variable. 3) A way of linking the outcome variable to the explanatory variables. 4) A way of estimating the coefficients of the regression equation, as well as their confidence intervals. 5) A way to test the goodness of fit of the regression model.
Measuring the Probability of Outcome The probability of the outcome is measured by the odds of occurrence of an event. If P is the probability of an event, then (1-P) is the probability of it not occurring. Odds of success = P / 1-P
The Logistic Regression The joint effects of all explanatory variables put together on the odds is The joint effects of all explanatory variables put together on the odds is Odds = P/1-P = e α + β1X1 + β2X2 + …+βpXp Taking the logarithms of both sides Log{P/1-P} = log α+β1X1+β2X2+…+βpXp Logit P = α+β1X1+β2X2+..+βpXp The coefficients β1, β2, βp are such that the sums of the squared distance between the observed and predicted values (i.e. regression line) are smallest.
The Logistic Regression Logit p = α + β1X1 +β2X βpXp Logit p = α + β1X1 +β2X βpXp α represents the overall disease risk β1 represents the fraction by which the disease risk is altered by a unit change in X1 β2 is the fraction by which the disease risk is altered by a unit change in X2 β2 is the fraction by which the disease risk is altered by a unit change in X2 ……. and so on. ……. and so on. What changes is the log odds. The odds themselves are changed by e β If β = 1.6 the odds are e 1.6 = 4.95
Analysis in Logistic Regression - 1 The study to be analysed is about the use of radioisotope thallium while the subject is made to exercise. 100 subjects underwent both thallium exercise and cardiac catheterisation. Some were on propranol. Change in heart rate if more than 85% of maximum, E.C.G. and occurrence of pain during exercise were recorded. The study to be analysed is about the use of radioisotope thallium while the subject is made to exercise. 100 subjects underwent both thallium exercise and cardiac catheterisation. Some were on propranol. Change in heart rate if more than 85% of maximum, E.C.G. and occurrence of pain during exercise were recorded.
Interpreting the Computer Printout Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant Stn Propran HrtRate IscExr Sex PnExr Log-Likelihood = Test that all slopes are zero: G = , DF = 6, P-Value = Goodness-of-Fit Tests Method Chi-Square DF P Pearson Deviance Hosmer-Lemeshow
Interpreting the Computer Printout - 2 Table of Observed and Expected Frequencies: (See Hosmer-Lemeshow Test for the Pearson Chi-Square Statistic) Group Value Total 1 Obs Exp Obs Exp Pairs Number Percent Summary Measures Concordant % Somers' D 0.51 Discordant % Goodman-Kruskal Gamma 0.52 Ties % Kendall's Tau-a 0.23 Total %
Regression Diagnostics In logistic regression Residual = 1− Estimated probability. Residuals for each subject are calculated standardised and plotted against probability. Eight diagnostic plots are available, four dealing with residuals and four with leverage. In logistic regression Residual = 1− Estimated probability. Residuals for each subject are calculated standardised and plotted against probability. Eight diagnostic plots are available, four dealing with residuals and four with leverage. These plots are demonstrated in the slides that follow. These plots are demonstrated in the slides that follow.
Diagnostic plots for residuals
Diagnostic plots for leverage