# Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

## Presentation on theme: "Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research."— Presentation transcript:

Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research

Objectives of Session Understand what is meant by a binary outcome Understand what is meant by a binary outcome How analyses of binary outcomes implemented in logistic regression model How analyses of binary outcomes implemented in logistic regression model Understand when a logistic model is appropriate Understand when a logistic model is appropriate Be able to implement in SPSS and Be able to implement in SPSS and Interpret logistic model output Interpret logistic model output

Binary Outcome Extremely common in health research: Dead / Alive Dead / Alive Hospitalisation (Yes / No) Hospitalisation (Yes / No) Diagnosis of diabetes (Yes / No) Diagnosis of diabetes (Yes / No) Met target e.g. total cholesterol < 5.0 mmol/l (Yes / No) Met target e.g. total cholesterol < 5.0 mmol/l (Yes / No) n.b. Can use any code such as 1 / 2 but mathematically easier to use 0 / 1

How is relationship formulated? For linear simplest equation is : y is the outcome; a is the intercept; b is the slope related to x the explanatory variable and; e is the error term or random ‘noise’

Can we fit y as a probability range 0 to 1? Not quite! Y as continuous can take any value from - ∞ to + ∞ Outcome is a probability of event, Π (or p) on scale 0 – 1 Certain transformations of p can give the required scale Probit is a normal transformation of p But not easy to interpret results

We can now fit p as a probability range 0 to 1 And y in range -∞ to + ∞ The logit transformation works!

Logistic Regression Model This has very useful properties The term p/(1-p) is called the ‘Odds’ of an event Note: not the same as the probability of an event p If x is binary coded 0/1 then - exp (b) = ODDS RATIO for the outcome in those coded 1 relative to code 0 e.g. Odds of death in men (1) vs. women (0)

Logistic Regression Model Consider the LDL data. It has two binary outcomes – 1)LDL target achieved 2)Chol target achieved For example consider gender as a predictor – Male = 1 & Female = 2

For a binary x we can express results as odds ratios (available in crosstabs) 140563 149531 NoYes Male Female LDL target achieved Gender Odds yes = 563/140 Odds yes = 531/149

Odds ratio = 4.02 / 3.56 OR = 0.886 Female cf Male 140563 149531 No Yes Male Female LDL target achieved Gender Odds yes = 563/140 = 4.02 Odds yes = 531/149 = 3.56 N.b. Odds is different to prob – Men p = 563/(140+563) = 0.80 or 80%

Odds ratio from Crosstabs Obtain odds ratios for 2 x 2 tables from crosstabs and select option ‘risk’

Results from Crosstabs Odds ratios for achieving LDL target in females vs. males n.b. OR given for Female vs male = 0.886

Fit Logistic Regression Model Dependent is binary outcome – LDL target met (Yes = 1, No = 0) Independent – Gender 1 = M, 2 = F Should get same as the crosstabs result Select Analyze / Regression / Binary Logistic Select option of 95% CI for exp (b)

Regression / Binary logistic…..

Odds ratio from logistic model results for a binary predictor EXP (B) = Odds ratio F vs. M Note that OR for Men vs Women = 1/0.886 = 1.13

Fit Logistic Regression Model – continuous predictor Dependent is binary outcome – LDL target met Independent – Continuous predictor – Adherence B represents the change in the ODDS RATIO for a 1 unit increase in adherence B x 10 represents the change in the ODDS RATIO for a 10 unit increase in adherence

Odds ratio from logistic model results for a continuous EXP (B) = Odds ratio for 1% increase in Adherence OR for 10% increase is exp(10 x 0.010) = 1.105 i.e. a 10.5% increase in odds of meeting LDL target for each 10% increase in adherence

Fit Logistic Regression Model – categorical predictor Dependent is binary outcome – LDL target met Independent – APOE genotype (1 – 6) Choose a reference category, in this case worst outcome is genotype 6 so choose 6 to give ORs > 1 B represents the OR for each category relative to the reference category

Regression / Binary logistic….. Choose Categorical

Odds ratios from logistic model results for a categorical predictor EXP (B) = Odds ratio for APOE (2) vs APOE (6) OR = 4.381 (95% CI 1.742, 11.021)

Epidemiological Designs Logistic model common in epidemiological research Logistic model common in epidemiological research In case-control designs, case is coded 1 and controls as 0 and used as dependent variable In case-control designs, case is coded 1 and controls as 0 and used as dependent variable In cohort study outcome (e.g. death) is used as binary outcome in logistic model In cohort study outcome (e.g. death) is used as binary outcome in logistic model Note in cohort study exp(b) is Relative Risk (RR) rather than OR Note in cohort study exp(b) is Relative Risk (RR) rather than OR

Definition- Clinical Prediction Rule Clinical tool that quantifies contribution of: Clinical tool that quantifies contribution of: – History – Examination – Diagnostic tests Stratify patients according to probability of having target disorder Stratify patients according to probability of having target disorder Outcome can be in terms of diagnosis, prognosis, referral or treatment Outcome can be in terms of diagnosis, prognosis, referral or treatment

Thresholds for decision making Diagnosis / test threshold Test / reassurance threshold Derived Probability of disease 100% 0% Treatment Further diagnostic testing Reassurance

Ottawa ankle rule

Identify high risk through ‘risk stratification’ and Intervene through case management at highest risk Risk Stratification Kaiser-Permanente Pyramid

Framingham Risk Algorithm Prediction of risk: Cardiovascular (Framingham) Prediction of risk: Cardiovascular (Framingham) 55 yr-old woman 15-20% 5 yr risk

Increasing appearance of “prediction models” in literature (ISI Web of Knowledge v3)

Stages of development and assessment of a CPR Cross Sectional orCohort Randomized Controlled Trial Cross Sectional orCohort Step 1 Derivation Identification of factors with predictive power Step 2 Validation Evidence of reproducible accuracy Application of a rule in similar clinical settings and population or better still multiple clinical settings and different populations with varying prevalence and outcomes of disease Step 3 Impact Analysis Evidence that rule changes physician behaviour and improves patient outcomes and /or reduces costs

How to derive a CPR? 1.Toss a coin to make decision? 2.Individual opinion and experience? 3.Huddle of wise ones – Delphi technique to reach consensus? 4.Statistical prediction models !

Regression Models for prediction In all of these models we combine a set of factors: In all of these models we combine a set of factors: Usually between 2-20 predictors Occam’s razor suggests smaller is better Fit a multiple regression model Fit a multiple regression model Extract probabilities of outcome or diagnosis Extract probabilities of outcome or diagnosis Create CPR Create CPR

Regression Models for prediction Linear if outcome continuous Linear if outcome continuous Binary Outcomes Binary Outcomes Logistic regression model Survival models – Cox PH, Weibull, log logistic, etc Ordinal or nominal outcomes Ordinal or nominal outcomes Ordinal logistic regression

We can now fit p as a probability range 0 to 1 And y in range -∞ to + ∞ The logit transformation

Statistical prediction Models Logistic regression model: p= probability of the Event and effect of factors (x) increase or decrease risk of this event

Derivation of probability of events Logistic regression model: Call Linear Predictor as a linear function of the predictors x 1, x 2, x 3, etc….

Derivation of probability of events Then: Take exp of both sides :

Derivation of probability of events Then rearrange: Or:

Example: PEONY model to predict risk of emergency admission to hospital over the next year Now implemented in NHS Tayside as part of Virtual Wards management of LTC PEONY II model developed – watch this space! Donnan et al Arch Int Med 2008 Risk Stratification based on derived probabilities

Other binary models The logistic model is only applicable whenever the length of follow-up is same for each individual e.g. 5-yr follow-up of a cohort For binary outcomes where censoring occurs i.e. people leave the cohort from death or migration then length of follow- up varies and need to use survival models such as Cox Proportional Hazards model

Summary Logistic model easily fitted in SPSS Logistic model easily fitted in SPSS Clear link with ODDS RATIOS Clear link with ODDS RATIOS Common model for case-control, cohort studies as well as development of clinical prediction models Common model for case-control, cohort studies as well as development of clinical prediction models

General References Campbell MJ, Machin D. Medical Statistics. A commonsense approach. 3 rd ed. Wiley, New York, 1999. Campbell MJ, Machin D. Medical Statistics. A commonsense approach. 3 rd ed. Wiley, New York, 1999. Hosmer DW and Lemeshow S. Applied logistic regression. John Wiley& sons, New Jersey, 2000. Hosmer DW and Lemeshow S. Applied logistic regression. John Wiley& sons, New Jersey, 2000. Altman DG. Practical statistics for medical research. London: Chapman and Hall, 1991. Altman DG. Practical statistics for medical research. London: Chapman and Hall, 1991. Armitage P and Berry G. Statistical Methods in Medical research. 3 rd ed. Oxford: Blackwell Scientific, 1994. Armitage P and Berry G. Statistical Methods in Medical research. 3 rd ed. Oxford: Blackwell Scientific, 1994. Agresti A. An introduction to Categorical Data Analysis. Wiley, New York, 1996. Agresti A. An introduction to Categorical Data Analysis. Wiley, New York, 1996.

Practical: Fit Multiple Logistic Regression Model Dependent is binary outcome – LDL target met (Yes = 1, No = 0) Independent – Gender 1 = M, 2 = F, add APOE, adherence, etc Remember Select Analyze / Regression / Binary Logistic Select option of 95% CI for exp (b)

3) Screening for variables to eliminate Consider screening procedures to eliminate a number of variables under consideration Consider screening procedures to eliminate a number of variables under consideration Test each variable separately Test each variable separately If p > 0.3 then they would have to be very strong confounders to become significant on adjustment in a multiple regression so could be discarded If p > 0.3 then they would have to be very strong confounders to become significant on adjustment in a multiple regression so could be discarded Hosmer-Lemeshow criteria Hosmer-Lemeshow criteria

4) A mixture of automatic procedures and self selection Use automatic procedures as a guide Use automatic procedures as a guide Compare stepwise and backward elimination Compare stepwise and backward elimination Think about what factors are important Think about what factors are important Add ‘important’ factors Add ‘important’ factors Do not follow blindly statistical significance Do not follow blindly statistical significance

Remember Occam’s Razor ‘Entia non sunt multiplicanda praeter necessitatem’ ‘Entities must not be multiplied beyond necessity’ William of Ockham 14 th century Friar and logician 1288-1347

Download ppt "Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research."

Similar presentations