Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data.

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Brief introduction on Logistic Regression
M2 Medical Epidemiology
Logistic Regression Psy 524 Ainsworth.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Logistic Regression.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Overview of Logistics Regression and its SAS implementation
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Adjusting for extraneous factors Topics for today Stratified analysis of 2x2 tables Regression Readings Jewell Chapter 9.
PH6415 Review Questions. 2 Question 1 A journal article reports a 95%CI for the relative risk (RR) of an event (treatment versus control as (0.55, 0.97).
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
EPI 809/Spring Multiple Logistic Regression.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
Cumulative Geographic Residual Test Example: Taiwan Petrochemical Study Andrea Cook.
An Introduction to Logistic Regression
Adjusting for extraneous factors Topics for today More on logistic regression analysis for binary data and how it relates to the Wolf and Mantel- Haenszel.
Generalized Linear Models
Ordinal Logistic Regression “Good, better, best; never let it rest till your good is better and your better is best” (Anonymous)
Conditional Logistic Regression for Matched Data HRP /25/04 reading: Agresti chapter 9.2.
Unit 4b: Fitting the Logistic Model to Data © Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 1
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
April 11 Logistic Regression –Modeling interactions –Analysis of case-control studies –Data presentation.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Analysis of matched data HRP /02/04 Chapter 9 Agresti – read sections 9.1 and 9.2.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
Applied Epidemiologic Analysis - P8400 Fall 2002
Linear correlation and linear regression + summary of tests
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Linear vs. Logistic Regression Log has a slightly better ability to represent the data Dichotomous Prefer Don’t Prefer Linear vs. Logistic Regression.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic.
Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables.
1 Chapter 2: Logistic Regression and Correspondence Analysis 2.1 Fitting Ordinal Logistic Regression Models 2.2 Fitting Nominal Logistic Regression Models.
Going from data to analysis Dr. Nancy Mayo. Getting it right Research is about getting the right answer, not just an answer An answer is easy The right.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
1 Topic 4 : Ordered Logit Analysis. 2 Often we deal with data where the responses are ordered – e.g. : (i) Eyesight tests – bad; average; good (ii) Voting.
1 Chapter 16 logistic Regression Analysis. 2 Content Logistic regression Conditional logistic regression Application.
Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
LOGISTIC REGRESSION Binary dependent variable (pass-fail) Odds ratio: p/(1-p) eg. 1/9 means 1 time in 10 pass, 9 times fail Log-odds ratio: y = ln[p/(1-p)]
Qualitative and Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Dates Presentations Wed / Fri Ex. 4, logistic regression, Monday Dec 7 th Final Tues. Dec 8 th, 3:30.
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
Analysis of matched data Analysis of matched data.
Logistic Regression Logistic Regression - Binary Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
Measures of disease frequency Simon Thornley. Measures of Effect and Disease Frequency Aims – To define and describe the uses of common epidemiological.
CHAPTER 7 Linear Correlation & Regression Methods
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Logistic Regression.
Logistic Regression.
Presentation transcript:

Logistic Regression III: Advanced topics

Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data

Recall: Matching Matching can control for extraneous sources of variability and increase the power of a statistical test. Match M controls to each case based on potential confounders, such as age and gender. If the data are matched, you must account for the matching in the statistical analysis!!

Recall: Agresti example, diabetes and MI Match each MI case to an MI control based on age and gender. Ask about history of diabetes to find out if diabetes increases your risk for MI.

Diabetes No diabetes DiabetesNo Diabetes MI cases MI controls odds(“favors” case/discordant pair) =

Conditional Logistic Regression

The Conditional Likelihood: each discordant stratum (rather than individual) gets 1 term in the likelihood Note: the marginal probability of disease may differ in each age-gender stratum, but we assume that the (multiplicative) increase in disease risk due to exposure is constant across strata. For each stratum, we add to the likelihood: the CONDITIONAL probability that the case got disease and the control did not, given that we have a case-control pair. The numerator is the probability (as a function of exposures) that the case gets disease and the control does not. The denominator is the probability that the case gets disease and the control does not OR that the control (with all her exposures) gets disease and the case doesn’t (with all her exposure).

Recall probability terms:

Diabetes No diabetes Case (MI)Control Diabetes No diabetes Case (MI)Control Diabetes No diabetes Case (MI)Control Diabetes No diabetes Case (MI)Control

 The conditional likelihood= Each age-gender stratum has the same baseline odds of disease; but these baseline odds may differ across strata

Conditional Logistic Regression

Example: MI and diabetes

Conditional Logistic Regression

In SAS… proc logistic data = YourData; model MI (event = "Yes") = diabetes; strata PairID; run;

Could there be an association between exposure to ultrasound in utero and an increased risk of childhood malignancies? Previous studies have found no association, but they have had poor statistical power to detect an association. Swedish researchers performed a nationwide population based case-control study using prospectively assembled data on prenatal exposure to ultrasound. Example:Prenatal ultrasound examinations and risk of childhood leukemia: case-control study BMJ 2000;320:

535 cases: all children born and diagnosed as having myeloid leukemia between 1973 and 1989 in Swedish registers of birth, cancer, and causes of death. 535 matched controls: 1 control was randomly selected for each case from the Swedish Birth Registry, matched by sex and year and month of birth.

Ultrasound No ultrasound UltrasoundNo Ultrasound Leukemia cases Myeloid leukemia controls But this type of analysis is limited to single dichotomous exposure…

Used conditional logistic regression to look at dose-response with number of ultrasounds: Results: Reference OR = 1.0; no ultrasounds OR =.91 for 1-2 ultrasounds OR=.64 for >=3 ultrasounds Conclusion: no evidence of a positive association between prenatal ultrasound and childhood leukemia; even evidence of inverse association (which could be explained by reasons for frequent ultrasound)

Each term in the likelihood represents a stratum of 1+M individuals More complicated likelihood expression! Just as easy to implement in SAS as we’ll see Wednesday… Extension: 1:M matching

Ordinal Logistic Regression

What if your outcome variable has more than two levels? For ordinal outcomes, use ordinal logistic regression: *Relies on the cumulative logit *Models the predicted probability of multiple outcomes *Also known as the “proportional odds model”

Ordinal Variable Example: Likert Scale 1 = strongly disagree 2 = disagree 3 = neutral 4 = agree 5 = strongly agree Cumulative outcomes: *strongly agree vs. the rest *agree or strongly agree vs. neutral or negative *agree or neutral vs. negative *the rest vs. strongly negative Ordinal logistic regression gives you a way to model these cumulative outcomes all at once!

Ordinal Variable Example: Continuous variable measured crudely 1 = breastfed >=6 months 2 = breastfed 4-5 months 3 = breastfed 2-3 months 4 = breastfed <2 months The outcome variable, breastfeeding, was only measured at limited time points. So, may not be best modeled as continuous variable in linear regression. Use ordinal logistic!

More inclusive definition of a “positive” outcome Another example, 3 levels: 1 = eumenorrhea (normal menses) (66.6%) 2 = oligomenorrhea (mild irregularity) (24.6%) 3 = amenorrhea (severe irregularity) (8.6%) From my data on runners: Most “severe” outcome

Cumulative logit, 3 groups (2 potential “positive” outcomes) In words: The log odds of having amenorrhea (versus everything else). And the log odds of having any irregularity (versus normal).

Corresponding logistic model (no predictors) The intercept-only model, no predictors (two intercepts!): Log odds (amenorrhea)=  amen Log odds (any irregularity)=  amen or oligo

Fitted model: Logit of amenorreha= 8.6% of my sample has amenorrhea Odds = 8.6/91.4=.094 Ln (.094) = Logit of any irregularity= 33.3% has any irregularity (24.6% + 8.6%) Odds=(1/3)/(2/3) = 1/2 Ln(1/2) = -.70  Fitted models are: Log odds (amenorrhea)= Log odds (any irregularity)= -0.70

Logistic model with predictors: Log odds (amenorrhea)=  amen + β 1 *X 1 + β 2 *X 2 Log odds (any irregularity)=  amen or oligo + β1*X1 + β2*X2 Note, different intercepts but shared betas (shared slopes)!

Odds ratio interpretation (a):

Odds ratio interpretation (b):

Odds ratio interpretation: Interpretation of the betas: e β = adjusted odds ratio For every 1-unit increase in X, it’s the increase in the odds of any menstrual irregularity compared with none and it’s also the increase in the odds of amenorrhea compared with the other two categories (adjusted for any other predictors in the model). Note: proportional odds assumption! The odds ratios are the same across different levels of the outcome.

Example predictor, EDI-A: Score on the anorexia subscale of the eating disorder inventory (EDI-A)

Cumulative logit plot (4 bins) The intercept for any irregularity (the log odds of any irregularity where EDI-A=0) The intercept for amenorrhea (the log odds of amenorrhea where EDI-A=0) These lines should be linear and parallel (equal slopes, one beta!) The slopes represent the increase in the log odds of either outcome for every 1-unit increase in EDI-A score.

Fitted model with EDI-A: Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 Intercept <.0001 EDIA <.0001 Log odds (amen)= *EDI-A Log odds (any irregularity)= *EDI-A

Fitted Model: Predicted logit at every level of EDI-A

Compare actual data and fitted model:

Fitted model with EDI-A: Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits EDIA For every 1-unit increase in EDI-A score, there’s a 13% increase in the odds of being amenorrheic versus the other two categories and a 13% increase in the odds of being amenorrheic or oligomenorrheic versus normal.

Predictions: Log odds (outcome)= *EDIA-1 The model predicts that a woman with an EDI-A score of 15 would have:

Predictions: Predicted logit= Predicted probability = 19% Predicted logit=.4281 Predicted probability = 60.5% 50% probability line

Advantages & disadvantages Ordinal logistic is better than running separate logistic models for different outcomes (e.g., one model for amenorrhea, one model for any irregularity) because of the improvement in statistical power! Ordinal logistic prevents you from having to arbitrarily turn an ordinal variable into a binary variable! But does require that you meet the proportional odds assumption…