BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.

Slides:



Advertisements
Similar presentations
Brief introduction on Logistic Regression
Advertisements

Logistic Regression Psy 524 Ainsworth.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
KRUSKAL-WALIS ANOVA BY RANK (Nonparametric test)
FIN822 Li11 Binary independent and dependent variables.
Simple Logistic Regression
Logit & Probit Regression
1Prof. Dr. Rainer Stachuletz Limited Dependent Variables P(y = 1|x) = G(  0 + x  ) y* =  0 + x  + u, y = max(0,y*)
Maximum likelihood (ML) and likelihood ratio (LR) test
Detecting Spatial Clustering in Matched Case-Control Studies Andrea Cook, MS Collaboration with: Dr. Yi Li November 4, 2004.
Measures of Disease Association Measuring occurrence of new outcome events can be an aim by itself, but usually we want to look at the relationship between.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Maximum likelihood (ML) and likelihood ratio (LR) test
Clustered or Multilevel Data
FIN357 Li1 Binary Dependent Variables Chapter 12 P(y = 1|x) = G(  0 + x  )
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
BIOST 536 Lecture 9 1 Lecture 9 – Prediction and Association example Low birth weight dataset Consider a prediction model for low birth weight (< 2500.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Log-linear and logistic models
BIOST 536 Lecture 12 1 Lecture 12 – Introduction to Matching.
Cumulative Geographic Residual Test Example: Taiwan Petrochemical Study Andrea Cook.
Linear and generalised linear models
Sample Size Determination
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Unit 6: Standardization and Methods to Control Confounding.
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Contingency tables Brian Healy, PhD. Types of analysis-independent samples OutcomeExplanatoryAnalysis ContinuousDichotomous t-test, Wilcoxon test ContinuousCategorical.
The binomial applied: absolute and relative risks, chi-square.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
BIOST 536 Lecture 11 1 Lecture 11 – Additional topics in Logistic Regression C-statistic (“concordance statistic”)  Same as Area under the curve (AUC)
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
BIOST 536 Lecture 1 1 Lecture 1 - Introduction Overview of course  Focus is on binary outcomes  Some ordinal outcomes considered Simple examples Definitions.
Introduction to logistic regression and Generalized Linear Models July 14, 2011 Introduction to Statistical Measurement and Modeling Karen Bandeen-Roche,
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
Case-Control Studies Abdualziz BinSaeed. Case-Control Studies Type of analytic study Unit of observation and analysis: Individual (not group)
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Logistic Regression Analysis Gerrit Rooks
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Matched Case-Control Study Duanping Liao, MD, Ph.D Phone:
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 3 – Slide 1 of 27 Chapter 11 Section 3 Inference about Two Population Proportions.
Nonparametric Statistics
Week 7: General linear models Overview Questions from last week What are general linear models? Discussion of the 3 articles.
Analysis of matched data Analysis of matched data.
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 8.1: Cohort sampling for the Cox model.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Methods of Presenting and Interpreting Information Class 9.
BINARY LOGISTIC REGRESSION
Logistic Regression APKC – STATS AFAC (2016).
Advanced Quantitative Techniques
CHAPTER 7 Linear Correlation & Regression Methods
Notes on Logistic Regression
THE LOGIT AND PROBIT MODELS
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
THE LOGIT AND PROBIT MODELS
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Presentation transcript:

BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model

BIOST 536 Lecture 4 2 Logistic regression estimation Modeling a binary outcome  Modeling the P(Y=1|X=x), not a continuous Y  Linear model has problems since the left-hand side 0 ≤ Pr ≤ 1, but the  ’s can be anywhere between (-∞,∞) so does not do well  Instead assume that P(Y=1|X=x) depends on X only through the linear combination  Need to link Z to P(Y=1|Z=z) so we use a logistic function

BIOST 536 Lecture 4 3 Logistic function Symmetric – could model either Pr(Y=1) or Pr(Y=0) and the effect of covariates would be the same Epidemiologists prefer logistic model since the OR is easily derived Mathematically convenient form Maximum likelihood equations have a simple form Need to solve these equations by iteration Iteration rarely goes awry if the data are not too sparse

BIOST 536 Lecture 4 4 Link functions Logistic function very close to the probit model (symmetric) Probit model used more in classical diagnostic testing and ROC analysis Call “logistic” and “probit” link functions that link the P(Y|X) though the linear combination Other link functions include something called the “complimentary log-log” (asymmetrical) Some general regression programs in Stata expect you to specify the link function We will assume a logistic link function here but can test in Stata using the linktest command

BIOST 536 Lecture 4 5 Simple estimation example Exposed (X=1)Unexposed (X=0)Total Case (Y=1) Control (Y=0) Total Adopt a logistic model so For the unexposed, probability of a case is For the unexposed, the probability of a control is For the exposed, the probability of a case is For the exposed, the probability of being a control is

BIOST 536 Lecture 4 6

7 The log likelihood is a well-behaved surface that is a function of the betas The maximum log likelihood is achieved at For the unexposed, the estimate is which is just the proportion of cases in the unexposed group For the exposed, the estimate is which is just the proportion of cases in the exposed group

BIOST 536 Lecture 4 8 Now do this under the null hypothesis that exposure does not make a difference, i.e. Then the likelihood depends on only Then the log likelihood is a function of The maximum log likelihood is achieved at Ignoring exposure, the estimated probability of being a case is which is the overall proportion of cases

BIOST 536 Lecture 4 9 Tests comparing nested models Want to decide if the more complex model is significantly better than the simpler model Possible tests comparing nested models (complex model includes all covariates of the simpler model) 1. Likelihood ratio test – direct comparison of the difference in log- likelihoods Preferred test Does not change with reparametrization 2. Score test – test computed at the null hypothesis values Very similar to LR test Sometimes can be computed when the LR test cannot Many of our common tests are score tests 3. Wald test – depends on the normality of the distribution of the estimates Can change with reparametrization P-value given in Stata output for individual variables

BIOST 536 Lecture 4 10

BIOST 536 Lecture 4 11

BIOST 536 Lecture 4 12 Summary about estimation Maximum likelihood estimation is preferred for binary outcome data The log likelihood depends on the betas that in turn depend on what covariates are in the model In some cases the beta estimates are related to familiar values, but usually have to iterate to get the estimates Difference in log likelihoods can sometimes test one model against another Odds ratios turn out to be related to the logistic regression coefficients

BIOST 536 Lecture 4 13 Example Study of identification of domestic violence (DV) identification in a medical setting ( PI: Robert Thompson, MD ; Co-PI: Fred Rivara, MD) Clinics were randomized to be either intervention (2 clinics) or control clinics (3 clinics) Intervention clinics received training in DV detection and some support services; clinics enrollees received materials Questions: 1. Did the intervention improve detection ? 2. Did it improve the rate of physicians asking about DV?

BIOST 536 Lecture 4 14

BIOST 536 Lecture 4 15

BIOST 536 Lecture 4 16

BIOST 536 Lecture 4 17

BIOST 536 Lecture 4 18

BIOST 536 Lecture 4 19

BIOST 536 Lecture 4 20 Confounding Confounding variable is related to both disease and exposure occurrence that modifies the relationship between exposure and disease We can adjust for the confounder explicitly by modeling or implicitly through stratification Two necessary relationships: 1. Confounder must be related to exposure in the data 2. Confounder must be independently related to disease in the population Example 1: Suppose (1) Age  exposure to menopausal estrogens and (2) Age  endometrial cancer Should we control for age? Example 2: (1) menopausal estrogens  endometrial hyperplasia and (2) endometrial hyperplasia  endometrial cancer Do we want to control for endometrial hyperplasia in studying the association between exposure to menopausal estrogens and endometrial cancer ?

BIOST 536 Lecture 4 21 Failure to account for confounding can increase or decrease the odds ratio Hypothetical data from Breslow & Day, Volume 1

BIOST 536 Lecture 4 22 Observational cohort study

BIOST 536 Lecture 4 23

BIOST 536 Lecture 4 24

BIOST 536 Lecture 4 25 If this model is correct, would the Mantel-Haenszel approach give an alternative estimate of the odds ratio?

BIOST 536 Lecture 4 26 Prefer the initial parametrization because we may want to test whether  = 0 (no interaction) or  =  =0 (no association of exposure with outcome for either confounder level) The α’s give us the baseline levels for the two confounder levels

BIOST 536 Lecture 4 27 Case-control study

BIOST 536 Lecture st part depends on the sampling and 2 nd on exposure and confounders

BIOST 536 Lecture 4 29 Case-control log odds ratio is the difference between two logits (exposed versus unexposed)

BIOST 536 Lecture 4 30

BIOST 536 Lecture 4 31 Example of sampling proportions

BIOST 536 Lecture 4 32 Summary about confounding We can control for confounding by stratification or modeling For a cohort sample with a binary exposure, binary confounder, and a binary outcome the probability model is For a case-control sample with a binary exposure, binary confounder, and a binary outcome the probability model is but the sampling fractions may be unknown The odds ratios can be estimated from either cohort or case-control studies, but absolute risk probabilities can be made only from a cohort study unless the sampling probabilities are known