Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.

Slides:



Advertisements
Similar presentations
Brief introduction on Logistic Regression
Advertisements

Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Logistic Regression.
Simple Logistic Regression
Logit & Probit Regression
Logistic Regression Example: Horseshoe Crab Data
COPYRIGHT OF: ABHINAV ANAND JYOTI ARORA SHRADDHA RAMSWAMY DISCRETE CHOICE MODELING IN HEALTH ECONOMICS.
Overview of Logistics Regression and its SAS implementation
Regression With Categorical Variables. Overview Regression with Categorical Predictors Logistic Regression.
Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Chapter 8 Logistic Regression 1. Introduction Logistic regression extends the ideas of linear regression to the situation where the dependent variable,
1 Experimental design and analyses of experimental data Lesson 6 Logistic regression Generalized Linear Models (GENMOD)
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Ordinal Logistic Regression
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
EPI 809/Spring Multiple Logistic Regression.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Mental Health Study Example Alachua County, Florida Purpose: Relate mental impairment to two explanatory variables, the severity of life and socioeconomic.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
An Introduction to Logistic Regression
Generalized Linear Models
Logistic Regression In logistic regression the outcome variable is binary, and the purpose of the analysis is to assess the effects of multiple explanatory.
Logistic Regression Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
Hierarchical Binary Logistic Regression
April 11 Logistic Regression –Modeling interactions –Analysis of case-control studies –Data presentation.
Logit model, logistic regression, and log-linear model A comparison.
EIPB 698E Lecture 10 Raul Cruz-Cano Fall Comments for future evaluations Include only output used for conclusions Mention p-values explicitly (also.
Assessing Survival: Cox Proportional Hazards Model
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
01/20151 EPI 5344: Survival Analysis in Epidemiology Interpretation of Models March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
Logistic Regression. Conceptual Framework - LR Dependent variable: two categories with underlying propensity (yes/no) (absent/present) Independent variables:
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
HSRP 734: Advanced Statistical Methods July 17, 2008.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Logistic Regression Analysis Gerrit Rooks
Qualitative and Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Dates Presentations Wed / Fri Ex. 4, logistic regression, Monday Dec 7 th Final Tues. Dec 8 th, 3:30.
1 Say good things, think good thoughts, and do good deeds.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
04/19/2006Econ 6161 Econ 616 – Spring 2006 Qualitative Response Regression Models Presented by Yan Hu.
Analysis of matched data Analysis of matched data.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
BINARY LOGISTIC REGRESSION
EHS Lecture 14: Linear and logistic regression, task-based assessment
Logistic Regression APKC – STATS AFAC (2016).
Advanced Quantitative Techniques
Notes on Logistic Regression
Generalized Linear Models
Logistic Regression & Classification
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Logistic Regression.
Introduction to Logistic Regression
Lexico-grammar: From simple counts to complex models
Presentation transcript:

Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez

Logistic regression Most important model for categorical response (y i ) data Categorical response with 2 levels (binary: 0 and 1) Categorical response with ≥ 3 levels (nominal or ordinal) Predictor variables (x i ) can take on any form: binary, categorical, and/or continuous

Logistic Regression Curve x Probability

Logit Transformation Logistic regression models transform probabilities called logits. where iindexes all cases (observations). p i is the probability the event (a sale, for example) occurs in the i th case. logis the natural log (to the base e).

Assumption pipi (p i )

Logistic regression model with a single continuous predictor logit (p i ) = log (odds) =  0 +  1 X 1 where logit(p i )logit transformation of the probability of the event  0 intercept of the regression line  1 slope of the regression line

LOGISTIC and GENMOD procedure for a single continuous predictor PROC LOGISTIC DATA= dataset ; MODEL response=predictor / ; OUTPUT OUT=SAS-dataset keyword=name ; RUN; PROC GENMOD DATA=dataset ; MAKE ‘OBSTATS’ OUT=SAS-data-set; MODEL response=predictors ; RUN;

Descending option in proc logistic and proc genmod The descending option in SAS causes the levels of your response variable to be sorted from highest to lowest (by default, SAS models the probability of the lower category). In the binary response setting, we code the event of interest as a ‘1’ and use the descending option to model that probability P(Y = 1 | X = x). In our SAS example, we’ll see what happens when this option is not used.

Interpretation of a single continuous parameter The sign (±) of β determines whether the log odds of y is increasing or decreasing for every 1-unit increase in x. If β > 0, there is an increase in the log odds of y for every 1-unit increase in x. If β < 0, there is a decrease in the log odds of y for every 1-unit increase in x. If β = 0 there is no linear relationship between the log odds and x.

Parameter interpretation (ctd). Exponentiating both sides of the logit link function we get the following: = odds = exp(  0 +  1 X 1 ) = e  0 e  1X1 The odds increase multiplicatively by e β for every 1- unit increase in x. Whether the increase is greater than 1 or less than one depends on whether β >0 or β <0. The odds at X = x+1 are e β times the odds at X = x. Therefore, e β is an odds ratio!

Logistic regression model with a single categorical (≥ 2 levels) predictor logit (p i ) = log (odds) =  0 +  k X k where logit(p i )logit transformation of the probability of the event  0 intercept of the regression line  k difference between the logits for category k vs. the reference category

LOGISTIC and GENMOD procedures for a single categorical predictor PROC LOGISTIC DATA=dataset ; CLASS variables ; MODEL response=predictors ; OUTPUT OUT=SAS-data-set keyword=name ; RUN; PROC GENMOD DATA=dataset ; CLASS variables ; MAKE ‘OBSTATS’ OUT=SAS-data-set; MODEL response=predictors ; RUN;

Class statement in proc logistic SAS will create dummy variables for a categorical variable if you tell it to. We need to specify dummy coding by using the param = ref option in the class statement; we can also specify the comparison group by using the ref = option after the variable name. Using class automatically generates a test of significance for all parameters associated with the class variable (table of Type 3 tests); if you use dummy variables instead (more on this soon), you will not automatically get an “overall” test for that variable. We will see this more clearly in the SAS examples.

Reference category Each factor has as many parameters as categories, but one is redundant, so we need to specify a reference category. Similar concept to what you just learned for simple linear regression.

Interpretation of a single categorical parameter If your reference group is level 0, then the coefficient of β k represents the difference in the log odds between level k of your variable and level 0. Therefore, e β is an odds ratio for category k vs. the reference category of x.

Creating your own dummy variables and not using the class statement An equivalent model uses dummy variables (that you create), which accounts for redundancy by not including a dummy variable for your reference category. The choice of reference category is arbitrary. Remember, this method will not produce an “overall” test of significance for that variable.

Hypothesis testing Significance tests focuses on a test of H 0 : β = 0 vs. H a : β ≠ 0. The Wald, Likelihood Ratio, and Score test are used (we’ll focus on Wald method) Wald CI easily obtained, score and LR CI numerically obtained. For Wald, the 95% CI (on the log odds scale) is

95% CI for parameter Similarly, the Wald 95% CI for the odds ratio is obtained by exponentiation. The following yields the lower and upper 95% confidence limits: 1.96 corresponds to z 0.05/2, where z~N(0,1)

Hypothesis testing (ctd) The Wald statistic of the test H 0 : β = β 0 is Under H 0, the test statistic is asymptotically chi-sq. with 1 df (at α = 0.05, the critical value is 3.84).

References 1.Paul D. Allison, “Logistic Regression Using the SAS System: Theory and Application”, SAS Institute, Cary, North Carolina, Alan Agresti, “Categorical Data Analysis”, 2 nd Ed., Wiley Interscience, David W. Hosmer and Stanley Lemeshow “Applied Logistic Regression”, Wiley- Interscience, 2 nd Edition, 2000.