April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.

Slides:



Advertisements
Similar presentations
Statistical Analysis SC504/HS927 Spring Term 2008
Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Brief introduction on Logistic Regression
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Logistic Regression.
Simple Logistic Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Multiple regression analysis
Chapter 11 Survival Analysis Part 3. 2 Considering Interactions Adapted from "Anderson" leukemia data as presented in Survival Analysis: A Self-Learning.
PH6415 Review Questions. 2 Question 1 A journal article reports a 95%CI for the relative risk (RR) of an event (treatment versus control as (0.55, 0.97).
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
EPI 809/Spring Multiple Logistic Regression.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
1 Chapter 16 Linear regression is a procedure that identifies relationship between independent variables and a dependent variable.Linear regression is.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Simple Linear Regression Analysis
Generalized Linear Models
Logistic regression for binary response variables.
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
AS 737 Categorical Data Analysis For Multivariate
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data.
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
April 11 Logistic Regression –Modeling interactions –Analysis of case-control studies –Data presentation.
EIPB 698E Lecture 10 Raul Cruz-Cano Fall Comments for future evaluations Include only output used for conclusions Mention p-values explicitly (also.
Assessing Survival: Cox Proportional Hazards Model
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
1 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה פרופ’ בנימין רייזר פרופ’ דוד פרג’י גב’ אפרת ישכיל.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Multiple Logistic Regression STAT E-150 Statistical Methods.
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
Statistical Inference Drawing conclusions (“to infer”) about a population based upon data from a sample. Drawing conclusions (“to infer”) about a population.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Logistic regression (when you have a binary response variable)
1 Say good things, think good thoughts, and do good deeds.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Nonparametric Statistics
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
Analysis of matched data Analysis of matched data.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
March 28 Analyses of binary outcomes 2 x 2 tables
BINARY LOGISTIC REGRESSION
CHAPTER 7 Linear Correlation & Regression Methods
Notes on Logistic Regression
Multiple logistic regression
ביצוע רגרסיה לוגיסטית. פרק ה-2
Categorical Data Analysis Review for Final
Introduction to Logistic Regression
Presentation transcript:

April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F

HRT Use and Polyps Case (Polyps)Control (No Polyps) HRT Use RO = 72/ /114 = 0.46 No HRT Use 247 RO HRT Use (Case v Control)    463 ) (RO) 2  174) (289) (247) (216) =

Inference for binary data Relative risk, odds ratios, 2x2 tables are limited –Can’t adjust for many confounders –Limited to categorical predictors –Can’t look at multiple variables simultaneously Logistic regression –Adjust for many confounders –Study continuous predictors –Model interactions

Linear regression model Y =  o +  1 X 1 +  2 X  p X p Y = dependent variable X i = independent variables Y is continuous, normally distributed Model the mean response (Y) based on the predictors   is mean of Y when all Xs are 0   is increase in mean of Y for increase in 1 unit of X

New regression model? Y?=  o +  1 X 1 +  2 X  p X p Y = binary outcome (0 or 1) X i = independent variables Would like to use this type of model for a binary outcome variable

Draw a line ?

What if you had multiple observations at each Score (or you grouped scores) ScoreProportion Dying < 101/10 = /15 = /15 = /16 = 0.50 * * * *

Possibilities for Y Y?=  o +  1 X 1 +  2 X  p X p Y = probability of Y = 1 (Problem: Y bound by 0 -1) Y = odds of Y = 1 Y = log (odds of Y = 1) – Has good properties

Probability, Odds, Log Odds  Odds (  Log (Odds) Bound by 0 -1 Extreme Values Less extreme values and symmetric about  =0.5

Nearly a straight line for middle values of P

Logistic regression equation Model log odds of outcome as a linear function of one or more variables X i = predictors, independent variables The model is:

A Little Math The natural LOG and exponential (EXP) functions are inverse functions of each other –LOG (a) = bEXP (b) = a –LOG (1) = 0EXP(0) = 1 –LOG (.5) = EXP(-.693) =.5 –LOG (1.5) =.405EXP(.405) = 1.5 These will be logistic regression betasThese will be the odds ratios Note: Calculators and Excel use LN for natural logarithm

A Little Math LOG function –Takes values [ 0 to +infinity] [-infinity to +infinity] EXP function –Takes values [ -infinity to infinity] [0 to +infinity]

A Little Math Properties of LOG function –log (a*b) = log (a) + log (b) –log (a/b) = log (a) – log (b) Properties of EXP function –exp (a+b) = exp(a) * exp(b) –exp (a-b) = exp(a)/exp(b) Differences in log odds Odds Ratios

(ODDS)

These will be typical betas from the logistic regression model These will be the odds ratios

Logistic regression – single binary covariate We need to use a dummy variable to code for men and women x = 1 for women, 0 for men What do the betas mean? What is odds ratio, women versus men? The model is:

Odds for Men and Women For men; For women; After some algebra, the odds ratio is equal to;   is difference in log odds between men and women

Example - risk of CVD for men vs. women log(odds) =  0 +  1 x = *x For females; log(odds) = (1) = For males; log(odds) = (0) = exp(  1 ) = odds ratio for women vs. men Here, exp(  1 ) = exp( ) = 0.35 Women are at a 65% lower risk of the outcome than men (OR<1) Dif =

Note Odds ratio from 2 x 2 table EXP (  ) from logistic regression for binary risk factor These will be equal

Multiple logistic regression model log(odds) =  o +  1 X 1 +  2 X  p X p log(odds) = logarithm of the odds for the outcome, dependent variable X i = predictors, independent variables  i - log(OR) associated with either exposure (for categorical predictors) a 1 unit increase in predictor (for continuous) OR adjusted for other variables in model

Interpretation of coefficients - continuous predictors Example - effect of age on risk of death in 10 years log(odds) = *age  0 = ,  1 = exp(  1 ) = exp(0.1026) = A one year increase in age is associated with an odds ratio of death of (assumption that this is true for any 2 consecutive ages) This is an increase of approximately 11% (= )

Interpretation of coefficients - continuous predictors What about a 5 year increase in age? Multiply coefficient by the change you want to look at; exp(5*  1 ) = exp(5*0.1026) = 1.67 A five year increase in age is associated with an odds ratio of death of 1.67 This is an increase of 67% Note: exp(5*  1 ) does not equal 5*exp(  1 )

Parameter Estimation How do we come up with estimates for  i ? Can’t use least squares since outcome is not continuous Use Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation Choose parameter estimates that maximize the probability of observing the data you observed. Example for estimation a proportion  –Observe 7/10 have characteristic –P = 0.70 is estimate  –P = 0.70 is MLE of  Why?) –Which value of  maximizes the probability of getting 7 of 10? –Answer: 0.70

MLE Simple Example Wish to estimate a proportion  Sample n = 2 –Observe 1 of 2 have characteristic –L =  –What value of  maximizes L? –Answer:  = 0.5 which is p=1/2

Fitted regression line Curve based on:  o effects location  1 effects curvature

Inference for multiple logistic regression Collect data, choose model, estimate  o and  i s Describe odds ratios, exp(  i ), in statistical terms. –How confident are we of our estimate? –Is the odds ratio is different from one due to chance? Not interested in inference for  o (related to overall probability of outcome)

Confidence Intervals for logistic regression coefficients General form of 95% CI: Estimate ± 1.96*SE –B i estimate, provided by SAS –SE is complicated, provided by SAS Related to variability of our data and sample size

95% Confidence Intervals for the odds ratio Based on transforming the 95% confidence interval for the parameter estimates Supplied automatically by SAS Look to see if interval contains 1 “We have a statistically significant association between the predictor and the outcome controlling for all other covariates” Equivalent to a hypothesis test; reject Ho: OR = 1 at alpha = Based on whether or not 1 is in the interval

Hypothesis test for individual logistic regression coefficient Null and alternative hypotheses –Ho :  i = 0, Ha:  i  0 Test statistic:  2 = (  i / SE) 2, supplied by SAS p-values are supplied by SAS If p<0.05, “there is a statistically significant association between the predictor and outcome variable controlling for all other covariates” at alpha = 0.05

PROC LOGISTIC PROC LOGISTIC DATA = dataset ; MODEL outcome = list of x variables; RUN; CLASS statement allows for categorical variables with many groups (>2)

DATA temp; INPUT apache death ; xdeath = 2; if death = 1 then xdeath = 1; DATALINES; ; PROC LOGIST DATA=temp; MODEL xdeath = apache; RUN;

The LOGISTIC Procedure Model Information Data Set WORK.TEMP Response Variable xdeath Number of Response Levels 2 Number of Observations 39 Model binary logit Optimization Technique Fisher's scoring Response Profile Ordered Total Value xdeath Frequency Probability modeled is xdeath=1.

The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept apache Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits apache EXP(0.2034) EXP( – 1.96*.0605) EXP( *.0605)

TOMHS – bpstudy sas dataset Variable CLINICAL (1=yes, 0 =no) indicates whether patient had a CVD event Run logistic regression separately for age and gender to determine if: –Age is related to CVD What is the odds associated with a 1 year increase in age What is the odds associated with a 5 year increase in age –Gender is related to CVD What is the odds of CVD (women versus men) Run logistic regression for age and gender together Note: Download dataset from web-page or use dataset on SATURN