Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic.

Slides:



Advertisements
Similar presentations
Comparing Two Proportions (p1 vs. p2)
Advertisements

Logistic Regression Psy 524 Ainsworth.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Logistic Regression.
Simple Logistic Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Introduction to Categorical Data Analysis
Logistic Regression STA302 F 2014 See last slide for copyright information 1.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
BIOST 536 Lecture 9 1 Lecture 9 – Prediction and Association example Low birth weight dataset Consider a prediction model for low birth weight (< 2500.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
Correlation and Regression Analysis
Generalized Linear Models
Linear Regression and Correlation Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of.
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Regression and Correlation
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Presenting Statistical Aspects of Your Research Analysis of Factors Associated with Pre-term Births in North Carolina.
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
Education 795 Class Notes Applied Research Logistic Regression Note set 10.
Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data.
Simple Linear Regression
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
Multiple Linear Regression. Multiple Regression In multiple regression we have multiple predictors X 1, X 2, …, X p and we are interested in modeling.
Assessing Survival: Cox Proportional Hazards Model
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
N318b Winter 2002 Nursing Statistics Specific statistical tests: Regression Lecture 11.
Logistic and Nonlinear Regression Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Interpreting multivariate OLS and logit coefficients Jane E. Miller, PhD.
Lecture 12: Cox Proportional Hazards Model
Tim Wiemken PhD MPH CIC Assistant Professor Division of Infectious Diseases University of Louisville, Kentucky Confounding.
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Multiple Regression  Similar to simple regression, but with more than one independent variable R 2 has same interpretation R 2 has same interpretation.
1 Chapter 16 logistic Regression Analysis. 2 Content Logistic regression Conditional logistic regression Application.
Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the.
Logistic Regression Analysis Gerrit Rooks
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Probability and odds Suppose we a frequency distribution for the variable “TB status” The probability of an individual having TB is frequencyRelative.
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
Analysis of matched data Analysis of matched data.
Logistic Regression Logistic Regression - Binary Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Nonparametric Statistics
Logistic Regression APKC – STATS AFAC (2016).
Notes on Logistic Regression
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Nonparametric Statistics
Logistic Regression.
Introduction to Logistic Regression
Logistic Regression.
Case-control studies: statistics
Presentation transcript:

Logistic Regression

Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic Regression The logistic functionThe logistic function Interpretation of coefficientsInterpretation of coefficients continuous predictor (X)continuous predictor (X) dichotomous categorical predictor (X)dichotomous categorical predictor (X) categorical predictor with three or more levels (X)categorical predictor with three or more levels (X) Multiple Logistic RegressionMultiple Logistic Regression ExamplesExamples

Simple Linear Regression Model the mean of a numeric response Y as a function of a single predictor X, i.e. E(Y|X) =  o +  1 f(x) Here f(x) is any function of X, e.g. f(x) = X  E(Y|X) =  o +  1 X (line) f(x) = X  E(Y|X) =  o +  1 X (line) f(x) = ln(X)  E(Y|X) =  o +  1 ln(X) (curved) f(x) = ln(X)  E(Y|X) =  o +  1 ln(X) (curved) The key is that E(Y|X) is a linear in the parameters  o and  1 but not necessarily in X.

 0 = Estimated Intercept  1 = Estimated Slope w units 00  1 w units Simple Linear Regression Interpretable only if x = 0 is a value of particular interest. Always interpretable! = -value at x = 0 = Change in for every unit increase in x x 0 ^ ^ ^ ^ = estimated change in the mean of Y for a unit change in X.

Multiple Linear Regression We model the mean of a numeric response as linear combination of the predictors themselves or some functions based on the predictors, i.e. E(Y|X) =  o +  1 X 1 +  2 X 2 +…+  p X p E(Y|X) =  o +  1 X 1 +  2 X 2 +…+  p X p Here the terms in the model are the predictors E(Y|X) =  o +  1 f 1 (X)+  2 f 2 (X)+…+  k f k (X) Here the terms in the model are k different functions of the p predictors

Multiple Linear Regression For the classic multiple regression model E(Y|X) =  o +  1 X 1 +  2 X 2 +…+  p X p the regression coefficients (  i ) represent the estimated change in the mean of the response Y associated with a unit change in X i while the other predictors are held constant. They measure the association between Y and X i adjusted for the other predictors in the model.

General Linear Models Family of regression modelsFamily of regression models Response Model Type ContinuousLinear regression CountsPoisson regression Survival timesCox model BinomialLogistic regression UsesUses Control for potentially confounding factorsControl for potentially confounding factors Model building, risk predictionModel building, risk prediction

Logistic Regression Models relationship between set of variables X iModels relationship between set of variables X i dichotomous (yes/no, smoker/nonsmoker,…)dichotomous (yes/no, smoker/nonsmoker,…) categorical (social class, race,... )categorical (social class, race,... ) continuous (age, weight, gestational age,...)continuous (age, weight, gestational age,...) and dichotomous categorical response variable Ydichotomous categorical response variable Y e.g. Success/Failure, Remission/No Remission Survived/Died, CHD/No CHD, Low Birth Weight/Normal Birth Weight, etc… e.g. Success/Failure, Remission/No Remission Survived/Died, CHD/No CHD, Low Birth Weight/Normal Birth Weight, etc…

Logistic Regression Example: Coronary Heart Disease (CD) and Age In this study sampled individuals were examined for signs of CD (present = 1 / absent = 0) and the potential relationship between this outcome and their age (yrs.) was considered. … This is a portion of the raw data for the 100 subjects who participated in the study.

Logistic Regression How can we analyze these data?How can we analyze these data? The mean age of the individuals with some signs of coronary heart disease is years vs years for individuals without signs (t = 5.95, p <.0001). Non-pooled t-test

Logistic Regression Simple Linear Regression?Smooth Regression Estimate? The smooth regression estimate is “S-shaped” but what does the estimated mean value represent? Answer: P(CD|Age)!!!!

Logistic Regression We can group individuals into age classes and look at the percentage/proportion showing signs of coronary heart disease. Notice the “S-shape” to the estimated proportions vs. age.

Logistic Function X P(“Success”|X)

Logit Transformation The logistic regression model is given by which is equivalent to This is called the Logit Transformation

Dichotomous Predictor Consider a dichotomous predictor (X) which represents the presence of risk (1 = present) Consider a dichotomous predictor (X) which represents the presence of risk (1 = present) Therefore the odds ratio (OR)

Dichotomous Predictor Therefore, for the odds ratio associated with risk presence we haveTherefore, for the odds ratio associated with risk presence we have Taking the natural logarithm we haveTaking the natural logarithm we have thus the estimated regression coefficient associated with a 0-1 coded dichotomous predictor is the natural log of the OR associated with risk presence!!! thus the estimated regression coefficient associated with a 0-1 coded dichotomous predictor is the natural log of the OR associated with risk presence!!!

Logit is Directly Related to Odds The logistic model can be written This implies that the odds for success can be expressed as This relationship is the key to interpreting the coefficients in a logistic regression model !!

Dichotomous Predictor (+1/-1 coding) Consider a dichotomous predictor (X) which represents the presence of risk (1 = present) Consider a dichotomous predictor (X) which represents the presence of risk (1 = present) Therefore the odds ratio (OR)

Dichotomous Predictor Therefore, for the odds ratio associated with risk presence we haveTherefore, for the odds ratio associated with risk presence we have Taking the natural logarithm we haveTaking the natural logarithm we have thus twice the estimated regression coefficient associated with a +1 / -1 coded dichotomous predictor is the natural log of the OR associated with risk presence!!! thus twice the estimated regression coefficient associated with a +1 / -1 coded dichotomous predictor is the natural log of the OR associated with risk presence!!!

Example: Age at 1 st Pregnancy and Cervical Cancer Use Fit Model Y = Disease Status X = Risk Factor Status X = Risk Factor Status When the response Y is a dichotomous categorical variable the Personality box will automatically change to Nominal Logistic, i.e. Logistic Regression will be used. Remember when a dichotomous categorical predictor is used JMP uses +1/-1 coding. If you want you can code them as 0-1 and treat is as numeric.

Example: Age at 1 st Pregnancy and Cervical Cancer Thus the estimated odds ratio is Women whose first pregnancy is at or before age 25 have 3.37 times the odds for developing cervical cancer than women whose 1 st pregnancy occurs after age 25.

Example: Age at 1 st Pregnancy and Cervical Cancer Thus the estimated odds ratio is Risk Present Odds Ratio for disease associated with risk presence

Example: Smoking and Low Birth Weight Use Fit Model Y = Low Birth Weight (Low, Norm) X = Smoking Status (Cig, NoCig) X = Smoking Status (Cig, NoCig) We estimate that women who smoker during pregnancy have 1.95 times higher odds for having a child with low birth weight than women who do not smoke cigarettes during pregnancy.

Example: Smoking and Low Birth Weight Find a 95% CI for OR 1 st Find a 95% CI for   2 nd Compute CI for OR = (e 2LCL, e 2UCL )  (LCL,UCL) We estimate that the odds for having a low birth weight infant are between 1.86 and 2.05 times higher for smokers than non-smokers, with 95% confidence.

Example: Smoking and Low Birth Weight We might want to adjust for other potential confounding factors in our analysis of the risk associated with smoking during pregnancy. This is accomplished by simply adding these covariates to our model. Multiple Logistic Regression Model Before looking at some multiple logistic regression examples we need to look at how continuous predictors and categorical variables with 3 or levels are handled in these models and how associated OR’s are calculated.

Example 2: Signs of CD and Age Fit Model Y = CD (CD if signs present, No otherwise) X = Age (years) X = Age (years) Consider the risk associated with a c year increase in age.

Example 2: Signs of CD and Age For example consider a 10 year increase in age, find the associated OR for showing signs of CD, i.e. c = 10 OR = e c  = e 10*.111 = 3.03 Thus we estimate that the odds for exhibiting signs of CD increase threefold for each 10 years of age. Similar calculations could be done for other increments as well. For example for a c = 1 year increase OR = e  = e.111 = 1.18 or an 18% increase in odds per year

Example 2: Signs of CD and Age Can we assume that the increase in risk associated with a c unit increase is constant throughout one’s life? Is the increase going from 20  30 years of age the same as going from 50  60 years? If that assumption is not reasonable then one must be careful when discussing risk associated with a continuous predictor.

Example 3: Race and Low Birth Weight Calculate the odds for low birth weight for each race (Low, Norm) White Infants (reference group, missing in parameters) Black Infants Other Infants OR for Blacks vs. Whites =.167/.0805 = OR for Others vs. Whites =.102/.0805 = OR for Black vs. Others =.167/.102 = 1.637

Example 3: Race and Low Birth Weight Finding these directly using the estimated parameters is cumbersome. JMP will compute the Odds Ratio for each possible comparison and their reciprocals in case those are of interest as well. Odds Ratio column is odds for Low for Level 1 vs. Level 2. Reciprocal is odds for Low for Level 2 vs. Level 1. These are the easiest to interpret here as they represent increased risk.

Putting it all together Now that we have seen how to interpret each of the variable types in a logistic model we can consider multiple logistic regression models with all these variable types included in the model. Now that we have seen how to interpret each of the variable types in a logistic model we can consider multiple logistic regression models with all these variable types included in the model. We can then look at risk associated with certain factors adjusted for the other covariates included in the model.

Example 3: Smoking and Low Birth Weight Consider again the risk associated with smoking but this time adjusting for the potential confounding effects of education level and age of the mother & father, race of the child, total number of prior pregnancies, number children born alive that are now dead, and gestational age of the infant.Consider again the risk associated with smoking but this time adjusting for the potential confounding effects of education level and age of the mother & father, race of the child, total number of prior pregnancies, number children born alive that are now dead, and gestational age of the infant. Several terms are not statistically significant and could consider using backwards elimination to simplify the model.

Example 3: Race and Low Birth Weight None of the mother and farther related covariates entered into the final model. Adjusting for the included covariates we find smoking is statistically significant (p <.0001) Adjusting for the included covariates we find the odds ratio for low birth weight associated with smoking during pregnancy is Odds Ratios for the other factors in the model can be computed as well. All of which can be prefaced by the “adjusting for…” statement.

Summary In logistic regression the response (Y) is a dichotomous categorical variable.In logistic regression the response (Y) is a dichotomous categorical variable. The parameter estimates give the odds ratio associated the variables in the model.The parameter estimates give the odds ratio associated the variables in the model. These odds ratios are adjusted for the other variables in the model.These odds ratios are adjusted for the other variables in the model. One can also calculate P(Y|X) if that is of interest, e.g. given demographics of the mother what is the estimated probability of her having a child with low birth weight.One can also calculate P(Y|X) if that is of interest, e.g. given demographics of the mother what is the estimated probability of her having a child with low birth weight.