Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.

Slides:



Advertisements
Similar presentations
Statistical Analysis SC504/HS927 Spring Term 2008
Advertisements

Lesson 10: Linear Regression and Correlation
Objectives 10.1 Simple linear regression
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Departments of Medicine and Biostatistics
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Generalized Linear Models
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
STAT E-150 Statistical Methods
Regression and Correlation
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Survival Curves Marshall University Genomics Core.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 7 – T-tests Marshall University Genomics Core Facility.
Regression and Correlation Methods Judy Zhong Ph.D.
Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic.
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Assessing Survival: Cox Proportional Hazards Model
CHAPTER 14 MULTIPLE REGRESSION
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 8 – Comparing Proportions Marshall University Genomics.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Logistic and Nonlinear Regression Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Going from data to analysis Dr. Nancy Mayo. Getting it right Research is about getting the right answer, not just an answer An answer is easy The right.
Simple linear regression Tron Anders Moger
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Correlation & Regression Analysis
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Probability and odds Suppose we a frequency distribution for the variable “TB status” The probability of an individual having TB is frequencyRelative.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
DISCRIMINANT ANALYSIS. Discriminant Analysis  Discriminant analysis builds a predictive model for group membership. The model is composed of a discriminant.
Logistic Regression Logistic Regression - Binary Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10 – Correlation and linear regression: Introduction.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Comparing Models.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 16 : Summary Marshall University Genomics Core Facility.
Stats Methods at IC Lecture 3: Regression.
Statistics 103 Monday, July 10, 2017.
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Stats Club Marnie Brennan
CHAPTER 29: Multiple Regression*
Product moment correlation
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Presentation transcript:

Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression Marshall University Genomics Core Facility

Multiple Regression In linear regression, we had one independent variable, and one dependent (outcome) variable – In lab experiments, this is fairly common – The investigator manipulates the value of one variable and keeps everything else the same In some lab experiments, and in most observational studies, there is more than one independent variable – Multiple Regression is used for these scenarios – "Multiple Regression" really refers to a collection of different techniques Marshall University School of Medicine

Aims of Multiple Regression Quantifying the effect of one variable of interest while adjusting for the effects of other variables – Very common in observational studies – The other variables change outside of the control of the investigator – These other variables are often called covariates Creating an equation which is useful for predicting the value of the outcome variable given the values of the various independent variables – For example, predict the probability of cancer recurrence after surgery alone given characteristics of the tumor (grade, stage, etc) and of the patient (age, height, weight, etc) Might be used to decide whether or not to use chemotherapy in addition to surgery Developing a scientific understanding of the impact of several variables on the outcome Marshall University School of Medicine

Types of Multiple Regression We will look at the following types of multiple regression (there are many others): – Multiple Linear Regression The dependent variable is a linear function of the independent variables – Logistic Regression The outcome variable is binary (dichotomous, or categorical with two possible outcomes) The log odds ratio of the outcome is modeled as a function of the independent variables – Proportional Hazards Regression Proportional Hazards Regression is used when the outcome is the elapsed time to a non-recurring event It is effectively used to compute the effect of independent variables on a survival curve Marshall University School of Medicine

Multiple Linear Regression Multiple Linear Regression finds the linear equation which best predicts an outcome variable, Y, from multiple independent variables X 1, X 2,…, X k Example (from Motulsky): Lead Exposure and Kidney Function – Staessen et al. (1992) investigated the relationship between lead concentration in the blood and kidney function Kidney function measured by creatinine clearance – Observational study of 965 men – Naive approach would be to measure lead concentration and creatinine clearance and analyze just the two variables – However, kidney function is known to decrease with age, and lead accumulates in the blood over time Age is a confounding variable Must account for this Marshall University School of Medicine

Multiple Regression Model The model Staessen et al. used was Y i = β 0 + β 1 X i,1 + β 2 X i,2 + β 3 X i,3 + β 4 X i,4 + β 5 X i,5 + ε i where the variables are Marshall University School of Medicine VariableDescription YiYi Creatine clearance of subject i X i,1 log(serum lead) of subject i X i,2 Age of subject i X i,3 Body mass of subject i X i,4 log(GGT) of subject i (liver function) X i,5 1 if subject i had previously taken diuretics, 0 otherwise εiεi Random scatter

Multiple Regression Parameters The β in the equation for the model are the parameters of the model – Do not vary from data point to data point – Are values associated with the population – Will be estimated from the data Note that one of the variables (X i,5 ) is categorical, and we use a “dummy variable” in its place Marshall University School of Medicine

What multiple regression does Multiple linear regression finds values for the parameters that make the model predict the actual data as well as possible Estimates for β 0, … β 5 are usually denoted b 0 … b 5 Software performing the regression will report the best estimates for each parameter, a confidence interval and p-value for each estimate, and an R 2 value for the model Null hypotheses for the p-values are that the variable provides no information to the model, i.e. that the parameter is zero Marshall University School of Medicine

Interpreting the Coefficients The coefficients can be interpreted in a similar way to the slope estimate in simple linear regression – Represent the change in the dependent variable for one unit increase in the corresponding independent variable, keeping all the other independent variables fixed In the example, b 1 (estimate for log(lead concentration)) was -9.5 ml/min, with a 95% CI of [- 18.1, -0.9]. This means for every one unit increase in log(lead concentration), creatinine clearance decreased by - 9.5ml/min on average, if all other variables were kept fixed. Marshall University School of Medicine

Statistical Significance of the Coefficients One unit increase in log(lead concentration) means a 10 fold increase in lead concentration So the average decrease in creatinine clearance corresponding to a 10 fold increase in lead concentration was 9.5 ml/min, and the 95% confidence interval for the decrease was 0.9ml/min to 18.1ml/min. – Since the 95% CI does not contain 0, the p-value for this coefficient must be less than 0.05 This is the p-value for the null hypothesis that the coefficient is zero Alternatively think of this as a comparison of models: – Compare the full model (including this variable) to the model not including this variable Marshall University School of Medicine

Interpreting coefficients for “dummy variables” One of the variables in the model was really a binary variable – Has the subject previously taken diuretics? – Coded as 0 for no and 1 for yes Estimate for the coefficient for this variable was - 8.8ml/min – An increase in one unit for this variable results in a decrease in creatinine clearance of 8.8 ml/min, on average – Since the only values are 0 and 1, this means that participants who has previously taken diuretics had an average creatinine clearance 8.8 ml/min lower than those who had not, if all other variables are held equal Marshall University School of Medicine

Interpreting the R 2 value for the model Multiple linear regression reports an R2 value – For our example, R 2 is 0.27 This means that 27% of the variation in creatinine clearance is accounted for by the model The remaining 73% is due to random scatter, or is associated with variables not included in the model Unlike simple linear regression, we cannot plot a graph of the model One approach to visualizing the model is to plot the predicted outcome variable from the model against the actual measured value Marshall University School of Medicine

Multiple Linear Regression Plot Marshall University School of Medicine

Variable Selection The authors of the article collected much more data Stated that other variables did not improve the fit of the model Adding additional parameters will almost always increase the R 2 value – Should use the sum-of-squares F test explained earlier to test if there really is an improvement in the model – Beware of overfitting (explained later) Marshall University School of Medicine

Logistic Regression Logistic Regression is used when the outcome variable is binary – i.e. categorical with two possible outcomes The general idea is to build a multiple linear model with the outcome variable being the log of the odds ratio – i.e. we build a model predicting the log of the odds of one of the two outcomes from the independent variables – the parameters describe the difference in odds when the variables change by one unit Marshall University School of Medicine

Logistic Regression Example We performed chart reviews on 99 post- menopausal women Ran a logistic regression for an outcome of diabetes with age at menopause, smoking status, and BMI as independent variables Marshall University School of Medicine

Logistic Regression Results Marshall University School of Medicine

Interpreting Logistic Regression Results The "Model Summary" box describes how well the model fits the data. – -2 Log likelihood is computed from the likelihood of our observed data given the model. Since likelihood must be between 0 and 1, this is always positive and a small value means a better fit. (Our data do not fit the model well.) R 2 cannot be calculated in the same way for logisitic regression. The remaining two values give two alternate approaches, and the interpretation for these is similar to a regular R 2. Again, our data do not fit the model well. The "Classification Table" describes the accuracy of using the model as a predictor. Use the independent variables to compute the predicted odds, and predict the class based on the most likely Note that adding more variables will always improve the accuracy; this should really be tested on an independent data set Marshall University School of Medicine

Interpreting the Logistic Regression Parameters The "Variables in the Equation" box gives the parameter estimates, 95% CIs, and p-values The parameter for Smoking is This means that a one-unit increase in the smoking variable results in an increase in the log odds ratio of Logs here are natural logs; so the increase in odds ratio is e =3.335 fold This is a dummy variable, so a smoker has about 3.3 times the odds of becoming diabetic than a non-smoker The parameter for BMI is 0.072; e =1.075, so an increase of one unit in BMI results in a fold increase in the odds ratio of being diabetic. The p-values and 95% CIs show that the parameter for smoking is significant at a significance level of BMI has a p-value of Marshall University School of Medicine

Mathematical Model for Logistic Regression The mathematical setup for logistic regression is: log(OR i ) = β 0 + X i,1 β 1 + … + X i,k β k where the variables are OR : Odds ratio for subject i X i,j : Value of variable j for subject i For our model, the estimates give log(OR) = S B OR = e S B OR = e e S e B = x S x B Marshall University School of Medicine

Proportional Hazards Regression Proportional Hazards Regression is used when the outcome is elapsed time to a non- recurring event – i.e. the same basic scenario as for survival analysis – We previously compared two groups for different survival rates using the Mantel-Cox test – Computed hazard ratio between the two groups Marshall University School of Medicine

Proportional Hazards extends Mantel- Cox test In Proportional Hazards regression, we estimate the effect of multiple factors on the hazard ratio Can be used to correct the hazard ratio for confounding variables Short et al. (2012) compared survival curves for two different treatments of COPD – Computed a “crude” hazard ratio using Mantel-Cox, and then a hazard ratio corrected for covariates (confounding variables) Marshall University School of Medicine

Summary Multiple Linear Regression fits a dependent variable as a linear model of multiple independent variables – Provides parameter estimates for each independent variable, along with confidence intervals and p-values – The null hypothesis for the p-value is that the variable doesn't contribute to the model – Used for finding the effect of a variable while correcting for confounding variables Logistic regression is used when the dependent variable is binary – Models the log odds ratio as a linear function of the dependent variables – Parameters are the increase in log odds ratio per unit increase in the independent variable Marshall University School of Medicine