Regression: Checking the Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Slides:



Advertisements
Similar presentations
Simple linear models Straight line is simplest case, but key is that parameters appear linearly in the model Needs estimates of the model parameters (slope.
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Lesson 10: Linear Regression and Correlation
Correlation and regression
Forecasting Using the Simple Linear Regression Model and Correlation
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Regression analysis Linear regression Logistic regression.
Correlation and Regression By Walden University Statsupport Team March 2011.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Objectives (BPS chapter 24)
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Lecture 6: Multiple Regression
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Chapter Topics Types of Regression Models
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Simple Linear Regression Analysis
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Chapter 7 Forecasting with Simple Regression
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Analysis
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Inference for regression - Simple linear regression
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
Assessing Survival: Cox Proportional Hazards Model
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Entering Multidimensional Space: Multiple Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
© The McGraw-Hill Companies, Inc., Chapter 10 Correlation and Regression.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Stats Methods at IC Lecture 3: Regression.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Chapter 13 Simple Linear Regression
Chapter 20 Linear and Multiple Regression
Inference for Least Squares Lines
Multiple Regression Prof. Andy Field.
AP Statistics Chapter 14 Section 1.
Statistics for Managers using Microsoft Excel 3rd Edition
Stats Club Marnie Brennan
CHAPTER 29: Multiple Regression*
Simple Linear Regression
Regression Statistics
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
3.2. SIMPLE LINEAR REGRESSION
Presentation transcript:

Regression: Checking the Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research

Objectives of session Recognise the need to check fit of the model Recognise the need to check fit of the model Carry out checks of assumptions in SPSS for simple linear regression Carry out checks of assumptions in SPSS for simple linear regression Understand predictive model Understand predictive model Understand residuals Understand residuals

How is the fitted line obtained? Use method of least squares (LS) Seek to minimise squared vertical differences between each point and fitted line Results in parameter estimates or regression coefficients of slope (b) and intercept (a) – y=a+bx

Consider Fitted line of y = a +bx Explanatory (x) Dependent (y) a

Consider the regression of age on minimum LDL cholesterol achieved Select Regression Select Regression Linear…. Linear…. Dependent (y) – Min LDL achieved Dependent (y) – Min LDL achieved Independent (x) - Age_Base Independent (x) - Age_Base

N.B may look very small but represents: The DECREASE in LDL achieved for each increase in one unit of age i.e. ONE year Output from SPSS linear regression Coefficients a ModelUnstandardized CoefficientsStandardized Coefficients BStd. ErrorBetatsig 1(Constant) Age at baseline a. Dependent Variable: Min LDL achieved

H 0 : slope b = 0 Test t = slope/se = /0.002 = with p<0.001, so statistically significant Predicted LDL = xAge Output from SPSS linear regression Coefficients a ModelUnstandardized CoefficientsStandardized Coefficients BStd. ErrorBetatsig 1(Constant) Age at baseline a. Dependent Variable: Min LDL achieved

Predicted LDL achieved = xAge So for a man aged 65 the predicted LDL achieved = – 0.008x 65 = Prediction Equation from linear regression AgePredicted Min LDL

Assumptions of Regression 1. Relationship is linear 2. Outcome variable and hence residuals or error terms are approx. Normally distributed

Use Graphs and Scatterplot to obtain the Lowess line of fit

1.Create Scatterplot and then double-click to enter chart editor 2.Chose Icon ‘Add fit line at total’ 3.Then select type of fit such as Lowess

Linear assumption: Fitted lowess smoothed line Lowess smoothed line (red) gives a good eyeball examination of linear assumption (green)

Definition of a residual A residual is the difference between the predicted value (fitted line) and the actual value or unexplained variation r i = y i – E ( y i ) Or r i = y i – ( a + bx )

Residuals

To assess the residuals in SPSS linear regression, select plots….. Normalised or standardised predicted value of LDL Normalised residual Select histogram of residuals and normal probability plot

In SPSS linear regression, select Statistics….. Select confidence intervals for regression coefficients Model fit Select Durbin- Watson for serial correlation and identification of outliers

Output: Scatterplot of residuals vs. predicted Note 1)Mean of residuals = 0 2)Most of data lie within + or -3 SDs of mean

Assumptions of Regression 1. Relationship is linear 2. Outcome variable and hence residuals or error terms are approx. Normally distributed

Plot of residuals with normal curve super- imposed Output: Histogram of standardised residuals

Output: Cumulative probability plot Look for deviation from diagonal line to indicate non- normality

Output: Description of residuals Subjects with standardised residuals > 3 Descriptive statistics for residuals Worth investigation? Casewise Diagnostics(a) Case NumberStd. ResidualMin LDL Predicted Residual a. Dependent Variable: Min LDL achieved

R – correlation between min LDL achieved and Age at baseline, here R 2 - % variation explained, here 1.5%, not particularly high Durbin-Watson test - serial correlation of residuals should be approximately 2 if no serial correlation Output: Model fit and serial correlation Model Summary ModelRR SquareAdjusted R SquareStd. Error of the Estimate Durbin-Watson a a. Predictors: (Constant), Age at baseline

Summary After fitting any regression model check assumptions - Functional form – linearity is default, often not best fit, consider quadratic… Functional form – linearity is default, often not best fit, consider quadratic… Check Residuals for approx. normality Check Residuals for approx. normality Check Residuals for outliers (> 3 SDs) Check Residuals for outliers (> 3 SDs) All accomplished within SPSS All accomplished within SPSS

Practical on Model Checking Read in ‘LDL Data.sav’ 1) Fit age squared term in min LDL model and check fit of model compared to linear fit (Hint: Use transform/compute to create age squared term and fit age and age 2 ) 2) Fit separate linear regressions with min Chol achieved with predictors of 1) baseline Chol 2) APOE_lin 3) adherence Check assumptions and interpret results