Multiple Linear Regression

Slides:



Advertisements
Similar presentations
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Advertisements

Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of  is constant.
Inference for Regression
Simple Logistic Regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
EPI 809/Spring Probability Distribution of Random Error.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Chapter 13 Multiple Regression
Multiple regression analysis
Descriptive Statistics In SAS Exploring Your Data.
Chapter 12 Multiple Regression
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
Regression Diagnostics Using Residual Plots in SAS to Determine the Appropriateness of the Model.
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
Plots, Correlations, and Regression Getting a feel for the data using plots, then analyzing the data with correlations and linear regression.
Mean Comparison With More Than Two Groups
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Two-Way ANOVA in SAS Multiple regression with two or
Lecture 24: Thurs., April 8th
More Linear Regression Outliers, Influential Points, and Confidence Interval Construction.
Business Statistics - QBM117 Interval estimation for the slope and y-intercept Hypothesis tests for regression.
Ch. 14: The Multiple Regression Model building
Analysis of Variance & Multivariate Analysis of Variance
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Linear regression Brian Healy, PhD BIO203.
Simple Linear Regression Analysis
Factorial Designs - 1 Intervention studies with 2 or more categorical explanatory variables leading to a numerical outcome variable are called Factorial.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Lecture 15 Basics of Regression Analysis
Chapter 13: Inference in Regression
Chapter 14 Introduction to Multiple Regression Sections 1, 2, 3, 4, 6.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
EIPB 698E Lecture 10 Raul Cruz-Cano Fall Comments for future evaluations Include only output used for conclusions Mention p-values explicitly (also.
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Regression Examples. Gas Mileage 1993 SOURCES: Consumer Reports: The 1993 Cars - Annual Auto Issue (April 1993), Yonkers, NY: Consumers Union. PACE New.
Regression For the purposes of this class: –Does Y depend on X? –Does a change in X cause a change in Y? –Can Y be predicted from X? Y= mX + b Predicted.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Exam 1 Review. Data referenced throughout review The Federal Trade Commission annually rates varieties of domestic cigarettes according to their tar,
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Multiple Regression BPS chapter 28 © 2006 W.H. Freeman and Company.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Lecture 4 Introduction to Multiple Regression
Data Analysis.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Testing Significance of coefficients Usually, first examination of model Does the model including the independent variable provide significantly more information.
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.
Multiple Regression Equations
Regression Analysis.
6-1 Introduction To Empirical Models
Presentation transcript:

Multiple Linear Regression Linear regression with two or more predictor variables

Introduction Often in linear regression, you want to investigate the relationship between more than one predictor variable and some outcome. In this case, your model will contain more than one independent variable. It is also often important to investigate a possible interaction between two or more independent variables.

Consider the following situation: The file air.txt contains a subsample of data from a study of the effect of air pollution on lung function. The variables measured were age, gender, height, weight, forced vital capacity (FVC), and forced expiratory volume in 1 second (FEV1). FVC is the total volume of air in liters which an individual can expel regardless of how long it takes. FEV1 is the volume of air expelled during the first second when an individual has been told to breath in deeply and then expel as much air as possible. (Dunn and Clark (1987), Applied Statistics: Analysis of Variance and Regression, p.354.)

Input the file air.txt into SAS with the following code (adjusting the location of the file as necessary): DATA air; INFILE ‘C:\air.txt' dlm = ' ' firstobs = 2; INPUT sex age height weight fvc fev1; height_age = height*age; RUN; “Height_age” creates a new variable which represents the interaction between height and age.

Exploring the Data We are interested in what factors may predict FVC. In order to explore this before analyzing the data, create two plots: one of FVC vs. height; the other of FVC vs. age: PROC GPLOT DATA = air; PLOT fvc * height; PLOT fvc * age; RUN;

Plot of FVC * Height

Plot of FVC * Age

It appears a linear relationship is justified between FVC and height, although it is unclear whether a linear relationship exists between FVC and age. Create a multiple linear regression model using both height and age to predict FVC: PROC REG DATA = air; MODEL fvc = height age; RUN; QUIT;

Multiple Regression Output

Interpreting Output The multiple regression equation is: Yhat = -6.67 + 0.18(height) – 0.03(age) The R-Square value is interpreted the same as with simple linear regression: 67% of the variance in FVC is explained by height and age in the model. Because the model includes more than one predictor variable, you may want to consider using the adjusted R2 (Adj R-Sq) value instead of the R-Square for interpreting amount of variance explained by the independent variables.

Overall F-test To test whether all of the variables taken together significantly predict the outcome variable (FVC), use the overall F-test. The test statistic (F* = 36.96) is found under F Value. The associated pvalue (<0.001) is found under Pr > F. Ho: β1 = β2 = 0 vs. Ha: At least one β ≠ 0. Because the p-value is less than 0.05, we reject the null hypothesis and conclude that taken together, height and age are significantly related to FVC.

Testing Significance of One Variable To test the significance of an individual variable in predicting FVC, use the test statistic (t Value) and associated pvalue for that particular variable (Pr > |t|). For example, the test of whether height is significantly related to FVC [Ho: β1 = 0 vs. Ha: β1 ≠ 0], has t* = 8.15, p < 0.0001. Reject the null hypothesis and conclude that height is significantly related to FVC.

Testing for an Interaction Because we have more than one predictor variable, it is important to consider whether they interact in some way. To test whether the interaction between height and age is significant, create another model in SAS that contains both the main effects of height and age as well as the interaction term you created: PROC REG DATA = air; MODEL fvc = height age height_age; RUN; QUIT;

Output with Interaction Term

Is the interaction significant? Notice that the pvalue for the interaction is 0.39, which is greater than 0.05. Therefore, the interaction between age and height is not significant, and we do not need to include it in the model. Additionally, notice that the R-Square is 0.679, indicating that 68% of the variability in FVC is explained by height, age and height_age. This number is not much larger than the R-Square from the model with just height and age. This also is a good indicator that the interaction term is not necessary. The final model only needs to include height and age predicting FVC.

Conclusions Multiple Linear Regression in SAS is very similar to Simple Linear Regression. The major difference is that more variables are added to the model statement, and interaction terms need to be considered. Use the same options (clb, cli, clm) for creating confidence intervals in SAS and determining outliers (r) and influential points (influence).

Linear Regression is used with continuous outcome variables Linear Regression is used with continuous outcome variables. If the outcome variable of interest is categorical, logistic regression analysis is used. The next tutorial is an introduction to logistic regression.