28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.

Slides:



Advertisements
Similar presentations
Chapter 12 Inference for Linear Regression
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Objectives (BPS chapter 24)
Chapter 13 Multiple Regression
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Chapter 12 Multiple Regression
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Ch. 14: The Multiple Regression Model building
Correlation and Regression Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression Analysis
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Correlation & Regression
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Inference for regression - Simple linear regression
BPS - 3rd Ed. Chapter 211 Inference for Regression.
AP Statistics Section 15 A. The Regression Model When a scatterplot shows a linear relationship between a quantitative explanatory variable x and a quantitative.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
AP Statistics Chapter 15 Notes. Inference for a Regression Line Goal: To determine if there is a relationship between two quantitative variables. Goal:
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
AP Statistics Chapter 15 Notes. Inference for a Regression Line Goal: To determine if there is a relationship between two quantitative variables. –i.e.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Multiple Regression BPS chapter 28 © 2006 W.H. Freeman and Company.
Chapter 13 Multiple Regression
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Chapter 10 Inference for Regression
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
AP Statistics Section 15 A. The Regression Model When a scatterplot shows a linear relationship between a quantitative explanatory variable x and a quantitative.
BPS - 5th Ed. Chapter 231 Inference for Regression.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
CHAPTER 12 More About Regression
23. Inference for regression
Chapter 14 Introduction to Multiple Regression
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Basic Estimation Techniques
Inference for Regression
Multiple Regression Analysis and Model Building
CHAPTER 12 More About Regression
The Practice of Statistics in the Life Sciences Fourth Edition
Basic Estimation Techniques
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Chapter 12 Regression.
Chapter 14 Inference for Regression
Regression Chapter 8.
Simple Linear Regression
Simple Linear Regression
Basic Practice of Statistics - 3rd Edition Inference for Regression
CHAPTER 12 More About Regression
Day 68 Agenda: 30 minute workday on Hypothesis Test --- you have 9 worksheets to use as practice Begin Ch 15 (last topic)
CHAPTER 12 More About Regression
Inference for Regression
Presentation transcript:

28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition

Objectives (PSLS Chapter 28) Multiple regression  The multiple linear regression model  Indicator variables  Two parallel regression lines  Interaction  Inference for multiple linear regression

The multiple linear regression model  In previous chapters we examined a simple linear regression model expressing a response variable y as a linear function of one explanatory variable x. In the population, this model has the form y = α +  x  We now examine multiple linear regression models in which the response variable y is a linear combination of k explanatory variables. In the population, this model takes the form y =  0 +  1 x 1 +  2 x 2 +  +  k x k  The parameters can be estimated from sample data, giving y = b 0 + b 1 x 1 + b 2 x 2 + … + b k x k

Assumptions  The mean response μ y has a linear relationship with the k explanatory variables taken together.  The y responses are independent of each other.  For any set of fixed values of the k explanatory variables, the response y varies Normally.  The standard deviation σ of y is the same for all values of the explanatory variables. In inference, the value of σ is unknown.

Indicator variables The multiple regression model can accommodate categorical response variables by coding them in a binary mode (0,1). In particular, we can compare individuals from different groups (independent SRSs in an observational study or randomized groups in an experiment) by using an indicator variable. To compare 2 groups, we simply create an indicator variable Ind such that  Ind = 0 for individuals in one group and  Ind = 1 for individuals in the other group

Two parallel regression lines When plotting the linear regression pattern of y as a function of x for two groups, we sometimes find that the two groups have roughly parallel simple regression lines. In such instances, we can model the data using a unique multiple linear regression model with two parallel regression lines, using the quantitative variable x 1 and an indicator variable Indx 2 for the groups: y =  0 +  1 x 1 +  2 Indx 2  1 is the slope for both lines  0 is the intercept for the Indx 2 = 0 line  0 +  2 ) is the intercept for the Indx 2 = 1 line 22 Indx 2 = 0 line Indx 2 = 1 line

Unique multiple regression model with an indicator variable for two parallel lines: y = – x 1 –23.55Indx 2 Two separate simple linear regression models (notice the similar slopes). Male fruit flies were randomly assigned to either reproduce (IndReprod = 1) or not (IndReprod = 0). Their thorax length and longevity were recorded.

Interaction When plotting the linear regression pattern of y as a function of x for two groups, we may find two non-parallel simple regression lines. We can model such data with a unique multiple linear regression model using a quantitative variable x 1, an indicator variable Indx 2 for the groups, and an interaction term x 1 Indx 2 : y =  0 +  1 x 1 +  2 Indx 2 +  3 x 1 Indx 2 Each line has its own slope and intercept.  1 is the slope for the Indx 2 = 0 line  0 +  3 ) is the slope for the Indx 2 = 1 line Indx 2 = 0 line Indx 2 = 1 line

Note that an interaction term can be computed between any two variables (not just between a quantitative variable and an indicator variable). An interaction effect between the variables x 1 and x 2 means that the relationship between the mean response  y and the explanatory variable x 1 is different for varying values of the explanatory variable x 2. When comparing two groups (x 2 is an indicator variable), this means that the two regression lines will not be parallel.

A random sample of children was taken and their lung capacity (forced expiratory volume, or FEV) was plotted as a function of their age and sex (IndSex = 0 for female and IndReprod = 1 for male).

Using an interaction term to take into account the non-parallel lines, software gives the following multiple regression model : y = x 1 –0.7314Indx x 1 Indx 2

Inference for multiple regression  We first want to run an overall test. We use an ANOVA F test to test: H 0 : β 1 = 0 and β 2 = 0 … and β k = 0 H a : H 0 is not true (at least one coefficient is not equal to 0)  The squared multiple correlation coefficient R 2 is given by the ANOVA output as and indicates how much of the variability in the response variable y can be explained by the specific model tested. A higher R 2 indicates a better model.

Estimating the regression coefficients  If the ANOVA is significant, we can run individual t tests on each regression coefficient: H 0 : β i = 0 in this specific model H a : β i ≠ 0 in this specific model using, which follows the t distribution with n – k – 1 degrees of freedom when H 0 is true.  We can also compute individual level-C confidence intervals for each of the k regression coefficients in the specific model. where t* is the critical value for a t distribution with n – k – 1 degrees of freedom.

The ANOVA test is significant, indicating that at least one regression coefficient is not zero. R 2 = 0.81, so this is a very good model that explains 81% of the variations in longevity of male fruit flies in the lab. The individual t tests are all significant, indicating that in this model, the regression coefficients are significantly different from zero. The confidence intervals give an range of likely values for these parameters. Because this is a model with 2 parallel lines, we can conclude that reproducing male fruit flies live between 19 and 28 days less on average than those that do not reproduce, when thorax length is taken into account. SPSS

The ANOVA test is significant, indicating that at least one regression coefficient is not zero. R 2 = 0.67, so this is a good model that explains 67% of FEV variations in children. The individual t tests are all significant, indicating that in this model, the regression coefficients are significantly different from zero. Because this is a model with a significant interaction effect, we conclude that both age and sex influence FEV in children, but that the effect of age on FEV is different for males and for females. The scatterplot indicates that the effect of age is more pronounced for males.

Checking the conditions for inference  The best way to check the conditions for inference is by examining graphically the scatterplot(s) of y as a function of each x i, and the residuals (y - ŷ) from the multiple regression model.  Look for:  Linear trends in the scatterplot(s)  Normality of the residuals (histogram of residuals)  Constant σ for all combinations of the x i s (residual plot with no particular pattern and approximately equal vertical spread)  Independence of observations (check the study design or a plot of the residuals sorted by order of data acquisition)