Download presentation

1
**Topic 12: Multiple Linear Regression**

2
**Outline Multiple Regression**

Data and notation Model Inference Recall notes from Topic 3 for simple linear regression

3
**Data for Multiple Regression**

Yi is the response variable Xi1, Xi2, … , Xi,p-1 are p-1 explanatory (or predictor) variables Cases denoted by i = 1 to n

4
**Multiple Regression Model**

Yi is the value of the response variable for the ith case β0 is the intercept β1, β2, … , βp-1 are the regression coefficients for the explanatory variables

5
**Multiple Regression Model**

Xi,k is the value of the kth explanatory variable for the ith case ei are independent Normally distributed random errors with mean 0 and variance σ2

6
**Multiple Regression Parameters**

β0 is the intercept β1, β2, … , βp-1 are the regression coefficients for the explanatory variables σ2 the variance of the error term

7
**Interesting special cases**

Yi = β0 + β1Xi + β2Xi2 +…+ βp-1Xip-1+ ei (polynomial of order p-1) X’s can be indicator or dummy variables taking the values 0 and 1 (or any other two distinct numbers) Interactions between explanatory variables (represented as the product of explanatory variables)

8
**Interesting special cases**

Consider the model Yi= β0 + β1Xi1+ β2Xi2+β3X i1Xi2+ ei If X2 a dummy variable Yi = β0 + β1Xi + ei (when X2=0) Yi = β0 + β1Xi1+β2+β3Xi1+ ei (when X2=1) = (β0+β2) + (β1+β3)Xi1+ ei Modeling two different regression lines at same time

9
Model in Matrix Form

10
Least Squares

11
**Least Squares Solution**

Fitted (predicted) values

12
Residuals

13
**Covariance Matrix of residuals**

Cov(e)=σ2(I-H)(I-H)΄= σ2(I-H) Var(ei)= σ2(1-hii) hii= X΄i(X΄X)-1Xi X΄i =(1,Xi1,…,Xi,p-1) Residuals are usually correlated Cov(ei,ej)= -σ2hij

14
Estimation of σ

15
**Distribution of b b = (X΄X)-1X΄Y Since Y~N(Xβ, σ2I)**

E(b)=((X΄X)-1X΄)Xβ=β Cov(b)=σ2 ((X΄X)-1X΄)((X΄X)-1X΄)΄ =σ2(X΄X)-1 σ2 (X΄X)-1 is estimated by s2 (X΄X)-1

16
**ANOVA Table Sources of variation are Model (SAS) or Regression (KNNL)**

Error (SAS, KNNL) or Residual Total SS and df add as before SSM + SSE =SSTO dfM + dfE = dfTotal

17
Sums of Squares

18
Degrees of Freedom

19
Mean Squares

20
Mean Squares

21
**ANOVA Table Source SS df MS F Model SSM dfM MSM MSM/MSE**

Error SSE dfE MSE Total SSTO dfTotal MST

22
**ANOVA F test H0: β1 = β2 = … = βp-1 = 0**

Ha: βk ≠ 0, for at least one k=1,., p-1 Under H0, F ~ F(p-1,n-p) Reject H0 if F is large, use P-value

23
P-value of F test The P-value for the F significance test tells us one of the following: there is no evidence to conclude that any of our explanatory variables can help us to model the response variable using this kind of model (P ≥ .05) one or more of the explanatory variables in our model is potentially useful for predicting the response variable in a linear model (P ≤ .05)

24
R2 The squared multiple regression correlation (R2) gives the proportion of variation in the response variable explained by all the explanatory variables It is usually expressed as a percent It is sometimes called the coefficient of multiple determination (KNNL p 226)

25
**R2 R2 = SSM/SST the proportion of variation explained**

R2 = 1 – (SSE/SST) 1 – the proportion not explained Can express F test is terms of R2 F = [ (R2)/(p-1) ] / [ (1- R2)/(n-p) ]

26
Background Reading We went over KNNL

Similar presentations

© 2024 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google