# Topic 12: Multiple Linear Regression

## Presentation on theme: "Topic 12: Multiple Linear Regression"— Presentation transcript:

Topic 12: Multiple Linear Regression

Outline Multiple Regression
Data and notation Model Inference Recall notes from Topic 3 for simple linear regression

Data for Multiple Regression
Yi is the response variable Xi1, Xi2, … , Xi,p-1 are p-1 explanatory (or predictor) variables Cases denoted by i = 1 to n

Multiple Regression Model
Yi is the value of the response variable for the ith case β0 is the intercept β1, β2, … , βp-1 are the regression coefficients for the explanatory variables

Multiple Regression Model
Xi,k is the value of the kth explanatory variable for the ith case ei are independent Normally distributed random errors with mean 0 and variance σ2

Multiple Regression Parameters
β0 is the intercept β1, β2, … , βp-1 are the regression coefficients for the explanatory variables σ2 the variance of the error term

Interesting special cases
Yi = β0 + β1Xi + β2Xi2 +…+ βp-1Xip-1+ ei (polynomial of order p-1) X’s can be indicator or dummy variables taking the values 0 and 1 (or any other two distinct numbers) Interactions between explanatory variables (represented as the product of explanatory variables)

Interesting special cases
Consider the model Yi= β0 + β1Xi1+ β2Xi2+β3X i1Xi2+ ei If X2 a dummy variable Yi = β0 + β1Xi + ei (when X2=0) Yi = β0 + β1Xi1+β2+β3Xi1+ ei (when X2=1) = (β0+β2) + (β1+β3)Xi1+ ei Modeling two different regression lines at same time

Model in Matrix Form

Least Squares

Least Squares Solution
Fitted (predicted) values

Residuals

Covariance Matrix of residuals
Cov(e)=σ2(I-H)(I-H)΄= σ2(I-H) Var(ei)= σ2(1-hii) hii= X΄i(X΄X)-1Xi X΄i =(1,Xi1,…,Xi,p-1) Residuals are usually correlated Cov(ei,ej)= -σ2hij

Estimation of σ

Distribution of b b = (X΄X)-1X΄Y Since Y~N(Xβ, σ2I)
E(b)=((X΄X)-1X΄)Xβ=β Cov(b)=σ2 ((X΄X)-1X΄)((X΄X)-1X΄)΄ =σ2(X΄X)-1 σ2 (X΄X)-1 is estimated by s2 (X΄X)-1

ANOVA Table Sources of variation are Model (SAS) or Regression (KNNL)
Error (SAS, KNNL) or Residual Total SS and df add as before SSM + SSE =SSTO dfM + dfE = dfTotal

Sums of Squares

Degrees of Freedom

Mean Squares

Mean Squares

ANOVA Table Source SS df MS F Model SSM dfM MSM MSM/MSE
Error SSE dfE MSE Total SSTO dfTotal MST

ANOVA F test H0: β1 = β2 = … = βp-1 = 0
Ha: βk ≠ 0, for at least one k=1,., p-1 Under H0, F ~ F(p-1,n-p) Reject H0 if F is large, use P-value

P-value of F test The P-value for the F significance test tells us one of the following: there is no evidence to conclude that any of our explanatory variables can help us to model the response variable using this kind of model (P ≥ .05) one or more of the explanatory variables in our model is potentially useful for predicting the response variable in a linear model (P ≤ .05)

R2 The squared multiple regression correlation (R2) gives the proportion of variation in the response variable explained by all the explanatory variables It is usually expressed as a percent It is sometimes called the coefficient of multiple determination (KNNL p 226)

R2 R2 = SSM/SST the proportion of variation explained
R2 = 1 – (SSE/SST) 1 – the proportion not explained Can express F test is terms of R2 F = [ (R2)/(p-1) ] / [ (1- R2)/(n-p) ]

Background Reading We went over KNNL