1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Published byModified over 4 years ago
Presentation on theme: "1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung."— Presentation transcript:
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung
2 3.1 Multiple Regression Models Multiple regression model: involve more than one regressor variable. Example: The yield in pounds of conversion depends on temperature and the catalyst concentration.
4 The response y may be related to k regressor or predictor variables: (multiple linear regression model) The parameter j represents the expected change in the response y per unit change in x i when all of the remaining regressor variables x j are held constant.
5 Multiple linear regression models are often used as the empirical models or approximating functions. (True model is unknown) The cubic model: The model with interaction effects: Any regression model that is linear in the parameters is a linear regression model, regardless of the shape of the surface that it generates.
9 3.2 Estimation of the Model Parameters 3.2.1 Least-squares Estimation of the Regression Coefficients n observations (n > k) Assume –The error term , E( ) = 0 and Var( ) = 2 –The errors are uncorrelated. –The regressor variables, x 1,…, x k are fixed.
10 The sample regression model: The least-squares function: The normal equations:
13 The fitted model corresponding to the levels of the regressor variable, x: The hat matrix, H, is an idempotent matrix and is a symmetric matrix. i.e. H 2 = H and H T = H H is an orthogonal projection matrix. Residuals:
14 Example 3.1 The Delivery Time Data –y: the delivery time, –x 1 : the number of cases of product stocked, –x 2 : the distance walked by the route driver –Consider y = 0 + 1 x 1 + 2 x 2 +
17 3.2.2 A Geometrical Interpretation of Least Square y = (y 1,…,y n ) is the vector of observations. X contains p (p = k+1) column vectors (n ×1), i.e. X = (1,x 1,…,x k ) The column space of X is called the estimation space. Any point in the estimation space is X . Minimize square distance S( )=(y-X )’(y-X )
19 3.2.3 Properties of the Least Square Estimators Unbiased estimator: Covariance matrix: Let C=(X’X) -1 The LSE is the best linear unbiased estimator LSE = MLE under normality assumption
20 3.2.4 Estimation of 2 Residual sum of squares: The degree of freedom: n – p The unbiased estimator of 2 : Residual mean squares
21 Example 3.2 The Delivery Time Data Both estimates are in a sense correct, but they depend heavily on the choice of model. The model with small variance would be better.
22 3.2.5 Inadequacy of Scatter Diagrams in Multiple Regression For the simple linear regression, the scatter diagram is an important tool in analyzing the relationship between y and x. However it may not be useful in multiple regression. – y = 8 – 5 x 1 + 12 x 2 –The y v.s. x 1 plot do not exhibit any apparent relationship between y and x 1 –The y v.s. x 2 plot indicates the linear relationship with the slope 8.
24 In this case, constructing scatter diagrams of y v.s. x j (j = 1,2,…,k) can be misleading. If there is only one (or a few) dominant regressor, or if the regressors operate nearly independently, the matrix scatterplots is most useful.
25 3.2.6 Maximum-Likelihood Estimation The Model is y = X + ~N(0, 2 I) The likelihood function and log-likelihood function: The MLE of 2
26 3.3 Hypothesis Testing in Multiple Linear Regression Questions: –What is the overall adequacy of the model? –Which specific regressors seem important? Assume the errors are independent and follow a normal distribution with mean 0 and variance 2
27 3.3.1 Test for Significance of Regression Determine if there is a linear relationship between y and x j, j = 1,2,…,k. The hypotheses are H 0 : β 1 = β 2 =…= β k = 0 H 1 : β j 0 for at least one j ANOVA SS T = SS R + SS Res SS R / 2 ~ 2 k, SS Res / 2 ~ 2 n-k-1, and SS R and SS Res are independent
28 Under H 1, F 0 follows F distribution with k and n- k-1 and a noncentrality parameter of
32 R 2 and Adjusted R 2 –R 2 always increase when a regressor is added to the model, regardless of the value of the contribution of that variable. –An adjusted R 2 : –The adjusted R 2 will only increase on adding a variable to the model if the addition of the variable reduces the residual mean squares.
33 3.3.2 Tests on Individual Regression Coefficients For the individual regression coefficient: –H 0 : β j = 0 v.s. H 1 : β j 0 –Let C jj be the j-th diagonal element of (X’X) -1. The test statistic: –This is a partial or marginal test because any estimate of the regression coefficient depends on all of the other regression variables. –This test is a test of contribution of x j given the other regressors in the model
36 For the full model, the regression sum of square Under the null hypothesis, the regression sum of squares for the reduce model The degree of freedom is p-r for the reduce model. The regression sum of square due to β 2 given β 1 This is called the extra sum of squares due to β 2 and the degree of freedom is p - (p - r) = r The test statistic
37 If β 2 0, F 0 follows a noncentral F distribution with Multicollinearity: this test actually has no power! This test has maximal power when X 1 and X 2 are orthogonal to one another! Partial F test: Given the regressors in X 1, measure the contribution of the regressors in X 2.
38 Consider y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + SS R ( β 1 | β 0, β 2, β 3 ), SS R ( β 2 | β 0, β 1, β 3 ) and SS R ( β 3 | β 0, β 2, β 1 ) are signal-degree-of –freedom sums of squares. SS R ( β j | β 0,…, β j-1, β j, … β k ) : the contribution of x j as if it were the last variable added to the model. This F test is equivalent to the t test. SS T = SS R ( β 1, β 2, β 3 | β 0 ) + SS Res SS R ( β 1, β 2, β 3 | β 0 ) = SS R ( β 1 | β 0 ) + SS R ( β 2 | β 1, β 0 ) + SS R ( β 3 | β 1, β 2, β 0 )
42 3.3.4 Testing the General Linear Hypothesis Let T be an m p matrix, and rank(T) = r Full model: y = X β + Reduced model: y = Z + , Z is an n (p-r) matrix and is a (p-r) 1 vector. Then The difference: SS H = SS Res (RM) – SS Res (FM) with r degree of freedom. SS H is called the sum of squares due to the hypothesis H 0 : T β = 0