3.3 Hypothesis Testing in Multiple Linear Regression

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
The Multiple Regression Model.
12-1 Multiple Linear Regression Models Introduction Many applications of regression analysis involve situations in which there are more than.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
12 Multiple Linear Regression CHAPTER OUTLINE
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Linear regression models
(c) 2007 IUPUI SPEA K300 (4392) Outline Least Squares Methods Estimation: Least Squares Interpretation of estimators Properties of OLS estimators Variance.
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Variance and covariance M contains the mean Sums of squares General additive models.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Chapter 10 Simple Regression.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Additional Topics in Regression Analysis
Simulation Modeling and Analysis Session 12 Comparing Alternative System Designs.
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Predictive Analysis in Marketing Research
Chapter 11 Multiple Regression.
Multiple Regression and Correlation Analysis
Chapter 9 Multicollinearity
Inferences About Process Quality
SIMPLE LINEAR REGRESSION
Linear Regression Analysis 5E Montgomery, Peck & Vining Hidden Extrapolation in Multiple Regression In prediction, exercise care about potentially.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 15: Model Building
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Hypothesis Tests and Confidence Intervals in Multiple Regressors
Objectives of Multiple Regression
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
1 Chapter 3 Multiple Linear Regression Multiple Regression Models Suppose that the yield in pounds of conversion in a chemical process depends.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Presented by: Regresi Linier Berganda (Cont.) (RLB) Dudi Barmana, M.Si
Simple and multiple regression analysis in matrix form Least square Beta estimation Beta Simple linear regression Multiple regression with two predictors.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Inference about the slope parameter and correlation
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Analysis of Variance in Matrix form
Regression Diagnostics
Fundamentals of regression analysis
CHAPTER 29: Multiple Regression*
6-1 Introduction To Empirical Models
Chapter 3 Multiple Linear Regression
Simple Linear Regression
Presentation transcript:

3.3 Hypothesis Testing in Multiple Linear Regression Questions: What is the overall adequacy of the model? Which specific regressors seem important? Assume the errors are independent and follow a normal distribution with mean 0 and variance 2

3.3.1 Test for Significance of Regression Determine if there is a linear relationship between y and xj, j = 1,2,…,k. The hypotheses are H0: β1 = β2 =…= βk = 0 H1: βj 0 for at least one j ANOVA SST = SSR + SSRes SSR/2 ~ 2k, SSRes/2 ~ 2n-k-1, and SSR and SSRes are independent

Under H1, F0 follows F distribution with k and n-k-1 and a noncentrality parameter of

ANOVA table

Example 3.3 The Delivery Time Data

R2 and Adjusted R2 R2 always increase when a regressor is added to the model, regardless of the value of the contribution of that variable. An adjusted R2: The adjusted R2 will only increase on adding a variable to the model if the addition of the variable reduces the residual mean squares.

3.3.2 Tests on Individual Regression Coefficients For the individual regression coefficient: H0: βj = 0 v.s. H1: βj  0 Let Cjj be the j-th diagonal element of (X’X)-1. The test statistic: This is a partial or marginal test because any estimate of the regression coefficient depends on all of the other regression variables. This test is a test of contribution of xj given the other regressors in the model

Example 3.4 The Delivery Time Data

The subset of regressors:

For the full model, the regression sum of square Under the null hypothesis, the regression sum of squares for the reduce model The degree of freedom is p-r for the reduce model. The regression sum of square due to β2 given β1 This is called the extra sum of squares due to β2 and the degree of freedom is p - (p - r) = r The test statistic

If β2  0, F0 follows a noncentral F distribution with Multicollinearity: this test actually has no power! This test has maximal power when X1 and X2 are orthogonal to one another! Partial F test: Given the regressors in X1, measure the contribution of the regressors in X2.

Consider y = β0 + β1 x1 + β2 x2 + β3 x3 +  SSR(β1| β0 , β2, β3), SSR(β2| β0 , β1, β3) and SSR(β3| β0 , β2, β1) are signal-degree-of –freedom sums of squares. SSR(βj| β0 ,…, βj-1, βj, … βk) : the contribution of xj as if it were the last variable added to the model. This F test is equivalent to the t test. SST = SSR(β1 ,β2, β3|β0) + SSRes SSR(β1 ,β2 , β3|β0) = SSR(β1|β0) + SSR(β2|β1, β0) + SSR(β3 |β1, β2, β0)

Example 3.5 Delivery Time Data

3.3.3 Special Case of Orthogonal Columns in X Model: y = Xβ +  = X1β1+ X2β2 +  Orthogonal: X1’X2 = 0 Since the normal equation (X’X)β= X’y,

3.3.4 Testing the General Linear Hypothesis Let T be an m  p matrix, and rank(T) = r Full model: y = Xβ +  Reduced model: y = Z + , Z is an n  (p-r) matrix and  is a (p-r) 1 vector. Then The difference: SSH = SSRes(RM) – SSRes(FM) with r degree of freedom. SSH is called the sum of squares due to the hypothesis H0: Tβ = 0

The test statistic:

Another form: H0: Tβ = c v.s. H1: Tβ c Then

3.4 Confidence Intervals in Multiple Regression 3.4.1 Confidence Intervals on the Regression Coefficients Under the normality assumption,

3.4.2 Confidence Interval Estimation of the Mean Response A confidence interval on the mean response at a particular point. x0 = (1,x01,…,x0k)’ The unbiased estimator of E(y|x0) :

Example 3.9 The Delivery Time Data

3.4.3 Simultaneous Confidence Intervals on Regression Coefficients An elliptically shaped region

Example 3.10 The Rocket Propellant Data

Another approach:  is chosen so that a specified probability that all intervals are correct is obtained. Bonferroni method: Δ= tα/2p, n-p Scheffe S-method: Δ=(2Fα,p, n-p )1/2 Maximum modulus t procedure: Δ= uα,p, n-2 is the upper  tail point of the distribution of the maximum absolute value of two independent student t r.v.’s each based on n-2 degree of freedom

Example 3.11 The Rocket Propellant Data Find 90% joint C.I. for β0 and β1 by constructing a 95% C.I. for each parameter.

Maximum modulus t < Bonferroni method The confidence ellipse is always a more efficient procedure than the Bonferroni method because the volume of the ellipse is always less than the volume of the space covere3d by the Bonferroni intervals. Bonferroni intervals are easier to construct. The length of C.I.: Maximum modulus t < Bonferroni method < Scheffe S-method

3.5 Prediction of New Observations

3.6 Hidden Extrapolation in Multiple Regression Be careful about extrapolating beyond the region containing the original observations! Rectangle formed by ranges of regressors NOT data region. Regressor variable hull (RVH): the convex hull of the original n data points. Interpolation: x0  RVH Extrapolation: x0  RVH

hii of the hat matrix H = X(XX)-1X’are useful in detecting hidden extrapolation. hmax: the maximum of hii . The point xi that has the largest value of hii will lie on the boundary of RVH {x | x(XX)-1x ≦ hmax } is an ellipsoid enclosing all points inside the RVH. Let h00 = x0′(X′X)-1x0 h00  hmax : inside the RVH and the boundary of RVH h00 > hmax : outside the RVH

MCE : minimum covering ellipsoid (Weisberg, 1985).

3.7 Standardized Regression Coefficients Difficult to compare regression coefficients directly. Unit Normal Scaling: Standardize a Normal r.v.

New model: There is no intercept. The least-square estimator of b is

Unit Length Scaling:

New Model: The least-square estimator:

It does not matter which scaling we use It does not matter which scaling we use! They both produce the same set of dimensionless regression coefficient.

3.8 Multicollinearity A serious problem: Multicollinearity or near-linear dependence among the regression variables. The regressors are the columns of X. So an exact linear dependence would result a singular X’X

Unit length scaling

Soft drink data: Off-diagonal elements are of W’W usually called the simple correlations between regressors.

Variance inflation factors (VIFs): The main diagonal elements of the inverse of X’X ((W’W)-1 above) From above two cases:Soft drink: VIF1 = VIF2 = 3.12 and Figure 3.12: VIF1 = VIF2 = 1 VIFj = 1/(1-Rj) Rj is the coefficient of multiple determination obtained from regressing xj on the other regressor variables. If xj is nearly linearly dependent on some of the other regressors, then Rj  1 and VIFj will be large. Serious problems: VIFs > 10

Figure 3.13 (a): The plan is unstable and very sensitive to relatively small changes in the data points. Figure 3.13 (b): Orthogonal regressors.

3.9 Why Do Regression Coefficients Have the Wrong Sign? The reasons of the wrong sign: The range of some of the regressors is too small. Important regressors have not been included in the model. Multicollinearity is present. Computational errors have been made.

For reason 1:

Although it is possible to decrease the variance of the regression coefficients by increase the range of the x’s, it may not be desirable to spread the levels of the regressors out too far: The true response function may be nonlinear. Impractical or impossible. For reason 2:

Fore reason 3: Multicollinearity inflates the variances of the coefficients, and this increases the probability that one or more regression coefficients will have the wrong sign. Different computer programs handle round-off or truncation problems in different ways, and some programs are more effective than the others in this regard.