Chapter 4 Multiple Regression.

Slides:



Advertisements
Similar presentations
Things to do in Lecture 1 Outline basic concepts of causality
Advertisements

Managerial Economics in a Global Economy
Multiple Regression Analysis
Multiple Regression W&W, Chapter 13, 15(3-4). Introduction Multiple regression is an extension of bivariate regression to take into account more than.
The Simple Regression Model
 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
The Simple Linear Regression Model: Specification and Estimation
Multiple Linear Regression Model
Chapter 10 Simple Regression.
Chapter 3 Simple Regression. What is in this Chapter? This chapter starts with a linear regression model with one explanatory variable, and states the.
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Econ 140 Lecture 131 Multiple Regression Models Lecture 13.
Multiple Regression Models
1 1 Slide 統計學 Spring 2004 授課教師:統計系余清祥 日期: 2004 年 5 月 4 日 第十二週:複迴歸.
Chapter 4 Multiple Regression. 4.1 Introduction.
Statistical Background
Chapter 11 Multiple Regression.
The Simple Regression Model
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Lecture 2 (Ch3) Multiple linear regression
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
FIN357 Li1 The Simple Regression Model y =  0 +  1 x + u.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: prediction Original citation: Dougherty, C. (2012) EC220 - Introduction.
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
8.1 Ch. 8 Multiple Regression (con’t) Topics: F-tests : allow us to test joint hypotheses tests (tests involving one or more  coefficients). Model Specification:
Objectives of Multiple Regression
3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors.
CORRELATION & REGRESSION
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
7.1 Multiple Regression More than one explanatory/independent variable This makes a slight change to the interpretation of the coefficients This changes.
Specification Error I.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
CHAPTER 14 MULTIPLE REGRESSION
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Chapter Three TWO-VARIABLEREGRESSION MODEL: THE PROBLEM OF ESTIMATION
3.4 The Components of the OLS Variances: Multicollinearity We see in (3.51) that the variance of B j hat depends on three factors: σ 2, SST j and R j 2.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 Multiple Regression
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Chap 6 Further Inference in the Multiple Regression Model
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly Copyright © 2014 by McGraw-Hill Higher Education. All rights.
Example x y We wish to check for a non zero correlation.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: exercise 2.11 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
The simple linear regression model and parameter estimation
Multiple Regression Analysis: Estimation
Essentials of Modern Business Statistics (7e)
...Relax... 9/21/2018 ST3131, Lecture 3 ST5213 Semester II, 2000/2001
Chapter 6: MULTIPLE REGRESSION ANALYSIS
The Simple Linear Regression Model: Specification and Estimation
Some issues in multivariate regression
Simple Linear Regression
CHAPTER 14 MULTIPLE REGRESSION
Chapter 7: The Normality Assumption and Inference with OLS
Multiple Regression Berlin Chen
Presentation transcript:

Chapter 4 Multiple Regression

4.1 Introduction The errors are again due to measurement errors in y and errors in the specification of the relationship between y and the x’s. We make the same assumptions about that we made in Chapter 3. These are:

4.1 Introduction for all i. and are independent foe all . and are independent foe all i and j. are normally distributed for all i .

4.1 Introduction There are no linear dependencies in the explanatory variables, i.e., none of the explanatory variables can be expressed as an exact linear function of the others. (This assumption will be relaxed in Chapter 7.) Also, it will be assumed that is a continuous variables. (The case where it is observed as a dummy variable or as a truncated variable will be discussed in Chapter 8.)

4.2 A Model with Two Explanatory Variables Consider the model (4.1) The assumptions we have made about the error term u imply that

4.2 A Model with Two Explanatory Variables Let , ,and be the estimators of , ,and , respectively. The sample counterpart of is the residual The three equations to determine , , and are obtained by replacing the population assumptions by their sample counterparts:

4.2 A Model with Two Explanatory Variables

4.2 A Model with Two Explanatory Variables The Least Squares Method The least square method says that we should choose the estimators , , of , , so as to minimize Differentiate Q with respect to , , and and equate the derivatives to zero.

4.2 A Model with Two Explanatory Variables We get

4.2 A Model with Two Explanatory Variables We can simplify this equation by the use of the following notation. Let us define

4.2 A Model with Two Explanatory Variables Now we can solve these two equations to get and . We get (4.8) Where . Once we obtain and we get from equation (4.5). We have

4.2 A Model with Two Explanatory Variables Thus the computational procedure is as follows: Obtain all the means: , , . Obtain all the sums of squares and sums of products: , , ,and so on. Obtain S11, S12 , S22 , S1y , S2y , and Syy. Solve equations (4.7) and (4.8) to get and . Substitute these in equation (4.5) to get .

4.2 A Model with Two Explanatory Variables

4.2 A Model with Two Explanatory Variables

4.2 A Model with Two Explanatory Variables If , then is an unbiased estimator for . If we substitute for in the expressions in result 2, we get the estimated variances and covariances. The square roots of the estimated variances are called the standard errors (denoted SE). Then each have a t-distribution with d.f. ( n – 3 ). An example

4.2 A Model with Two Explanatory Variables Note that the higher the value of (other things staying the same), the higher the variances of and . If is very high, we cannot estimate and with much precision.

4.2 A Model with Two Explanatory Variables In the case of simple regression we also defined the following: residual sum of squares = explained sum of squares =

4.2 A Model with Two Explanatory Variables The analogous expressions in multiple regression are explained sum of squares =

4.2 A Model with Two Explanatory Variables is called the coefficient of multiple determination and its positive square root is called the multiple correlation coefficient. The first subscript is the explained variable. The subscripts after the dot are the explanatory variables. To avoid cumbersome notation we have written 12 instead of x1x2. Since it is only x’s that have subscripts, there is no confusion in this notation.

4.5 Partial Correlations and Multiple Correlation If we have explained variable y and three explanatory variables x1, x2, x3 and , , are the squares of the simple correlations between y and x1, x2, x3, respectively, then , , and measure the proportion of the variance in y that x1 alone, x2 alone, or x3 alone explain. On the other hand, measures the proportion of the variance of y that x1, x2, x3 together explain. The relationship between simple and multiple correlations?

4.5 Partial Correlations and Multiple Correlation We would also like to measure something else. For instance, how much does x2 explain after x1 is included in the regression equation? How much does x3 explain after x1 and x2 are included? These are measured by the partial coefficients of determination and , respectively. The variables after the dot are the variables already included.

4.5 Partial Correlations and Multiple Correlation With three explanatory variables we have the following partial correlations: These are called partial correlations of the first order. We also have three partial correlation coefficients of the second order: The variables after the dot are always the variables already included in the regress equation.

4.5 Partial Correlations and Multiple Correlation The order of partial correlation coefficient depends on the number of variables after the dot. The usual convention is to denote simple and partial correlations by a small r and multiple correlations by a capital R. For instance, are all coefficients of multiple determination (their positive square roots are multiple correlation coefficients.)

4.5 Partial Correlations and Multiple Correlation Partial correlations are very important in deciding whether or not to include more explanatory variables. For instance, suppose that we have two explanatory variables x1 and x2 , and is very high, say 0.95, but is very low, say 0.01. What this means is that if x2 alone is used to explain y, it can do a good job.

4.5 Partial Correlations and Multiple Correlation But after x1 is included, x2 does not help any more explaining y; that is, x1 has done the job of x2 . In this case there is no use including x2. In face, we can have a situation where, for instance, but

4.5 Partial Correlations and Multiple Correlation In this case each variable is highly correlated with y but the partial correlations are both very low. This is called multicollinearity and we will discuss this problem later in Chapter 7. In this example we can use x1 only or x2 only or some combination of the two as an explanatory variable.

4.5 Partial Correlations and Multiple Correlation For instance, suppose that x1 is the amount of skilled labor, x2 the amount of unskilled labor, and y the output. What the partial correlation coefficients suggest is that the separation of total labor into two components -- skilled and unskilled -- does not help us much in explaining output. So we might as well use x1 + x2 or total labor as the explanatory variable.

Assignment The data from the teacher’s web site Calculate the following three types of correlation Multiple correlation Simple correlation Partial correlation

4.9 Omission of Relevant Variables and Inclusion of Irrelevant Variables Until now we have assumed that the multiple regression equation we are estimating includes all the relevant explanatory variables. In practice, this is rarely the case. Sometimes some relevant variables are not included due to oversight or lack of measurements. At other times some irrelevant variables are included. What we would like to know is how our inferences change when these problems are present.

4.9 Omission of Relevant Variables and Inclusion of Irrelevant Variables Let us first consider the omission of relevant variables. Suppose that the true equation is Instead, we omit x2 and estimate the equation This will be referred to as the “misspecified model.” The estimate of we get is

4.9 Omission of Relevant Variables and Inclusion of Irrelevant Variables Substituting the expression for y from equation (4.15) in this, we get Since we get Where is the regression coefficient from a regression of x2 on x1.

4.9 Omission of Relevant Variables and Inclusion of Irrelevant Variables Thus is a biased estimator for and the bias is given by bias = ( coefficient of the excluded variable) × ( regression coefficient in a regression of the excluded variable on the included variable) If we denote the estimator for from equation (4.15) by , the variance of is given by where

4.9 Omission of Relevant Variables and Inclusion of Irrelevant Variables On the other hand, Thus is a biased estimator but has a smaller variance than . In fact, the variance would be considerably smaller if is high. However, the estimated standard error need not be smaller for than for .

4.9 Omission of Relevant Variables and Inclusion of Irrelevant Variables This is because , the estimated variance of the error, can be higher in the misspecified model. It is given by the residual sum of squares divided by degrees of freedom, and can be higher (or lower) for the misspecified model.

4.9 Omission of Relevant Variables and Inclusion of Irrelevant Variables Consider now the case of inclusion of irrelevant variables. Suppose that the true equation is , but we estimate the equation The least squares estimators and from misspecified equation are given by

4.9 Omission of Relevant Variables and Inclusion of Irrelevant Variables The least squares estimators and from misspecified equation are given by where ,and so on. Since we have Hence we get Thus we get unbiased estimates for both the parameters.

4.9 Omission of Relevant Variables and Inclusion of Irrelevant Variables This result, coupled with the earlier results regarding the bias introduced by the omission of relevant variables might lead us to believe that it is better to include variables (when in doubt) rather than exclude them. However, this is not so, because though the inclusion of irrelevant variables has no effect on the bias of the estimator, it does affect the variances.

4.9 Omission of Relevant Variables and Inclusion of Irrelevant Variables The variance of , the estimator of β1 from the correct equation is given by On the other hand, from the misspecified equation we have where r12 is the correlation between x1 and x1 .

4.9 Omission of Relevant Variables and Inclusion of Irrelevant Variables Thus unless r12 =0. Hence we will be getting unbiased but inefficient estimates by including the irrelevant variable. An example: omit or Include Variables