Multiple Regression. In the previous section, we examined simple regression, which has just one independent variable on the right side of the equation.

Slides:



Advertisements
Similar presentations
Multiple Regression.
Advertisements

Managerial Economics in a Global Economy
Here we add more independent variables to the regression.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Inference for Regression
Pengujian Parameter Regresi Pertemuan 26 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
1 Multiple Regression Interpretation. 2 Correlation, Causation Think about a light switch and the light that is on the electrical circuit. If you and.
Chapter 13 Additional Topics in Regression Analysis
1 1 Slide 統計學 Spring 2004 授課教師:統計系余清祥 日期: 2004 年 5 月 4 日 第十二週:複迴歸.
SIMPLE LINEAR REGRESSION
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
1 Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Consider the variable.
Topic 3: Regression.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
SIMPLE LINEAR REGRESSION
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 7 Forecasting with Simple Regression
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 13: Inference in Regression
Regression Method.
Analysis of Variance or ANOVA. In ANOVA, we are interested in comparing the means of different populations (usually more than 2 populations). Since this.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Regression. Idea behind Regression Y X We have a scatter of points, and we want to find the line that best fits that scatter.
QMS 6351 Statistics and Research Methods Regression Analysis: Testing for Significance Chapter 14 ( ) Chapter 15 (15.5) Prof. Vera Adamchik.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
CHAPTER 14 MULTIPLE REGRESSION
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Lecturer: Kem Reat, Viseth, PhD (Economics)
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Chapter 5 Demand Estimation Managerial Economics: Economic Tools for Today’s Decision Makers, 4/e By Paul Keat and Philip Young.
Chapter 13 Multiple Regression
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Statistics for Managers Using Microsoft® Excel 5th Edition
Business Research Methods
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
 List the characteristics of the F distribution.  Conduct a test of hypothesis to determine whether the variances of two populations are equal.  Discuss.
Chapter 14 Introduction to Multiple Regression
Multiple Regression Analysis and Model Building
Essentials of Modern Business Statistics (7e)
John Loucks St. Edward’s University . SLIDES . BY.
Correlation and Simple Linear Regression
Multiple Regression BPS 7e Chapter 29 © 2015 W. H. Freeman and Company.
Correlation and Simple Linear Regression
Our theory states Y=f(X) Regression is used to test theory.
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
SIMPLE LINEAR REGRESSION
Introduction to Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Multiple Regression

In the previous section, we examined simple regression, which has just one independent variable on the right side of the equation. In this section, we consider multiple regression, in which there are two or more independent variables on the right side of the equation.

Simple RegressionMultiple Regression True Relation Estimated Relation Y i =  +  X i +  i Y i = a + bX i +e i Y i =  +  1 X 1i +  2 X 2i +…+  k X ki +  i Y i = a + b 1 X 1i + b 2 X 2i +…+ b k X ki + e i The number of X’s (independent variables) will be denoted as k. We are estimating k+1 parameters, the k  ’s and the constant .

We have similar assumptions to the ones we used in simple regression. The assumptions are The Y values are independent of each other. The conditional distributions of Y given the X’s are normal. The conditional standard deviations of Y given the X’s are equal for all values of the X’s.

We continue to use OLS (ordinary least squares). It is much more difficult to do multiple regression with a hand calculator than simple regression is, but computer programs perform it very easily and quickly.

As in simple regression, we have

The standard error of the regression or the standard error of the estimate is In simple regression, there was only one X, so k was 1 and our denominator was (n – 2). Here the denominator is generalized to (n – k – 1).

The Regression ANOVA Table is now: Source of Variation Sum of squares Degrees of freedom Mean square Regressionk MSR SSR/k Errorn – k – 1 MSE SSE/(n – k – 1) Totaln – 1 MST SST/(n – 1)

The hypotheses for testing the overall significance of the regression are: H 0 :  1 =  2 = … =  k = 0 (all the slope coefficients are zero) H 1 : at least one of the  ’s is not zero. The statistic for the test is

We can also test whether a particular coefficient  j is zero (or any other specified value), using a t-statistic. The calculation of s bj is very messy, but s bj is always given on computer output. We can do one-tailed and two-tailed tests.

Coefficient of determination or R 2 : R 2 adjusted or corrected for degrees of freedom:

Dummy Variables Dummy variables enable us to explore the effects of qualitative rather than quantitative factors. Side note: Cross-sectional data provides us with information on a number of households, individuals, firms, etc. at a particular point in time. Time-series data gives us information on a particular household, firm, etc. at various points in time. Suppose, for example, we have cross-sectional data on income. Dummy variables can give us an understanding of how race, gender, residence in an urban area can affect income. If we have time-series data on expenditures, dummy variables can tell us about seasonal effects.

To capture the effects of a factor that has m categories, you need m – 1 dummy variables. Here are some examples. Gender: You are examining SAT scores. Since there are 2 gender categories, you need 1 gender variable to capture the effect of gender. If you include a variable that is 1 for male observations and 0 for females, the coefficient on that variable tells how male scores compare to female scores. In this case, female is the reference category. Race: You are examining salaries and you have data for 4 races: white, black, Asian, and Native American. You only need 3 dummy variables. You might define a variable that is 1 for blacks and 0 for non-blacks, a 2 nd variable that is 1 for Asians and 0 for non-Asians, and a 3 rd variable that is 1 for Native Americans and 0 for non- Native Americans. Then white would be the reference category and the coefficients of the 3 race variables would tell how salaries for those groups compare to salaries for whites.

Coefficient interpretation example: You have estimated the regression If there are two people with the same experience and gender, and one has 1 more unit of education (in this case, a year), that person would be expected to have a salary that is 1.0 units higher (in this case, 1.0 thousand dollars higher). If there are two people with the same education and gender, and one has 1 more year of experience, that person would be expected to have a salary that is 2.0 thousand dollars higher. If there are two people with the same education and experience, and one is male and one is female, the female is expected to have a salary that is 5.0 thousand dollars less. where SALARY is measured in thousands of dollars, EDUC and EXP are education and experience, each measured in years, and FEMALE is a dummy variable equal to 1 for females and 0 for males. The coefficients of the variables would be interpreted as follows.

educationexperiencefemalesalary Consider 4 people with the following characteristics.

educationexperiencefemalesalary – 0 = Consider 4 people with the following characteristics.

educationexperiencefemalesalary – 0 = – 0 = Consider 4 people with the following characteristics. If two people have the same experience and gender, the one that has one more year of education, would be expected to earn 1.0 thousand dollars more.

educationexperiencefemalesalary – 0 = – 0 = – 0 = Consider 4 people with the following characteristics. If two people have the same education and gender, the one that has one more year of experience, would be expected to earn 2.0 thousand dollars more.

educationexperiencefemalesalary – 0 = – 0 = – 0 = – 5 = 28 Consider 4 people with the following characteristics. If two people have the same education and experience, the female would be expected to earn 5.0 thousand dollars less than the male.

Suppose you have regression results based on quarterly data for a particular household. SPENDING and INCOME are in thousands of dollars. WINTER equals 1 if the quarter is winter and 0 if it is fall, spring or summer. SPRING is 1 if the quarter is spring and 0 if it is fall, winter or summer. SUMMER is 1 if the quarter is summer and 0 if it is fall, spring or winter. Suppose, household income is 10 thousand dollars for all 4 quarters of a particular year. In the fall, spending would be expected to be 17 thousand dollars. In the spring, spending would be expected to be 2.0 thousand dollars higher than in fall or 19 thousand dollars. In the winter, spending would be expected to be 3.0 thousand dollars higher than in the fall or 20 thousand dollars. In the summer, spending would be expected to be 1.0 thousand dollars less than in the fall or 16 thousand dollars.

Example: You have run a regression with 30 observations. The dependent variable, WGT, is weight measured in pounds. The independent variables are HGT, height measured in inches and a dummy variable, MALE, which is 1 if the person is male and 0 if the person is female. The results are as shown below. Answer the questions that follow. variable estimated coefficient estimated std. error CONSTANT HGT MALE source of variation sum of squares degrees of freedom mean square regression25, , error8, total33,

1. Interpret the HGT coefficient. variable estimated coefficient estimated std. error CONSTANT HGT MALE source of variation sum of squares degrees of freedom mean square regression25, , error8, total33, If there are 2 people of the same gender and one is an inch taller than the other, the taller one is expected to weigh pounds more.

2. Interpret the MALE coefficient. variable estimated coefficient estimated std. error CONSTANT HGT MALE source of variation sum of squares degrees of freedom mean square regression25, , error8, total33, If there are 2 people of the same height, and one is male and one is female, the male is expected to weigh pounds more.

3. Calculate and interpret the coefficient of determination R 2. Also calculate the adjusted R 2. variable estimated coefficient estimated std. error CONSTANT HGT MALE source of variation sum of squares degrees of freedom mean square regression25, , error8, total33, , About 75% of the variation in weight is explained by the regression on height and gender.

4. Test at the 5% level whether the HGT coefficient is greater than zero. (Note that is the alternative hypothesis.) variable estimated coefficient estimated std. error CONSTANT HGT MALE source of variation sum of squares degrees of freedom mean square regression25, , error8, total33, , From our t table, we see that for 27 dof, and a 1-tailed 5% critical region, our critical value is Since the value of our statistic is 3.97, we reject H 0 and accept H 1 : the HGT coefficient is greater than zero t 27 critical region Since our t value of 5.27 is in the critical region, we reject H 0 and accept H 1 that the population correlation  is not zero.

5. Test at the 1% level whether the MALE coefficient is different from zero. (Note that is the alternative hypothesis.) variable estimated coefficient estimated std. error CONSTANT HGT MALE source of variation sum of squares degrees of freedom mean square regression25, , error8, total33, , From our t table, we see that for 27 dof, and a 2-tailed 1% critical region, our critical values are and Since the value of our statistic is 2.89, we reject H 0 and accept H 1 : the MALE coefficient is different from zero t 27 critical region Since our t value of 5.27 is in the critical region, we reject H 0 and accept H 1 that the population correlation  is not zero.

6. Test the overall significance of the regression at the 1% level. variable estimated coefficient estimated std. error CONSTANT HGT MALE source of variation sum of squares degrees of freedom mean square regression25, , error8, total33, , From our F table, we see that for 2 and 27 dof, and a 1% critical region, our critical value is Since the value of our statistic is 40.02, we reject H 0 and accept H 1 : at least one of the slope coefficients is not zero. f(F 2,27 ) F 2,27 acceptance region crit. reg

Multicollinearity Problem Multicollinearity arises when independent variables X’s are highly correlated. Then it is not possible to separate the effects of the these variables on the dependent variable Y. The slope coefficient estimates will tend to be unreliable, and often are not significantly different from zero. The simplest solution is to delete one of the correlated variables.

Example: You are exploring the factors influencing the number of children that a couple has. You have included as X’s the mother’s education and the father’s education. You find that neither appears to be statistically significantly different from zero. This may occur because the two education variables are highly correlated. One option is to include only the education of one parent. Alternatively, you could use replace the two education variables with just one variable that might be either the average or total education of the parents.

Problem of Autocorrelation or Serial Correlation This is a problem that may arise in time-series data, but generally not in cross-sectional data. It occurs when successive observations of the dependent variable Y are not independent of each other. For example, if you are examining the weight of a particular person over time, if that that weight is particularly high in one period, it is likely to be high in the next period as well. Therefore, the residuals tend to be correlated among themselves (autocorrelated) rather than independent.

You can test for autocorrelation using the Durbin-Watson statistic The Durbin-Watson statistic d is always between 0 and 4. When there is extreme negative autocorrelation, d will be near 4. When there is extreme positive autocorrelation, d will be near 0. When there is no problem of autocorrelation, d will be near 2. In many computer statistical packages you can request that the Durbin–Watson be provided as output. You can look up critical values in a table that then allows you to determine if you have an autocorrelation problem.

The Durbin-Watson table provides two numbers d L and d U corresponding to the number n of observations and the number k of explanatory variables (X’s). Your textbook provides one-tailed values, so you can test for “positive autocorrelation” or “negative autocorrelation” but not “positive or negative autocorrelation” at the same time. The null hypothesis is that there is no autocorrelation. The diagram below indicates which regions are indicative of positive autocorrelation, negative autocorrelation, no autocorrelation, or are inconclusive. 240dLdL dUdU 4 – d U 4 – d L positive autocorrelation no autocorrelation problem inconclusive negative autocorrelation inconclusive

Example: You have run a time-series regression with 25 observations and 4 independent variables. Your Durbin-Watson statistic d = Test at the 1% level whether you have a positive autocorrelation problem. The Durbin-Watson table indicates that for 25 observations and 4 independent variables, d L = 0.83 and d U = This implies the following diagram. 240dLdL dUdU 4 – d U 4 – d L positive autocorrelation no autocorrelation problem inconclusive negative autocorrelation inconclusive You reject H 0 : no autocorrelation and accept H 1 : there is a positive autocorrelation problem. There are techniques for handling autocorrelation problems, but they are beyond the scope of this course.