8.1 Ch. 8 Multiple Regression (con’t) Topics: F-tests : allow us to test joint hypotheses tests (tests involving one or more  coefficients). Model Specification:

Slides:



Advertisements
Similar presentations
Further Inference in the Multiple Regression Model Hill et al Chapter 8.
Advertisements

Multiple Regression Analysis
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
The Simple Regression Model
The Multiple Regression Model.
Chapter 12 Simple Linear Regression
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Hypothesis Testing Steps in Hypothesis Testing:
Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of  is constant.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
EPI 809/Spring Probability Distribution of Random Error.
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
Chapter 12 Simple Linear Regression
Chapter 13 Multiple Regression
Multiple regression analysis
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Chapter 12 Multiple Regression
Econ 140 Lecture 131 Multiple Regression Models Lecture 13.
Chapter 4 Multiple Regression.
Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Our theory states.
Multiple Regression Models
The Simple Regression Model
1 Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Consider the variable.
Chapter 11 Multiple Regression.
Further Inference in the Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Topic 3: Regression.
Ch. 14: The Multiple Regression Model building
Introduction to Regression Analysis, Chapter 13,
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Multiple Linear Regression Analysis
Chapter 8 Forecasting with Multiple Regression
Introduction to Linear Regression and Correlation Analysis
Hypothesis Testing in Linear Regression Analysis
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Ms. Khatijahhusna Abd Rani School of Electrical System Engineering Sem II 2014/2015.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.
Estimating Demand Functions Chapter Objectives of Demand Estimation to determine the relative influence of demand factors to forecast future demand.
Regression. Idea behind Regression Y X We have a scatter of points, and we want to find the line that best fits that scatter.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
7.1 Multiple Regression More than one explanatory/independent variable This makes a slight change to the interpretation of the coefficients This changes.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
CHAPTER 14 MULTIPLE REGRESSION
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
Model Selection1. 1. Regress Y on each k potential X variables. 2. Determine the best single variable model. 3. Regress Y on the best variable and each.
1 1 Slide Simple Linear Regression Coefficient of Determination Chapter 14 BA 303 – Spring 2011.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Lecturer: Kem Reat, Viseth, PhD (Economics)
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Six.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Chapter 5 Demand Estimation Managerial Economics: Economic Tools for Today’s Decision Makers, 4/e By Paul Keat and Philip Young.
3-1 MGMG 522 : Session #3 Hypothesis Testing (Ch. 5)
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Chap 6 Further Inference in the Multiple Regression Model
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
Further Inference in the Multiple Regression Model
Prepared by Lee Revere and John Large
Introduction to Regression
Presentation transcript:

8.1 Ch. 8 Multiple Regression (con’t) Topics: F-tests : allow us to test joint hypotheses tests (tests involving one or more  coefficients). Model Specification: –1) what variables to include in the model: what happens when we omit a relevant variable and what happens when we include an irrelevant variable ? –2) what functional form to use ? Multicollinearity: what happens when some of the independent variables have a high degree of correlation with each other We will SKIP sections 8.5, 8.6.2

8.2 F-tests Previously we conducted hypothesis tests on individual  coefficients using a t-test. New Approach is the F-test: it is based on a comparison of the sum of squared residuals under the assumption that the null hypothesis is true and then under the assumption that it is false. It is more general than the t-test because we can use it to test several coefficients jointly Unrestricted model is: Restricted model is something like: or

8.3 Types of Hypotheses that can be tested with a F-Test A. One of the  ’s is zero. When we remove independent variables from the model, we are restricting its coefficient to be zero. Unrestricted: Restricted: H o :  4 = 0 H 1 :  4  0 We already know how to conduct this test using T-test. However, we could also test it with an F-test. Both tests should come to the same conclusion regarding Ho.

8.4 Unrestricted Model: Restricted: H o :  3 =  4 = 0 H 1 : at least one of  3,  4 is non-zero B. A Proper Subset of the Slope Coefficients are restricted to be zero:

8.5 R: U: H o :  2 =  3 =  4 = 0 H 1 : at least one of  2,  3,  4 is non-zero C. All of the Slope Coefficients are restricted to be zero: We call this a test of overall model significance. If we fail to reject Ho  our model has explained nothing. If we reject Ho  our model has explained something.

8.6 Let SSE R be the sum of squared residuals from the Restricted Model Let SSE U be the sum of squared residuals from the Unrestricted Model. Let J be the number of “restrictions” that are placed on the Unrestricted model in constructing the Restricted model. Let T be the number of observations in the data set. Let k be the number of RHS variables plus one for intercept in the Unrestricted model. Recall from Chapter 7 that the sum of squared residuals (SSE) for the model with fewer independent variables is always greater than or equal to the sum of squared residuals for the model with more independent variables. F-statistic has 2 measures of degrees of freedom: J in the numerator and T-k in the denominator

8.7 Critical F: use table on page 391 (5%) or 392 (1%) FFcFc 0.05 Suppose J=1 and T=30 and k=3  Critical F at 5% level of significance is F c = 4.21 (see page 391), meaning P(F > 4.21) = 0.05 We calculate our F statistic using this formula: If F > Fc  we reject null Hypothesis Ho If F < Fc  we fail to reject Ho 0 Note: F can never be negative

8.8 Airline Cost Function: Double Log model. See page 197, 8.14 SSE U SST The SAS System The REG Procedure Model: MODEL1 Dependent Variable: lvc Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 ly <.0001 lk <.0001 lpl <.0001 lpm lpf <.0001 lstage <.0001

8.9 H o :  4 =  5 =  6 = 0 H 1 : at least one of  5,  6,  7 is non-zero Conduct the test Jointly Test a proper subset of slope coefficients: The REG Procedure Model: MODEL2 Dependent Variable: lvc Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 ly <.0001 lk lstage SSE R

8.10 H o :  4 = 0 H 1 :  4  0 Conduct the test Test a single slope coefficient Dependent Variable: lvc Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 ly <.0001 lk <.0001 lpl <.0001 lpf <.0001 lstage <.0001 SSE R

8.11 H o :  2 =  3 =  4 =  5 =  6 =  7 = 0 H 1 : at least one of is non-zero. To conduct the test that all the slope coefficients are zero, we do not estimate a restricted version of the model, because the restricted model has no independent variables on the right hand side. The restricted model explains none on the variation in the dependent variable. The SSR R is 0, meaning the unexplained portion is everything!! SSE R = SST. (SST is the same for Unrestricted and Restricted Models.) Conduct the test: Jointly test all of the slope coefficients

8.12 Additional Hypothesis Tests EXAMPLE: tr t =  1 +  2 p t +  3 a t +  4 a 2 t + e This model suggests that the effect of advertising (a t ) on total revenues (tr t ) is nonlinear, specifically, it is quadratic. 1)If we want to test the hypothesis that advertising has any effect on total revenues, then we would test H o :  3 =  4 =0; H 1 : at least one is nonzero. We would conduct the test using an F test. 2) If we want to test (instead of assuming) that the effect of advertising on total revenues is quadratic, as opposed to linear we would test the hypothesis H o :  4 =0 ; H 1 :  4  0. We could conduct this test using the F-test or a simple t-test (the t-test is easier because we estimate only on model instead of two).

8.13 Model Specification 1) Functional Form (Chapter 6) 2)Omitted Variables: the exclusion of a variables that belongs in the model. Is there a problem? Aside from not being able to get an estimate of  3, is there any problem with getting an estimate of  2 ? The Model We Estimate: True Model: We use this Formula A: We should have used this Formula B:

8.14 It can be shown that E(b 2 )   2, meaning that using Formula A (the bivariate formula for least squares) to estimate  2 results in a biased estimate when the true model is multiple regression (Formula B should have been used). In Ch. 4, we derived E(b 2 ). Here it is:

8.15 Recap: When b 2 is calculated using formula A (which assumes that x 2 is the only independent variable) when the true model is that y t is determined by x 2 and x 3, then least squares will be biased: E(b 2 ) ≠ β 2 So……not only do we not get an estimate of  3 (the effect of x 3 on y), Our estimate of  2 (the effect of x 2 on y) is biased. Recall that Assumption 5 implies that independent variables in regression model are uncorrelated with the error term. When we omit an independent Variable, it is “thrown” into the error term. If the omitted variable is correlated with the included independent variables, this assumption 5 is violated and Least Squares is no longer an unbiased estimator. However, if x 2 and x 3 are uncorrelated  b 2 is unbiased. Bias In general, the signs of  3 and Cov(x 2,x 3 ) determine the direction of the bias.

8.16 Example of Omitted Variable Bias: The Model We Estimate: True Model: Our estimated model using annual data for U.S. Economy : A corrected model

Inclusion of Irrelevant Variables: This error is not nearly as severe as omitting a relevant variable. The Model We Estimate: True Model: In truth  3 = 0, so our estimate b 3 should be not be statistically different from zero. The only problem is the Var(b 2 ) will be larger than it should be  Results may appear to be less significant. Remove x 3 from the model and we should see a decrease in se(b 2 ). The formula we do use: The formula we should use:

8.18 Multicollinearity Economic data are usually from an uncontrolled experiment. Many of the economic variables move together in systematic ways. Variables are collinear, and the problem is labeled collinearity, or multicollinearity when several variables are involved. Consider a production relationship: certain factors of production, such as labor and capital, are used in relatively fixed proportions  Proportionate relationships between variables are the very sort of systematic relationships that epitomize “collinearity.” A related problem exists when the values of an explanatory variable do not vary or change much within the sample of data. When an explanatory variable exhibits little variation, then it is difficult to isolate its impact. We generally always have some of it. It is a matter or degree.

8.19 The Statistical Consequences of Collinearity Whenever there are one or more exact linear relationships among the explanatory variables  exact (perfect) multicollinearity. Least squares is not defined; can’t identify the separate effects. When nearly exact linear dependencies (high correlations) among the X’s exist, the variances of the least squares estimators may be large  least square estimator will lack precision  small t-statistics (insignificant results), despite possibly high R 2 or “F-values” indicating “significant” explanatory power of the model as a whole. Remember the Venn diagrams.

8.20 Identifying and Mitigating Collinearity One simple way to detect collinear relationships is to use sample correlation coefficients. A rule of thumb: a r ij > 0.8 or 0.9 indicates a strong linear association and a potentially harmful collinear relationship. A second simple and effective procedure for identifying the presence of collinearity is to estimate so-called “auxiliary regressions” where the left-hand-side variable is one of the explanatory variables, and the right-hand-side variables are all the remaining explanatory variables. If the R 2 from this artificial model is high, above.80  large portion of the variation in x t is explained by variation in the other explanatory variables (multicollinearity is a problem.) One solution is to obtain more data. We may add structure to the problem by introducing nonsample information in the form of restrictions on the parameters (drop some of the variables, meaning set their parameters to zero).