Multiple Regression Models

Slides:



Advertisements
Similar presentations
Multiple Regression.
Advertisements

Heteroskedasticity Lecture 17 Lecture 17.
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Jennifer Siegel. Statistical background Z-Test T-Test Anovas.
Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of  is constant.
Inference for Regression
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Econ 140 Lecture 81 Classical Regression II Lecture 8.
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
Econ 140 Lecture 151 Multiple Regression Applications Lecture 15.
Classical Regression III
Chapter 13 Multiple Regression
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Multiple Linear Regression Model
Econ 140 Lecture 121 Prediction and Fit Lecture 12.
Chapter 12 Simple Regression
Chapter 12 Multiple Regression
Econ 140 Lecture 131 Multiple Regression Models Lecture 13.
Chapter 4 Multiple Regression.
The Simple Regression Model
Econ 140 Lecture 181 Multiple Regression Applications III Lecture 18.
CHAPTER 4 ECONOMETRICS x x x x x Multiple Regression = more than one explanatory variable Independent variables are X 2 and X 3. Y i = B 1 + B 2 X 2i +
Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17.
Multiple Regression Applications
Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X1 Including the omitted variable.
Econ 140 Lecture 191 Heteroskedasticity Lecture 19.
Inference about a Mean Part II
Autocorrelation Lecture 18 Lecture 18.
Lorelei Howard and Nick Wright MfD 2008
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Multiple Linear Regression Analysis
Lecture 5 Correlation and Regression
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Example of Simple and Multiple Regression
8.1 Ch. 8 Multiple Regression (con’t) Topics: F-tests : allow us to test joint hypotheses tests (tests involving one or more  coefficients). Model Specification:
Objectives of Multiple Regression
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
F-Test ( ANOVA ) & Two-Way ANOVA
Regression Analysis. Regression analysis Definition: Regression analysis is a statistical method for fitting an equation to a data set. It is used to.
Chapter 13: Inference in Regression
Hypothesis Testing in Linear Regression Analysis
4.2 One Sided Tests -Before we construct a rule for rejecting H 0, we need to pick an ALTERNATE HYPOTHESIS -an example of a ONE SIDED ALTERNATIVE would.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Six.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Discussion of time series and panel models
3-1 MGMG 522 : Session #3 Hypothesis Testing (Ch. 5)
Chap 6 Further Inference in the Multiple Regression Model
Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly Copyright © 2014 by McGraw-Hill Higher Education. All rights.
Example x y We wish to check for a non zero correlation.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE.
Lecturer: Ing. Martina Hanová, PhD. Business Modeling.
Stats Methods at IC Lecture 3: Regression.
Chapter 4 Basic Estimation Techniques
Multiple Regression Lecture 13 Lecture 12.
Goodness of Fit The sum of squared deviations from the mean of a variable can be decomposed as follows: TSS = ESS + RSS This decomposition can be used.
Chapter 7: The Normality Assumption and Inference with OLS
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Multiple Regression Models Lecture 14 Lecture 14

Today’s plan How to read the estimated coefficients Functional form Testing the explanatory power of the model Adjustment to R2 Lecture 14

Reading coefficients With a bi-variate model we could easily determine how a change in X affects Y With a multivariate model , determining how a change in X2 affects Y is more complicated For a multivariate regression, you must hold X1 constant to determine the effect of a change in X2 on Y For this reason we call the slope coefficients in a multivariate regression the partial regression coefficients Lecture 14

Reading coefficients Example Back to our earnings and education example from L11.xls For our estimated multivariate regression equation, the expectation of Y is: E(Y) = 4.135 + 0.057 X1 + 0.023 X2 If we hold age constant at 30, the expectation of Y becomes: E(Y) = 4.135 + 0.057 X1 + 0.023 (30) = 4.135 + 0.023 (30) + 0.057 X1 What we’re doing here is looking at the relationship between education and earnings for 30 year olds This can also be done for any other age, i.e. 50 year olds: E(Y) = 4.135 + 0.057 X1 + 0.023 (50) Lecture 14

Functional form Our example on earnings and years of education has some economic theory in its foundation - but basically an ‘ad-hoc’ specification. We know we want to test the relationship between earnings and years of schooling. Let’s look at another example that is based on economic theory: the Cobb-Douglas production function Y = ALK If we want to test for constant returns to scale  +  = 1 Lecture 14

Functional form (2) We can get the equation into a form we can estimate by taking logs: ln Y = ln A +  ln L +  ln K This is called log linear form since all the variables are in logs The model is now linear in parameters so we can use least squares to estimate it The log linear form gives us estimated coefficients that are elasticities: the estimates of  and  give us the elasticities of labor and capital with respect to output Lecture 14

Example with longitudinal data L14-1.xls is on the web. It contains information on companies in the UK private sector. Data from DATASTREAM; for US: COMPUSTAT Note that this is a longitudinal data set - we are analyzing the same agents (the companies) over time I have calculated the true output elasticity with respect to labor for a 100% change in labor and the true output elasticity with respect to labor for a 10% change in labor Note that the larger the increase in the independent variable, the further the approximation is from the coefficient Lecture 14

Example with longitudinal data (2) If we want to calculate the true change, we need to calculate: If we want to estimate the Cobb-Douglas production function, we use the partial slope coefficients We can calculate the partial slope coefficients : Lecture 14

Example with longitudinal data (3) Adding our estimates together we find: Later on we’ll test the constraint that  +  = 1 Lecture 14

Phillips Curve The Phillips Curve is an example of ad-hoc variable inclusion Un W The equation representing this relationship between unemployment and wage inflation is: Lecture 14

Phillips Curve (2) With ad-hoc specification we don’t know what other variables are relevant we need to make informed guesses determined by what we know of economic theory Lecture 14

The story so far Functional form Omitted variable bias Types of data Cross section: earnings and education Panel/longitudinal: Cobb-Douglas Time-series: Phillips Curve Lecture 14

Variation in multivariate models Let our model be We still want to calculate: How to calculate these values. Lecture 14

Variation in multivariate models (2) It still holds that the variance of the regression line is It also still holds that: Lecture 14

Test statistics in multivariate models We will start with the sum of squares identity, where: Total = Explained + Residual or But, the composition of the ESS will be different - our sum of squares identity will look like this: As you add more independent variables to the model, more terms get added to the ESS Lecture 14

Test statistics in multivariate models (2) Our R2 is: Now let’s look back to an example from an earlier lecture we looked at the returns to earnings of education (b1) and age (b2) calculate the test statistics and consider model problems Lecture 14

Test statistics in multivariate models (3) On an exam you may be asked to estimate the regression line, given a matrix of products and cross-products like this: You will also be given these these values: Lecture 14

Test statistics in multivariate models (4) The regression line we calculated earlier is: We can start our calculations with: Taking the square root, we find the root mean square error: Lecture 14

Test statistics in multivariate models (5) We can then calculate: Taking the square root gives us Lecture 14

Test statistics in multivariate models (6) We can then calculate: Taking the square root gives Lecture 14

Hypothesis test on education We can also form a null hypothesis The t-ratio is calculated: For a significance level of 5% we have a table t value of t/2,33 = 2.035 Since |t| < t /2 , we accept the null hypothesis Recall that the purpose of the test was to examine whether or not education has an effect on earnings. Can we accept this given what we know about economics? Lecture 14

Hypothesis test on age We construct another hypothesis test: The t-ratio is calculated: For a significance level of 5% we have a table t value of t/2,33 = 2.035 Since |t| > t /2 , we reject the null hypothesis Lecture 14

Looking at R2 Let’s look at R2: This is a rather low R2 This means that the regression equation doesn’t explain the variation well The regression equation only explains about 1/5 of the variation in Y Lecture 14

Looking at R2 (2) What should we do about the form of our estimated equation when years of education are shown to be statistically insignificant at our chosen significance level? We chose a 5% significance level for our test, but we might have been able to reject the null at a different significance level Remember: with hypothesis test we want to reduce the number of type I errors where we falsely reject a null Lecture 14

Testing explanatory power What if we examined the regression equation as a whole? To do so, we look at this null hypothesis: H0 : b1 = b2 = 0 This says that neither of the independent variables has any explanatory power To test this, we will use an F test Lecture 14

Testing explanatory power (2) The F statistic that we’re looking at can be found on the LINEST output The F test comes from the ANOVA table for the multivariate case, which looks like this: Lecture 14

Testing explanatory power (3) The F statistic will look like: ^ Using the F table, you choose a significance level and use the degrees of freedom in the numerator and denominator, or F0.05, 2, 33 The 1st row in the table is df in the numerator The 1st column is the df in the denominator The 2nd column is the significance level Lecture 14

Testing explanatory power (4) If our calculated F statistic is greater than (to the right of) our F table value, we reject the null If our calculated F statistic is less than (to the left of) our F table value, we accept the null F table value H0: Accepting the null H1: Rejecting the null F Lecture 14

Testing explanatory power (5) Looking at the F table, we find that there is no value for exactly 33 df We have to approximate using 30 df instead Our approximated F value is F0.05, 2, 33  3.29 We reject the null because F > F0.05, 2, 33 Had we picked a 1% significance level, or F table value would be F0.01, 2, 33  5.27 and we would’ve accepted the null because F < F0.01, 2, 33 Lecture 14

Testing explanatory power (6) In summary, we’re more likely to reject the null at a greater significance level In this case, we rejected at a 5% significance level and accepted at a 1% level Graphically: F* value F 1% 5% 3.29 3.81 5.27 Lecture 14

Testing explanatory power (7) The t-test suggests that we should remove years of education from our regression An F-test on the joint hypothesis rejects the null, but the test is weak. At a lower significance level (1 percent), we would have accepted the null. In this instance, we want to keep the years of education variable in the equation because of what we know of economic theory What to do? Conclude that the economic theory is weak. Obtain more data and try again! Lecture 14

Adjustment to R2 The more variables added to a regression, the higher R2 will be R2 is important, but it isn’t the sole criteria for judging a model’s explanatory power Adjusted R2 adjusts for the loss in degrees of freedom associated with adding independent variables to the regression Lecture 14

Adj R2 = 1 - (1 - R2)((n - 1)/(n - k)) Adjustment to R2 (2) Adjusted R2 is written as Adj R2 = 1 - (1 - R2)((n - 1)/(n - k)) n : sample size k : number of parameters in the regression Lecture 14

What’s next Restricted least squares and the Cobb Douglas Production function Including qualitative indicators into the regression equation (e.g. race, gender, marital status). Lecture 14