1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x 2 +...  k x k + u 7. Specification and Data Problems.

Slides:



Advertisements
Similar presentations
Economics 20 - Prof. Anderson
Advertisements

Multiple Regression Analysis
Welcome to Econ 420 Applied Regression Analysis
1 Javier Aparicio División de Estudios Políticos, CIDE Otoño Regresión.
Random Assignment Experiments
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
Multiple Regression Analysis: Specification And Data Issues
Choosing a Functional Form
1Prof. Dr. Rainer Stachuletz Limited Dependent Variables P(y = 1|x) = G(  0 + x  ) y* =  0 + x  + u, y = max(0,y*)
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Linear Regression.
Economics Prof. Buckles1 Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
Prof. Dr. Rainer Stachuletz
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
1Prof. Dr. Rainer Stachuletz Simultaneous Equations y 1 =  1 y 2 +  1 z 1 + u 1 y 2 =  2 y 1 +  2 z 2 + u 2.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 4. Further Issues.
Multiple Regression Analysis
4. Multiple Regression Analysis: Estimation -Most econometric regressions are motivated by a question -ie: Do Canadian Heritage commercials have a positive.
Multiple Regression Analysis
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 4. Further Issues.
1Prof. Dr. Rainer Stachuletz Fixed Effects Estimation When there is an observed fixed effect, an alternative to first differences is fixed effects estimation.
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Inference.
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X1 Including the omitted variable.
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 3. Asymptotic Properties.
Economics 20 - Prof. Anderson
The Simple Regression Model
EC Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Inference.
Empirical Estimation Review EconS 451: Lecture # 8 Describe in general terms what we are attempting to solve with empirical estimation. Understand why.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 5. Dummy Variables.
1Prof. Dr. Rainer Stachuletz Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
Economics Prof. Buckles
Chapter 15: Model Building
1Prof. Dr. Rainer Stachuletz Panel Data Methods y it =  0 +  1 x it  k x itk + u it.
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors.
Nonlinear Regression Functions
Assessing Studies Based on Multiple Regression
Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
7.1 Multiple Regression More than one explanatory/independent variable This makes a slight change to the interpretation of the coefficients This changes.
Specification Error I.
TWO-VARIABLEREGRESSION ANALYSIS: SOME BASIC IDEAS In this chapter:
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
2.4 Units of Measurement and Functional Form -Two important econometric issues are: 1) Changing measurement -When does scaling variables have an effect.
Multiple Linear Regression ● For k>1 number of explanatory variables. e.g.: – Exam grades as function of time devoted to study, as well as SAT scores.
1 Javier Aparicio División de Estudios Políticos, CIDE Primavera Regresión.
Chapter 13: Limited Dependent Vars. Zongyi ZHANG College of Economics and Business Administration.
1 Javier Aparicio División de Estudios Políticos, CIDE Primavera Regresión.
7.4 DV’s and Groups Often it is desirous to know if two different groups follow the same or different regression functions -One way to test this is to.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
1 Prof. Dr. Rainer Stachuletz Summary and Conclusions Carrying Out an Empirical Project.
Chap 6 Further Inference in the Multiple Regression Model
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Inference.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Economics 20 - Prof. Anderson1 Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
AUTOCORRELATION 1 Assumption B.5 states that the values of the disturbance term in the observations in the sample are generated independently of each other.
4-1 MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form (Ch. 6 & 7)
Multiple Regression Analysis: Further Issues
More on Specification and Data Issues
Multiple Regression Analysis
More on Specification and Data Issues
Multiple Regression Analysis: Further Issues
Some issues in multivariate regression
Tutorial 1: Misspecification
Chapter 7: The Normality Assumption and Inference with OLS
More on Specification and Data Issues
Presentation transcript:

1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems

2Prof. Dr. Rainer Stachuletz Functional Form We’ve seen that a linear regression can really fit nonlinear relationships Can use logs on RHS, LHS or both Can use quadratic forms of x’s Can use interactions of x’s How do we know if we’ve gotten the right functional form for our model?

3Prof. Dr. Rainer Stachuletz Functional Form (continued) First, use economic theory to guide you Think about the interpretation Does it make more sense for x to affect y in percentage (use logs) or absolute terms? Does it make more sense for the derivative of x 1 to vary with x 1 (quadratic) or with x 2 (interactions) or to be fixed?

4Prof. Dr. Rainer Stachuletz Functional Form (continued) We already know how to test joint exclusion restrictions to see if higher order terms or interactions belong in the model It can be tedious to add and test extra terms, plus may find a square term matters when really using logs would be even better A test of functional form is Ramsey’s regression specification error test (RESET)

5Prof. Dr. Rainer Stachuletz Ramsey’s RESET RESET relies on a trick similar to the special form of the White test Instead of adding functions of the x’s directly, we add and test functions of ŷ So, estimate y =  0 +  1 x 1 + … +  k x k +  1 ŷ 2 +  1 ŷ 3 +error and test H 0 :  1 = 0,  2 = 0 using F~F 2,n-k-3 or LM~χ 2 2

6Prof. Dr. Rainer Stachuletz Nonnested Alternative Tests If the models have the same dependent variables, but nonnested x’s could still just make a giant model with the x’s from both and test joint exclusion restrictions that lead to one model or the other An alternative, the Davidson-MacKinnon test, uses ŷ from one model as regressor in the second model and tests for significance

7Prof. Dr. Rainer Stachuletz Nonnested Alternatives (cont) More difficult if one model uses y and the other uses ln(y) Can follow same basic logic and transform predicted ln(y) to get ŷ for the second step In any case, Davidson-MacKinnon test may reject neither or both models rather than clearly preferring one specification

8Prof. Dr. Rainer Stachuletz Proxy Variables What if model is misspecified because no data is available on an important x variable? It may be possible to avoid omitted variable bias by using a proxy variable A proxy variable must be related to the unobservable variable – for example: x 3 * =  0 +  3 x 3 + v 3, where * implies unobserved Now suppose we just substitute x 3 for x 3 *

9Prof. Dr. Rainer Stachuletz Proxy Variables (continued) What do we need for for this solution to give us consistent estimates of  1 and  2 ? E(x 3 * | x 1, x 2, x 3 ) = E(x 3 * | x 3 ) =  0 +  3 x 3 That is, u is uncorrelated with x 1, x 2 and x 3 * and v 3 is uncorrelated with x 1, x 2 and x 3 So really running y = (  0 +  3  0 ) +  1 x 1 +  2 x 2 +  3  3 x 3 + (u +  3 v 3 ) and have just redefined intercept, error term x 3 coefficient

10Prof. Dr. Rainer Stachuletz Proxy Variables (continued) Without out assumptions, can end up with biased estimates Say x 3 * =  0 +  1 x 1 +  2 x 2 +  3 x 3 + v 3 Then really running y = (  0 +  3  0 ) + (  1 +  3  1 ) x 1 + (  2 +  3  2 ) x 2 +  3  3 x 3 + (u +  3 v 3 ) Bias will depend on signs of  3 and  j This bias may still be smaller than omitted variable bias, though

11Prof. Dr. Rainer Stachuletz Lagged Dependent Variables What if there are unobserved variables, and you can’t find reasonable proxy variables? May be possible to include a lagged dependent variable to account for omitted variables that contribute to both past and current levels of y Obviously, you must think past and current y are related for this to make sense

12Prof. Dr. Rainer Stachuletz Measurement Error Sometimes we have the variable we want, but we think it is measured with error Examples: A survey asks how many hours did you work over the last year, or how many weeks you used child care when your child was young Measurement error in y different from measurement error in x

13Prof. Dr. Rainer Stachuletz Measurement Error in a Dependent Variable Define measurement error as e 0 = y – y* Thus, really estimating y =  0 +  1 x 1 + …+  k x k + u + e 0 When will OLS produce unbiased results? If e 0 and x j, u are uncorrelated is unbiased If E(e 0 ) ≠ 0 then  0 will be biased, though While unbiased, larger variances than with no measurement error

14Prof. Dr. Rainer Stachuletz Measurement Error in an Explanatory Variable Define measurement error as e 1 = x 1 – x 1 * Assume E(e 1 ) = 0, E(y| x 1 *, x 1 ) = E(y| x 1 *) Really estimating y =  0 +  1 x 1 + (u –  1 e 1 ) The effect of measurement error on OLS estimates depends on our assumption about the correlation between e 1 and x 1 Suppose Cov(x 1, e 1 ) = 0 OLS remains unbiased, variances larger

15Prof. Dr. Rainer Stachuletz Measurement Error in an Explanatory Variable (cont) Suppose Cov(x 1 *, e 1 ) = 0, known as the classical errors-in-variables assumption, then Cov(x 1, e 1 ) = E(x 1 e 1 ) = E(x 1 * e 1 ) + E(e 1 2 ) = 0 +  e 2 x 1 is correlated with the error so estimate is biased

16Prof. Dr. Rainer Stachuletz Measurement Error in an Explanatory Variable (cont) Notice that the multiplicative error is just Var(x 1 *)/Var(x 1 ) Since Var(x 1 *)/Var(x 1 ) < 1, the estimate is biased toward zero – called attenuation bias It’s more complicated with a multiple regression, but can still expect attenuation bias with classical errors in variables

17Prof. Dr. Rainer Stachuletz Missing Data – Is it a Problem? If any observation is missing data on one of the variables in the model, it can’t be used If data is missing at random, using a sample restricted to observations with no missing values will be fine A problem can arise if the data is missing systematically – say high income individuals refuse to provide income data

18Prof. Dr. Rainer Stachuletz Nonrandom Samples If the sample is chosen on the basis of an x variable, then estimates are unbiased If the sample is chosen on the basis of the y variable, then we have sample selection bias Sample selection can be more subtle Say looking at wages for workers – since people choose to work this isn’t the same as wage offers

19Prof. Dr. Rainer Stachuletz Outliers Sometimes an individual observation can be very different from the others, and can have a large effect on the outcome Sometimes this outlier will simply be do to errors in data entry – one reason why looking at summary statistics is important Sometimes the observation will just truly be very different from the others

20Prof. Dr. Rainer Stachuletz Outliers (continued) Not unreasonable to fix observations where it’s clear there was just an extra zero entered or left off, etc. Not unreasonable to drop observations that appear to be extreme outliers, although readers may prefer to see estimates with and without the outliers Can use Stata to investigate outliers