Quantitative Methods Heteroskedasticity.

Slides:



Advertisements
Similar presentations
Multiple Regression.
Advertisements

Autocorrelation and Heteroskedasticity
Applied Econometrics Second edition
Econometric Modeling Through EViews and EXCEL
Qualitative predictor variables
Objectives 10.1 Simple linear regression
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
8. Heteroskedasticity We have already seen that homoskedasticity exists when the error term’s variance, conditional on all x variables, is constant: Homoskedasticity.
Chapter 12 Simple Linear Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Multiple Linear Regression Model
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
Chapter 5 Heteroskedasticity. What is in this Chapter? How do we detect this problem What are the consequences of this problem? What are the solutions?
SIMPLE LINEAR REGRESSION
Chapter 11 Multiple Regression.
Topic 3: Regression.
Review.
Ch. 14: The Multiple Regression Model building
Economics Prof. Buckles
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Relationships Among Variables
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Objectives of Multiple Regression
ECON 6012 Cost Benefit Analysis Memorial University of Newfoundland
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Class 4 Ordinary Least Squares SKEMA Ph.D programme Lionel Nesta Observatoire Français des Conjonctures Economiques
Correlation and Linear Regression
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
1 Javier Aparicio División de Estudios Políticos, CIDE Primavera Regresión.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
7.4 DV’s and Groups Often it is desirous to know if two different groups follow the same or different regression functions -One way to test this is to.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Quantitative Methods. Bivariate Regression (OLS) We’ll start with OLS regression. Stands for  Ordinary Least Squares Regression. Relatively basic multivariate.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
11.1 Heteroskedasticity: Nature and Detection Aims and Learning Objectives By the end of this session students should be able to: Explain the nature.
Heteroscedasticity Heteroscedasticity is present if the variance of the error term is not a constant. This is most commonly a problem when dealing with.
Chapter 4: Basic Estimation Techniques
Chapter 4 Basic Estimation Techniques
Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3
REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY
I271B Quantitative Methods
CHAPTER 29: Multiple Regression*
Heteroskedasticity.
BEC 30325: MANAGERIAL ECONOMICS
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Quantitative Methods Heteroskedasticity

Heterskedasticity OLS assumes homoskedastic error terms. In OLS, the data are homoskedastic if the error term does not have constant variance. If there is non-constant variance of the error terms, the error terms are related to some variable (or set of variables), or to case #. The data is then heteroskedastic.

Heteroskedasticity Example (from wikipedia, I confess—it has relevant graphs which are easily pasted!) Note: as X increases, the variance of the error term increases (the “goodness of fit” gets worse)

Heteroskedasticity As you can see from the graph, the “b” (parameter estimate – estimated slope or effect of x on y) will not necessarily change. However, heteroskedasticity changes the standard errors of the b’s—making us more or less confident in our slope estimates than we would be otherwise.

Heteroskedasticity Note that whether one is more confident or less confident depends in large part on the distribution of the data—if there is relatively poor goodness of fit near the mean of X, where most of the data points tend to be, then it is likely that you will be less confident in your slope estimates than you would b otherwise. If the data fit the line relatively well near the mean of X, then it is likely that you will be more confident in your slope estimates than you would be otherwise.

Heteroskedasticity: why? Learning?—either your coders learn (in which case you have measurement error), or your cases actually learn. For example, if you are predicting wages with experience, it is likely that variance is reduced among those with more experience.

Heteroskedasticity: why? Scope of choice: some subsets of your data may have more discretion. So, if you want to predict saving behavior with wealthwealthier individuals might show greater variance in their behavior.

Heteroskedasticity Heteroskedasticity is very common in pooled data, which makes sense—for example, some phenomenon (i.e., voting) may be more predictable in some states than in others.

Heteroskedasticity But note that what looks like heteroskedasticity could actually be measurement error (improving or deteriorating, thus causing differences in goodness of fit), or specification issues (you have failed to control for something which might account for how predictable your dependent variable is across different subsets of data).

Heteroskedasticity Tests The tests for heteroskedasticity tend to incorporate the same basic idea of figuring out – through an auxiliary regression analysis – whether the independent variables (or case #, or some combination of independent variables) have a significant relationship to the goodness of fit of the model.

Heteroskedasticity Tests In other words, all of the tests seek to answer the question: Does my model fit the data better in some places than in others? Is the goodness of fit significantly better at low values of some independent variable X? Or at high values? Or in the mid-range of X? Or in some subsets of data?

Heteroskedasticity Tests Also note that no single test is definitive—in part because, as observed in class, there could be problems with the auxiliary regressions themselves. We’ll examine just a few tests, to give you the basic idea.

Heteroskedasticity Tests The first thing you could do is just examine your data in a scatterplot. Of course, it is time consuming to examine all the possible ways in which your data could be heteroskedastic (that is, relative to each X, to combinations of X, to case #, to other variables that aren’t in the model such as pooling unit, etc.)

Heteroskedasticity Tests Another test is the Goldfeld-Quandt. The Goldfeld Quandt essentially asks you to compare the goodness of fit of two areas of your data. Disadvantagesyou need to have pre-selected an X that you think is correlated with the variance of the error term. G-Q assumes a monotonic relations between X and the variance of the error term. That is, is will only work to diagnose heteroskedasticity where the goodness of fit at the low levels of X is different than the goodness of fit of high levels of X (as in the graph above). But it won’t work to diagnose heteroskedasticity where the goodness of fit in the mid-range of X is different from the goodness of fit at both the low end of X and the high end of X.

Heteroskedasticity Tests Goldfeld-Quandt test--steps First, order the n cases by the X that you think is correlated with ei2. Then, drop a section of c cases out of the middle (one-fifth is a reasonable number). Then, run separate regressions on both upper and lower samples. You will then be able to compare the “goodness of fit” between the two subsets of your data.

Heteroskedasticity Tests Obtain the residual sum of squares from each regression (ESS-1 and ESS-2). Then, calculate GQ, which has an F distribution.

Heteroskedasticity Tests The numerator represents the residual “mean square” from the first regression—that is, ESS-1 / df. The df (degrees of freedom) are n-k-1. “n” is the number of cases in that first subset of data, and k is the # of independent variables (and then, 1 is for the intercept estimate).

Heteroskedasticity Tests The denominator represents the residual “mean square” from the first regression—that is, ESS-2 / df. The df (degrees of freedom) are n-k-1. “n” is the number of cases in that second subset of data, and k is the # of independent variables (and then, 1 is for the intercept estimate).

Heteroskedasticity Tests Note that the F test is useful in comparing the goodness of fit of two sets of data. How would we know if the goodness of fit was significantly different across the two subsets of data? By comparing them (as in the ratio above), we can see if one goodness of fit is significantly better than the other (accounting for degrees of freedomsample size, number of variables, etc.) In other words, if GQ is significantly greater or less than 1, that means that the “ESS-1 / df” is significantly greater or less than the “ESS-2 / df”in other words, we have evidence of heteroskedasticity.

Heteroskedasticity Tests A second test is the Glejser test Perform the regression analysis and save the residuals. Regress the absolute value of the residuals on possible sources of heteroskedasticity A significant coefficient indicates heteroskedasticity

Heteroskedasticity Tests Glejser test This makes sense conceptually—you are testing to see if one of your independent variables is significantly related to the variance of your residuals.

Heteroskedasticity Tests White’s Test Regress the squared residuals (as the dependent variables) on... All the X variables, all the cross products (i.e., possible interactions) of the X variables, and all squared values of the X variables. Calculate an “LM test statistics”, which is = n * R2 The LM test statistic has a chi-squared distribution, with the degrees of freedom = # independent variables.

Heteroskedasticity Tests White’s Test The advantage of White’s test is that it does not assume that there is a monotonic relationship between any one X and the variance of the error terms—the inclusion of the interactions allows some non-linearity in that relationship. And, it tests for heteroskedasticity in the entire model—you do not have to choose a particular X to examine. However, if you have many variables, the number of possible interactions plus the squared variables plus the original variables can be quite high!

Heteroskedasticity Solutions GLS / Weighted Least Squares In a perfect world, we would actually know what heteroskedasticity we could expect—and we would then use ‘weighted least squares’. WLS essentially transforms the entire equation by dividing through every part of the equation with the square root of whatever it is that one thinks the variance is related to. In other words, if one thinks one’s variance of the error terms is related to X1 2, then one divides through every element of the equation (intercept, each bx, residual) by X1.

Heteroskedasticity Solutions GLS / Weighted Least Squares In this way, one creates a transformed equation, where the variance of the error term is now constant (because you’ve “weighted” it appropriately). Note, however, that since the equation has been “transformed”, the parameter esimates are different than in the non-transformed version—in the example above, for b2, you have the effect of X2/X1 on Y, not the effect of X2 on Y. So, you need to think about that when you are interpreting your results.

Heteroskedasticity Solutions However... We almost never know the precise form that we expect heteroskedasticity to take. So, in general, we ask the software package to give us White’s Heteroskedastic-Constant Variances and Standard Errors (White’s robust standard errors). (alternatively, less commonly, Newey-West is similar.) (For those of you who have dealt with clustering—the basic idea here is somewhat similar, except that in clustering, you identify an X that you believe your data are “clustered on”. When I have repeated states in a database—that is, multiple cases from California, etc.—I might want to cluster on state (or, if I have repeated legislators, I could cluster on legislator. Etc.) In general, it’s a recognition that the error terms will be related to those repeated observations—the goodness of fit within the observations from California will be better than the goodness of fit across the observations from all states.)