Problems in Applying the Linear Regression Model Appendix 4A

Slides:



Advertisements
Similar presentations
Week 13 November Three Mini-Lectures QMM 510 Fall 2014.
Advertisements

Managerial Economics in a Global Economy
Multivariate Regression
Welcome to Econ 420 Applied Regression Analysis
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Prediction, Goodness-of-Fit, and Modeling Issues ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
English math statistics data THE SCIENTIFIC METHOD knowledge.
Qualitative Variables and
The Use and Interpretation of the Constant Term
Choosing a Functional Form
Analysis of Economic Data
Slide 1  2002 South-Western Publishing Objective : Learn how to estimate a demand function using regression analysis, and interpret the results A chief.
Chapter 13 Multiple Regression
Chapter 13 Additional Topics in Regression Analysis
Marietta College Week 14 1 Tuesday, April 12 2 Exam 3: Monday, April 25, 12- 2:30PM Bring your laptops to class on Thursday too.
Chapter 10 Simple Regression.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Chapter 12 Multiple Regression
Econ 140 Lecture 131 Multiple Regression Models Lecture 13.
Additional Topics in Regression Analysis
Regression Hal Varian 10 April What is regression? History Curve fitting v statistics Correlation and causation Statistical models Gauss-Markov.
Multiple Regression Models
Slide 1  South-Western Publishing Applications of Cost Theory Chapter 9 Estimation of Cost Functions using regressions »Short run -- various methods.
Summarizing Empirical Estimation EconS 451: Lecture #9 Transforming Variables to Improve Model Using Dummy / Indicator Variables Issues related to Model.
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Chapter 11 Multiple Regression.
Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Empirical Estimation Review EconS 451: Lecture # 8 Describe in general terms what we are attempting to solve with empirical estimation. Understand why.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need.
Part 3: Regression and Correlation 3-1/41 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Mathematics for Economics and Business Jean Soper chapter two Equations in Economics 1.
ECON 6012 Cost Benefit Analysis Memorial University of Newfoundland
Nonlinear Regression Functions
Outlook for Commercial Aircraft McDonnell Douglas (Made in 1995)
Slide 1  2005 South-Western Publishing A chief uncertainty for managers is the future. They fear what will happen to their product? »Managers use forecasting,
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Two Ending Sunday, September 9 (Note: You must go over these slides and complete every.
Slide 1  South-Western Publishing Applications of Cost Theory Chapter 9 Topics in this Chapter include: Estimation of Cost Functions using regressions.
Introduction to Linear Regression
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
M25- Growth & Transformations 1  Department of ISM, University of Alabama, Lesson Objectives: Recognize exponential growth or decay. Use log(Y.
Regression Analysis A statistical procedure used to find relations among a set of variables.
11/11/20151 The Demand for Baseball Tickets 2005 Frank Francis Brendan Kach Joseph Winthrop.
Discussion of time series and panel models
Estimating Demand Chapter 4 A chief uncertainty for managers is the future. Managers fear what will happen to their product. »Managers use forecasting,
Prediction, Goodness-of-Fit, and Modeling Issues Prepared by Vera Tabakova, East Carolina University.
Fitting Curves to Data 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 5: Fitting Curves to Data Terry Dielman Applied Regression.
1 Some Basic Stuff on Empirical Work Master en Economía Industrial Matilde P. Machado.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Pertemuan Estimation of Demand Chapter 4 Matakuliah: J0434 / Ekonomi Managerial Tahun: 01 September 2005 Versi: revisi.
Managerial Economics Estimating Demand Example Aalto University School of Science Department of Industrial Engineering and Management January 12 – 28,
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
4-1 MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form (Ch. 6 & 7)
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1.
Regression Analysis Week 4.
Applications of Cost Theory Chapter 9
Demand Estimation and Forecasting
Undergraduated Econometrics
Some issues in multivariate regression
BEC 30325: MANAGERIAL ECONOMICS
Chapter 13 Additional Topics in Regression Analysis
BEC 30325: MANAGERIAL ECONOMICS
Presentation transcript:

Problems in Applying the Linear Regression Model Appendix 4A The assumptions of the linear regression model don’t always hold in the real world We now examine statistical problems, which is the central focus of the economic sub-field called econometrics Autocorrelation Heteroscedasticity Specification and Measurement Error Multicollinearity Simultaneous equation relationships and the identification problem Nonlinearities

Autocorrelation also known as serial correlation Problem: Coefficients are unbiased but t-values are unreliable Symptoms: look at a scatter of the error terms to see if there is a pattern, or see if Durbin Watson statistic is far from 2. Cures: Find more variables that explain these patterns Take first differences of data: Q = a + b•P 20

Scatter of Error Terms Positive Autocorrelation Figure 4A.1 page 171 Y 1 2 6 3 4 5 7 8 X 21

2. Heteroscedasticity Problem: Coefficients are unbiased t-values are unreliable Symptoms: different variances for different sub-samples scatter of error terms shows increasing or decreasing dispersion Cures: Transform data, e.g., transform them into logs Take averages of each sub-sample and use weighted least squares 22

Scatter of Error Terms Heteroscedasticity Height alternative log Ht = a + b•AGE 1 2 5 8 AGE 23

3. Specification & Measurement Error Salary = a + b (Strike Outs) in baseball b is positive !!! Why? Omitted variable which is the number of Hits Salary = c + d (Strike Outs) + e ( Hits ) here d is negative and e is positive

Specification & Measurement Error Problem: Coefficients are biased – we can even have the wrong sign as in the baseball example Even adding more observations will not cure this bias Symptoms: The results don’t make economic sense Cure: Think through the relationships and find the missing variables in the specification See if the new specification improves the fit (higher R2) and makes economic sense. 22

4. Multicollinearity Regression Output Q = 22 - 7.8 Pd -.9 Pg (1.2) (1.45) R-square = .87 (t-values in parentheses) N = 100 observations Notice that: R-square is 87% But that neither coefficient is statistically significant. Sometimes independent variables aren’t independent. EXAMPLE: Let Q = Eggs sold Q = a + b Pd + c Pg where Pd is price for a dozen eggs and Pg is the price for a gross of eggs. 19

Multicollinearity Problem: Coefficients are unbiased The t-values are small, often insignificant Symptoms: High R-squares but low t-values Cures: Drop a variable. Usually the remaining variable becomes significant. Do nothing if forecasting, since the added R-square of more variables is worthwhile 22

5. Identification Problem and the Simultaneity Problem Coefficients are biased Symptom: Independent variables are known to be part of a system of equations Cure: Use as many independent variables as possible 18

Graphical Explanation of the Identification Problem Suppose we estimate the following demand curve Q = a + b P. Suppose Supply varies and Demand is FIXED. All points lie on the demand curve The demand curve is said to be identified. S1 S2 S3 Demand |____________________________Quantity quantity 4

Suppose instead that SUPPLY is Fixed Let DEMAND shift and supply is fixed on doesn’t change. All Points are on the SUPPLY curve. We say that the SUPPLY curve is identified. Supply D3 D2 D1 quantity 5

When both Supply and Demand Vary Often both supply and demand vary. Equilibrium points are in shaded region. A regression of Q = a + b P will be neither a demand nor a supply curve. S2 S1 ? D2 D1 quantity 6

Simultaneous Systems Demand is Qd = a + b P + c Y + e1 Supply is Qs = d + e P + f W + e2 Where P is price, Y is income, W is the wage rate, and each has an error term. Notice that P is in both of the demand and supply function. P is “endogenously” determined by both demand and supply. The simultaneity problem is that price is not independent, as it is determined by the whole system The cure for this problem is usually to have as many independent variables as possible in the demand regression to make demand act like it is “fixed”.

6. Nonlinear Forms Semi-logarithmic transformations. Sometimes taking the logarithm of the dependent variable or an independent variable improves the R2. Examples are: log Y =  + ß·X. Here, Y grows exponentially at rate ß in X; that is, ß percent growth per period. Y =  + ß·log X. Here, Y doubles each time X increases by the square of X. Ln Y = .01 + .05X Y X

Reciprocal Transformations The relationship between variables may be inverse. Sometimes taking the reciprocal of a variable improves the fit of the regression as in the example: Y =  + ß·(1/X) shapes can be: declining slowly if beta positive rising slowly if beta negative Y E.g., Y = 500 + 2 ( 1/X) X

Polynomials Quadratic, cubic, and higher degree polynomial relationships are common in business and economics. Profit and revenue are cubic functions of output. Average cost is a quadratic function, as it is U-shaped Total cost is a cubic function, as it is S-shaped TC = ·Q + ß·Q2 + ·Q3 is a cubic total cost function. If higher order polynomials improve the R-square, then the added complexity may be worth it.

Multiplicative or Double Log With the double log form, the coefficients are elasticities Q = A • P b • Yc • Ps d multiplicative functional form So: Ln Q = a + b•Ln P + c•Ln Y+ d•Ln Ps Transform all variables into natural logs Called the double log, since logs are on the left and the right hand sides. Ln and Log are used interchangeably. We use only natural logs. 16

Soft Drink Case, pp. 167-168 a cross section of 50 states Linear Specification Cans = 515 - 242 Price + 1.19 Income + 2.91Temp Predictor Coeff StDev T P Constant 514.8 113.2 4.55 0.000 Price -241.80 43.65 -5.54 0.000 Income 1.195 1.688 0.71 0.483 Temp 2.9136 0.7071 4.12 0.000 R-Sq = 69.8% R-Sq(adj) = 67.7% The Price elasticity in Wyoming is = (DQ/DP)(P/Q) = -241.8(2.31/102)= -5.476

Double Log Soft Drink Case Ln Cans = 2.47 - 3.17 Ln Price + 0.202 Ln Income + 1.12 Ln Temp Predictor Coef Std Dev T P Constant 2.466 1.385 1.78 0.082 Ln Price -3.1695 0.6485 -4.89 0.000 Ln Income 0.2020 0.1834 1.10 0.277 Ln Temp 1.1196 0.2611 4.29 0.000 R-Sq = 67.4% R-Sq(adj) = 65.1% Characterize the demand for soft drinks in the US. Are soft drinks inelastic? Are they luxuries? Which specification fits the data better?