Instrumental Variables: 2-Stage and 3-Stage Least Squares Regression of a Linear Systems of Equations 2009 LPGA Performance Statistics and Prize Winnings.

Slides:



Advertisements
Similar presentations
3SLS 3SLS is the combination of 2SLS and SUR.
Advertisements

Chapter 4: Basic Estimation Techniques
Weighted Least Squares Regression Dose-Response Study for Rosuvastin in Japanese Patients with High Cholesterol "Randomized Dose-Response Study of Rosuvastin.
Correlation and Regression
Economics 20 - Prof. Anderson
Simple Linear Regression Analysis
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Multiple Regression and Model Building
Topic 12: Multiple Linear Regression
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Kin 304 Regression Linear Regression Least Sum of Squares
NOTATION & ASSUMPTIONS 2 Y i =  1 +  2 X 2i +  3 X 3i + U i Zero mean value of U i No serial correlation Homoscedasticity Zero covariance between U.
Ridge Regression Population Characteristics and Carbon Emissions in China ( ) Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon.
A Comparative Study of the Indicators of Success on the PGA Tour: A Panel Data Analysis Authors: Amarendra Sharma, Patrick Reilly Elmira College.
Correlation and Linear Regression.
Instrumental Variables Estimation and Two Stage Least Square
1 1 Slide 統計學 Spring 2004 授課教師:統計系余清祥 日期: 2004 年 5 月 11 日 第十二週:建立迴歸模型.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Prof. Dr. Rainer Stachuletz
Simultaneous Equations Models
Additional Topics in Regression Analysis
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University.
Multiple Linear Regression
An Introduction to Logistic Regression
15: Linear Regression Expected change in Y per unit X.
Meta-Analysis and Meta- Regression Airport Noise and Home Values J.P. Nelson (2004). “Meta-Analysis of Airport Noise and Hedonic Property Values: Problems.
Relationships Among Variables
Ordinary Least Squares
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Quantile Regression Prize Winnings – LPGA 2009/2010 Seasons Kahane, L.H. (2010). “Returns to Skill in Professional Golf: A Quantile Regression.
Creating Empirical Models Constructing a Simple Correlation and Regression-based Forecast Model Christopher Oludhe, Department of Meteorology, University.
Quantile Regression By: Ashley Nissenbaum. About the Author Leo H. Kahane Associate Professor at Providence College Research Sport economics, international.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Regression Model Building LPGA Golf Performance
Chapter 13 Multiple Regression
Discussion of time series and panel models
INDE 6335 ENGINEERING ADMINISTRATION SURVEY DESIGN Dr. Christopher A. Chung Dept. of Industrial Engineering.
1 1 Slide © 2005 Thomson/South-Western AK/ECON 3480 M & N WINTER 2006 n Power Point Presentation n Professor Ying Kong School of Analytic Studies and Information.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
1/69: Topic Descriptive Statistics and Linear Regression Microeconometric Modeling William Greene Stern School of Business New York University New.
9.2 Linear Regression Key Concepts: –Residuals –Least Squares Criterion –Regression Line –Using a Regression Equation to Make Predictions.
10-1 MGMG 522 : Session #10 Simultaneous Equations (Ch. 14 & the Appendix 14.6)
1 Empirical methods: endogeneity, instrumental variables and panel data Advanced Corporate Finance Semester
The Instrumental Variables Estimator The instrumental variables (IV) estimator is an alternative to Ordinary Least Squares (OLS) which generates consistent.
Simple and multiple regression analysis in matrix form Least square Beta estimation Beta Simple linear regression Multiple regression with two predictors.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Returns to Skill in Professional Golf Leo H. Kahne International Journal of Sport Finance, 2010 A Quantile Regression Approach.
Multiple Regression.
BINARY LOGISTIC REGRESSION
Lecturer: Ing. Martina Hanová, PhD.
Multiple Regression.
Microeconometric Modeling
Instrumental Variables and Two Stage Least Squares
Stats Club Marnie Brennan
Linear Regression.
Multiple Regression.
Instrumental Variables and Two Stage Least Squares
Microeconometric Modeling
Simultaneous equation models Prepared by Nir Kamal Dahal(Statistics)
Lecturer: Ing. Martina Hanová, PhD.
Multiple Linear Regression
Linear Panel Data Models
Linear Regression Summer School IFPRI
Regression and Correlation of Data
Linear Regression and Correlation
Presentation transcript:

Instrumental Variables: 2-Stage and 3-Stage Least Squares Regression of a Linear Systems of Equations 2009 LPGA Performance Statistics and Prize Winnings S.J. Callan and J.M. Thomas (2007). “Modeling the Determinants of a Professional Golfer’s Tournament Earnings,” Journal of Sports Economics, Vol. 8, No. 4, pp

Data Description Prize Winnings and Performance Statistics for n = 146 professional women (LPGA) golfers for 2009 season Exogenous Performance Variables:  Average Driving Distance  Percentage of Fairways reached on Drive  Percentage of Greens Reached in Regulation  Percentage of Sand Saves (in hole in 2 shots from close traps)  Average Putts per hole on greens reached in regulation  Numbers of Events, Events Completed, Rounds Endogenous Result (Dependent & Independent) Variables:  Average Score per Round  Average Rank (Percentile in Tournaments)  Log(Prize Winnings)

Variables in Systems of Equations Endogenous Variables – Jointly dependent (response) variables that are system determined. They can also appear as predictor variables in other equations Exogenous Variables – Independent variables that do not depend on the endogenous variables Predetermined Variables – Exogenous and lagged Endogenous variables Instrumental Variables – Predetermined variables used to predict endogenous variables in first-stage regressions, with predicted values being used in place of the endogenous predictors in system of equations

System of Equations (Callan and Thomas, 2007) 1.Average Score (per 18 holes) is related to the golfers’ skills and experience (number of rounds played) 2.Average Rank (transformed to percentile) in tournaments is related to average score and the number of events she competed in 3.Season Earnings is related to average rank and the number of tournaments she completed

Potential Problems with Endogenous Predictors When endogenous variables are included as predictors, they can be correlated with error terms for that equation, particularly when there are omitted variables that may be related to the outcome. This causes Ordinary Least Squares Estimates to be biased and inconsistent.  In equation 2, SCORE may be correlated with the error term without a variable measuring average course difficulty (Callan and Thomas, p. 402).  In equation 3, Rank may be correlated with the error term without a variable measuring golfer’s human capital investment such as diet and concentration level (Callan and Thomas, p. 402).

Model Building Process 1.Regress all endogenous variables (Score, Rank, and ln(Prize)) on all exogenous variables 2.Obtain the predicted values for each endogenous variable, based on the Regressions from 1. 3.In the system of equations, replace any “right hand side” endogenous predictors with their fitted values from 2. 4.Note that software (e.g. SAS and STATA) will fit all the regressions in 1., even if that variable does not appear as a predictor (ln(Prize) in this example). 5.This method provides correct estimates, but not ANOVA table or correct standard errors

First Stage Regressions for Score and Rank The fitted (predicted) values for SCORE will be used in equation 2 in place of SCORE, and the fitted values for RANK in equation 3. Equation 1 has no right hand side endogenous variables

Equation 1) - SCORE is related to SKILLS and experience All variables except average driving distance are significant. All else equal:  Average SCORE decreases as Percent Fairways Hit Increases (a 10% increase in fairways hit corresponds to a 0.19 decrease in SCORE)  Average SCORE decreases by 1.36 with a 10% increase in Greens in regulation  Average SCORE decreases by 0.16 with a 10% increase in Sand Saves  Average SCORE increases by 1.32 with a 0.1 increase in putts per Green in Regulation hole  Average SCORE decreases by 0.08 for 10 Round Increase in Rounds played

Equation 2) - Rank is related to SCORE and Events Rank (as Percentile, with 100 meaning golfer won every tournament she played in) is:  Negative associated with predicted SCORE (decreases by 12.5 with unit increase in average SCORE)  Positively associated with number of Events (increases by 0.28 with a unit increase in # of EVENTS played)  Note: The estimated coefficients are correct, but the standard errors, t-tests, and Analysis of Variance are incorrect (see slide 11)

Equation 3) – ln(Prize) is related to Rank and Completed Events Prize Winnings (in log form):  Increase with (Predicted) Rank. A 10% increase in Rank (percentile) increases ln(Prize) by 0.56  Increase with Completed Events. For each tournament completed, ln(Prize) increases by  Note: The estimated coefficients are correct, but the standard errors, t- tests, and Analysis of Variance are incorrect (see slide 11)

Matrix Approach: Models w/ Endogenous Predictors

Model 2 – Rank = f(Score, Events)

Model 3: ln(Prize) = f(Rank,Completed)

Robust Estimate of Variance of 2SLS Estimator Exact same method for equation 3

Results for Model 2: Rank = f(Score, Events)

Results for Model 3: ln(Prize) = f(Rank,Completed)

3-Stage Least Squares Extension of 2-Stage Least Squares that allows for a covariance structure among the system of equations Errors from 2SLS are obtained, and used to estimate the within individual (golfer) variance-covariance structure among the equations The response vector is stacked with the n responses from model 1, being stacked over the n responses from model 2, which are stacked over the n responses from model 3. The X matrices are “blocked” out diagonally, with 0 matrices off the blocked diagonal

Model Description - I

Model Description - II

Estimation Results EQ1 EQ2 EQ3

SAS Program data lpga2009; infile 'lpga2009.dat'; input golfer drive fairway green putts sandsv prize lnprize events girputts complete aveposrank rounds strokes; lnprize1=log(prize); run; proc syslin 2sls out=regout; instruments drive fairway green girputts sandsv rounds events complete; strokes: model strokes = drive fairway green girputts sandsv rounds; output residual=e1; rank: model aveposrank = strokes events; output residual=e2; prize: model lnprize1 = aveposrank complete; output residual=e3; run; proc syslin 3sls data=lpga2009 itprint out=regout3; instruments drive fairway green girputts sandsv rounds events complete; strokes: model strokes = drive fairway green girputts sandsv rounds / xpx; output residual=e1; rank: model aveposrank = strokes events / xpx; output residual=e2; prize: model lnprize1 = aveposrank complete / xpx; output residual=e3; run;

STATA Program insheet using lpga_2009_meq.csv generate lnprize=ln(prize) reg3 (avestrokes=drive fairway green sandsvpct girputtshole rounds) /// (averagepospct=avestrokes events) (lnprize=averagepospct completed), /// 2sls reg3 (avestrokes=drive fairway green sandsvpct girputtshole rounds) /// (averagepospct=avestrokes events) (lnprize=averagepospct completed), /// 3sls