Econometrics I Professor William Greene Stern School of Business

Slides:



Advertisements
Similar presentations
Part 3: Least Squares Algebra 3-1/27 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Advertisements

3.3 Hypothesis Testing in Multiple Linear Regression
Econometrics I Professor William Greene Stern School of Business
Classical Linear Regression Model
Discrete Choice Modeling William Greene Stern School of Business New York University Lab Sessions.
Part 17: Nonlinear Regression 17-1/26 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Part 12: Asymptotics for the Regression Model 12-1/39 Econometrics I Professor William Greene Stern School of Business Department of Economics.
[Part 1] 1/15 Discrete Choice Modeling Econometric Methodology Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
Simple Regression. Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the.
Chapter 10 Simple Regression.
Statistical Inference and Regression Analysis: GB Professor William Greene Stern School of Business IOMS Department Department of Economics.
Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 2000 LIND MASON MARCHAL 1-1 Chapter Twelve Multiple Regression and Correlation Analysis GOALS When.
Chapter 11 Multiple Regression.
Part 4: Partial Regression and Correlation 4-1/24 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Part 5: Functional Form 5-1/36 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Part 7: Multiple Regression Analysis 7-1/54 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Econometric Methodology. The Sample and Measurement Population Measurement Theory Characteristics Behavior Patterns Choices.
Regression Analysis British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century.
Discrete Choice Modeling William Greene Stern School of Business New York University.
Discrete Choice Modeling William Greene Stern School of Business New York University.
Part 24: Multiple Regression – Part /45 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department.
Empirical Methods for Microeconomic Applications William Greene Department of Economics Stern School of Business.
Discrete Choice Modeling William Greene Stern School of Business New York University.
 United States Health Care System Performance  The World’s Best Health Care? William Greene Department of Economics Stern School of Business.
Introduction to Regression Analysis. Two Purposes Explanation –Explain (or account for) the variance in a variable (e.g., explain why children’s test.
Discrete Choice Modeling William Greene Stern School of Business New York University.
[Part 4] 1/43 Discrete Choice Modeling Bivariate & Multivariate Probit Discrete Choice Modeling William Greene Stern School of Business New York University.
Part 9: Hypothesis Testing /29 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
1/62: Topic 2.3 – Panel Data Binary Choice Models Microeconometric Modeling William Greene Stern School of Business New York University New York NY USA.
Discrete Choice Modeling William Greene Stern School of Business New York University.
Econometrics III Evgeniya Anatolievna Kolomak, Professor.
Part 3: Least Squares Algebra 3-1/58 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Chapter 6: Multiple Regression – Additional Topics
Correlation and Regression
Multiple Regression Analysis: Further Issues
CHAPTER 7 Linear Correlation & Regression Methods
William Greene Stern School of Business New York University
Regression 10/29.
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
Evgeniya Anatolievna Kolomak, Professor
Simple Linear Regression
Chapter 6: Multiple Regression – Additional Topics
Statistical Inference and Regression Analysis: GB
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
I271B Quantitative Methods
Microeconometric Modeling
Chapter 6: Multiple Regression – Additional Topics
Undergraduated Econometrics
Statistical Inference and Regression Analysis: GB
Chapter 6: Multiple Regression – Additional Topics
Econometrics Chengyaun yin School of Mathematics SHUFE.
Simple Linear Regression
Correlation and Regression
Econometrics Analysis
Econometrics Analysis
Econometrics Analysis
Econometrics I Professor William Greene Stern School of Business
Econometrics I Professor William Greene Stern School of Business
Microeconometric Modeling
Microeconometric Modeling
Chengyuan Yin School of mathematics
بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.
Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences? It may be difficult to separate.
Correlation & Regression
Econometrics I Professor William Greene Stern School of Business
Chapter Thirteen McGraw-Hill/Irwin
Econometrics I Professor William Greene Stern School of Business
Presentation transcript:

Econometrics I Professor William Greene Stern School of Business Department of Economics

Econometrics I Part 4 – Partial Regression and Correlation

Frisch-Waugh (1933) Theorem The Most Powerful Theorem in Econometrics Context: Model contains two sets of variables: X = [ (1,time) : (other variables)] = [X1 X2] Regression model: y = X11 + X22 +  (population) = X1b1 + X2b2 + e (sample) Problem: Algebraic expression for the second set of least squares coefficients, b2

Partitioned Solution Method of solution (Why did F&W care? In 1933, matrix computation was not trivial!) Direct manipulation of normal equations produces

Partitioned Solution Direct manipulation of normal equations produces b2 = (X2X2)-1X2(y - X1b1) What is this? Regression of (y - X1b1) on X2 If we knew b1, this is the solution for b2. Important result (perhaps not fundamental). Note the result if X2X1 = 0. Useful in theory: Probably Likely in practice? Not at all. 5

Partitioned Inverse Use of the partitioned inverse result produces a fundamental result: What is the southeast element in the inverse of the moment matrix?

Partitioned Inverse The algebraic result is: [ ]-1(2,2) = {[X2’X2] - X2’X1(X1’X1)-1 X1’X2}-1 = [X2’(I - X1(X1’X1)-1X1’)X2]-1 = [X2’M1X2]-1 Note the appearance of an “M” matrix. How do we interpret this result? Note the implication for the case in which X1 is a single variable. (Theorem, p. 34) Note the implication for the case in which X1 is the constant term. (p. 35)

Frisch-Waugh Result Continuing the algebraic manipulation: b2 = [X2’M1X2]-1[X2’M1y]. This is Frisch and Waugh’s famous result - the “double residual regression.” How do we interpret this? A regression of residuals on residuals. “We get the same result whether we (1) detrend the other variables by using the residuals from a regression of them on a constant and a time trend and use the detrended data in the regression or (2) just include a constant and a time trend in the regression and not detrend the data” “Detrend the data” means compute the residuals from the regressions of the variables on a constant and a time trend.

Important Implications Isolating a single coefficient in a regression. (Corollary 3.2.1, p. 34). The double residual regression. Regression of residuals on residuals – ‘partialling’ out the effect of the other variables. It is not necessary to ‘partial’ the other Xs out of y because M1 is idempotent. (This is a very useful result.) (i.e., X2M1’M1y = X2M1y) (Orthogonal regression) Suppose X1 and X2 are orthogonal; X1X2 = 0. What is M1X2?

Applying Frisch-Waugh Using gasoline data from Notes 3. X = [1, year, PG, Y], y = G as before. Full least squares regression of y on X.

Detrending the Variables - Pg

Regression of Detrended G on Detrended Pg and Detrended Y

Prob(Bank Account Ownership) = F(RACE, INCOME, EDUCATION, … 30+ other variables)

Partial Regression Important terms in this context: Partialing out the effect of X1. Netting out the effect … “Partial regression coefficients.” To continue belaboring the point: Note the interpretation of partial regression as “net of the effect of …” Now, follow this through for the case in which X1 is just a constant term, column of ones. What are the residuals in a regression on a constant. What is M1? Note that this produces the result that we can do linear regression on data in mean deviation form. 'Partial regression coefficients' are the same as 'multiple regression coefficients.' It follows from the Frisch-Waugh theorem.

Does Signature Explain (log)Price of Monet Paintings after Controlling for (log)Size?  = Signed  = Unsigned

.

Partial Correlation Working definition. Correlation between sets of residuals. Some results on computation: Based on the M matrices. Some important considerations: Partial correlations and coefficients can have signs and magnitudes that differ greatly from gross correlations and simple regression coefficients. Compare the simple (gross) correlation of G and PG with the partial correlation, net of the time effect. (Could you have predicted the negative partial correlation?) CALC;list;Cor(g,pg)$ Result = .7696572 CALC;list;cor(gstar,pgstar)$ Result = -.6589938

Partial Correlation

A Useful Result Squared partial correlation of an x in X with y is We will define the 't-ratio' and 'degrees of freedom' later. Note how it enters:

Adding Variables to a Model What is the effect of adding PN, PD, PS,

Partial Correlation Partial correlation is a difference in R2s. For PS in the example above, R2 without PS = .9861, R2 with PS = .9907 (.9907 - .9861) / (1 - .9861) = .3309 3.922 / (3.922 + (36-5)) = .3314 (rounding)

THE Application of Frisch-Waugh The Fixed Effects Model A regression model with a dummy variable for each individual in the sample, each observed Ti times. yi = Xi + diαi + εi, for each individual N columns N may be thousands. I.e., the regression has thousands of variables (coefficients).

Estimating the Fixed Effects Model The FEM is a linear regression model but with many independent variables

Application – Health and Income German Health Care Usage Data, 7,293 Individuals, Varying Numbers of Periods Variables in the file are Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. There are altogether 27,326 observations.  The number of observations ranges from 1 to 7 per family.  (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987).  The dependent variable of interest is DOCVIS = number of visits to the doctor in the observation period HHNINC =  household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped) HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC =  years of schooling AGE = age in years We desire also to include a separate family effect (7293 of them) for each family. This requires 7293 dummy variables in addition to the four regressors.

Fixed Effects Estimator (cont.)

‘Within’ Transformations

Least Squares Dummy Variable Estimator b is obtained by ‘within’ groups least squares (group mean deviations) Normal equations for a are D’Xb+D’Da=D’y a = (D’D)-1D’(y – Xb)

Fixed Effects Regression

Time Invariant Variable

In the millennial edition of its World Health Report, in 2000, the World Health Organization published a study that compared the successes of the health care systems of 191 countries. The results notoriously ranked the United States a dismal 37th, between Costa Rica and Slovenia. The study was widely misrepresented , universally misunderstood and was, in fact, unhelpful in understanding the different outcomes across countries. Nonetheless, the result remains controversial a decade later, as policy makers argue about why the world’s most expensive health care system isn’t the world’s best.

The COMPOSITE Index Equation

Estimated Model β1 β2 β3 α

Implications of results: Increases in Health Expenditure and increases in Education are both associated with increases in health outcomes. These are the policy levers in the analysis!

WHO Data