Lecture 2 (Ch3) Multiple linear regression

Slides:



Advertisements
Similar presentations
1 Regression as Moment Structure. 2 Regression Equation Y =  X + v Observable Variables Y z = X Moment matrix  YY  YX  =  YX  XX Moment structure.
Advertisements

Multiple Regression Analysis
Multiple Regression W&W, Chapter 13, 15(3-4). Introduction Multiple regression is an extension of bivariate regression to take into account more than.
Chapter 12 Simple Linear Regression
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
3.3 Omitted Variable Bias -When a valid variable is excluded, we UNDERSPECIFY THE MODEL and OLS estimates are biased -Consider the true population model:
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Ch11 Curve Fitting Dr. Deshi Ye
Definition  Regression Model  Regression Equation Y i =  0 +  1 X i ^ Given a collection of paired data, the regression equation algebraically describes.
Instrumental Variables Estimation and Two Stage Least Square
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
The General Linear Model. The Simple Linear Model Linear Regression.
Lecture 4 Econ 488. Ordinary Least Squares (OLS) Objective of OLS  Minimize the sum of squared residuals: where Remember that OLS is not the only possible.
Assumption MLR.3 Notes (No Perfect Collinearity)
Multiple regression analysis
Lecture 3 Cameron Kaplan
The Simple Linear Regression Model: Specification and Estimation
Cross section and panel method
Chapter 10 Simple Regression.
Simple Linear Regression
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
3-variable Regression Derive OLS estimators of 3-variable regression
Chapter 4 Multiple Regression.
All rights reserved by Dr.Bill Wan Sing Hung - HKBU 4A.1 Week 4a Multiple Regression The meaning of partial regression coefficients.
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
4. Multiple Regression Analysis: Estimation -Most econometric regressions are motivated by a question -ie: Do Canadian Heritage commercials have a positive.
FIN357 Li1 The Simple Regression Model y =  0 +  1 x + u.
CHAPTER 4 ECONOMETRICS x x x x x Multiple Regression = more than one explanatory variable Independent variables are X 2 and X 3. Y i = B 1 + B 2 X 2i +
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Topic 3: Regression.
The Simple Regression Model
Lecture 1 (Ch1, Ch2) Simple linear regression
FIN357 Li1 The Simple Regression Model y =  0 +  1 x + u.
Multivariate Regression Analysis Estimation. Why Multiple Regression? Suppose you want to estimate the effect of x1 on y, but you know the following:
Ordinary Least Squares
Objectives of Multiple Regression
Lecture 3-2 Summarizing Relationships among variables ©
3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors.
Stat13-lecture 25 regression (continued, SE, t and chi-square) Simple linear regression model: Y=  0 +  1 X +  Assumption :  is normal with mean 0.
Chapter 4-5: Analytical Solutions to OLS
2-1 MGMG 522 : Session #2 Learning to Use Regression Analysis & The Classical Model (Ch. 3 & 4)
7.1 Multiple Regression More than one explanatory/independent variable This makes a slight change to the interpretation of the coefficients This changes.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
MTH 161: Introduction To Statistics
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Ordinary Least Squares Regression.
Chapter Three TWO-VARIABLEREGRESSION MODEL: THE PROBLEM OF ESTIMATION
3.4 The Components of the OLS Variances: Multicollinearity We see in (3.51) that the variance of B j hat depends on three factors: σ 2, SST j and R j 2.
Chapter 13 Multiple Regression
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin The Two-Variable Model: Hypothesis Testing chapter seven.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
5. Consistency We cannot always achieve unbiasedness of estimators. -For example, σhat is not an unbiased estimator of σ -It is only consistent -Where.
Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Lecture 8: Ordinary Least Squares Estimation BUEC 333 Summer 2009 Simon Woodcock.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Econometrics III Evgeniya Anatolievna Kolomak, Professor.
The simple linear regression model and parameter estimation
Multiple Regression Analysis: Estimation
Regression.
Fundamentals of regression analysis
Multiple Regression Analysis
Chapter 6: MULTIPLE REGRESSION ANALYSIS
The Regression Model Suppose we wish to estimate the parameters of the following relationship: A common method is to choose parameters to minimise the.
Simple Linear Regression
Tutorial 6 SEG rd Oct..
Presentation transcript:

Lecture 2 (Ch3) Multiple linear regression Research Method Lecture 2 (Ch3) Multiple linear regression

Model with k independent variables y=β0+β1x1+β2x2+….+βkxk+u β0 is the intercept βj for j=1,…,k are the slope parameters

Mechanics of OLS Variable labels Suppose you have n observations. Then you have data that look like Obs id Y x1 x2 … xk 1 y1 x11 x12 x1k 2 y2 x21 x22 x2k : n yn xn1 xn2 xnk

The OLS estimates of the parameters are chosen to minimize the estimated sum of squared errors. That is, you minimize Q, given below, by choosing betas. This can be achieved by taking the partial derivatives of Q with respect to betas, then set them equal to zero. (See next page)

The first order conditions (FOCs) ……. You solve these equations for betas. The solutions are the OLS estimators for the coefficients.

Most common method to solve for the FOCs is to use matrix notation Most common method to solve for the FOCs is to use matrix notation. We will use this method later. For our purpose, more useful representation of the estimators are given in the next slide.

The OLS estimators The slope parameters have the following representation. The jth parameter (except intercept) is given by Where is the OLS residual of the following equation where xj is regressed on all other explanatory variables. That is; Proof: See the front board

Unbiasedness of OLS Now, we introduce a series of assumptions to show the unbiasedness of OLS. Assumption MLR.1: Linear in parameters The population model can be written as y=β0+β1x1+β2x2+….+βkxk+u

Assumption MLR.2: Random sampling We have a random sample of n observations {xi1 xi2…xik, yi}, i=1,…,n following the population model.

MLR.2 means the following MLR.2a yi , i=1,…,n are iid MLR.2b xi1,i=1,….,n are iid : xik, i=1,….,n are iid MLR.2c Any variables across observations are independent MLR.2d ui , i=1,…,n are iid Obs id Y x1 x2 … xk 1 y1 x11 x12 x1k 2 y2 x21 x22 x2k : n yn xn1 xn2 xnk

Assumption MLR.3: No perfect collinearity In the sample and in the population, none of the independent variables are constant, and there are no exact linear relationships among the independent variables.

Assumption MLR.4: Zero conditional mean E(u|x1,x2,…,xk)=0

Combined with MRL.2 and MRL.4, we have the following. MLR.4a: E(ui|xi1, xi2,…,xik)=0 for i=1,…,n MLR.4b: E(ui|x11,x12,..,x1k,x21,x22,..,x2k,..…,xn1,xn2,..,xnk)=0 for i=1,…,n. MLR.4b means that conditional on all the data, the expected value of ui is zero. We usually write this as E(ui|X)=0

Unbiasedness of OLS parameters Theorem 3.1 Under assumption MRL.1 through MRL.4 we have for j=0,1,..,k Proof: See front board

Omitted variable bias Suppose that the following population model satisfies MLR.1 through MLR.4 y=β0+β1x1+β2x2+u -----------------------------(1) But, further suppose that you instead estimate the following model which omits x2, perhaps because of a simple mistake, or perhaps because x2 is not available in your data. y=β0+β1x1+v ------------------------------------(2)

where is the OLS estimate from (1), and is the OLS estimate from (2). Then, OLS estimate of (1) and OLS estimate of (2) have the following relationship. where is the OLS estimate from (1), and is the OLS estimate from (2). and, is the OLS estimate of the following model x2=δ0+δ1x1+e Proof for this will be give later for a general case.

So we have So, unless =0 or =0, the estimate from equation (2), , is biased. Notice that >0 if cov(x1,x2) >0 and vise versa, so we can predict the direction of the bias in the following way.

Summary of bias >0 i.e,. cov(x1,x2)>0 Β2>0 Positive bias (upward bias) Negative bias (downward bias) Β2<0

Question Suppose the population model (satisfying the MRL.1 through MRL.4) is given by (Crop yield)= β0+ β1(fertilizer)+ β2(land quality)+u -----(1) But your data do not have land quality variable, so you estimate the following. (Crop yield)= β0+ β1(fertilizer)+ v ---------------------------(2) Questions next page:

Consider the following two scenarios. Scenario 1: On the farm where data were collected, farmers used more fertilizer on pieces of land where land quality is better. Scenario 2: On the farm where data were collected, scientists randomly assigned different quantities of fertilizer on different pieces of land, irrespective of the land quality. Question 1: In which scenario, do you expect to get an unbiased estimate? Question 2: If the estimate under one of the above scenario is biased, predict the direction of the bias.

Omitted bias, more general case Suppose the population model (which satisfies MRL.1 through MRL.3) is given by y=β0+β1x1+β2x2+….+βk-1xk-1+βkxk+u -----(1) But you estimate a model which omits xk. y=β0+β1x1+β2x2+….+βk-1xk-1+v -----(2)

Then, we have the following where is the OLS estimate from (1), and is the OLS estimate from (2). And, is the OLS estimate of the following xk=δ0+δ1x1+…+ δk-1xk-1+ e

In general, it is difficult to predict the direction of bias in the general case. However, approximation is often useful. Note that is likely to be positive if the correlation between xj and xk are positive. Using this, you can make predict the “approximate” direction of the bias.

Endogeneity Consider the following model y=β0+β1x1+β2x2+….+βk-1xk-1+βkxk+u A variable xj is said to be endogenous if xj and u are correlated. This causes a bias in βj, and in certain cases, for other variables as well. One reason why endogeneity occurs is the omitted variable problem, described in the previous slides.

Variance of OLS estimators First, we introduce one more assumption Assumption MLR.5: Homoskedasticity Var(u|x1,x2,…,xk)=σ2 This means that the variance of u does not depend on the values of independent variables.

Combining MLR.5 with MLR.2, we also have MRL.4a Var(ui|X)=σ2 for i=1,…,n where X denotes all the independent variables for all the observations. That is, x11, x12,..,x1k, x2l,x22,…x2k,…., xn1, xn2,…xnk.

Sampling variance of OLS slope estimators Theorem 3.2: Under assumptions MLR.1 through MLR.5, we have for j=1,…,n where And Rj2 is the R=squared from regressing xj on all other independent variables. That is, R-squared from the following regression: Proof: see front board

The standard deviation of OLS slope parameters are given by the square root of the variance, which is for j=1,…,n

The estimator of σ2 In theorem 3.2, σ2 is unknown, which have to be estimated. The estimator is given by n-k-1 comes from (# obs)-(# parameters estimated including the intercept). This is called the degree of freedom.

Under MLR.1 through MLR.5, we have Theorem 3.3: Unbiased estimator of σ2 . Under MLR.1 through MLR.5, we have Proof: See the front board

Estimates of the variance and the standard errors of OLS slope parameters We replace the σ2 in the theorem 3.2 by to get the estimate of the variance of the OLS parameters. This is given by Note the is a hat indicating that this is an estimate. Then the standard error of the OLS estimate is the square root of the above. This is the estimated standard deviation of the slope parameters

Multicollinearity If xj is highly correlated with other independent variables, Rj2 gets close to 1. This in turn means that the variance of the βj gets large. This is the problem of multicollinearity. In an extreme case where xj is perfectly linearly correlated with other explanatory variables, Rj2 is equal to 1. In this case, you cannot estimate betas at all. However, this case is eliminated by MLR.3. Note that multicollinearity does not violate any of the OLS assumptions (except the perfect multicollinearity case), and should not be over-emphasized. You can reduce variance by increasing the number of observations.

Gauss-Markov theorem Theorem 3.4 Under Assumption MLR.1 through MRL.5, OLS estimates of beta parameters are the best linear unbiased estimators. This theorem means that among all the possible unbiased estimators of the beta parameters, OLS estimators have the smallest variances.