Lecture 2 (Ch3) Multiple linear regression

Lecture 2 (Ch3) Multiple linear regression
Research Method Lecture 2 (Ch3) Multiple linear regression

Model with k independent variables
y=β0+β1x1+β2x2+….+βkxk+u β0 is the intercept βj for j=1,…,k are the slope parameters

Mechanics of OLS Variable labels Suppose you have n observations. Then you have data that look like Obs id Y x1 x2 … xk 1 y1 x11 x12 x1k 2 y2 x21 x22 x2k : n yn xn1 xn2 xnk

The OLS estimates of the parameters are chosen to minimize the estimated sum of squared errors. That is, you minimize Q, given below, by choosing betas. This can be achieved by taking the partial derivatives of Q with respect to betas, then set them equal to zero. (See next page)

The first order conditions (FOCs)
……. You solve these equations for betas. The solutions are the OLS estimators for the coefficients.

Most common method to solve for the FOCs is to use matrix notation
Most common method to solve for the FOCs is to use matrix notation. We will use this method later. For our purpose, more useful representation of the estimators are given in the next slide.

The OLS estimators The slope parameters have the following representation. The jth parameter (except intercept) is given by Where is the OLS residual of the following equation where xj is regressed on all other explanatory variables. That is; Proof: See the front board

Unbiasedness of OLS Now, we introduce a series of assumptions to show the unbiasedness of OLS. Assumption MLR.1: Linear in parameters The population model can be written as y=β0+β1x1+β2x2+….+βkxk+u

Assumption MLR.2: Random sampling
We have a random sample of n observations {xi1 xi2…xik, yi}, i=1,…,n following the population model.

MLR.2 means the following MLR.2a yi , i=1,…,n are iid
MLR.2b xi1,i=1,….,n are iid : xik, i=1,….,n are iid MLR.2c Any variables across observations are independent MLR.2d ui , i=1,…,n are iid Obs id Y x1 x2 … xk 1 y1 x11 x12 x1k 2 y2 x21 x22 x2k : n yn xn1 xn2 xnk

Assumption MLR.3: No perfect collinearity
In the sample and in the population, none of the independent variables are constant, and there are no exact linear relationships among the independent variables.

Assumption MLR.4: Zero conditional mean
E(u|x1,x2,…,xk)=0

Combined with MRL.2 and MRL.4, we have the following.
MLR.4a: E(ui|xi1, xi2,…,xik)=0 for i=1,…,n MLR.4b: E(ui|x11,x12,..,x1k,x21,x22,..,x2k,..…,xn1,xn2,..,xnk)=0 for i=1,…,n. MLR.4b means that conditional on all the data, the expected value of ui is zero. We usually write this as E(ui|X)=0

Unbiasedness of OLS parameters
Theorem 3.1 Under assumption MRL.1 through MRL.4 we have for j=0,1,..,k Proof: See front board

Omitted variable bias Suppose that the following population model satisfies MLR.1 through MLR.4 y=β0+β1x1+β2x2+u (1) But, further suppose that you instead estimate the following model which omits x2, perhaps because of a simple mistake, or perhaps because x2 is not available in your data. y=β0+β1x1+v (2)

where is the OLS estimate from (1), and is the OLS estimate from (2).
Then, OLS estimate of (1) and OLS estimate of (2) have the following relationship. where is the OLS estimate from (1), and is the OLS estimate from (2). and, is the OLS estimate of the following model x2=δ0+δ1x1+e Proof for this will be give later for a general case.

So we have So, unless =0 or =0, the estimate from equation (2), , is biased. Notice that >0 if cov(x1,x2) >0 and vise versa, so we can predict the direction of the bias in the following way.

Summary of bias >0 i.e,. cov(x1,x2)>0
Β2>0 Positive bias (upward bias) Negative bias (downward bias) Β2<0

Question Suppose the population model (satisfying the MRL.1 through MRL.4) is given by (Crop yield)= β0+ β1(fertilizer)+ β2(land quality)+u -----(1) But your data do not have land quality variable, so you estimate the following. (Crop yield)= β0+ β1(fertilizer)+ v (2) Questions next page:

Consider the following two scenarios.
Scenario 1: On the farm where data were collected, farmers used more fertilizer on pieces of land where land quality is better. Scenario 2: On the farm where data were collected, scientists randomly assigned different quantities of fertilizer on different pieces of land, irrespective of the land quality. Question 1: In which scenario, do you expect to get an unbiased estimate? Question 2: If the estimate under one of the above scenario is biased, predict the direction of the bias.

Omitted bias, more general case
Suppose the population model (which satisfies MRL.1 through MRL.3) is given by y=β0+β1x1+β2x2+….+βk-1xk-1+βkxk+u -----(1) But you estimate a model which omits xk. y=β0+β1x1+β2x2+….+βk-1xk-1+v (2)

Then, we have the following
where is the OLS estimate from (1), and is the OLS estimate from (2). And, is the OLS estimate of the following xk=δ0+δ1x1+…+ δk-1xk-1+ e

In general, it is difficult to predict the direction of bias in the general case.
However, approximation is often useful. Note that is likely to be positive if the correlation between xj and xk are positive. Using this, you can make predict the “approximate” direction of the bias.

Endogeneity Consider the following model
y=β0+β1x1+β2x2+….+βk-1xk-1+βkxk+u A variable xj is said to be endogenous if xj and u are correlated. This causes a bias in βj, and in certain cases, for other variables as well. One reason why endogeneity occurs is the omitted variable problem, described in the previous slides.

Variance of OLS estimators
First, we introduce one more assumption Assumption MLR.5: Homoskedasticity Var(u|x1,x2,…,xk)=σ2 This means that the variance of u does not depend on the values of independent variables.

Combining MLR.5 with MLR.2, we also have
MRL.4a Var(ui|X)=σ2 for i=1,…,n where X denotes all the independent variables for all the observations. That is, x11, x12,..,x1k, x2l,x22,…x2k,…., xn1, xn2,…xnk.

Sampling variance of OLS slope estimators
Theorem 3.2: Under assumptions MLR.1 through MLR.5, we have for j=1,…,n where And Rj2 is the R=squared from regressing xj on all other independent variables. That is, R-squared from the following regression: Proof: see front board

The standard deviation of OLS slope parameters are given by the square root of the variance, which is for j=1,…,n

The estimator of σ2 In theorem 3.2, σ2 is unknown, which have to be estimated. The estimator is given by n-k-1 comes from (# obs)-(# parameters estimated including the intercept). This is called the degree of freedom.

Under MLR.1 through MLR.5, we have
Theorem 3.3: Unbiased estimator of σ2 . Under MLR.1 through MLR.5, we have Proof: See the front board

Estimates of the variance and the standard errors of OLS slope parameters
We replace the σ2 in the theorem 3.2 by to get the estimate of the variance of the OLS parameters. This is given by Note the is a hat indicating that this is an estimate. Then the standard error of the OLS estimate is the square root of the above. This is the estimated standard deviation of the slope parameters

Multicollinearity If xj is highly correlated with other independent variables, Rj2 gets close to 1. This in turn means that the variance of the βj gets large. This is the problem of multicollinearity. In an extreme case where xj is perfectly linearly correlated with other explanatory variables, Rj2 is equal to 1. In this case, you cannot estimate betas at all. However, this case is eliminated by MLR.3. Note that multicollinearity does not violate any of the OLS assumptions (except the perfect multicollinearity case), and should not be over-emphasized. You can reduce variance by increasing the number of observations.

Gauss-Markov theorem Theorem 3.4
Under Assumption MLR.1 through MRL.5, OLS estimates of beta parameters are the best linear unbiased estimators. This theorem means that among all the possible unbiased estimators of the beta parameters, OLS estimators have the smallest variances.

Lecture 2 (Ch3) Multiple linear regression

Similar presentations

Presentation on theme: "Lecture 2 (Ch3) Multiple linear regression"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 2 (Ch3) Multiple linear regression

Similar presentations

Presentation on theme: "Lecture 2 (Ch3) Multiple linear regression"— Presentation transcript:

Similar presentations

About project

Feedback