The Simple Regression Model

Name: The Simple Regression Model
Uploaded: 2017-08-14T15:52:04+00:00
Duration: PTM30S5
Channel: Wilfred Porter
Description: The Simple Regression Model

The Simple Regression Model
Chapter 2

I. Outline Simple linear regression model-used to explain one variable in terms of another. Model Assumptions OLS estimator-method of estimating effect of one variable on another. Compute estimator Statistical properties: Unbiasedness & Variance Units of measurement

II. Simple Linear Regression (SLR)
Basic Idea: y and x are two variables want to explain y in terms of x; how y varies with changes in x y: soybean crop, hourly wage, crime rate x: lbs of fertilizer, years of education, # of police SLR model: y = b0 + b1x + u

II. SLR Terminology (Variables)
y = b0 + b1x + u u represents all factors other than x that affect y u is called the: Error term Disturbance Unobserved component (by econometrician) y is called the: Dependent Variable Left-Hand Side Variable Explained Variable Regressand x is called the: Independent Variable Right-Hand Side Variable Explanatory Variable Regressor Covariate Control Variables

II. SLR Terminology (Parameters)
y = b0 + b1x + u b0 is the intercept or constant term Basic value for y when x=0 b1 is the slope parameter Measures the relationship between y and x Tells us how y changes when x changes by some amount How to isolate this effect? Δy = b1Δx if Δu=0 ceteris paribus: holding other factors fixed

II. SLR Examples Example 2: Wages Example 1: Soybeans
SLR model: y = b0 + b1x + u Example 1: Soybeans y: soybean yield x: fertilizer (lbs) u: land quality, rainfall Δyield=b1Δfertilizer Measures change in yield due to adding another unit of fertilizer…holds all other factors fixed Example 2: Wages y: wage x: education (years) u: innate ability, experience, work ethic Δwage=b1Δeduc Measures change in wage due to attaining another unit of education…holds all other factors fixed.

II. SLR Notes SLR assumes linearity: y = b0 + b1x + u is equation for straight line, where slope is constant A one unit change in x has the same effect on y, regardless of initial value of x Example: 10th to 11th year of school has same impact on wage as going from 11th to 12th….may not be realistic. Will consider more realistic forms later…

II. SLR Assumptions Simplifying Assumption: Mean Zero Error
The average value of u, the error term, in the population is 0. Terminology: Expectation is just finding the average E(u) = 0 Ex: average ability is zero, average land quality is zero. This is not a restrictive assumption, since we can always use b0 to normalize E(u) to 0 y = b0 + b1x + u +α0 - α0, where α0 is E(u) y = (α0 + b0 )+ b1x + (u - α0 )….b1 is not affected.

II. SLR Assumptions More Important Assumption: Zero Conditional Mean
In order for b1 to estimate only the effect of x on y, we need to make a crucial assumption about how u and x are related. Average value of u does not depend on the value of x. Terminology: Conditioning on a variable w means we use values of w to explain values of z: E(z|w). If w does not tell us anything about z, E(z|w)=E(z). Note: E(z|w)=E(z) implies Cov(z, w)=0 E(u|x)=E(u) Knowing something about x does not give us any information about u…implies x and us are completely unrelated.

II. SLR Assumptions: More Important Assumption: Zero Conditional Mean
Example: Wage equation: wage = b0 + b1educ + u let u represent unobserved ability E(u|educ) requires that average ability is the same regardless of years of education E(ability|educ=12)=E(ability|educ=16) How likely is this? Generally think people that choose to get more education are more able. E(ability|educ=12)<E(ability|educ=16).

II. SLR Assumptions Combining the two assumptions: E(u|x) = E(u) = 0
Taking the expectation of both sides of SLR model: E(y|x)=E(b0 + b1x + u |x)…. E(y|x)=E(b0 |x)+ E(b1x |x) + E(u|x)…. E(y|x) = b0 + b1x Called the Population Regression Function Now y is written only in terms of x…. this allows us to identify impact of x on y. Note: Above derivation used multiple properties of E(.), including linear operator, conditioning on a constant, conditioning on variable itself (see Appendix A-C)

III. Ordinary Least Squares (OLS)
Basic Idea: Take SLR model and estimate parameters of interest using a sample of data OLS is a method for estimating the parameters Data: Let {(xi,yi): i=1, …,n} denote a random sample of size n from the population Model: For each observation we can write: yi = b0 + b1xi + ui y = b0 + b1x + u (vector notation)

III. Deriving OLS Estimates
To derive the OLS estimates, we need use the SLR assumptions: E(u)=0 and E(u|x)=E(u)….E(u|x)=E(u)=0 Recall, this means Cov(u,x)=0 Covariance is a measure of the linear dependence between two variables Using definition of covariance: Cov(u,x) =0= E(xu)-E(x)E(u) =E(xu) So now we have that: E(u)=0 E(ux)=0

III. Deriving OLS Estimates (continued)
We can write our 2 restrictions just in terms of x, y, b0 and b1 , since u = y – b0 – b1x E(u)=0……..E(y – b0 – b1x) = 0 E(xu)=0……E[x(y – b0 – b1x)] = 0 These restrictions are often called moment restrictions or first order conditions. It is important to note that we have 2 equations, 2 unknowns so we have an exactly identified system of equations. OLS finds so that these equations are satisfied. “hats” denote that we are talking about estimates

Step 1: We know E(.) is just the mean, so the sample counterparts to the two moment equations are (at the estimated parameters):

Step 2: Using the following algebraic properties, and similarly for x Summation is linear operator we can rewrite the first moment as:

Step 3: Substituting this into the second moment condition: Note: Dropping n-1doesn’t affect the estimation. Note: Above derivation uses the following properties of summation:

Step 4: Solving for the parameter estimate The denominator is non-zero as long as there is at least one xi which differs from the others.

III. Summary of OLS slope estimate
The slope estimate is the sample covariance between x and y divided by the sample variance of x. Variance: Measure of spread in the distribution of a random variable Covariance: Measure of linear dependence between two random variables. If x and y are: positively correlated, the slope will be positive negatively correlated, the slope will be negative

III. Deriving OLS Estimates Alternative Approach
Intuition on OLS: Are fitting a line through the sample points (xi,yi) Claim: Are defining line of “best fit” such that the sum of squared residuals is as small as possible What is a residual? Residual is the estimate of the error term: Minimization problem:

III. Deriving OLS Estimates Alternative Approach
To solve the minimization problem we need to take first order conditions. For each parameter: These first order conditions are the same as the moment conditions, multiplied by n-1 OLS finds the parameters that best solve these equations. Leads to name least squares estimator

Sample OLS Line of Best Fit
y . y4 { . y3 } . y2 { . y1 } x1 x2 x3 x4 x

IV. Properties of OLS Algebraic
The sum of the OLS residuals is zero: The sample average of the OLS residuals is zero as well: The sample covariance between the regressors and the OLS residuals is zero: The OLS regression line always goes through the mean of the sample:

IV. Properties of OLS Algebraic
We can think of each observation yi as being composed of 2 parts: explained and unexplained Can define the following: SST: SSE: SSR: Total variation in y is expressed as the sum of the explained variation plus the unexplained variation. SST=SSE+SSR

IV. Proof that SST = SSE + SSR

IV. Goodness-of-Fit Use these definitions to measure how well our independent variable explains the dependent variable. Compute the fraction of the total sum of squares (SST) that is explained by the model R2 = SSE/SST = 1 – SSR/SST Aka coefficient of determination Measures fraction of variation in y that’s explained by variation in x….between 0 and 1…smaller number indicates poorer fit. Often multiply by 100%.

V. Examples: CEO Salary & Return on Equity
Regression specification Model: Salary= b0 + b1 *ROE + u “Regress salary on ROE” Data: Salary is in thousands of $, so that means $856,300 ROE is in percentages Parameter: b1 measures the change in annual salary (in thousands of $), when ROE increases by 1% point (one unit)

V. Examples: CEO Salary & Return on Equity
Results: Sample Regression Function: If ROE=0, then predicted salary is …$963,510 Slope Estimate: If ROE increases by 1% point, then salary is predicted to change by …$18,500 Linearity imposes that the predicted salary change is regardless of initial ROE. ROE=20, then predicted salary is $1,333,215 …in reality, actual salary is $1,145,000. R2 = from regression: variation in ROE explains 1.3% of variation in salary

V. Examples: Wage & Education Voting Outcomes & Expenditure
Regression results: Data Wage: $ per hour Educ: years of education Negative wage for person with no education implies regression line does bad job at low levels of educ. Predicted wage for 8 years is $3.42= *8 Increase in education by 1 year (unit) leads to increase in hourly wage by $0.54…increase by 4 years leads to $0.54*4=$2.16 Is it reasonable that each extra year leads to same wage increase? Regression results: Data voteA: % of vote received by A shareA: % of total campaign expenditures accounted for by A. If candidate A’s share of spending increases by 1% point (unit), that candidate will receive 0.464% more votes.

VI. Properties of OLS Estimator Unbiasedness
One key statistical property of the OLS estimator is that it gives us unbiased estimates, , of the parameters Unbiased estimates: Intuition: We only have a single sample to estimate , and so the estimates we get may or may not be equal to the true parameter. If we had multiple samples of data and in each, estimate , then the average of all these estimates should be equal to population parameter. There are 4 assumptions that we must make to ensures unbiasedness.

VI. Properties of OLS Unbiasedness
SLR.1 Linear in Parameters Assume the population model is linear in parameters as y = b0 + b1x + u I.e. we are estimating b0 and b1, not, say b13 SLR.2 Random Sampling Assume we have a random sample of size n, {(xi, yi): i=1, 2, …, n}, from the population. This allows us to write the sample model yi = b0 + b1xi + ui

SLR.3 Sample variation in x There is variation in x across i Var(x) ≠ 0 SLR.4 Zero Conditional Mean Most important for unbiasedness E(u|x) = 0 and thus E(ui|xi) = 0

To show unbiasedness, we first rewrite our OLS estimator (recall: (App. A)) Using algebra:

We know that: Let:

Can do the same for b0 (in text). Unbiasedness is a description of the estimator. In any given sample of data, we may be “near” or “far” from the true parameter (i.e. true effect of x on y) Unbiasedness says that if have many estimates from many different samples, then, their average converges to the true parameter b1. Proof of unbiasedness depends on our 4 assumptions If any assumption fails, then OLS is not necessarily unbiased. Can slack on SLR.1 (out of scope of text) SLR.3 almost always holds

SLR. 2 can be relaxed when looking at time series data and panel data (later chapters). For cross-sectional data, assume SLR.2 holds SLR.4 is most “crucial” assumption, and unfortunately the hardest to guarantee. As we saw with unobserved ability, it’s likely that x is correlated with u. This can result in the OLS estimates reporting a spurious or biased estimate of the effect of x on y estimating the effect of unobserved factors on y because they are correlated with x.

Example: Student performance and National School Lunch Program (NSLP) Expect that, other factors being equal, a student who receives a free lunch at school will have improved performance Regression: b1 =-0.319, b0 =32.14 Indicates participation has negative effect on achievement Likely that u (school quality, motivation) is correlated with NSLP participation, meaning E(u) is different across participating and non-participating students.

VII. Properties of OLS Variance
For a given sample of data, we estimate Even with unbiasedness, know our estimate is not usually equal to the true parameter. Would like to know, on average, how far our estimate is from the true parameter. Variance of an estimator How spread out the distributions of are. Measure of spread is the variance (or it’s square root, standard deviation). Note: If had multiple methods of estimating the parameters, would use this rubric to determine which is the best (i.e. lowest variance)

To calculate variance of an estimator, we first need to make a simplifying assumption: SLR.5 Homoskedasticity (constant variance) Var(u|x) = s2 Means the error term u has the same variance (spread) given any value of the explanatory variable. Graphically…. Algebra: Var(u|x) = E(u2|x)-[E(u|x)]2 We know E(u|x) = 0, so E(u2|x) = E(u2) = Var(u)=s2 (this is a result of Var(u)=E(u2)-[E(u)]2 and E(u)=0) s2 is also the unconditional variance, called the error variance s, the square root of the error variance is called the standard deviation of the error

Homoskedastic Case y f(y|x) . E(y|x) = b0 + b1x . x1 x2

Heteroskedastic Case f(y|x) y . . E(y|x) = b0 + b1x . x1 x2 x3 x

People often re-write SLR.4 and SLR.5 as SLR.4: E(u|x)=0 y =b0 + b1x + u….E(y|x)=E(b0 |x)+E(b1x|x)+E(u|x) .…E(y|x)=b0 + b1x SLR.5: Var(u|x) = s2 Similarly, Var(u|x)=Var(y|x) = s2 Assuming homoskedasticity, we can derive an estimator for the variance of the OLS parameter estimates. Heteroskedasticity is more likely, but will ignore for now. This will give us an idea of how precisely the parameter is estimated. Would like small variance, because this means our parameter estimate is more likely to be close to the true value.

Calculating Variance of Estimator: Properties: The larger the error variance, s2, the larger the variance of the slope estimate…bad thing. The larger the variability in the xi, the smaller the variance of the slope estimate (i.e. easier to pinpoint how y varies with x)…good thing. Consequently, a larger sample size should decrease the variance of the slope estimate

Calculating Error Variance: Recall s2 = E(u2) = Var(u) Problem: We don’t know what the error variance, s2, is because we don’t observe the errors ui. What we observe are the residuals, ûi We can use the residuals to form an estimate of the error variance.

Then, an unbiased estimator of s2 =E(u2) is: We generally look at the spread of an estimator in terms of the standard error (estimate of standard deviation), which is the square root of the variance. standard deviation:

VIII. Units of Measurement and Functional Form
We are essentially always trying to estimate the impact of x on y. The units for our variables will qualitatively affect how we interpret the estimates…but, the punchline is the same. Example: CEO Salary and ROE Model: Salary= b0 + b1 *ROE + u Data: Salary is measured in thousands of $, so that means $856,300 ROE is in %, so a one unit of change is 1% Results: That is, when ROE increases by 1%, salary is predicted to increase by or $18,501

Rule # 1: If dependent variable is multiplied by a constant c, then OLS intercept and slope estimates are also multiplied by c. Rule # 2: If independent variable is divided (multiplied) by some non zero constant, c, then the OLS slope coefficient is multiplied (divided) by c. The intercept is not affected. Suppose ROE now measured as decimal…0.01 Results: When ROE increases by one unit (units are in decimals), this means ROE changes by 0.01=1% : That is, when ROE changes by 0.01, salary is predicted to increase by 1,850.1*0.01= Since salary is measured in thousands of $, this is a $18,501 increase.

Can incorporate nonlinearities in the variables to make our estimation more realistic. Wage Example Estimate: Restricts each increase in a year of education to have the same affect as the previous increase (10th to 11th, 11th to 12th both yield $0.54 increase). This is unrealistic, as the 12th year culminates in a high school degree, and is likely rewarded in the labor market.

An improvement would be to say that wage increases by a constant percentage at each additional year of education. Allows for the monetary impact of 10 to 11 to be different from 11 to 12, although the % increase is the same. Model: Log(wage)= = b0 + b1 *educ + u Using this form implies an increasing return to education

Estimate: Log(wage)= b0 + b1 *educ + u Results: Standard to multiply b1 *100% to get the percentage change in wage given one additional unit (year) of schooling. An extra year of education results in a 8.3% increase in predicted wage.

What if our LHS and RHS are logs? Called Constant elasticity model Estimate: Log(wage)= b0 + b1 log(sales)+ u Wage in $, sales in millions of $ b1 estimates the elasticity of salary with respect to sales Result: Implies a 1% increase in firm sales increases salary by %

The Simple Regression Model

Similar presentations

Presentation on theme: "The Simple Regression Model"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Simple Regression Model

Similar presentations

Presentation on theme: "The Simple Regression Model"— Presentation transcript:

Similar presentations

About project

Feedback