The Simple Regression Model

Slides:



Advertisements
Similar presentations
Things to do in Lecture 1 Outline basic concepts of causality
Advertisements

Economics 20 - Prof. Anderson1 The Simple Regression Model y =  0 +  1 x + u.
Multiple Regression Analysis
The Simple Regression Model
C 3.7 Use the data in MEAP93.RAW to answer this question
3.3 Omitted Variable Bias -When a valid variable is excluded, we UNDERSPECIFY THE MODEL and OLS estimates are biased -Consider the true population model:
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
Quantitative Methods 2 Lecture 3 The Simple Linear Regression Model Edmund Malesky, Ph.D., UCSD.
(c) 2007 IUPUI SPEA K300 (4392) Outline Least Squares Methods Estimation: Least Squares Interpretation of estimators Properties of OLS estimators Variance.
Part 1 Cross Sectional Data
The Simple Linear Regression Model: Specification and Estimation
Chapter 10 Simple Regression.
9. SIMPLE LINEAR REGESSION AND CORRELATION
2.5 Variances of the OLS Estimators
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
Simple Linear Regression
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
Chapter 4 Multiple Regression.
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
FIN357 Li1 The Simple Regression Model y =  0 +  1 x + u.
The Simple Regression Model
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Chapter 11 Multiple Regression.
The Simple Regression Model
Lecture 1 (Ch1, Ch2) Simple linear regression
Lecture 2 (Ch3) Multiple linear regression
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
FIN357 Li1 The Simple Regression Model y =  0 +  1 x + u.
Economics Prof. Buckles
Chapter 14 Introduction to Linear Regression and Correlation Analysis
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Ordinary Least Squares
Multiple Linear Regression Analysis
3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors.
Introduction to Linear Regression and Correlation Analysis
Hypothesis Testing in Linear Regression Analysis
Chapter 4-5: Analytical Solutions to OLS
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
2.4 Units of Measurement and Functional Form -Two important econometric issues are: 1) Changing measurement -When does scaling variables have an effect.
Lecture 7: What is Regression Analysis? BUEC 333 Summer 2009 Simon Woodcock.
Chapter Three TWO-VARIABLEREGRESSION MODEL: THE PROBLEM OF ESTIMATION
3.4 The Components of the OLS Variances: Multicollinearity We see in (3.51) that the variance of B j hat depends on three factors: σ 2, SST j and R j 2.
Chapter 13 Multiple Regression
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Chapter 8: Simple Linear Regression Yang Zhenlin.
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly Copyright © 2014 by McGraw-Hill Higher Education. All rights.
Review.
Lecture 6 Feb. 2, 2015 ANNOUNCEMENT: Lab session will go from 4:20-5:20 based on the poll. (The majority indicated that it would not be a problem to chance,
The simple linear regression model and parameter estimation
Ch. 2: The Simple Regression Model
Multiple Regression Analysis: Estimation
The Simple Regression Model
Multiple Regression Analysis
Ch. 2: The Simple Regression Model
Chapter 6: MULTIPLE REGRESSION ANALYSIS
The Simple Linear Regression Model: Specification and Estimation
Simple Linear Regression
Seminar in Economics Econ. 470
The Simple Regression Model
The Simple Regression Model
Presentation transcript:

The Simple Regression Model Chapter 2

I. Outline Simple linear regression model-used to explain one variable in terms of another. Model Assumptions OLS estimator-method of estimating effect of one variable on another. Compute estimator Statistical properties: Unbiasedness & Variance Units of measurement

II. Simple Linear Regression (SLR) Basic Idea: y and x are two variables want to explain y in terms of x; how y varies with changes in x y: soybean crop, hourly wage, crime rate x: lbs of fertilizer, years of education, # of police SLR model: y = b0 + b1x + u

II. SLR Terminology (Variables) y = b0 + b1x + u u represents all factors other than x that affect y u is called the: Error term Disturbance Unobserved component (by econometrician) y is called the: Dependent Variable Left-Hand Side Variable Explained Variable Regressand x is called the: Independent Variable Right-Hand Side Variable Explanatory Variable Regressor Covariate Control Variables

II. SLR Terminology (Parameters) y = b0 + b1x + u b0 is the intercept or constant term Basic value for y when x=0 b1 is the slope parameter Measures the relationship between y and x Tells us how y changes when x changes by some amount How to isolate this effect? Δy = b1Δx if Δu=0 ceteris paribus: holding other factors fixed

II. SLR Examples Example 2: Wages Example 1: Soybeans SLR model: y = b0 + b1x + u Example 1: Soybeans y: soybean yield x: fertilizer (lbs) u: land quality, rainfall Δyield=b1Δfertilizer Measures change in yield due to adding another unit of fertilizer…holds all other factors fixed Example 2: Wages y: wage x: education (years) u: innate ability, experience, work ethic Δwage=b1Δeduc Measures change in wage due to attaining another unit of education…holds all other factors fixed.

II. SLR Notes SLR assumes linearity: y = b0 + b1x + u is equation for straight line, where slope is constant A one unit change in x has the same effect on y, regardless of initial value of x Example: 10th to 11th year of school has same impact on wage as going from 11th to 12th….may not be realistic. Will consider more realistic forms later…

II. SLR Assumptions Simplifying Assumption: Mean Zero Error The average value of u, the error term, in the population is 0. Terminology: Expectation is just finding the average E(u) = 0 Ex: average ability is zero, average land quality is zero. This is not a restrictive assumption, since we can always use b0 to normalize E(u) to 0 y = b0 + b1x + u +α0 - α0, where α0 is E(u) y = (α0 + b0 )+ b1x + (u - α0 )….b1 is not affected.

II. SLR Assumptions More Important Assumption: Zero Conditional Mean In order for b1 to estimate only the effect of x on y, we need to make a crucial assumption about how u and x are related. Average value of u does not depend on the value of x. Terminology: Conditioning on a variable w means we use values of w to explain values of z: E(z|w). If w does not tell us anything about z, E(z|w)=E(z). Note: E(z|w)=E(z) implies Cov(z, w)=0 E(u|x)=E(u) Knowing something about x does not give us any information about u…implies x and us are completely unrelated.

II. SLR Assumptions: More Important Assumption: Zero Conditional Mean Example: Wage equation: wage = b0 + b1educ + u let u represent unobserved ability E(u|educ) requires that average ability is the same regardless of years of education E(ability|educ=12)=E(ability|educ=16) How likely is this? Generally think people that choose to get more education are more able. E(ability|educ=12)<E(ability|educ=16).

II. SLR Assumptions Combining the two assumptions: E(u|x) = E(u) = 0 Taking the expectation of both sides of SLR model: E(y|x)=E(b0 + b1x + u |x)…. E(y|x)=E(b0 |x)+ E(b1x |x) + E(u|x)…. E(y|x) = b0 + b1x Called the Population Regression Function Now y is written only in terms of x…. this allows us to identify impact of x on y. Note: Above derivation used multiple properties of E(.), including linear operator, conditioning on a constant, conditioning on variable itself (see Appendix A-C)

III. Ordinary Least Squares (OLS) Basic Idea: Take SLR model and estimate parameters of interest using a sample of data OLS is a method for estimating the parameters Data: Let {(xi,yi): i=1, …,n} denote a random sample of size n from the population Model: For each observation we can write: yi = b0 + b1xi + ui y = b0 + b1x + u (vector notation)

Copyright © 2009 South-Western/Cengage Learning

III. Deriving OLS Estimates To derive the OLS estimates, we need use the SLR assumptions: E(u)=0 and E(u|x)=E(u)….E(u|x)=E(u)=0 Recall, this means Cov(u,x)=0 Covariance is a measure of the linear dependence between two variables Using definition of covariance: Cov(u,x) =0= E(xu)-E(x)E(u) =E(xu) So now we have that: E(u)=0 E(ux)=0

III. Deriving OLS Estimates (continued) We can write our 2 restrictions just in terms of x, y, b0 and b1 , since u = y – b0 – b1x E(u)=0……..E(y – b0 – b1x) = 0 E(xu)=0……E[x(y – b0 – b1x)] = 0 These restrictions are often called moment restrictions or first order conditions. It is important to note that we have 2 equations, 2 unknowns so we have an exactly identified system of equations. OLS finds so that these equations are satisfied. “hats” denote that we are talking about estimates

III. Deriving OLS Estimates (continued) Step 1: We know E(.) is just the mean, so the sample counterparts to the two moment equations are (at the estimated parameters):

III. Deriving OLS Estimates (continued) Step 2: Using the following algebraic properties, and similarly for x Summation is linear operator we can rewrite the first moment as:

III. Deriving OLS Estimates (continued) Step 3: Substituting this into the second moment condition: Note: Dropping n-1doesn’t affect the estimation. Note: Above derivation uses the following properties of summation:

III. Deriving OLS Estimates (continued) Step 4: Solving for the parameter estimate The denominator is non-zero as long as there is at least one xi which differs from the others.

III. Summary of OLS slope estimate The slope estimate is the sample covariance between x and y divided by the sample variance of x. Variance: Measure of spread in the distribution of a random variable Covariance: Measure of linear dependence between two random variables. If x and y are: positively correlated, the slope will be positive negatively correlated, the slope will be negative

III. Deriving OLS Estimates Alternative Approach Intuition on OLS: Are fitting a line through the sample points (xi,yi) Claim: Are defining line of “best fit” such that the sum of squared residuals is as small as possible What is a residual? Residual is the estimate of the error term: Minimization problem:

III. Deriving OLS Estimates Alternative Approach To solve the minimization problem we need to take first order conditions. For each parameter: These first order conditions are the same as the moment conditions, multiplied by n-1 OLS finds the parameters that best solve these equations. Leads to name least squares estimator

Sample OLS Line of Best Fit y . y4 { . y3 } . y2 { . y1 } x1 x2 x3 x4 x

IV. Properties of OLS Algebraic The sum of the OLS residuals is zero: The sample average of the OLS residuals is zero as well: The sample covariance between the regressors and the OLS residuals is zero: The OLS regression line always goes through the mean of the sample:

IV. Properties of OLS Algebraic We can think of each observation yi as being composed of 2 parts: explained and unexplained Can define the following: SST: SSE: SSR: Total variation in y is expressed as the sum of the explained variation plus the unexplained variation. SST=SSE+SSR

IV. Proof that SST = SSE + SSR

IV. Goodness-of-Fit Use these definitions to measure how well our independent variable explains the dependent variable. Compute the fraction of the total sum of squares (SST) that is explained by the model R2 = SSE/SST = 1 – SSR/SST Aka coefficient of determination Measures fraction of variation in y that’s explained by variation in x….between 0 and 1…smaller number indicates poorer fit. Often multiply by 100%.

V. Examples: CEO Salary & Return on Equity Regression specification Model: Salary= b0 + b1 *ROE + u “Regress salary on ROE” Data: Salary is in thousands of $, so that 856.3 means $856,300 ROE is in percentages Parameter: b1 measures the change in annual salary (in thousands of $), when ROE increases by 1% point (one unit)

Copyright © 2009 South-Western/Cengage Learning

V. Examples: CEO Salary & Return on Equity Results: Sample Regression Function: If ROE=0, then predicted salary is 963.51…$963,510 Slope Estimate: If ROE increases by 1% point, then salary is predicted to change by 18.501…$18,500 Linearity imposes that the predicted salary change is 18.501 regardless of initial ROE. ROE=20, then predicted salary is $1,333,215 …in reality, actual salary is $1,145,000. R2 =0.0132 from regression: variation in ROE explains 1.3% of variation in salary

Copyright © 2009 South-Western/Cengage Learning

V. Examples: Wage & Education Voting Outcomes & Expenditure Regression results: Data Wage: $ per hour Educ: years of education Negative wage for person with no education implies regression line does bad job at low levels of educ. Predicted wage for 8 years is $3.42= - 0.90+0.54*8 Increase in education by 1 year (unit) leads to increase in hourly wage by $0.54…increase by 4 years leads to $0.54*4=$2.16 Is it reasonable that each extra year leads to same wage increase? Regression results: Data voteA: % of vote received by A shareA: % of total campaign expenditures accounted for by A. If candidate A’s share of spending increases by 1% point (unit), that candidate will receive 0.464% more votes.

VI. Properties of OLS Estimator Unbiasedness One key statistical property of the OLS estimator is that it gives us unbiased estimates, , of the parameters Unbiased estimates: Intuition: We only have a single sample to estimate , and so the estimates we get may or may not be equal to the true parameter. If we had multiple samples of data and in each, estimate , then the average of all these estimates should be equal to population parameter. There are 4 assumptions that we must make to ensures unbiasedness.

VI. Properties of OLS Unbiasedness SLR.1 Linear in Parameters Assume the population model is linear in parameters as y = b0 + b1x + u I.e. we are estimating b0 and b1, not, say b13 SLR.2 Random Sampling Assume we have a random sample of size n, {(xi, yi): i=1, 2, …, n}, from the population. This allows us to write the sample model yi = b0 + b1xi + ui

VI. Properties of OLS Unbiasedness SLR.3 Sample variation in x There is variation in x across i Var(x) ≠ 0 SLR.4 Zero Conditional Mean Most important for unbiasedness E(u|x) = 0 and thus E(ui|xi) = 0

VI. Properties of OLS Unbiasedness To show unbiasedness, we first rewrite our OLS estimator (recall: (App. A)) Using algebra:

VI. Properties of OLS Unbiasedness We know that: Let:

VI. Properties of OLS Unbiasedness Can do the same for b0 (in text). Unbiasedness is a description of the estimator. In any given sample of data, we may be “near” or “far” from the true parameter (i.e. true effect of x on y) Unbiasedness says that if have many estimates from many different samples, then, their average converges to the true parameter b1. Proof of unbiasedness depends on our 4 assumptions If any assumption fails, then OLS is not necessarily unbiased. Can slack on SLR.1 (out of scope of text) SLR.3 almost always holds

VI. Properties of OLS Unbiasedness SLR. 2 can be relaxed when looking at time series data and panel data (later chapters). For cross-sectional data, assume SLR.2 holds SLR.4 is most “crucial” assumption, and unfortunately the hardest to guarantee. As we saw with unobserved ability, it’s likely that x is correlated with u. This can result in the OLS estimates reporting a spurious or biased estimate of the effect of x on y estimating the effect of unobserved factors on y because they are correlated with x.

VI. Properties of OLS Unbiasedness Example: Student performance and National School Lunch Program (NSLP) Expect that, other factors being equal, a student who receives a free lunch at school will have improved performance Regression: b1 =-0.319, b0 =32.14 Indicates participation has negative effect on achievement Likely that u (school quality, motivation) is correlated with NSLP participation, meaning E(u) is different across participating and non-participating students.

VII. Properties of OLS Variance For a given sample of data, we estimate Even with unbiasedness, know our estimate is not usually equal to the true parameter. Would like to know, on average, how far our estimate is from the true parameter. Variance of an estimator How spread out the distributions of are. Measure of spread is the variance (or it’s square root, standard deviation). Note: If had multiple methods of estimating the parameters, would use this rubric to determine which is the best (i.e. lowest variance)

VII. Properties of OLS Variance To calculate variance of an estimator, we first need to make a simplifying assumption: SLR.5 Homoskedasticity (constant variance) Var(u|x) = s2 Means the error term u has the same variance (spread) given any value of the explanatory variable. Graphically…. Algebra: Var(u|x) = E(u2|x)-[E(u|x)]2 We know E(u|x) = 0, so E(u2|x) = E(u2) = Var(u)=s2 (this is a result of Var(u)=E(u2)-[E(u)]2 and E(u)=0) s2 is also the unconditional variance, called the error variance s, the square root of the error variance is called the standard deviation of the error

Homoskedastic Case y f(y|x) . E(y|x) = b0 + b1x . x1 x2

Heteroskedastic Case f(y|x) y . . E(y|x) = b0 + b1x . x1 x2 x3 x

VII. Properties of OLS Variance People often re-write SLR.4 and SLR.5 as SLR.4: E(u|x)=0 y =b0 + b1x + u….E(y|x)=E(b0 |x)+E(b1x|x)+E(u|x) .…E(y|x)=b0 + b1x SLR.5: Var(u|x) = s2 Similarly, Var(u|x)=Var(y|x) = s2 Assuming homoskedasticity, we can derive an estimator for the variance of the OLS parameter estimates. Heteroskedasticity is more likely, but will ignore for now. This will give us an idea of how precisely the parameter is estimated. Would like small variance, because this means our parameter estimate is more likely to be close to the true value.

VII. Properties of OLS Variance Calculating Variance of Estimator: Properties: The larger the error variance, s2, the larger the variance of the slope estimate…bad thing. The larger the variability in the xi, the smaller the variance of the slope estimate (i.e. easier to pinpoint how y varies with x)…good thing. Consequently, a larger sample size should decrease the variance of the slope estimate

VII. Properties of OLS Variance Calculating Error Variance: Recall s2 = E(u2) = Var(u) Problem: We don’t know what the error variance, s2, is because we don’t observe the errors ui. What we observe are the residuals, ûi We can use the residuals to form an estimate of the error variance.

VII. Properties of OLS Variance Then, an unbiased estimator of s2 =E(u2) is: We generally look at the spread of an estimator in terms of the standard error (estimate of standard deviation), which is the square root of the variance. standard deviation:

VIII. Units of Measurement and Functional Form We are essentially always trying to estimate the impact of x on y. The units for our variables will qualitatively affect how we interpret the estimates…but, the punchline is the same. Example: CEO Salary and ROE Model: Salary= b0 + b1 *ROE + u Data: Salary is measured in thousands of $, so that 856.3 means $856,300 ROE is in %, so a one unit of change is 1% Results: That is, when ROE increases by 1%, salary is predicted to increase by 18.501 or $18,501

VIII. Units of Measurement and Functional Form Rule # 1: If dependent variable is multiplied by a constant c, then OLS intercept and slope estimates are also multiplied by c. Rule # 2: If independent variable is divided (multiplied) by some non zero constant, c, then the OLS slope coefficient is multiplied (divided) by c. The intercept is not affected. Suppose ROE now measured as decimal…0.01 Results: When ROE increases by one unit (units are in decimals), this means ROE changes by 0.01=1% : That is, when ROE changes by 0.01, salary is predicted to increase by 1,850.1*0.01=18.501. Since salary is measured in thousands of $, this is a $18,501 increase.

VIII. Units of Measurement and Functional Form Can incorporate nonlinearities in the variables to make our estimation more realistic. Wage Example Estimate: Restricts each increase in a year of education to have the same affect as the previous increase (10th to 11th, 11th to 12th both yield $0.54 increase). This is unrealistic, as the 12th year culminates in a high school degree, and is likely rewarded in the labor market.

VIII. Units of Measurement and Functional Form An improvement would be to say that wage increases by a constant percentage at each additional year of education. Allows for the monetary impact of 10 to 11 to be different from 11 to 12, although the % increase is the same. Model: Log(wage)= = b0 + b1 *educ + u Using this form implies an increasing return to education

Copyright © 2009 South-Western/Cengage Learning

VIII. Units of Measurement and Functional Form Estimate: Log(wage)= b0 + b1 *educ + u Results: Standard to multiply b1 *100% to get the percentage change in wage given one additional unit (year) of schooling. An extra year of education results in a 8.3% increase in predicted wage.

VIII. Units of Measurement and Functional Form What if our LHS and RHS are logs? Called Constant elasticity model Estimate: Log(wage)= b0 + b1 log(sales)+ u Wage in $, sales in millions of $ b1 estimates the elasticity of salary with respect to sales Result: Implies a 1% increase in firm sales increases salary by 0.257%

Copyright © 2009 South-Western/Cengage Learning