Download presentation

1
**Simple Linear Regression**

Chapter 4 Simple Linear Regression

2
Learning Objectives Understand the goals of simple linear regression analysis Consider what the error term contains Define the population regression model and the sample regression function Estimate the sample regression function Interpret the estimated sample regression function Predict outcomes based on our estimated sample regression function Assess the goodness-of-fit of the estimated sample regression function Understand how to read regression output in Excel Understand the difference between correlation and causation

4
**Understand the Goals of Simple Linear Regression Analysis**

Regression analysis is used to: Obtain the marginal effect that a one-unit change in the independent variable has on the dependent variable Predict the value of a dependent variable based on the value of the independent variable Dependent or explanatory variable: the variable we wish to explain Independent variable: the variable used to explain the dependent variable

5
**Simple Linear Regression Model**

The term simple refers to that there is only one independent variable, x, Relationship between x and y is described by a linear function Regression refers to the manner the relationship is estimated Changes in y are assumed to be caused by changes in x (although this is not typically the case)

6
**Types of Regression Models**

Positive Linear Relationship Relationship NOT Linear Negative Linear Relationship No Relationship

7
**Population Linear Regression Model**

The population regression model: Random Error term, or residual Population Slope Coefficient Population y intercept Independent Variable Dependent Variable Linear component Random Error component

8
**Population Linear Regression**

y Observed Value of y for xi εi Slope = β1 Predicted Value of y for xi Random Error for this x value Intercept = β0 xi x

9
**Consider What the Random Error Component, ε, Contains**

Omitted Variables – independent variables that are related to the dependent variable, y, but are not in the regression model (i.e. they are omitted). Measurement Error – the difference between the measured value of the observation and the true value. This can occur if there is a data entry error or if a person, firm, etc. does not know the true value and instead reports an incorrect value.

10
**Consider What the Random Error Component, ε, Contains**

Incorrect Functional Form – the wrong model is fit to the data. For example, a linear function is fit between y and x but the true relationship is quadratic. Random Component – the variable being studies is inherently random. Even if two people have the same number of years of education, they may earn different salaries due to random factors aside from the omitted factors listed above.

11
**Estimated Regression Function**

The sample regression line provides an estimate of the population regression line Estimated (or predicted) y value Estimate of the regression intercept Estimate of the regression slope Independent variable

12
What is a Residual? A residual is the difference between the observed value of y and the predicted value of y. It is an estimate of the error term, ε, that resides in the population while the residual is from the sample.

13
**Graph of the Sample Regression Function**

14
**Graph of Predictions and Residuals for Multiple Observations**

15
**Estimate the Sample Regression Function**

and are obtained by minimizing the sum of the squared residuals with respect to and

16
**The Least Squares Equation**

The formulas for and are: and

17
**Interpretation of the Slope and the Intercept**

is, on average, the estimated value of y when x is equal to zero is, on average, the estimated change in the value of y as a result of a one-unit change in x

18
**Salary (y) vs. Education (x) Example in salary.xls**

19
Example continued

20
**A Graphical Representation of the Estimated Regression Line**

21
**Using Excel to Compute the Estimated Regression Equation in a Scatter Plot**

Create a scatter diagram in Excel Position the mouse over any data point and right click Select Add Trendline option When the Add Trendline dialog box appears: On the Type tab select Linear (it is the default) On the Options tab select the Display equation on chart box (note the equation is displayed with the slope first and the intercept second) Click OK

22
**Interpret the Estimated Sample Regression Function**

: On average, if education goes up by one year then salary will go up by $11, : On average, if an individual has 0 years of education then their estimated salary is $-121, (this estimate is obviously ridiculous)

23
**Predict Outcomes Based on our Estimated Sample Regression Function**

Say we want to predict salary for a person with 12 years of education. We would put this value of x into the sample regression function as We predict a salary of $13, for a person with 12 years of education.

24
**Assess the Goodness-of-Fit of the Estimated Regression Function**

Goodness-of-Fit is how well the regression model describes the observed data. Two measures of goodness-of-fit R-squared Standard error of the regression

25
**Comparing the Goodness-of-Fit of Two Hypothetical Data Sets**

26
**A Venn Diagram Demonstrating Joint Variation between y and x**

27
**The Sample Regression Function Explains None of the Variation in y**

28
**The Sample Regression Function Explains All of the Variation in y**

29
**The Sample Regression Function Explains All of the Variation in y**

30
**Explained and Unexplained Variation**

Total variation in the dependent variable is made up of two parts: Total sum of Squares Explained Sum of Squares Unexplained Sum of Squares where: = Average value of the dependent variable y = Observed values of the dependent variable = Estimated value of y for the given x value

31
**Explained and Unexplained Variation**

SST = total sum of squares Measures the total variation of the yi values around the mean of y. This is the numerator of the variance of y ESS = explained sum of squares Variation in y attributable to the portion of the dependent variable y that is explained by the independent variable x USS = unexplained sum of squares Variation in y attributable to factors other than the relationship between x and y

32
**Explained and Unexplained Variation**

y yi y USS = (yi - yi )2 _ TSS = (yi - y)2 _ y ESS = (yi - y)2 _ _ y y x Xi

33
**Coefficient of Determination, R2**

The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable The coefficient of determination is also called R-squared and is denoted as R2 where

34
**Coefficient of Determination, R2**

Note: In simple linear regression, R2 is equal to the correlation coefficient squared where: R2 = Coefficient of determination = correlation coefficient between x and y

35
**How are the Correlation Coefficient and the Coefficient of Determination Related?**

R2 = rxy2 Note that this relationship only occurs with simple linear regression.

36
**What is the Intuition Behind This Relationship?**

In the case of linear relationship between two variables, both the coefficient of determination and the sample correlation coefficient provide measures of the strength of relationship. The coefficient of determination provides a measure between 0 and 1 whereas the correlation coefficient provides a measure between -1 and 1. The coefficient of determination can be used for nonlinear relationships and for relationships that have two or more independent variables. Why might the correlation coefficient be preferred to the coefficient of determination?

37
**Examples of Approximate R2 Values**

y R2 = 1 Perfect linear relationship between x and y: 100% of the variation in y is explained by variation in x Note that the R2 is positive even if the line has a negative slope x R2 = 1 y x R2 = +1

38
**Examples of Approximate R2 Values**

y 0 < R2 < 1 Weaker linear relationship between x and y: Some but not all of the variation in y is explained by variation in x x y x

39
**Examples of Approximate R2 Values**

y No linear relationship between x and y: The value of Y does not depend on x. (None of the variation in y is explained by variation in x) x R2 = 0

40
What does R2 mean? R2 means that R2*100% of the variation in y is explained by x. For example if R2=.85 we would say that 85% of the variation in y is explained by x.

41
**Calculating R2 for the salary.xls example**

This says 63.73% of the variation in salary is explained by education

42
**Using Excel to Compute the Coefficient of Determination**

Position the mouse pointer over any data point in the scatter diagram and right click to display the chart menu. Select Add Trendline option When the Add Trendline dialog box occurs: On the Options tab display the R-squared value on the chart box and click OK.

43
**The Standard Error of the Estimated Sample Regression Function**

The standard error of the regression function measures, on average, how far the points fall away from the regression line. where k = the number of explanatory variables. In simple linear regression k = 1.

44
**Calculation of the Standard Error for the salary.xls Example**

45
**Reading Regression Output in Excel: Intercept and Slope**

46
**Reading Regression Output in Excel: R2**

63.73% of the variation in salary is explained by the variation in education Explained Unexplained Total

47
**Reading Regression Output in Excel: Standard Error**

Explained Unexplained Total

48
**Excel’s Regression Tool**

Select the Tools menu Choose the Data Analysis option Choose Regression from the list of Analysis Tools Input y into the Input Y Range Input x into the Input X Range Select Labels Select Output Range in the sheet Click OK

49
**Understand the Difference between Correlation and Causation**

Correlation is when there is a linear relationship between two random variables. Causation occurs between two random variables when changes in one variable (say x) causes changes in another variable (say y) Spurious correlation occurs when there is correlation between two random variables that results from a relationship from a third random variable

50
**Understand the Difference between Correlation and Causation**

Just because there is correlation between two random variables it does not mean causation. Examples: The more firemen at a fire is linked to increased monetary damages from the fire. The number of shark attacks and ice cream sales are positively related. Students who are tutored tend to get worse grades than children that are tutored. See Google correlate for more real world examples of this phenomenon.

Similar presentations

© 2024 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google