Presentation is loading. Please wait.

Presentation is loading. Please wait.

Simple Linear Regression

Similar presentations


Presentation on theme: "Simple Linear Regression"— Presentation transcript:

1 Simple Linear Regression
and Correlation

2 Correlation question:
From 1983 to 2001 in the state of Tennessee, were motor gasoline consumption and ethanol consumption significantly related to each other? In a correlation problem, one is interested in measuring the strength of the relationship between variables.

3 Regression question: From 1983 to 2001 in the state of Tennessee, could the ethanol consumption in one year have been used to predict motor gasoline consumption in the following year? In a regression problem, one is interested in predicting one variable (called the dependent variable) based on another variable (called the independent variable).

4 Simple Linear Regression
The Key Word Simple Linear Regression and Correlation

5 Simple Linear Regression
A Straight Line Simple Linear Regression and Correlation

6 What is the equation for a straight line?
Do you recall ? x is the independent variable, and y is the dependent variable. What is ? Answer: the slope What is ? Answer: the y-intercept In the text, the equation is given by:

7 The General Simple Linear Regression Problem
Given a random sample of the related x and y values, find the value of the slope and the value of the y-intercept that yields the “best” fit to these points.

8 Visually Y Given a random sample of the related x and y values, find the value of the slope and the value of the y-intercept that yields the “best” fit to these points. What does “best” mean? By “best” we mean the smallest error in prediction. X

9 } Error Defined Y If one picks an arbitrary point
in the random sample, (Xi, Yi), how “far” is the point from the line: ? Yi is the actual y value. Error = } is the predicted y-value. (value on the line) By “best” we mean the smallest error in prediction. The error is the difference between Yi and . X

10 } } { General Problem Restated Y Given a random sample of the related
x and y values, find the value of the slope and the value of the y-intercept that yields the smallest error over all the sample. } { { Error = } { What would you want ? Unfortunately, there are an infinite number of lines possessing this property. Any line that passes through the point, , will have this property, because it is a property of the mean. } { { The errors for the points above the line should balance the errors for the points below the line, resulting in a sum of zero. } X

11 } } { General Problem Restated in terms of Least Squares Y
Given a random sample of the related x and y values, find the value of the slope and the value of the y-intercept that yields the smallest sum of the squares of the errors (SSE) over all the sample. } { { Error = } { } { { } Find the value of b0 and the value of b1 that will minimize where X

12 } } { Solution of the Least Squares Problem Y
Find the value of b0 and the value of b1 that will minimize where } { { Error = } Noting that SSE is a function of two variables, we can restate the problem once again. { } { { } X

13 } } { Solution of the Least Squares Problem Y
Find the value of b0 and the value of b1 that will minimize f(b0, b1) = } { Finding the values of variables that will maximize/minimize a function is a calculus problem. Because calculus is not a prerequisite to this course, the details are omitted, but the process results in two equations and two unknowns. { Error = } { } { { } X

14 The Normal Equations matrix form algebraic form or There are many ways to solve a system of two equations and two unknowns. If you have a favorite, feel free to use it. Two relationships that I expect you to know are: and

15 The Normal Equations matrix form algebraic form or There are many ways to solve a system of two equations and two unknowns. If you have a favorite, feel free to use it. Two relationships that I expect you to know are: and Now the specifics are introduced with an example.

16 The Random Sample

17 Generate Graph First, graph the data. The scatter plot of the data may
indicate that a linear model is totally inappropriate and a waste of time. The following three slides give some examples of nonlinear patterns. Following the nonlinear examples, the graph of the data in the random sample is constructed.

18 Example of a Nonlinear Pattern

19 Example of a Nonlinear Pattern

20 Example of a Nonlinear Pattern

21 Example of a Nonlinear Pattern
Transformed to a linear pattern

22 The Scatter Graph H2O Consumption Number of Commercials (14, 10000)
(13, 10000) H2O Consumption (10, 8000) (12, 9000) (11, 8000) (10, 7000) ( 8, 5000) ( 7, 5000) ( 7, 4000) ( 8, 4000) Number of Commercials

23 The Scatter Graph H2O Consumption Number of Commercials
Find the slope and the y-intercept of the line that is the “best” fit to these points. H2O Consumption Number of Commercials

24 (with “guesstimated” line)
The Scatter Graph (with “guesstimated” line) Find the slope and the y-intercept of the line that is the “best” fit to these points. H2O Consumption Number of Commercials

25 The Initial Calculations

26 Some Basic Formulas

27 X = Number of Commercials;
Y = Water Consumption (gallons)

28 Interpretation of the Slope and the Y-intercept
X = Number of Commercials; Y = Water Consumption (gallons) Interpret the slope. (What does the slope mean in terms of the problem?) For each additional commercial, we expect the water consumption to increase by gallons. Interpret the y-intercept. (What does the y-intercept mean in terms of the problem?) If there are no commercials, we expect the water consumption to be a negative 2, gallons. ?????????? Think about it. ??????????

29 If the water consumption is a negative 2,107.14 gallons, which way is
Reservoir If the water consumption is a negative 2,107.14 gallons, which way is the water flowing in the pipe from the reservoir to the city? We know that the water does not flow back into the reservoir. Welcome to Mulvany, Tennessee Does this result mean that the regression model is worthless? City Water Plant Sensor line River

30 Interpolation versus Extrapolation
smallest X largest X smallest X

31 Interpolation Interpolation versus Extrapolation largest X 14 H2O
Between the smallest (7) and the largest (14) values of X used to compute the sample regression model, we may interpolate with statistical significance. (14, 10000) largest X 14 H2O Consumption Interpolation To determine if the model has statistical significance, we still have to perform some more calculations. ( 7, 5000) ( 7, 4000) smallest X 7 Extrapolation Extrapolation Relevant Range Number of Commercials

32 Calculation of SSE by Definition

33 Calculation of SSE by Definition
First, you insert the Xi values into the sample regression equation to calculate the predicted values.

34 Calculation of SSE by Definition
First, you insert the Xi values into the sample regression equation to calculate the predicted values. Second, you calculate the deviations of the points from the line.

35 Calculation of SSE by Definition
First, you insert the Xi values into the sample regression equation to calculate the predicted values. Second, you calculate the deviations of the points from the line. Finally, you calculate the squares of the deviations of the points from the line and sum them to obtain SSE.

36 Calculation of SSE by “Backing” into it
= variation explained by regression variation not explained by regression +

37 Calculation of SSE by “Backing” into it
= variation explained by regression variation not explained by regression + Therefore,

38 Calculation of SSE by “Backing” into it
= variation explained by regression variation not explained by regression + Therefore, However,

39 Calculation of SSE by “Backing” into it
= variation explained by regression variation not explained by regression + Therefore, However, and Hence,

40 Calculation of the Standard Error of the Estimate
= error variance = = standard error of the estimate = = gallons Interpretation: The “typical” error made when predicting the number of gallons of water consumed based on the number of commercials is about gallons.

41 The Question At the .05 level of significance, is there evidence
that a linear relationship exists between the number of commercials and water consumption? We have almost enough calculated to be able to answer the question. just one more

42 Calculation of the Standard Error of the Slope
(also called the standard error of the regression coefficient, b1)

43 Test Statistics for Regression
Now, what is ? Well, that’s another story. or

44 The End


Download ppt "Simple Linear Regression"

Similar presentations


Ads by Google