Modeling a Linear Relationship Lecture 44 Secs. 13.1 – 13.3.1 Tue, Apr 24, 2007.

Modeling a Linear Relationship Lecture 44 Secs. 13.1 – 13.3.1 Tue, Apr 24, 2007

Bivariate Data Data is called bivariate if each observations consists of a pair of values (x, y). x is the explanatory variable. y is the response variable. x is also called the independent variable. y is also called the dependent variable.

Scatterplots Scatterplot – A display in which each observation (x, y) is plotted as a point in the xy plane.

Example Draw a scatterplot of the following data of calories vs. cholesterol in Subway sandwiches. Calories (x)350290330290320370280290310230 Cholesterol (y)5020451535502025200

Example 200250300350400 Calories Cholesterol 0 10 20 30 50 40

Example Does there appear to be a relationship? How can we tell?

TI-83 - Scatterplots To set up a scatterplot,  Enter the x values in L 1.  Enter the y values in L 2.  Press 2 nd STAT PLOT.  Select Plot1 and press ENTER.

TI-83 - Scatterplots The Stat Plot display appears.  Select On and press ENTER.  Under Type, select the first icon (a small image of a scatterplot) and press ENTER.  For XList, enter L 1.  For YList, enter L 2.  For Mark, select the one you want and press ENTER.

TI-83 - Scatterplots To draw the scatterplot,  Press ZOOM. The Zoom menu appears.  Select ZoomStat (#9) and press ENTER. The scatterplot appears.  Press TRACE and use the arrow keys to inspect the individual points.

Describing a Linear Relationship How would we describe this relationship? 200250300350400 Calories Cholesterol 0 10 20 30 50 40

Linear Association Draw (or imagine) an oval around the data set. If the oval is tilted, then there is some linear association. If the oval is tilted upwards from left to right, then there is positive association. If the oval is tilted downwards from left to right, then there is negative association. If the oval is not tilted at all, then there is no association.

Positive Linear Association x y

Negative Linear Association x y

No Linear Association x y

Strong vs. Weak Association The association is strong if the oval is narrow. The association is weak if the oval is wide.

Strong Positive Linear Association x y

Weak Positive Linear Association x y

Example 200250300350400 Calories Cholesterol 0 10 20 30 50 40

Describing the Relationship 200250300350400 Calories Cholesterol 0 10 20 30 50 40

Describing the Relationship There appears to be a strong positive linear association between calories and cholesterol in Subway sandwiches.

Example Draw a scatterplot of the following data. xy 23 35 59 612 916

Simple Linear Regression To quantify the linear relationship between x and y, we wish to find the equation of the line that “best” fits the data. Typically, there will be many lines that all look pretty good. How do we measure how well a line fits the data?

Measuring the Goodness of Fit Which line better fits the data? x y

Measuring the Goodness of Fit Start with the scatterplot. x y

Measuring the Goodness of Fit Draw any line through the scatterplot. x y

Measuring the Goodness of Fit Measure the vertical distances from every point to the line x y

Measuring the Goodness of Fit Each of these represents a deviation, called a residual, from the line. x y e

Residuals The i th residual – The difference between the observed value of y i and the predicted, or expected, value of y i. Use y i ^ for the predicted y i. The formula for the i th residual is

Residuals Notice that the residual is positive if the data point is above the line and it is negative if the data point is below the line.

Measuring the Goodness of Fit The i th residual. x y eiei xixi yi^yi^ yiyi

Measuring the Goodness of Fit Find the sum of the squared residuals. x y eiei xixi yi^yi^ yiyi

Measuring the Goodness of Fit The smaller the sum of squared residuals, the better the fit. x y eiei xixi yi^yi^ yiyi

Example Consider the data points xy 23 35 59 612 916

Example 2 3456789 5 10 15

Least Squares Line Let’s see how good the fit is for the line y ^ = -1 + 2x, where y ^ represents the predicted value of y, not the observed value.

Sum of Squared Residuals Begin with the data set. xy 23 35 59 612 916

Sum of Squared Residuals Compute the predicted y, using y ^ = -1 + 2x. xyy^y^ 233 355 599 61211 91617

Sum of Squared Residuals Compute the residuals, y – y ^. xyy^y^ y – y ^ 2330 3550 5990 612111 91617

Sum of Squared Residuals Square the residuals. xyy^y^ y – y ^ (y – y ^ ) 2 23300 35500 59900 6121111 916171

Sum of Squared Residuals Find the sum of the squared residuals. xyy^y^ y – y ^ (y – y ^ ) 2 23300 35500 59900 6121111 916171  SSE =  (y – y ^ ) 2 = 2.00

Least Squares Line Least squares line – The line for which the sum of the squares of the residuals is as small as possible. The least squares line is also called the line of best fit or the regression line.

Regression Line We will write regression line as  a is the y-intercept.  b is the slope. This is the usual slope-intercept form with the two terms rearranged and relabeled.

TI-83 – Computing Residuals It is not hard to compute the residuals and the sum of their squares on the TI-83. (Later, we will see a faster method.)  Enter the x-values in list L 1 and the y-values in list L 2.  Compute a + b*L 1 and store in list L 3 (y ^ values).  Compute (L 2 – L 3 ) 2. This is a list of the squared residuals.  Compute sum(Ans). This is the sum of the squared residuals.

Sum of Squared Residuals Now let’s see how good the fit is for the line y ^ = -0.5 + 1.9x. We will compute the sum of squared residuals, SSE.

Sum of Squared Residuals Begin with the data set. xy 23 35 59 612 916

Sum of Squared Residuals Compute the predicted y, using y ^ = -0.5 + 1.9x. xyy^y^ 233.3 355.2 599.0 61210.9 91616.6

Sum of Squared Residuals Compute the residuals, y – y ^. xyy^y^ y – y ^ 233.3-0.3 355.2-0.2 599.00.0 61210.91.1 91616.6-0.6

Sum of Squared Residuals Compute the squared residuals. xyy^y^ y – y ^ (y – y ^ ) 2 233.3-0.30.09 355.2-0.20.04 599.00.00.00 61210.91.11.21 91616.6-0.60.36

Sum of Squared Residuals Find the sum of the squared residuals. xyy^y^ y – y ^ (y – y ^ ) 2 233.3-0.30.09 355.2-0.20.04 599.00.00.00 61210.91.11.21 91616.6-0.60.36  SSE =  (y – y ^ ) 2 = 1.70

Sum of Squared Residuals We conclude that y ^ = -0.5 + 1.9x is a better fit than y ^ = -1 + 2x. Is it the best fit?

Sum of Squared Residuals 2 3456789 5 10 15 y ^ = -1 + 2x

Sum of Squared Residuals 2 3456789 5 10 15 y ^ = -0.5 + 1.9x

Example For all the lines that one could draw through this data set, it turns out that 1.70 is the smallest possible value for the sum of the squares of the residuals. xy 23 35 59 612 916

Example Therefore, y ^ = -0.5 + 1.9x is the regression line for this data set.

Prediction Use the regression line to predict y when  x = 4  x = 7  x = 20 Interpolation – Using an x value within the observed extremes of x values to predict y. Extrapolation – Using an x value beyond the observed extremes of x values to predict y.

Interpolation vs. Extrapolation Interpolated values are more reliable then extrapolated values. The farther out the values are extrapolated, the less reliable they are.

Modeling a Linear Relationship Lecture 44 Secs. 13.1 – 13.3.1 Tue, Apr 24, 2007.

Similar presentations

Presentation on theme: "Modeling a Linear Relationship Lecture 44 Secs. 13.1 – 13.3.1 Tue, Apr 24, 2007."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Modeling a Linear Relationship Lecture 44 Secs. 13.1 – 13.3.1 Tue, Apr 24, 2007.

Similar presentations

Presentation on theme: "Modeling a Linear Relationship Lecture 44 Secs. 13.1 – 13.3.1 Tue, Apr 24, 2007."— Presentation transcript:

Similar presentations

About project

Feedback