Download presentation
Presentation is loading. Please wait.
Published byMarybeth Cole Modified over 9 years ago
2
Chapters 8, 9, 10 Least Squares Regression Line Fitting a Line to Bivariate Data
3
Suppose there is a relationship between two numerical variables. Data: (x 1, y 1 ), (x 2, y 2 ), …, (x n, y n ) Let x be the amount spent on advertising and y be the amount of sales for the product during a given period. You might want to predict product sales for a month (y) when the amount spent on advertizing is $10,000 (x). The letter y is used to denoted the variable you want to predict, called the response variable. The other variable, denoted by x, is the explanatory variable.
4
Simplest Relationship Simplest equation that describes the dependence of variable y on variable x y = b 0 + b 1 x linear equation b 1 is the slope –it is the amount by which y changes when x increases by 1 unit y-intercept b 0 –where the line crosses the y-axis; that is, the value of y when x = 0.
5
Graph is a line y x0 b0b0 y=b 0 +b 1 x run rise Slope b=rise/run
6
How do you find an appropriate line for describing a bivariate data set? y = 10 + 2x y = 4 + 2.5x Let’s look at only the blue line. To assess the fit of a line, we look at how the points deviate vertically from the line. What is the meaning of a negative deviation? The point (15,44) has a deviation of +4. To assess the fit of a line, we need a way to combine the n deviations into a single measure of fit.
7
The deviations are referred to as residuals and denoted e i.
8
Residuals: graphically
9
8 The Least Squares (Regression) Line A good line is one that minimizes the sum of squared differences between the points and the line.
10
The Least Squares (Regression) Line 9 3 3 4 1 1 4 (1,2) 2 2 (2,4) (3,1.5) Sum of squared differences =(2 - 1) 2 +(4 - 2) 2 +(1.5 - 3) 2 + (4,3.2) (3.2 - 4) 2 = 6.89 Sum of squared differences =(2 -2.5) 2 +(4 - 2.5) 2 +(1.5 - 2.5) 2 +(3.2 - 2.5) 2 = 3.99 2.5 Let us compare two lines The second line is horizontal The smaller the sum of squared differences the better the fit of the line to the data.
11
Criterion for choosing what line to draw: method of least squares The method of least squares chooses the line that makes the sum of squares of the residuals as small as possible This line has slope b 1 and intercept b 0 that minimizes
12
Least Squares Line y = b 0 + b 1 x: Slope b 1 and Intercept b 0
13
Scatterplot with least squares prediction line (x i, y i ): (3.4, 5.5) (3.8, 5.9) (4.1, 6.5) (2.2, 3.3) (2.6, 3.6) (2.9, 4.6) (2, 2.9) (2.7, 3.6) (1.9, 3.1) (3.4, 4.9)
14
Observed y, Predicted y predicted y when x=2.7 = b 0 + b 1 x = b 0 + b 1 *2.7 2.7
15
Car Weight, Fuel Consumption Example, cont. (x i, y i ): (3.4, 5.5) (3.8, 5.9) (4.1, 6.5) (2.2, 3.3) (2.6, 3.6) (2.9, 4.6) (2, 2.9) (2.7, 3.6) (1.9, 3.1) (3.4, 4.9)
16
Wt (x) Fuel (y) 3.45.5.5.251.111.231.555 3.85.9.9.811.512.28011.359 4.16.51.21.442.114.45212.532 2.23.3-.7.49-1.091.1881.763 2.63.6-.3.09-.79.6241.237 2.94.600.21.04410 2.02.9-.9.81-1.492.22011.341 2.73.6-.2.04-.79.6241.158 1.93.11-1.291.66411.29 3.44.9.5.25.51.2601.255 2943.905.18014.5898.49 col. sum
17
Calculations
18
Scatterplot with least squares prediction line
19
The Least Squares Line Always goes Through ( x, y ) (x, y ) = (2.9, 4.39)
20
Using the least squares line for prediction. Fuel consumption of 3,000 lb car? (x=3)
21
Be Careful! Fuel consumption of 500 lb car? (x =.5) x =.5 is outside the range of the x-data that we used to determine the least squares line
22
Avoid GIGO! Evaluating the least squares line 1.Create scatterplot. Approximately linear? 2.Calculate r 2, the square of the correlation coefficient 3.Examine residual plot
23
r 2 : The Variation Accounted For The square of the correlation coefficient r gives important information about the usefulness of the least squares line
24
r 2 : important information for evaluating the usefulness of the least squares line The square of the correlation coefficient, r 2, is the fraction of the variation in y that is explained by the least squares regression of y on x. -1 ≤ r ≤ 1 implies 0 ≤ r 2 ≤ 1 The square of the correlation coefficient, r 2, is the fraction of the variation in y that is explained by differences in x.
25
March Madness: S(k) Sagarin rating of k th seeded team; Y ij =Vegas point spread between seed i and seed j, i<j 94.8% of the variation in point spreads is explained by the variation in Sagarin ratings.
26
SAT scores: result r 2 = (-.86845) 2 =.7542 Approx. 75.4% of the variation in mean SAT math scores is explained by differences in the percent of seniors taking the SAT.
27
Avoid GIGO! Evaluating the least squares line 1.Create scatterplot. Approximately linear? 2.Calculate r 2, the square of the correlation coefficient 3.Examine residual plot
28
Residuals residual=observed y - predicted y = y - y Properties of residuals 1.The residuals always sum to 0 (therefore the mean of the residuals is 0) 2.The least squares line always goes through the point (x, y)
29
Graphically residual = y - y y y i y i e i =y i - y i X x i
30
Residual plots A residual plot is a scatterplot of the (x, residual) pairs. Residuals can also be graphed against the predicted y-values We make a scatterplot of the residuals in the hope of finding…NOTHING! Isolated points or a pattern of points in the residual plot indicate potential problems. A careful look at the residuals can reveal many potential problems. A residual plot is a graph of the residuals.
31
Car weight, fuel consumption, continued Weight(x)Fuel Consumption (y) 3.45.5 3.85.9 4.16.5 2.23.3 2.63.6 2.94.6 2.02.9 2.73.6 1.93.1 3.44.9 Predicted Fuel Consump. Residual 5.210.29 5.870.03 6.360.14 3.240.06 3.90-0.30 4.390.21 2.91-0.01 4.06-0.46 2.750.35 5.21-0.31 Plot the residuals against the Weight (x)
32
Residual 0.29 0.03 0.14 0.06 -0.30 0.21 -0.01 -0.46 0.35 -0.31 Weight(x) 3.4 3.8 4.1 2.2 2.6 2.9 2.0 2.7 1.9 3.4
33
Residuals: Sagarin Ratings and Point Spreads YijPredicted YijResiduals 2023.48573586-3.485735859 2421.37177342.628226598 1813.967191394.032808608 1111.52185104-0.521851036 65.7741585190.225841481 8.57.6138771980.886122802 41.6833554952.316644505 42.1861357551.813864245 2827.268014630.731985367 1615.532666290.467333708 11.510.561997810.938002187 1210.116351671.883648327 45.397073324-1.397073324 76.8368531590.163146841 -1.51.500526309-3.000526309 21.9461724490.053827551 YijPredicted YijResiduals 2523.588577281.411422725 18.518.343665020.156334982 10.512.85878945-2.358789455 11.510.950509830.549490168 4.52.5975014221.902498578 56.631170326-1.631170326 43.2031230990.796876901 -3.50.095026946-3.595026946 2324.15991848-1.15991848 20.521.24607834-0.746078337 1820.0919691-2.091969104 10.511.62469245-1.124692453 96.8368531592.163146841 75.9798413531.020158647 23.283110867-1.283110867 56.745438567-1.745438567
34
Plot of Sagarin Residuals Good!
35
Linear Relationship?
36
Garbage In Garbage Out
37
Residual Plot – Clue to GIGO
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.