Download presentation
Presentation is loading. Please wait.
Published byJonas Potter Modified over 9 years ago
2
Least Squares Regression Fitting a Line to Bivariate Data
3
Automating Least Squares Line and Related Calculations n Excel: In text: see p. 200-201 Excel has extensive capabilities related to least squares lines; in the Excel help search line type terms such as: slope, intercept, trendline, and regression for more information. n Statcrunch In the left panel of our class webpage http://www.stat.ncsu.edu/people/reiland/courses/st311/ click on Student Resources, in “Statcrunch Instructional Videos” see “Scatterplots and Regression”; in “Many Statcrunch Instructional Videos” see videos 16, 19, 24, 48, and 48 (these numbers may change as more videos are added to this YouTube site). http://www.stat.ncsu.edu/people/reiland/courses/st311/ n TI calculator: In the left panel of our class webpage click on Student Resources; under “Graphing Calculators, Online Calculations”, either click on TI Graphing Calculator Guide and see p. 7-9, or click on Online Graphing Calculator Tutorials
4
Linear Relationships Avg. occupants per car n 1980: 6/car n 1990: 3/car n 2000: 1.5/car n By the year 2010 every fourth car will have nobody in it! Food for Thought n Kind of mathematical relationship between year and avg. no. of occupants per car? n Why might relation- ship break down by 2010?
5
Basic Terminology n Scatterplots, correlation: interested in association between 2 variables (assign x and y arbitrarily) n Least squares regression: does one quantitative variable explain or cause changes in another variable?
6
Basic Terminology (cont.) n Explanatory variable: explains or causes changes in the other variable; the x variable. (independent variable) n Response variable: the y -variable; it responds to changes in the x - variable. (dependent variable)
7
Examples n Fertilizer (x ) corn yield (y ) n Advertising $ (x ) store income (y ) n Drug dose (x ) blood pressure (y ) n Daily temperature (x ) natural gas demand (y ) n change in min wage(x) unemployment rate (y)
8
Simplest Relationship n Simplest equation that describes the dependence of variable y on variable x y = b 0 + b 1 x n linear equation n graph is line with slope b 1 and y- intercept b 0
9
Graph y x0 b0b0 y=b 0 +b 1 x run rise Slope b=rise/run
10
Notation n (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) n draw the line y= b 0 + b 1 x through the scatterplot, the point on the line corresponding to x i is
11
Observed y, Predicted y predicted y when x=2.7 yhat = a + bx = a + b*2.7 2.7
12
Scatterplot: Fuel Consumption vs Car Weight “Best” line?
13
Scatterplot with least squares prediction line
14
How do we draw the line? Residuals
15
Residuals: graphically
16
Criterion for choosing what line to draw: method of least squares n The method of least squares chooses the line that makes the sum of squares of the residuals as small as possible n This line has slope b 1 and intercept b 0 that minimizes
17
Least Squares Line y = b 0 + b 1 x: Slope b 1 and Intercept b 0
18
Example: Income vs Consumption Expenditure
19
Questions n Construct scatterplot; determine if linear model is appropriate. If so … n … find the least squares prediction line n Estimate consumption expenditure in a household with an income of (i) $6,000 (ii) $25,000. Comfortable with estimates? n Compute the residuals
20
Scatterplot
21
Solution
22
Calculations
23
least squares prediction line
24
Least Squares Prediction Line
25
Consumption Expenditure Prediction When x=$6,000 6 7.4
26
Consumption Expenditure Prediction When x=$25,000 25 11.2
27
The least squares line always goes through the point with coordinates (x, y) ( x, y ) = ( 9, 8 )
28
C. Compute the Residuals
29
Residuals
30
Income Residual Plot
31
residuals, residuals) 2 n Note that * residuals = 0 residuals) 2 = 3.6 *From formula on slide 15: SSE= y i 2 – b 0 * y i – b 1 * x i y i 330 – 6.2*40 -.2*392 = 330 – 248 – 78.4 = 3.6 Any other line drawn through the scatterplot will have residuals) 2 > 3.6
32
Car Weight, Fuel Consumption Example, cont. (x i, y i ): (3.4, 5.5) (3.8, 5.9) (4.1, 6.5) (2.2, 3.3) (2.6, 3.6) (2.9, 4.6) (2, 2.9) (2.7, 3.6) (1.9, 3.1) (3.4, 4.9)
33
Wt (x) Fuel (y) 3.45.5.5.251.111.231.555 3.85.9.9.811.512.28011.359 4.16.51.21.442.114.45212.532 2.23.3-.7.49-1.091.1881.763 2.63.6-.3.09-.79.6241.237 2.94.600.21.04410 2.02.9-.9.81-1.492.22011.341 2.73.6-.2.04-.79.6241.158 1.93.11-1.291.66411.29 3.44.9.5.25.51.2601.255 2943.905.18014.5898.49 col. sum
34
Calculations
35
Scatterplot with least squares prediction line
36
The Least Squares Line Always goes Through ( x, y ) (x, y ) = (2.9, 4.39)
37
Using the least squares line for prediction. Fuel consumption of 3,000 lb car? (x=3)
38
Be Careful! Fuel consumption of 500 lb car? (x =.5) x =.5 is outside the range of the x-data that we used to determine the least squares line
39
Avoid GIGO! Evaluating the least squares line 1. Create scatterplot. Approximately linear? 2. Calculate r 2, the square of the correlation coefficient 3. Examine residual plot
40
r 2 : The Variation Accounted For n The square of the correlation coefficient r gives important information about the usefulness of the least squares line
41
r 2 : important information for evaluating the usefulness of the least squares line The square of the correlation coefficient, r 2, is the fraction of the variation in y that is explained by the least squares regression of y on x. -1 ≤ r ≤ 1 implies 0 ≤ r 2 ≤ 1 The square of the correlation coefficient, r 2, is the fraction of the variation in y that is explained by the variation in x.
42
Example: car weight, fuel consumption n x=car weight, y=fuel consumption r 2 = (.9766) 2 .95 About 95% of the variation in fuel consumption (y) is explained by the linear relationship between car weight (x) and fuel consumption (y). n What else affects fuel consumption? –Driver, size of engine, tires, road, etc.
43
Example: SAT scores
44
SAT scores: calculations
45
SAT scores: result r 2 = (-.868) 2 =.7534 If 57% of NC seniors take the SAT, the predicted mean score is
46
Avoid GIGO! Evaluating the least squares line 1. Create scatterplot. Approximately linear? 2. Calculate r 2, the square of the correlation coefficient 3. Examine residual plot
47
Residuals n residual=observed y - predicted y = y - y n Properties of residuals 1.The residuals always sum to 0 (therefore the mean of the residuals is 0) 2.The least squares line always goes through the point (x, y)
48
Graphically residual = y - y y y i y i e i =y i - y i X x i
49
Residual Plot n Residuals help us determine if fitting a least squares line to the data makes sense n When a least squares line is appropriate, it should model the underlying relationship; nothing interesting should be left behind n We make a scatterplot of the residuals in the hope of finding… NOTHING!
50
Car Wt/ Fuel Consump: Residuals n CAR WT. FUEL CONSUMP. Pred FUEL CONSUMP. Residuals n 3.4 5.55.2094980690.290501931 n 3.8 5.95.865096525 0.034903475 n 4.1 6.56.356795367 0.143204633 n 2.2 3.33.242702703 0.057297297 n 2.6 3.63.898301158 -0.29830115 n 2.9 4.64.39 0.21 n 2 2.92.914903475 -0.01490347 n 2.7 3.64.062200772 -0.46220077 n 1.9 3.12.751003861 0.348996139 n 3.4 4.95.209498069 -0.309498069
51
Example: Car wt/fuel consump. residual plot
52
SAT Residuals
53
Linear Relationship?
54
Garbage In Garbage Out
55
Residual Plot – Clue to GIGO
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.