CHAPTER 3 Describing Relationships

Slides:



Advertisements
Similar presentations
Chapter 3: Describing Relationships
Advertisements

Scatter Diagrams and Linear Correlation
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Section 3.2 Least-Squares Regression
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 3: Describing Relationships Section 3.2 Least-Squares Regression.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
3.2 Least-Squares Regression Objectives SWBAT: INTERPRET the slope and y intercept of a least-squares regression line. USE the least-squares regression.
LEAST-SQUARES REGRESSION 3.2 Least Squares Regression Line and Residuals.
CHAPTER 3 Describing Relationships
LEAST-SQUARES REGRESSION 3.2 Role of s and r 2 in Regression.
Chapter 3: Describing Relationships
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
+ Warm Up 10/2 Type the following data into your calculator L1, L2 so you can follow along today…. 1 NEA calFat gain kg
CHAPTER 3 Describing Relationships
Describing Relationships
CHAPTER 3 Describing Relationships
Aim – How can we analyze the results of a linear regression?
3.2 Least-Squares Regression
Statistics 101 Chapter 3 Section 3.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
LSRL Least Squares Regression Line
Regression and Residual Plots
Ice Cream Sales vs Temperature
Least-Squares Regression
residual = observed y – predicted y residual = y - ŷ
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Least-Squares Regression
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
GET OUT p.161 HW!.
Least-Squares Regression
Chapter 3 Describing Relationships Section 3.2
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Least-Squares Regression
Chapter 3: Describing Relationships
Least-Squares Regression
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
3.2 – Least Squares Regression
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Warm-up: Pg 197 #79-80 Get ready for homework questions
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
9/27/ A Least-Squares Regression.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Honors Statistics Review Chapters 7 & 8
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

CHAPTER 3 Describing Relationships 3.2 Least-Squares Regression

Least-Squares Regression INTERPRET the slope and y intercept of a least-squares regression line. USE the least-squares regression line to predict y for a given x.

Regression Line Linear (straight-line) relationships between two quantitative variables are common and easy to understand. A regression line summarizes the relationship between two variables, but only in settings where one of the variables helps explain or predict the other. A regression line is a line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x.

Interpreting a Regression Line A regression line is a model for the data, much like density curves. The equation of a regression line gives a compact mathematical description of what this model tells us about the relationship between the response variable y and the explanatory variable x. Suppose that y is a response variable (plotted on the vertical axis) and x is an explanatory variable (plotted on the horizontal axis). A regression line relating y to x has an equation of the form ŷ = a + bx In this equation, ŷ (read “y hat”) is the predicted value of the response variable y for a given value of the explanatory variable x. b is the slope, the amount by which y is predicted to change when x increases by one unit. a is the y intercept, the predicted value of y when x = 0.

Difference Between y and ŷ y is the observed, or actual, value of y. It is a real data value. ŷ is a predicted value found using a regression line.

Example: Interpreting slope and y intercept The equation of the regression line shown is a) Interpret the slope and y intercept of the regression line. b) Predict the price of a used Ford with 100,000 miles. SOLUTION: The slope b = -0.1629 tells us that the price of a used Ford F-150 is predicted to go down by 0.1629 dollars (16.29 cents) for each additional mile that the truck has been driven. The y intercept a = $38,257 is the predicted price of a Ford F-150 that has been driven 0 miles. Try Exercise 39

Prediction We can use a regression line to predict the response ŷ for a specific value of the explanatory variable x. Use the regression line to predict price for a Ford F-150 with 100,000 miles driven.

Prediction Predict the price of a used Ford that has 250,000 on the odometer. How confident are you with this prediction? Why?

Extrapolation We can use a regression line to predict the response ŷ for a specific value of the explanatory variable x. The accuracy of the prediction depends on how much the data scatter about the line. While we can substitute any value of x into the equation of the regression line, we must exercise caution in making predictions outside the observed values of x. Extrapolation is the use of a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line. Such predictions are often not accurate. Don’t make predictions using values of x that are much larger or much smaller than those that actually appear in your data.

Example  

Example The following data show the number of miles driven and advertised price for 11 used Honda CR-Vs from the 2002-2006 model years. 1. Make scatterplot on calculator (draw a rough sketch in notes) 2. Describe the scatterplot. 3. Find the correlation coefficient. 4. Find the regression line. 5. Interpret the regression line. 6. Predict the price of a car with 50,000 miles on it. Miles Driven (Thousands) Price (Dollars) 22 17,998 29 16,450 35 14,998 39 13,998 45 14,599 49 14,988 55 13,599 56 69 11,998 70 14,450 86 10,998

Least-Squares Regression INTERPRET the slope and y intercept of a least-squares regression line. USE the least-squares regression line to predict y for a given x.

Homework Pg. 193 #35, 37, 39, 41

Least-Squares Regression CALCULATE and INTERPRET residuals and their standard deviation. EXPLAIN the concept of least squares. DETERMINE the equation of a least-squares regression line using a variety of methods. CONSTRUCT and INTERPRET residual plots to assess whether a linear model is appropriate.

Residuals In most cases, no line will pass exactly through all the points in a scatterplot. A good regression line makes the vertical distances of the points from the line as small as possible. A residual is the difference between an observed value of the response variable and the value predicted by the regression line. residual = observed y – predicted y residual = y - ŷ

Example  

Least Squares Regression Line Different regression lines produce different residuals. The regression line we want is the one that minimizes the sum of the squared residuals. The least-squares regression line of y on x is the line that makes the sum of the squared residuals as small as possible.

Example 12 of Taco Bell’s chicken menu items: Calculate the equation of the least-squares regression line using technology. Make sure to define variables! Sketch the scatterplot with the graph of the least-squares regression line. Units are grams for both. b) Interpret the slope and y-intercept in context. c) Calculate and interpret the residual for the first item listed, the Chicken Burrito Supreme, with 12g of fat and 50g of carbs.

Residual Plots One of the first principles of data analysis is to look for an overall pattern and for striking departures from the pattern. A regression line describes the overall pattern of a linear relationship between two variables. We see departures from this pattern by looking at the residuals. A residual plot is a scatterplot of the residuals against the explanatory variable. Residual plots help us assess how well a regression line fits the data.

Examining Residual Plots A residual plot magnifies the deviations of the points from the line, making it easier to see unusual observations and patterns. The residual plot should show no obvious patterns The residuals should be relatively small in size. Pattern in residuals Linear model not appropriate

Interpreting Residual Plots A residual plot can reveal patterns that were not apparent (not obvious) in the original scatterplot. What to look for: BAD: A curved pattern, meaning the relationship between x and y is not linear. BAD: Changing vertical spread, meaning predictions will be less accurate for some values of x. (A line may still be the best fit, but we need to be careful about making predictions.) GOOD: A uniform, even, random scatter, meaning any errors in our predictions will be random and about equal for any value of x. (Line is a good fit.) Also, smaller residuals mean the line is a better predictor!

Making a Residual Plot in your Calculator  

Example Construct and interpret a residual plot for the Taco Bell chicken items.

Least-Squares Regression CALCULATE and INTERPRET residuals and their standard deviation. EXPLAIN the concept of least squares. DETERMINE the equation of a least-squares regression line using a variety of methods. CONSTRUCT and INTERPRET residual plots to assess whether a linear model is appropriate.

Homework pg. 194-195 #47, 49, 51, 54

Least-Squares Regression ASSESS how well the least-squares regression line models the relationship between two variables.

Standard Deviation of the Residuals To assess how well the line fits all the data, we need to consider the residuals for each observation, not just one. Using these residuals, we can estimate the “typical” prediction error when using the least-squares regression line. If we use a least-squares regression line to predict the values of a response variable y from an explanatory variable x, the standard deviation of the residuals (s) is given by This value gives the approximate size of a “typical” prediction error (residual).

Example  

The Coefficient of Determination The standard deviation of the residuals gives us a numerical estimate of the average size of our prediction errors. There is another numerical quantity that tells us how well the least-squares regression line predicts values of the response y. The coefficient of determination r2 is the fraction of the variation in the values of y that is accounted for by the least-squares regression line of y on x. We can calculate r2 using the following formula: r2 tells us how much better the LSRL does at predicting values of y than simply guessing the mean y for each value in the dataset.

Interpretation ____ % of the variation in the (response variable) is accounted for by the least-squares regression line relating (explanatory variable) and (response variable)

 

Example Find and interpret the coefficient of determination for the taco bell data.

Example: Residual plots, s, and r2  

Example: Residual plots, s, and r2 (a) Is a linear model appropriate for these data? Explain. (b) Interpret the value s = 1.24. (c) Interpret the value r2 = 0.88. Because there is no obvious pattern left over in the residual plot, the linear model is appropriate. When using the least-squares regression line with x = points per game to predict y = the number of wins, we will typically be off by about 1.24 wins. About 88% of the variation in wins is accounted for by the linear model relating wins to points per game.

Least-Squares Regression ASSESS how well the least-squares regression line models the relationship between two variables.

Homework Pg. 195-196 #55, 57

Least-Squares Regression DESCRIBE how the slope, y intercept, standard deviation of the residuals, and r 2 are influenced by outliers. Determine the equation of a least-squares regression line using computer output. Find the slope and y-intercept of the least squares regression line from the means and standard deviations of x and y and their correlation.

Interpreting Computer Regression Output A number of statistical software packages produce similar regression output. Be sure you can locate the slope b the y intercept a the values of s and r2

Example Several factors contribute to the speed of a roller coaster: largest drop, maximum angle of drop, initial speed at the top of the largest hill, etc. Some parks report both height and drop, others only report the height. To investigate the relationship between height and speed for steel sit-down coasters, output from a regression analysis of 39 such American rides is shown below. The units for height are feet and the units for speed are miles per hour. Predictor Coef SE Coef T P Constant 25 1.13 11.7 0.0000 Height 0.24 0.02 12 0.0000   S = 6.54 R-Sq = 86.0% R-Sq(adj) = 87.2%

Regression to the Mean Using technology is often the most convenient way to find the equation of a least-squares regression line. It is also possible to calculate the equation of the least- squares regression line using only the means and standard deviations of the two variables and their correlation. How to Calculate the Least-Squares Regression Line We have data on an explanatory variable x and a response variable y for n individuals. From the data, calculate the means and the standard deviations of the two variables and their correlation r. The least-squares regression line is the line ŷ = a + bx with slope And y intercept

Example  

Correlation and Regression Wisdom Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, be aware of their limitations. 1. The distinction between explanatory and response variables is important in regression.

Correlation and Regression Wisdom 2. Correlation and regression lines describe only linear relationships. r = 0.816.

Correlation and Regression Wisdom 3. Correlation and least-squares regression lines are not resistant.

Outliers and Influential Observations in Regression Least-squares lines make the sum of the squares of the vertical distances to the points as small as possible. A point that is extreme in the x direction with no other points near it pulls the line toward itself. We call such points influential. An outlier is an observation that lies outside the overall pattern of the other observations. Points that are outliers in the y direction but not the x direction of a scatterplot have large residuals. Other outliers may not have large residuals. An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation. Points that are outliers in the x direction of a scatterplot are often influential for the least-squares regression line.

Example a) Is child 18 an outlier, influential point, both, or neither? Explain. b) Is child 19 an outlier, influential point, both, or neither? Explain.

Least-Squares Regression DESCRIBE how the slope, y intercept, standard deviation of the residuals, and r 2 are influenced by outliers. Determine the equation of a least-squares regression line using computer output. Find the slope and y-intercept of the least squares regression line from the means and standard deviations of x and y and their correlation.

Homework pg. 196-199 #59, 61, 63, 68, 71-78 (Look at pg. 185-186 in book for example of 4-step process)