Least Squares Regression

Slides:



Advertisements
Similar presentations
Least Squares Regression
Advertisements

CHAPTER 3 Describing Relationships
Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel.
Regression, Residuals, and Coefficient of Determination Section 3.2.
C HAPTER 3: E XAMINING R ELATIONSHIPS. S ECTION 3.3: L EAST -S QUARES R EGRESSION Correlation measures the strength and direction of the linear relationship.
Finding Areas with Calc 1. Shade Norm
Describing Bivariate Relationships Chapter 3 Summary YMS3e AP Stats at LSHS Mr. Molesky Chapter 3 Summary YMS3e AP Stats at LSHS Mr. Molesky.
Lesson Least-Squares Regression. Knowledge Objectives Explain what is meant by a regression line. Explain what is meant by extrapolation. Explain.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Describing Bivariate Relationships Chapter 3 Summary YMS AP Stats Chapter 3 Summary YMS AP Stats.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances.
LEAST-SQUARES REGRESSION 3.2 Least Squares Regression Line and Residuals.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Describing Bivariate Relationships. Bivariate Relationships When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response.
Describing Relationships
CHAPTER 3 Describing Relationships
Sections Review.
Statistics 101 Chapter 3 Section 3.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Linear models
Describing Bivariate Relationships
AP Stats: 3.3 Least-Squares Regression Line
Ice Cream Sales vs Temperature
Least-Squares Regression
Describing Bivariate Relationships
Chapter 8 Linear Regression
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
^ y = a + bx Stats Chapter 5 - Least Squares Regression
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Finding Areas with Calc 1. Shade Norm
Chapter 3 Describing Relationships Section 3.2
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Least-Squares Regression
Least-Squares Regression
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
3.2 – Least Squares Regression
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Section 3.2: Least Squares Regressions
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
9/27/ A Least-Squares Regression.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Review of Chapter 3 Examining Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

Least Squares Regression A regression line is a line that describes how a response variable changed as an explanatory variable changes. We often use it to predict values of y. Regression lines take the form: y= a +bx b is the slope: the amount y changes when x increases by one unit a is the y-intercept

LSRL The slope here b = .00344 tells us that fat gained goes down by .00344 kg for each added calorie of NEA according to this linear model. The Y intercept a = 3.505kg is the fat gain estimated by this model if NEA does not change when a person overeats. Our regression equation is the predicted RATE OF CHANGE in the response y as the explanatory variable x changes.

LSRL In most cases, no line will pass exactly through all the points in a scatter plot and different people will draw different regression lines by eye. Because we use the line to predict y from x, the prediction errors we make are errors in y, the vertical direction in the scatter plot A good regression line makes the vertical distances of the points from the line as small as possible Error: Observed response - predicted response

LSRL Cont.

LSRL Equation a = the intercept of the line b = the slope of the line Fact!: Every LSRL line passes through ***y denotes actual value, y-hat denotes predicted value

YOU TRY 

Prediction We can use a regression line to predict the response y for a specific value of the explanatory variable x. Extrapolation: the use of the regression line for predictions outside the range of values of the explanatory variable to obtain the line. These are often not accurate.

Ex: Dinosaur Bones Create a scatterplot Analyze the scatterplot Archeologists want to determine if a new bone belongs to a certain species of dinosaur. They have a set of bones that they KNOW go together and have recorded the Femur lengths and Humerus lengths. Analyze the data and determine if there is a relationship. Create a scatterplot Analyze the scatterplot Find the correlation Find the LSRL Interpret

Residuals The error of our predictions, or vertical distance from predicted Y to observed Y, are called residuals because they are “left-over” variation in the response. Residual: the difference between an observed value of the response variable and the value predicted by the regression line. That is,

RANDOM EXAMPLE

Residuals List on Calc Clear other Stat Plot and Deselect any other equations in y= Press [2nd][Y=][2] to access Stat Plot2 and enter the Xlist you used in your regression. Enter the Ylist by pressing [2nd][STAT] and using the up- and down-arrow keys to scroll to RESID. Press [ENTER] to insert the RESID list. Zoom Stat

Residual Plot Residual Plot: a scatterplot of the regression residuals against the explanatory variable. Helps us assess how well a regression line fits the data. The sum of the least-squares residuals is always zero. The mean of the residuals is always zero, the horizontal line at zero in the figure helps orient us. This “residual = 0” line corresponds to the regression line

Examining Residual Plot Residual plot should show no obvious pattern. A curved pattern shows that the relationship is not linear and a straight line may not be the best model. Residuals should be relatively small in size. A regression line in a model that fits the data well should come close” to most of the points. A commonly used measure of this is the standard deviation of the residuals, given by:

RANDOM EXAMPLE Continued…..

Residual Plot on Calc Produce Scatterplot and Regression line from data (lets use BAC if still in there) Turn all plots off Create new scatterplot with X list as your explanatory variable and Y list as residuals (2nd stat, resid) Zoom Stat

Facts about LSRL The distinction between explanatory and response variables is essential in regression. If we reverse the roles, we get a different least- squares regression line. There is a close connection between correlation and the slope of the LSRL. Slope is r times Sy/Sx. The LSRL will always pass through the point (X bar, Y Bar) r squared is the fraction of variation in values of y explained by the x variable

Influential vs Outlier Correlation r is not resistant. A point in the scatterplot greatly affects the value of r. LSRL is not resistant. Outlier: an observation that lies outside the overall pattern of the other observations. Points that are outliers in the y direction but not the x on a scatterplot have strong residuals. Influential: a point that pulls the linear regression line towards it. Generally it’s a point extreme in the x direction with no other points near it.

Lurking Variables Lurking Variable: a variable that is not among the explanatory or response variables and yet may influence the interpretation of relationships among these variables. Example: A college board study of HS grads found a strong correlation between math minority students took in high school and their later success in college. News articles quoted the College Board saying that “math is the gatekeeper for success in college”. But, Minority students from middle-class homes with educated parents no doubt take more high school math courses. They are also more likely to have a stable family, parents who emphasize education, and can pay for college etc. These students would likely succeed in college even if they took fewer math courses. The family background of students is a lurking variable that probably explains much of the relationship between math courses and college success.

Corrosion and Strength Consider the following data from the article, “The Carbonation of Concrete Structures in the Tropical Environment of Singapore” (Magazine of Concrete Research (1996):293-300 which discusses how the corrosion of steel(caused by carbonation) is the biggest problem affecting concrete strength: x= carbonation depth in concrete (mm) y= strength of concrete (Mpa) x 8 20 30 35 40 50 55 65 y 22.8 17.1 21.5 16.1 13.4 12.4 11.4 9.7 6.8 Define the Explanatory and Response Variables. Plot the data and describe the relationship.

Corrosion and Strength There is a strong, negative, linear relationship between depth of corrosion and concrete strength. As the depth increases, the strength decreases at a constant rate. Depth (mm) Strength (Mpa)

Corrosion and Strength Depth (mm) Strength (Mpa) The mean depth of corrosion is 35.89mm with a standard deviation of 18.53mm. The mean strength is 14.58 Mpa with a standard deviation of 5.29 Mpa.

Corrosion and Strength Find the equation of the Least Squares Regression Line (LSRL) that models the relationship between corrosion and strength. Depth (mm) Strength (Mpa) y=24.52+(-0.28)x strength=24.52+(-0.28)depth r=-0.96

Corrosion and Strength Depth (mm) Strength (Mpa) y=24.52+(-0.28)x strength=24.52+(-0.28)depth r=-0.96 What does “r” tell us? There is a Strong, Negative, LINEAR relationship between depth of corrosion and strength of concrete. What does “b=-0.28” tell us? For every increase of 1mm in depth of corrosion, we predict a 0.28 Mpa decrease in strength of the concrete.

Corrosion and Strength Use the prediction model (LSRL) to determine the following: What is the predicted strength of concrete with a corrosion depth of 25mm? strength=24.52+(-0.28)depth strength=24.52+(-0.28)(25) strength=17.59 Mpa What is the predicted strength of concrete with a corrosion depth of 40mm? strength=24.52+(-0.28)(40) strength=13.44 Mpa How does this prediction compare with the observed strength at a corrosion depth of 40mm?

Residuals Note, the predicted strength when corrosion=40mm is 13.44 Mpa. The observed strength is 12.4mm The prediction did not match the observation. There was an “error” or “residual” between our prediction and the actual observation. RESIDUAL = Observed y - Predicted y The residual when corrosion=40mm is: residual = 12.4 - 13.44 residual = -1.04

Assessing the Model Is the LSRL the most appropriate prediction model for strength? r suggests it will provide strong predictions...can we do better? To determine this, we need to study the residuals generated by the LSRL. Make a residual plot. Look for a pattern. If no pattern exists, the LSRL may be our best bet for predictions. If a pattern exists, a better prediction model may exist...

Residual Plot Construct a Residual Plot for the (depth,strength) LSRL. depth(mm) residuals There appears to be no pattern to the residual plot...therefore, the LSRL may be our best prediction model.

R squared- Coefficient of determination

R squared- Coefficient of determination If all the points fall directly on the least-squares line, r squared = 1. Then all the variation in y is explained by the linear relationship with x. So, if r squared = .606, that means that 61% of the variation in y among individual subjects is due to the influence of the other variable. The other 39% is “not explained”. r squared is a measure of how successful the regression was in explaining the response

Coefficient of Determination We know what “r” tells us about the relationship between depth and strength....what about r2? Depth (mm) Strength (Mpa) 93.75% of the variability in predicted strength can be explained by the LSRL on depth.

Summary When exploring a bivariate relationship: Make and interpret a scatterplot: Strength, Direction, Form Describe x and y: Mean and Standard Deviation in Context Find the Least Squares Regression Line. Write in context. Construct and Interpret a Residual Plot. Interpret r and r2 in context. Use the LSRL to make predictions...