Presentation is loading. Please wait.

Presentation is loading. Please wait.

Least Squares Regression

Similar presentations


Presentation on theme: "Least Squares Regression"— Presentation transcript:

1 Least Squares Regression
A regression line is a line that describes how a response variable changed as an explanatory variable changes. We often use it to predict values of y. Regression lines take the form: y= a +bx b is the slope: the amount y changes when x increases by one unit a is the y-intercept

2 LSRL The slope here b = tells us that fat gained goes down by kg for each added calorie of NEA according to this linear model. The Y intercept a = 3.505kg is the fat gain estimated by this model if NEA does not change when a person overeats. Our regression equation is the predicted RATE OF CHANGE in the response y as the explanatory variable x changes.

3 LSRL In most cases, no line will pass exactly through all the points in a scatter plot and different people will draw different regression lines by eye. Because we use the line to predict y from x, the prediction errors we make are errors in y, the vertical direction in the scatter plot A good regression line makes the vertical distances of the points from the line as small as possible Error: Observed response - predicted response

4 LSRL Cont.

5 LSRL Equation a = the intercept of the line b = the slope of the line
Fact!: Every LSRL line passes through ***y denotes actual value, y-hat denotes predicted value

6 YOU TRY 

7 Prediction We can use a regression line to predict the response y for a specific value of the explanatory variable x. Extrapolation: the use of the regression line for predictions outside the range of values of the explanatory variable to obtain the line. These are often not accurate.

8 Ex: Dinosaur Bones Create a scatterplot Analyze the scatterplot
Archeologists want to determine if a new bone belongs to a certain species of dinosaur. They have a set of bones that they KNOW go together and have recorded the Femur lengths and Humerus lengths. Analyze the data and determine if there is a relationship. Create a scatterplot Analyze the scatterplot Find the correlation Find the LSRL Interpret

9 Residuals The error of our predictions, or vertical distance from predicted Y to observed Y, are called residuals because they are “left-over” variation in the response. Residual: the difference between an observed value of the response variable and the value predicted by the regression line. That is,

10 RANDOM EXAMPLE

11 Residuals List on Calc Clear other Stat Plot and Deselect any other equations in y= Press [2nd][Y=][2] to access Stat Plot2 and enter the Xlist you used in your regression. Enter the Ylist by pressing [2nd][STAT] and using the up- and down-arrow keys to scroll to RESID. Press [ENTER] to insert the RESID list. Zoom Stat

12 Residual Plot Residual Plot: a scatterplot of the regression residuals against the explanatory variable. Helps us assess how well a regression line fits the data. The sum of the least-squares residuals is always zero. The mean of the residuals is always zero, the horizontal line at zero in the figure helps orient us. This “residual = 0” line corresponds to the regression line

13 Examining Residual Plot
Residual plot should show no obvious pattern. A curved pattern shows that the relationship is not linear and a straight line may not be the best model. Residuals should be relatively small in size. A regression line in a model that fits the data well should come close” to most of the points. A commonly used measure of this is the standard deviation of the residuals, given by:

14 RANDOM EXAMPLE Continued…..

15 Residual Plot on Calc Produce Scatterplot and Regression line from data (lets use BAC if still in there) Turn all plots off Create new scatterplot with X list as your explanatory variable and Y list as residuals (2nd stat, resid) Zoom Stat

16 Facts about LSRL The distinction between explanatory and response variables is essential in regression. If we reverse the roles, we get a different least- squares regression line. There is a close connection between correlation and the slope of the LSRL. Slope is r times Sy/Sx. The LSRL will always pass through the point (X bar, Y Bar) r squared is the fraction of variation in values of y explained by the x variable

17 Influential vs Outlier
Correlation r is not resistant. A point in the scatterplot greatly affects the value of r. LSRL is not resistant. Outlier: an observation that lies outside the overall pattern of the other observations. Points that are outliers in the y direction but not the x on a scatterplot have strong residuals. Influential: a point that pulls the linear regression line towards it. Generally it’s a point extreme in the x direction with no other points near it.

18 Lurking Variables Lurking Variable: a variable that is not among the explanatory or response variables and yet may influence the interpretation of relationships among these variables. Example: A college board study of HS grads found a strong correlation between math minority students took in high school and their later success in college. News articles quoted the College Board saying that “math is the gatekeeper for success in college”. But, Minority students from middle-class homes with educated parents no doubt take more high school math courses. They are also more likely to have a stable family, parents who emphasize education, and can pay for college etc. These students would likely succeed in college even if they took fewer math courses. The family background of students is a lurking variable that probably explains much of the relationship between math courses and college success.

19 Corrosion and Strength
Consider the following data from the article, “The Carbonation of Concrete Structures in the Tropical Environment of Singapore” (Magazine of Concrete Research (1996): which discusses how the corrosion of steel(caused by carbonation) is the biggest problem affecting concrete strength: x= carbonation depth in concrete (mm) y= strength of concrete (Mpa) x 8 20 30 35 40 50 55 65 y 22.8 17.1 21.5 16.1 13.4 12.4 11.4 9.7 6.8 Define the Explanatory and Response Variables. Plot the data and describe the relationship.

20 Corrosion and Strength
There is a strong, negative, linear relationship between depth of corrosion and concrete strength. As the depth increases, the strength decreases at a constant rate. Depth (mm) Strength (Mpa)

21 Corrosion and Strength
Depth (mm) Strength (Mpa) The mean depth of corrosion is 35.89mm with a standard deviation of 18.53mm. The mean strength is Mpa with a standard deviation of 5.29 Mpa.

22 Corrosion and Strength
Find the equation of the Least Squares Regression Line (LSRL) that models the relationship between corrosion and strength. Depth (mm) Strength (Mpa) y=24.52+(-0.28)x strength=24.52+(-0.28)depth r=-0.96

23 Corrosion and Strength
Depth (mm) Strength (Mpa) y=24.52+(-0.28)x strength=24.52+(-0.28)depth r=-0.96 What does “r” tell us? There is a Strong, Negative, LINEAR relationship between depth of corrosion and strength of concrete. What does “b=-0.28” tell us? For every increase of 1mm in depth of corrosion, we predict a 0.28 Mpa decrease in strength of the concrete.

24 Corrosion and Strength
Use the prediction model (LSRL) to determine the following: What is the predicted strength of concrete with a corrosion depth of 25mm? strength=24.52+(-0.28)depth strength=24.52+(-0.28)(25) strength=17.59 Mpa What is the predicted strength of concrete with a corrosion depth of 40mm? strength=24.52+(-0.28)(40) strength=13.44 Mpa How does this prediction compare with the observed strength at a corrosion depth of 40mm?

25 Residuals Note, the predicted strength when corrosion=40mm is Mpa. The observed strength is 12.4mm The prediction did not match the observation. There was an “error” or “residual” between our prediction and the actual observation. RESIDUAL = Observed y - Predicted y The residual when corrosion=40mm is: residual = residual = -1.04

26 Assessing the Model Is the LSRL the most appropriate prediction model for strength? r suggests it will provide strong predictions...can we do better? To determine this, we need to study the residuals generated by the LSRL. Make a residual plot. Look for a pattern. If no pattern exists, the LSRL may be our best bet for predictions. If a pattern exists, a better prediction model may exist...

27 Residual Plot Construct a Residual Plot for the (depth,strength) LSRL.
depth(mm) residuals There appears to be no pattern to the residual plot...therefore, the LSRL may be our best prediction model.

28 R squared- Coefficient of determination

29 R squared- Coefficient of determination
If all the points fall directly on the least-squares line, r squared = 1. Then all the variation in y is explained by the linear relationship with x. So, if r squared = .606, that means that 61% of the variation in y among individual subjects is due to the influence of the other variable. The other 39% is “not explained”. r squared is a measure of how successful the regression was in explaining the response

30 Coefficient of Determination
We know what “r” tells us about the relationship between depth and strength....what about r2? Depth (mm) Strength (Mpa) 93.75% of the variability in predicted strength can be explained by the LSRL on depth.

31 Summary When exploring a bivariate relationship:
Make and interpret a scatterplot: Strength, Direction, Form Describe x and y: Mean and Standard Deviation in Context Find the Least Squares Regression Line. Write in context. Construct and Interpret a Residual Plot. Interpret r and r2 in context. Use the LSRL to make predictions...


Download ppt "Least Squares Regression"

Similar presentations


Ads by Google