2 Essentials: Regression (Predictions based upon the known.) Understand what the regression process does - prediction.Be able to state the steps we use leading up to the decision to conduct regression.Be able to calculate the slope of a line and the y-intercept.Be able to calculate a regression equation and apply it to the prediction of other values. Know that these are estimates, not necessarily the actual values that might occur.Know what the Least Squares Property and Line of Best Fit. Residual – what’s that?
3 Cartesian Coordinate System We’ve seen the positive section of this. 54321-1-2-3-4-5xyCartesian Coordinate SystemWe’ve seen the positive section of this.It also has a negative section.
4 54321-1-2-3-4-5xy.(-1, 4).(-2, 2)Points (x, y) are plotted in the negative section in the same way we plot points in the positive section.Here we see the points (-2, 2) and (-1, 4)-x-y
5 A Linear Equation in One Independent Variable b is they-intercept (the point at which theline intersects the y-axis). It is thevalue of y when x = 0.y is the dependentvariable (also called the responsevariable). Its value depends on thevalue of x.y = mx + bA linear equation will always have a straight line as its graph.x and y are variablem and b are fixed values.Linear equations are many times used to express the relationship between 2 variables.x is theindependent variable(also known as the predictorvariable.)m is the slope of the line.The slope indicates how much the y-valueincreases (or decreases if the slope is negative)when the x-value increases by 1 unit. When m ispositive, the line will have an upward slope. Whenm is negative, the line will have a downward slope.
6 . y=mx+b y=2x+1 y x y 0 1 1 3 x 2 5 -1 -1 -2 -3 -3 -5 1 2 3 4 5 4 3 2 54321-1-2-3-4-5xy.y=mx+by=2x+1x y0 11 32 5-1 -1-2 -3-3 -5
7 Regression Definitions y = b0 + b1x Regression Line Regression EquationGiven a collection of paired data, the regression equationy = b0 + b1x^algebraically describes the relationship between the two variablesRegression Line(line of best fit or least-squares line)The regression line is the graph of the regression equation
8 Always Look at a Scatterplot First You should be able to “see” a straight line being passed through the data points.If data doesn’t exhibit a linear pattern, then what we are about to do does not apply, and therefore should not be used.
9 Regression Line Plotted on Scatterplot We said that a “good” line, would be one that minimizes the distance from each point to the line.In this block, we will find the best line to do this, for a given set of data.
10 The Regression Line is calculated to minimize the distance of the line from the observed values.We said that a “good” line, would be one that minimizes the distance from each point to the line.In this block, we will find the best line to do this, for a given set of data.
11 The Regression Equation x is the independent variable (predictor variable)^y is the dependent variable (response variable)^y = b0 +b1xWhere: b0 = y interceptb1 = slope(recall, y = mx +b )
12 Notation for Regression Equation PopulationParameterSampleStatisticy-intercept of regression equation b0Slope of regression equation b1Equation of the regression line y = 0 + 1x y = b0 + b1x1^
13 Formulas for b0 and b1 Slope: y-intercept: HOWEVER (NEXT SLIDE)y-intercept:NOTE: If you do not find b1 first, then b0 may be determined by:
14 The Regression Line y = b0 +b1x ^ Fits the sample points best. Distances between this line and the sample points are at a minimum.
15 When is it reasonable to do Regression Start by asking the following:Does it make sense to look at the relationship between these two variables?Does a scatter plot present a relationship (either positive or negative)?If yes to both, calculate r (the correlation).Is the correlation statistically significant?Yes - go on to regressionNo – best estimate becomes the mean of the y variableConduct regression analysis (if yes above)Use the regression equation to calculate (estimate) a y-value given a specific x-value.
16 PredictionsIn predicting a value of y based on some given value of x ...1. If there is not a significant linear correlation, the best predicted y-value is y.2. If there is a significant linear correlation, the best predicted y-value is found by substituting the x-value into the regression equation.
17 Predicting the Value of a Variable StartCalculate the value of rand test the hypothesisthat = 0Isthere asignificant linearcorrelation?Use the regressionequation to makepredictions. Substitutethe given value in theregression equation.YesNoGiven any value of onevariable, the best predictedvalue of the other variableis its sample mean.
18 Guidelines for Using The Regression Equation If there is no significant linear correlation, don’t use the regression equation to make predictions.When using the regression equation for predictions, stay within the scope of the available sample data.A regression equation based on old data is not necessarily valid now.Don’t make predictions about a population that is different from the population from which the sample data was drawn.
19 Definitions Marginal Change Outlier Influential Points the amount a variable changes when the other variable changes by exactly one unitOutliera point lying far away from the other data pointsInfluential Pointspoints which strongly affect the graph of the regression lineThe slope b1 in the regression equation represents the marginal change in y that occurs when x changes by one unit.
20 Residuals and the Least-Squares Property DefinitionsResidualFor a sample of paired (x,y) data, the difference (y - y) between an observed sample y-value and the value of y-hat, which is the value of y that is predicted by using the regression equation.Least-Squares PropertyA straight line satisfies this property if the sum of the squares of the residuals is the smallest sum possible.^
21 Residuals and the Least-Squares Property xyy = 5 + 4x2468101214161820222426283032135•xyResidual = 7Residual = -13Residual = -5Residual = 11^
22 Example : Orion CarsOrion Cars: The age and price for a sample of 11 Orions are noted below. Calculate a correlation coefficient and , if appropriate, a regression equation for the relationship. Determine the value of cars that are 4.5 years and 10 years old.Car Age (yrs.) Price ($100’s)