Presentation is loading. Please wait.

Presentation is loading. Please wait.

Anthony Greene1 Regression Using Correlation To Make Predictions.

Similar presentations


Presentation on theme: "Anthony Greene1 Regression Using Correlation To Make Predictions."— Presentation transcript:

1 Anthony Greene1 Regression Using Correlation To Make Predictions

2 Anthony Greene2 Making a prediction To obtain the predicted value of y based on a known value of x and a known correlation. Note what happens for positive and negative values of r and for high and low values of r and for near-zero values of r.

3 Anthony Greene3 Graph of y = 5 – 3 x

4 Anthony Greene4 y-Intercept and Slope For a linear equation y = a + bx, the constant a is the y-intercept and the constant b is the slope. x and y are related variables

5 Anthony Greene5 Straight-line graphs of three linear equations Y = a + bX a = y-intercept b = slope (rise/run)

6 Anthony Greene6 Graphical Interpretation of Slope The straight-line graph of the linear equation y = a +bx slopes upward if b > 0, slopes downward if b < 0, and is horizontal if b = 0

7 Anthony Greene7 Graphical interpretation of slope

8 Anthony Greene8 Four data points

9 Anthony Greene9 Scatter plot

10 Anthony Greene10 Two possible straight-line fits to the data points

11 Anthony Greene11 Determining how well the data points in are fit by Line A Vs.Line B

12 Anthony Greene12 Least-Squares Criterion The straight line that best fits a set of data points is the one having the smallest possible sum of squared errors. Recall that the sum of squared errors is error variance.

13 Anthony Greene13 Regression Line and Regression Equation Regression line: The straight line that best fits a set of data points according to the least-squares criterion. Regression equation: The equation of the regression line.

14 Anthony Greene14 The best-fit line minimizes the distance between the actual data and the predicted value

15 Anthony Greene15 Residual, e, of a data point

16 Anthony Greene16 We define SS x, SS P and SS y by Notation Used in Regression and Correlation

17 Anthony Greene17 Regression Equation The regression equation for a set of n data points is

18 Anthony Greene18 The relationship between b and r That is, the regression slope is just the correlation coefficient scaled up to the right size for the variables x and y.

19 Anthony Greene19

20 Anthony Greene20 Criterion for Finding a Regression Line Before finding a regression line for a set of data points, draw a scatter diagram. If the data points do not appear to be scattered about a straight line, do not determine a regression line.

21 Anthony Greene21 Linear regression requires linear data: (a) Data points scattered about a curve (b) Inappropriate straight line fit to the data Higher order regression equations exist but are outside the range of this course

22 Anthony Greene22 Uniform Variance Math Proficiency By Grade

23 Anthony Greene23 Assumptions for Regression Inferences

24 Anthony Greene24 Table for obtaining the three sums of squares for the used car data

25 Anthony Greene25 Regression line and data points for used car data What is a fair asking price for a 2.5 year old car? So since the price unit is $100s, the best prediction is $17,271

26 Anthony Greene26 Extrapolation in the used car example

27 27 Total sum of squares, SST: The variation in the observed values of the response variable: Regression sum of squares, SSR: The variation in the observed values of the response variable that is explained by the regression: Error sum of squares, SSE: The variation in the observed values of the response variable that is not explained by the regression: Sums of Squares in Regression

28 Anthony Greene28 Regression Identity The total sum of squares equals the regression sum of squares plus the error sum of squares. In symbols, SST = SSR + SSE.

29 Anthony Greene29 Graphical portrayal of regression for used cars y = a + bx

30 Anthony Greene30 What sort of things could regression be used for? Any instance where a known correlation exists, regression can be used to predict a new score. Examples: 1. If you knew that there was a past correlation between the amount of study time and the grade on an exam, you could make a good prediction about the grade before it happened. 2. If you knew that certain features of a stock correlate with its price, you can use regression to predict the price before it happens.

31 Anthony Greene31 Regression Example: Low Correlation Find the regression equation for predicting height based on knowledge of weight. The existing data is for 10 male stats students?

32 Anthony Greene32 X Y

33 Anthony Greene33 X Y XY X 2 Y 2

34 Anthony Greene34 X Y XY X 2 Y 2 

35 Anthony Greene35 SS x =  x 2 - (  x) 2 /n = 465,844-433,472.4 = 32,372 S P =  xy -  x  y/n= 151,325-150, 112.2 b=S P /SS x, so b = 1,213/32,372=0.03 a = (1/n)(  y-b  x), so a = 0.1(721-60.38) = 66 So, Y=0.03x+66 X Y XY X 2 Y 2 2,082 721 151,325 465,844 52,147  ^

36 Anthony Greene36 Y=0.03x+66 ^

37 Anthony Greene37 Regression Example: High Correlation Find the regression equation for predicting probability of a teenage suicide attempt based on weekly heroine usage.

38 38 XYXYX2X2 Y2Y2 10.2 10.04 10.31 10.0961 10.18 10.0324 20.270.5440.0729 20.380.7640.1444 20.460.9240.2116 30.92.790.81 30.581.7490.3364 30.451.3590.2025 40.843.36160.7056 40.742.96160.5476 40.682.72160.4624 50.854.25250.7225 50.783.9250.6084 50.733.65250.5329 60.885.28360.7744 60.824.92360.6724 60.784.68360.6084 70.926.44490.8464 70.855.95490.7225 70.916.37490.8281 8413.5163.184209.9779

39 39 XYXYX2X2 Y2Y2 10.2 10.04 10.31 10.0961 10.18 10.0324 20.270.5440.0729 20.380.7640.1444 20.460.9240.2116 30.92.790.81 30.581.7490.3364 30.451.3590.2025 40.843.36160.7056 40.742.96160.5476 40.682.72160.4624 50.854.25250.7225 50.783.9250.6084 50.733.65250.5329 60.885.28360.7744 60.824.92360.6724 60.784.68360.6084 70.926.44490.8464 70.855.95490.7225 70.916.37490.8281 8413.5163.184209.9779

40 40 XYXYX2X2 Y2Y2 10.2 10.04 10.31 10.0961 10.18 10.0324 20.270.5440.0729 20.380.7640.1444 20.460.9240.2116 30.92.790.81 30.581.7490.3364 30.451.3590.2025 40.843.36160.7056 40.742.96160.5476 40.682.72160.4624 50.854.25250.7225 50.783.9250.6084 50.733.65250.5329 60.885.28360.7744 60.824.92360.6724 60.784.68360.6084 70.926.44490.8464 70.855.95490.7225 70.916.37490.8281 8413.5163.184209.9779 Σ

41 41 n = 21 SS x =  x 2 - (  x) 2 /n = 420 - 336 = 84 S P =  xy -  x  y/n= 63.18 – 54.04 = 9.14 b=S P /SS x, so b = 9.14/84 = 0.109 a=(1/n)(  y-b  x), so a = (1/21)(13.51-9.156) = 0.207 So, Y= 0.109x + 0.207 XYXYX2X2 Y2Y2 8413.5163.184209.9779 Σ ^

42 Anthony Greene42 Why Is It Called Regression? For low correlations, the predicted value is close to the mean For zero correlations the prediction is the mean Only for perfect correlations R 2 = 1.0 do the predicted scores show as much variation as the actual scores Since perfect correlations are rare, we say that the predicted scores show regression towards the mean


Download ppt "Anthony Greene1 Regression Using Correlation To Make Predictions."

Similar presentations


Ads by Google