Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.

Similar presentations


Presentation on theme: "Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith."— Presentation transcript:

1 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Ten Regression and Correlation

2 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 2 Scatter Diagram a plot of paired data to determine or show a relationship between two variables

3 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 3 Paired Data

4 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 4

5 5 Scatter Diagram

6 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 6 Linear Correlation The general trend of the points seems to follow a straight line segment.

7 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 7 Linear Correlation

8 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 8 Non-Linear Correlation

9 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 9 No Linear Correlation

10 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 10 High Linear Correlation Points lie close to a straight line.

11 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 11 High Linear Correlation

12 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 12 Moderate Linear Correlation

13 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 13 Low Linear Correlation

14 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 14 Perfect Linear Correlation

15 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 15 Questions Arising Can we find a relationship between x and y? How strong is the relationship?

16 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 16 When there appears to be a linear relationship between x and y: attempt to “fit” a line to the scatter diagram.

17 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 17 When using x values to predict y values: Call x the explanatory variable Call y the response variable

18 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 18 The Least Squares Line The sum of the squares of the vertical distances from the points to the line is made as small as possible.

19 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 19 Least Squares Criterion The sum of the squares of the vertical distances from the points to the line is made as small as possible.

20 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 20 Equation of the Least Squares Line y = a + bx a = the y-interceptb = the slope

21 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 21 Finding the slope

22 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 22 Finding the y-intercept

23 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 23 Find the Least Squares Line

24 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 24 Finding the slope

25 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 25 Finding the y-intercept

26 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 26 The equation of the least squares line is: y = a + bx y = 2.8 + 1.7x

27 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 27 The following point will always be on the least squares line:

28 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 28 Graphing the least squares line Using two values in the range of x, compute two corresponding y values. Plot these points. Join the points with a straight line.

29 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 29 Graphing y = 30.9 + 1.7x Use (8.3, 16.9) (average of the x’s, the average of the y’s) Try x = 5. Compute y: y = 2.8 + 1.7(5)= 11.3

30 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 30 Sketching the Line Using the Points (8.3, 16.9) and (5, 11.3)

31 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 31 Using the Equation of the Least Squares Line to Make Predictions Choose a value for x (within the range of x values). Substitute the selected x in the least squares equation. Determine corresponding value of y.

32 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 32 Predict the time to make a trip of 14 miles Equation of least squares line: y = 2.8 + 1.7x Substitute x = 14: y = 2.8 + 1.7 (14) y = 26.6 According to the least squares equation, a trip of 14 miles would take 26.6 minutes.

33 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 33 Interpolation Using the least squares line to predict y values for x values that fall between the points in the scatter diagram

34 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 34 Extrapolation Prediction beyond the range of observations

35 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 35 Standard Error of Estimate A method for measuring the spread of a set of points about the least squares line

36 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 36 The Residual y – y p = difference between the y value of a data point on the scatter diagram and the y value of the point on the least-squares line with the same x value

37 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 37 The Residual difference between the y value of a data point and the y value of the point on the line with the same x value

38 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 38 Standard Error of Estimate

39 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 39 Standard Error of Estimate The number of points must be greater that or equal to three. If n = 2, the line is a perfect fit and there is no need to compute S e. The nearer the points are to the least squares line, the smaller S e will be. The larger S e is, the more scattered the points are.

40 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 40 Calculating Formula for S e

41 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 41 Calculating Formula for S e Use caution in rounding. Uses quantities also used to determine the least squares line.

42 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 42 Find S e

43 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 43 Finding the Standard Error of Estimate

44 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 44 Finding S e

45 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 45 Finding S e

46 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 46 Finding S e

47 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 47 Confidence Interval for y Least squares line gives a predicted y value, y p, for a given x. Least squares line estimates the true y value. True y value is given by: y =  +  x +   = y intercept  = slope  = random error

48 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 48 For a Specific x, a c Confidence Interval for y

49 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 49 For a Specific x, a c Confidence Interval for y

50 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 50 For a Specific x, a c Confidence Interval for y

51 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 51 For a Specific x, a c Confidence Interval for y

52 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 52 For a Specific x, a c Confidence Interval for y

53 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 53 Find a 95% confidence interval for the number of minutes for a trip of eight miles

54 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 54 The least squares line and prediction, y p : y = a + bx y = 2.8 + 1.7x For x = 8, y p = 2.8 + 1.7(8) = 16.4

55 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 55 For x = 8, a c Confidence Interval for y

56 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 56 Finding E

57 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 57 Finding E

58 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 58 For x = 8, a 95% Confidence Interval for y

59 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 59 For x = 8 miles we are 95% sure that the trip will take between 11.3 and 21.5 minutes.

60 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 60 Confidence Interval for y at a Specific x Uses: The values of E increase as x is chosen further from the mean of the x values. Confidence interval for y becomes wider for values of x further from the mean.

61 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 61 Try not to use the least squares line to predict y values for x values beyond the data extremes of the sample x distribution.

62 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 62 The Linear Correlation Coefficient, r A measurement of the strength of the linear association between two variables Also called the Pearson product-moment correlation coefficient

63 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 63 Positive Linear Correlation High values of x are paired with high values of y and low values of x are paired with low values of y.

64 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 64 Negative Linear Correlation High values of x are paired with low values of y and low values of x are paired with high values of y.

65 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 65 Little or No Linear Correlation Both high and low values of x are sometimes paired with high values of y and sometimes with low values of y.

66 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 66 y x Positive Correlation

67 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 67 y x Negative Correlation

68 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 68 y x Little or No Linear Correlation

69 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 69 What type of correlation is expected? Height and weight Mileage on tires and remaining tread IQ and height Years of driving experience and insurance rates

70 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 70 Calculating the Correlation Coefficient, r

71 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 71 Linear correlation coefficient  1  r  +1

72 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 72 y x If r = 0, scatter diagram might look like:

73 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 73 y x If r = +1, all points lie on the least squares line

74 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 74 y x If r = –1, all points lie on the least squares line

75 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 75 y x – 1 < r < 0

76 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 76 y x 0 < r < 1

77 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 77 To Compute r: Complete a table, with columns listing x, y, x 2, y 2, xy Compute SS xy, SS x, and SS y Use the formula:

78 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 78 Find the Correlation Coefficient

79 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 79 Calculations:

80 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 80 The Correlation Coefficient, r = 0.9753643 r  0.98

81 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 81 A relationship between correlation coefficient, r, and the slope, b, of the least squares line:

82 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 82 A statistic related to r: the coefficient of determination = r 2

83 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 83 Coefficient of Determination a measure of the proportion of the variation in y that is explained by the regression line using x as the predicting variable

84 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 84 Formula for Coefficient of Determination

85 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 85 Interpretation of r 2 If r = 0.9753643, then what percent of the variation in minutes (y) is explained by the linear relationship with x, miles traveled? What percent is explained by other causes?

86 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 86 Interpretation of r 2 If r = 0.9753643, then r 2 =.9513355 Approximately 95 percent of the variation in minutes (y) is explained by the linear relationship with x, miles traveled. Less than five percent is explained by other causes.

87 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 87 Warning The correlation coefficient ( r) measures the strength of the relationship between two variables. Just because two variables are related does not imply that there is a cause-and- effect relationship between them.

88 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 88 Testing the Correlation Coefficient Determining whether a value of the sample correlation coefficient, r, is far enough from zero to indicate correlation in the population.

89 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 89 The Population Correlation Coefficient  = Greek letter “rho”

90 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 90 Hypotheses to Test Rho Assume that both variables x and y are normally distributed. To test if the (x, y) values are correlated in the population, set up the null hypothesis that they are not correlated: H 0 :x and y are not correlated, so  = 0.

91 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 91 H 0 :  = 0 If you believe  is positive, use a right-tailed test. H 1 :  > 0

92 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 92 H 0 :  = 0 If you believe  is negative, use a left-tailed test. H 1 :  < 0

93 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 93 H 0 :  = 0 If you believe  is not equal to zero, use a two-tailed test. H 1 :   0

94 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 94 Convert r to a Student’s t Distribution

95 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 95 A researcher wishes to determine (at 5% level of significance) if there is a positive correlation between x, the number of hours per week a child watches television and y, the cholesterol measurement for the child. Assume that both x and y are normally distributed.

96 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 96 Correlation Between Hours of Television and Cholesterol Suppose that a sample of x and y values for 25 children showed the correlation coefficient, r to be 0.42. Use a right-tailed test. The null hypothesis: H 0 :  = 0 The alternate hypothesis: H 1 :  > 0  = 0.05

97 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 97 Convert the sample statistic r = 0.42 to t using n = 25

98 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 98 Find critical t value for right- tailed test with  = 0.05 Use Table 6. d.f. = 25 - 2 = 23. t = 1.714 2.22 > 1.714 Reject the null hypothesis. Conclude that there is a positive correlation between the variables.

99 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 99 P Value Approach Use Table 6 in Appendix II, d.f. = 23 Our t value =2.22 is between 2.069 and 2.500. This gives P between 0.025 and 0.010. Since we would reject H 0 for any   P, we reject H 0 for  = 0.05. We conclude that there is a positive correlation between the variables.

100 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 100 Conclusion We conclude that there is a positive correlation between the number of hours spent watching television and the cholesterol measurement.

101 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 101 Note Even though a significance test indicates the existence of a correlation between x and y in the population, it does not signify a cause-and-effect relationship.

102 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 102 Testing the Slope  = slope of the population based least squares line. b = slope of the sample based least squares line.

103 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 103 To test the slope: Use H 0 : The population slope = zero,  = 0 H 1 may be  > 0 or  < 0 or   0 Convert b to a Student’s t distribution:

104 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 104 Standard Error for b

105 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 105 Test the Slope

106 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 106 We have: The least squares line: y = 2.8 + 1.7x Slope = b = 1.7 S e  1.85 SS x  115.4 We suspect the slope  is positive.

107 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 107 Hypothesis Test H 0 :  = 0 H 1 :  > 0 Use 1% level of significance. Convert the sample test statistic b = 1.7 to a t value.

108 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 108 t value For d.f. = 7 - 1 = 5 and  ´ = 0.01, critical value of t = 3.365. From Table 6, we note that P < 0.005. Since we would reject H 0 for any   P, we reject H 0 for  = 0.01. We conclude that  is positive.

109 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 109 Confidence Intervals for the Slope  We wish to estimate the slope of the population-based least squares line.

110 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 110 Confidence Intervals for the Slope  = slope of the population based least squares line. b = slope of the sample based least squares line.

111 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 111 To determine a confidence interval for  : Convert b to a Student’s t distribution:

112 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 112 A c Confidence Interval for 

113 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 113 b – E <  < b + E

114 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 114 Find a 95% Confidence Interval for 

115 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 115 We have: The least squares line: y = 2.8 + 1.7x Slope = b = 1.7 S e  1.85 SS x  115.4 c = 95% = 0.95 d.f. = n - 2 = 7 - 2 = 5 t 0.75 = 2.571

116 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 116 b – E <  < b + E

117 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 117 Conclusion: We are 95% confident that the true slope of the regression line is between 1.26 and 2.14.

118 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 118 Multiple Regression More than a single random variable is used in the computation of predictions.

119 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 119 Common formula for linear relationships among more than two variables: y = b 0 + b 1 x 1 + b 2 x 2 + … + b k x k y = response variable x 1, x 2, …, x k = explanatory variables, variables on which predictions will be based b 0, b 1, b 2, …, b k = coefficients obtained from least squares criterion

120 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 120 Regression Model A collection of random variables with a number of properties

121 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 121 Properties of a Regression Model One variable is identified as response variable. All other variables are explanatory variables. For any application there will be a collection of numerical values for each variable.

122 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 122 Properties of a Regression Model Using numerical data values, least squares criterion the least-squares equation (regression equation) can be constructed. Usually includes a measure of “goodness of fit” of the regression equation to the data values.

123 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 123 Properties of a Regression Model Allows us to supply given values of explanatory variables in order to predict corresponding value of the response variable. A c% confidence interval can be constructed for least-squares criterion.

124 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 124 “Goodness of Fit” of Least- Squares Regression Equation May be measured by coefficient of multiple determination, r 2

125 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 125 Multiple regression models are analayzed by computer programs such as: ComputerStat Minitab Excel


Download ppt "Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith."

Similar presentations


Ads by Google