Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.

Similar presentations


Presentation on theme: "Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis."— Presentation transcript:

1 Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

2 Copyright © Cengage Learning. All rights reserved. 13.5 Confidence Intervals for Regression

3 3 Confidence Intervals for Regression The confidence interval for and the prediction interval for are constructed in a similar fashion, with replacing as our point estimate. If we were to randomly select several samples from the population, construct the line of best fit for each sample, calculate for a given x using each regression line, and plot the various values (they would vary because each sample would yield a slightly different regression line), we would find that the values form a normal distribution.

4 4 Confidence Intervals for Regression That is, the sampling distribution of is normal, just as the sampling distribution of is normal. What about the appropriate standard deviation of ? The standard deviation in both cases ( and ) is calculated by multiplying the square root of the variance of the error by an appropriate correction factor. We know that the variance of the error,, is calculated by means of formula (13.8).

5 5 Confidence Intervals for Regression Before we look at the correction factors for the two cases, let’s see why they are necessary. We know that the line of best fit passes through the point, the centroid. If we draw lines with slopes equal to the extremes of that confidence interval, 1.27 to 2.51, through the point, [which is (12.3, 26.9)] on the scatter diagram, we will see that the value for fluctuates considerably for different values of x (Figure 13.11). Lines Representing the Confidence Interval for Slope FIGURE 13.11

6 6 Confidence Intervals for Regression Therefore, we should suspect a need for a wider confidence interval as we select values of x that are farther away from x. Hence we need a correction factor to adjust for the distance between x 0 and x. This factor must also adjust for the variation of the y values about. First, let’s estimate the mean value of y at a given value of x,. The confidence interval formula is: (13.16)

7 7 Confidence Intervals for Regression Note The numerator of the second term under the radical sign is the square of the distance of x 0 from. The denominator is closely related to the variance of x and has a “standardizing effect” on this term. Formula (13.16) can be modified for greater ease of calculation. Here is the new form:

8 8 Confidence Intervals for Regression Let’s compare formula (13.16) with formula (9.1): replaces, and (13.16)

9 9 Confidence Intervals for Regression the estimated standard deviation of in estimating, replaces, the standard deviation of.The degrees of freedom are now n – 2 instead of n – 1 as before.

10 10 Example 10 – Constructing a Confidence Interval for  Y|x 0 Construct a 95% confidence interval for the mean travel time for the co-workers who travel 7 miles to work (refer to Example 5 in Section 13.3). Solution: Step 1 Parameter of interest:  y|x = 7, the mean travel time for co-workers who travel 7 miles to work Step 2 a. Assumptions: The ordered pairs form a random sample, and we will assume that the y values minutes) at each x (miles) have a normal distribution. cont’d

11 11 Example 10 – Solution b. Probability distribution and formula: Student’s t-distribution and formula (13.17) c. Level of confidence: 1 –  = 0.95 Step 3 Sample information: where and therefore, = 29.17 (found in example 5 in section 13.3) S e = = 5.40 cont’d

12 12 Example 10 – Solution = 3.64 + 1.89x = 3.64 + 1.89(7) = 16.87 Step 4 a. Confidence coefficient: t (13, 0.025) = 2.16 (from Table 6 in Appendix B) b. Maximum error of estimate: Using formula (13.17), we have cont’d

13 13 Example 10 – Solution c. Lower and upper confidence limits: Thus, 12.44 to 21.30 is the 95% confidence interval for  x|y = 7. That is, with 95% confidence, the mean travel time for commuters that travel 7 miles is between 12.44 minutes (12 min, 26 sec) and 21.30 minutes (21 min, 18 sec). cont’d

14 14 Example 10 – Solution This confidence interval is shown in Figure 13.12 by the dark red vertical line. cont’d Confidence Belts for  Y|x 0 Figure 13.12

15 15 Example 10 – Solution The confidence belt showing the upper and lower boundaries of all intervals at 95% confidence is also shown in red. Notice that the boundary lines for the x values far away from become close to the two lines that represent the equations with slopes equal to the extreme values of the 95% confidence interval for the slope (see Figure 13.12). cont’d

16 16 Confidence Intervals for Regression The formula for the prediction interval of the value of a single randomly selected y is

17 17 Confidence Intervals for Regression There are three basic precautions that you need to be aware of as you work with regression analysis: 1. Remember that the regression equation is meaningful only in the domain of the x-variable studied. Estimation outside this domain is extremely dangerous; it requires that we know or assume that the relationship between x and y remains the same outside the domain of the sample data. However, although projections outside the interval may be somewhat dangerous, they may be the best predictors available.

18 18 Confidence Intervals for Regression 2. Don’t get caught by the common fallacy of applying the regression results inappropriately. Basically, the results of one sample should not be used to make inferences about a population other than the one from which the sample was drawn.

19 19 Confidence Intervals for Regression 3. Don’t jump to the conclusion that the results of the regression prove that x causes y to change. (This is perhaps the most common fallacy.) Regressions measure only movement between x and y; they never prove causation. The most common difficulty in this regard occurs because of what is called the missing variable, or third- variable, effect. That is, we observe a relationship between x and y because a third variable, one that is not in the regression, affects both x and y.


Download ppt "Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis."

Similar presentations


Ads by Google