Presentation is loading. Please wait.

Presentation is loading. Please wait.

Correlation and Regression

Similar presentations


Presentation on theme: "Correlation and Regression"— Presentation transcript:

1 Correlation and Regression
11-1 Chapter 11 Correlation and Regression

2 Outline 11-1 Introduction 11-2 Scatter Plots 11-3 Correlation
11-4 Regression

3 Outline 11-3 11-5 Coefficient of Determination and Standard Error of Estimate

4 Objectives Draw a scatter plot for a set of ordered pairs.
11-4 Draw a scatter plot for a set of ordered pairs. Find the correlation coefficient. Test the hypothesis H0:  = 0. Find the equation of the regression line.

5 Objectives Find the coefficient of determination.
11-5 Find the coefficient of determination. Find the standard error of estimate. Find a prediction interval.

6 11-2 Scatter Plots 11-6 A scatter plot is a graph of the ordered pairs (x, y) of numbers consisting of the independent variable, x, and the dependent variable, y.

7 11-2 Scatter Plots - Example
11-7 Construct a scatter plot for the data obtained in a study of age and systolic blood pressure of six randomly selected subjects. The data is given on the next slide.

8 11-2 Scatter Plots - Example
11-8

9 11-2 Scatter Plots - Example
11-9 Positive Relationship

10 11-2 Scatter Plots - Other Examples
11-10 Negative Relationship

11 11-2 Scatter Plots - Other Examples
11-11 11-2 Scatter Plots - Other Examples 7 6 5 4 3 2 1 x y 7 6 5 4 3 2 1 X Y No Relationship

12 11-3 Correlation Coefficient
11-12 The correlation coefficient computed from the sample data measures the strength and direction of a relationship between two variables. Sample correlation coefficient, r. Population correlation coefficient, 

13 11-3 Range of Values for the Correlation Coefficient
11-13 Strong negative relationship No linear relationship Strong positive relationship  

14 11-3 Formula for the Correlation Coefficient r
11-14 n xy x y r n x x 2 2 n y 2 y 2 Where n is the number of data pairs

15 11-3 Correlation Coefficient - Example (Verify)
11-15 Compute the correlation coefficient for the age and blood pressure data.

16 11-3 The Significance of the Correlation Coefficient
11-16 The population corelation coefficient, , is the correlation between all possible pairs of data values (x, y) taken from a population.

17 11-3 The Significance of the Correlation Coefficient
11-17 H0: = H1:  0 This tests for a significant correlation between the variables in the population.

18 11-3 Formula for the t tests for the Correlation Coefficient
11-18 n 2 t 1 r 2 with d . f . n 2

19 11-3 Example 11-19 Test the significance of the correlation coefficient for the age and blood pressure data. Use  = 0.05 and r = Step 1: State the hypotheses. H0: = H1:  0

20 11-3 Example 11-20 Step 2: Find the critical values. Since  = 0.05 and there are 6 – 2 = 4 degrees of freedom, the critical values are t = and t = –2.776. Step 3: Compute the test value t = (verify).

21 11-3 Example 11-21 Step 4: Make the decision. Reject the null hypothesis, since the test value falls in the critical region (4.059 > 2.776). Step 5: Summarize the results. There is a significant relationship between the variables of age and blood pressure.

22 11-4 Regression 11-22 The scatter plot for the age and blood pressure data displays a linear pattern. We can model this relationship with a straight line. This regression line is called the line of best fit or the regression line. The equation of the line is y  = a + bx.

23 11-4 Formulas for the Regression Line y  = a + bx.
11-23 y x 2 x xy a n x 2 x 2 n xy x y b n x 2 2 x Where a is the y  intercept and b is the slope of the line.

24 11-4 Example 11-24 Find the equation of the regression line for the age and the blood pressure data. Substituting into the formulas give a = and b = (verify). Hence, y  = x. Note, a represents the intercept and b the slope of the line.

25 11-4 Example 11-25 y  = x

26 11-4 Using the Regression Line to Predict
11-26 The regression line can be used to predict a value for the dependent variable (y) for a given value of the independent variable (x). Caution: Use x values within the experimental region when predicting y values.

27 11-4 Example 11-27 Use the equation of the regression line to predict the blood pressure for a person who is 50 years old. Since y  = x, then y  = (50) =  129. Note that the value of 50 is within the range of x values.

28 11-5 Coefficient of Determination and Standard Error of Estimate
11-28 The coefficient of determination, denoted by r2, is a measure of the variation of the dependent variable that is explained by the regression line and the independent variable.

29 11-5 Coefficient of Determination and Standard Error of Estimate
11-29 r2 is the square of the correlation coefficient. The coefficient of nondetermination is (1 – r2). Example: If r = 0.90, then r2 = 0.81.

30 11-5 Coefficient of Determination and Standard Error of Estimate
11-30 The standard error of estimate, denoted by sest, is the standard deviation of the observed y values about the predicted y  values. The formula is given on the next slide.

31 11-5 Formula for the Standard Error of Estimate
11-31 y y 2 s est n 2 or y a y 2 b xy s n est 2

32 11-5 Standard Error of Estimate - Example
11-32 From the regression equation, y  = x and n = 6, find sest. Here, a = 55.57, b = 8.13, and n = 6. Substituting into the formula gives sest = 6.48 (verify).

33 11-5 Prediction Interval 11-33 A prediction interval is an interval constructed about a predicted y value, y , for a specified x value.

34 11-5 Prediction Interval 11-34 For given  value, we can state with (1 – )100% confidence that the interval will contain the actual mean of the y values that correspond to the given value of x.

35 11-5 Formula for the Prediction Interval about a Value y
11-35 2 1 n ( x - X ) y - t s 1 + + a 2 2 est n 2 ( ) n å x - å x 2 1 n ( x - X ) y + t s 1 + + a 2 est n 2 2 ( ) n å x - å x with d . f . n 2

36 11-5 Prediction interval - Example
11-36 A researcher collects the data shown on the next slide and determines that there is a significant relationship between the age of a copy machine and its monthly maintenance cost. The regression equation is y  = x. Find the 95% prediction interval for the monthly maintenance cost of a machine that is 3 years old.

37 11-5 Prediction Interval - Example
11-37 A 1 $62 B 2 $78 C 3 $70 D 4 $90 E 4 $93 F 6 $103

38 11-5 Prediction Interval - Example
11-38 Step 1: Find x, x2 and . x = 20, x2 = 82, Step 2: Find y  for x = 3. y  = (3) = 79.96 Step 3: Find sest sest = 6.48 as shown in previous example.

39 11-5 Prediction Interval - Example
11-39 Step 4: Substitute in the formula and solve. t/2 = 2.776, d.f. = 6 – 2 = 4 for 95% < y < (verify) Hence, one can be 95% confident that the interval < y < contains the actual value of y.


Download ppt "Correlation and Regression"

Similar presentations


Ads by Google