Download presentation
Presentation is loading. Please wait.
Published byAbigayle Ryan Modified over 9 years ago
1
© The McGraw-Hill Companies, Inc., 2000 11-1 Chapter 11 Correlation and Regression
2
© The McGraw-Hill Companies, Inc., 2000 11-2 Outline 11-1 Introduction 11-2 Scatter Plots 11-3 Correlation 11-4 Regression
3
© The McGraw-Hill Companies, Inc., 2000 11-3 Outline 11-5 Coefficient of Determination and Standard Error of Estimate
4
© The McGraw-Hill Companies, Inc., 2000 11-4 Objectives Draw a scatter plot for a set of ordered pairs. Find the correlation coefficient. Test the hypothesis H 0 : = 0. Find the equation of the regression line.
5
© The McGraw-Hill Companies, Inc., 2000 11-5 Objectives Find the coefficient of determination. Find the standard error of estimate. Find a prediction interval.
6
© The McGraw-Hill Companies, Inc., 2000 11-6 11-2 Scatter Plots A scatter plot (x, y) x y A scatter plot is a graph of the ordered pairs (x, y) of numbers consisting of the independent variable, x, and the dependent variable, y.
7
© The McGraw-Hill Companies, Inc., 2000 11-7 11-2 Scatter Plots - 11-2 Scatter Plots - Example Construct a scatter plot for the data obtained in a study of age and systolic blood pressure of six randomly selected subjects. The data is given on the next slide.
8
© The McGraw-Hill Companies, Inc., 2000 11-8 11-2 Scatter Plots - 11-2 Scatter Plots - Example
9
© The McGraw-Hill Companies, Inc., 2000 11-9 11-2 Scatter Plots - 11-2 Scatter Plots - Example Positive Relationship
10
© The McGraw-Hill Companies, Inc., 2000 11-10 11-2 Scatter Plots - 11-2 Scatter Plots - Other Examples Negative Relationship
11
© The McGraw-Hill Companies, Inc., 2000 11-11 11-2 Scatter Plots - 11-2 Scatter Plots - Other Examples No Relationship
12
© The McGraw-Hill Companies, Inc., 2000 11-12 11-3 Correlation Coefficient correlation coefficient The correlation coefficient computed from the sample data measures the strength and direction of a relationship between two variables. Sample correlation coefficient, r. Population correlation coefficient,
13
© The McGraw-Hill Companies, Inc., 2000 11-13 11-3 Range of Values for the Correlation Coefficient Strong negative relationship Strong positive relationship No linear relationship
14
© The McGraw-Hill Companies, Inc., 2000 11-14 11-3 Formula for the Correlation Coefficient r r nxyxy nxxnyy 2 2 2 2 Where n is the number of data pairs
15
© The McGraw-Hill Companies, Inc., 2000 11-15 11-3 Correlation Coefficient - 11-3 Correlation Coefficient - Example (Verify) correlation coefficient Compute the correlation coefficient for the age and blood pressure data.
16
© The McGraw-Hill Companies, Inc., 2000 11-16 11-3 The Significance of the Correlation Coefficient population corelation coefficient The population corelation coefficient, , is the correlation between all possible pairs of data values (x, y) taken from a population.
17
© The McGraw-Hill Companies, Inc., 2000 11-17 11-3 The Significance of the Correlation Coefficient H 0 : = 0 H 1 : 0 This tests for a significant correlation between the variables in the population.
18
© The McGraw-Hill Companies, Inc., 2000 11-18 11-3 Formula for the t tests for the Correlation Coefficient t n r withdfn 2 1 2 2..
19
© The McGraw-Hill Companies, Inc., 2000 11-19 11-3 11-3 Example Test the significance of the correlation coefficient for the age and blood pressure data. Use = 0.05 and r = 0.897. Step 1: Step 1: State the hypotheses. H 0 : = 0 H 1 : 0
20
© The McGraw-Hill Companies, Inc., 2000 11-20 Step 2: Step 2: Find the critical values. Since = 0.05 and there are 6 – 2 = 4 degrees of freedom, the critical values are t = +2.776 and t = –2.776. Step 3: Step 3: Compute the test value. t = 4.059 (verify). 11-3 11-3 Example
21
© The McGraw-Hill Companies, Inc., 2000 11-21 Step 4: Step 4: Make the decision. Reject the null hypothesis, since the test value falls in the critical region (4.059 > 2.776). Step 5: Step 5: Summarize the results. There is a significant relationship between the variables of age and blood pressure. 11-3 11-3 Example
22
© The McGraw-Hill Companies, Inc., 2000 11-22 The scatter plot for the age and blood pressure data displays a linear pattern. We can model this relationship with a straight line. This regression line is called the line of best fit or the regression line. The equation of the line is y = a + bx. 11-4 Regression
23
© The McGraw-Hill Companies, Inc., 2000 11-23 11-4 Formulas for the Regression Line 11-4 Formulas for the Regression Line y = a + bx. a yxxxy nxx b n xy nxx 2 2 2 2 2 Where a is the y intercept and b is the slope of the line.
24
© The McGraw-Hill Companies, Inc., 2000 11-24 11-4 11-4 Example Find the equation of the regression line for the age and the blood pressure data. Substituting into the formulas give a = 81.048 and b = 0.964 (verify). Hence, y = 81.048 + 0.964x. ainterceptb slope Note, a represents the intercept and b the slope of the line.
25
© The McGraw-Hill Companies, Inc., 2000 11-25 11-4 11-4 Example y = 81.048 + 0.964x
26
© The McGraw-Hill Companies, Inc., 2000 11-26 11-4 Using the Regression Line to Predict The regression line can be used to predict a value for the dependent variable (y) for a given value of the independent variable (x). Caution: Caution: Use x values within the experimental region when predicting y values.
27
© The McGraw-Hill Companies, Inc., 2000 11-27 11-4 11-4 Example Use the equation of the regression line to predict the blood pressure for a person who is 50 years old. Since y = 81.048 + 0.964x, then y = 81.048 + 0.964(50) = 129.248 129. Note that the value of 50 is within the range of x values.
28
© The McGraw-Hill Companies, Inc., 2000 11-28 11-5 Coefficient of Determination and Standard Error of Estimate coefficient of determination The coefficient of determination, denoted by r 2, is a measure of the variation of the dependent variable that is explained by the regression line and the independent variable.
29
© The McGraw-Hill Companies, Inc., 2000 11-29 11-5 Coefficient of Determination and Standard Error of Estimate r 2 is the square of the correlation coefficient. coefficient of nondetermination The coefficient of nondetermination is (1 – r 2 ). Example: If r = 0.90, then r 2 = 0.81.
30
© The McGraw-Hill Companies, Inc., 2000 11-30 11-5 Coefficient of Determination and Standard Error of Estimate standard error of estimate The standard error of estimate, denoted by s est, is the standard deviation of the observed y values about the predicted y values. The formula is given on the next slide.
31
© The McGraw-Hill Companies, Inc., 2000 11-31 11-5 Formula for the Standard Error of Estimate s yy n or s yaybxy n est 2 2 2 2
32
© The McGraw-Hill Companies, Inc., 2000 11-32 11-5 Standard Error of Estimate - 11-5 Standard Error of Estimate - Example From the regression equation, y = 55.57 + 8.13x and n = 6, find s est. Here, a = 55.57, b = 8.13, and n = 6. Substituting into the formula gives s est = 6.48 (verify).
33
© The McGraw-Hill Companies, Inc., 2000 11-33 11-5 Prediction Interval prediction interval A prediction interval is an interval constructed about a predicted y value, y, for a specified x value.
34
© The McGraw-Hill Companies, Inc., 2000 11-34 11-5 Prediction Interval For given value, we can state with (1 – )100% confidence that the interval will contain the actual mean of the y values that correspond to the given value of x.
35
© The McGraw-Hill Companies, Inc., 2000 11-35 11-5 Formula for the Prediction Interval about a Value y 2 2 2 )(1 1 2 xxn Xxn n est sty 2 2 2 )(1 1 2 xxn Xxn n sty 2.. nfdwith
36
© The McGraw-Hill Companies, Inc., 2000 11-36 11-5 Prediction interval - 11-5 Prediction interval - Example A researcher collects the data shown on the next slide and determines that there is a significant relationship between the age of a copy machine and its monthly maintenance cost. The regression equation is y = 55.57 + 8.13x. Find the 95% prediction interval for the monthly maintenance cost of a machine that is 3 years old.
37
© The McGraw-Hill Companies, Inc., 2000 11-37 11-5 Prediction Interval - 11-5 Prediction Interval - Example A1$62 B2$78 C3$70 D4$90 E4$93 F6$103
38
© The McGraw-Hill Companies, Inc., 2000 11-38 Step 1: Step 1: Find x, x 2 and. x = 20, x 2 = 82, Step 2: Step 2: Find y for x = 3. y = 55.57 + 8.13(3) = 79.96 Step 3: Step 3: Find s est s est = 6.48 as shown in previous example. 11-5 Prediction Interval - 11-5 Prediction Interval - Example
39
© The McGraw-Hill Companies, Inc., 2000 11-39 Step 4: Step 4: Substitute in the formula and solve. t /2 = 2.776, d.f. = 6 – 2 = 4 for 95% 60.53 < y < 99.39 (verify) Hence, one can be 95% confident that the interval 60.53 < y < 99.39 contains the actual value of y. 11-5 Prediction Interval - 11-5 Prediction Interval - Example
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.