Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-1 Overview Overview 10-2 Correlation 10-3 Regression-3 Regression.

Slides:



Advertisements
Similar presentations
Sections 10-1 and 10-2 Review and Preview and Correlation.
Advertisements

Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Correlation and Regression
Chapter 4 The Relation between Two Variables
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
SIMPLE LINEAR REGRESSION
Correlation and Regression Analysis
Simple Linear Regression and Correlation
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
Correlation and Linear Regression
STATISTICS ELEMENTARY C.M. Pascual
Linear Regression.
Chapter 10 Correlation and Regression
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Relationship of two variables
Linear Regression and Correlation
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Correlation.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Correlation and Regression
Sections 9-1 and 9-2 Overview Correlation. PAIRED DATA Is there a relationship? If so, what is the equation? Use that equation for prediction. In this.
1 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Probabilistic and Statistical Techniques 1 Lecture 24 Eng. Ismail Zakaria El Daour 2010.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Chapter 10 Correlation and Regression
Correlation & Regression
Examining Relationships in Quantitative Research
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Chapter 10 Correlation and Regression Lecture 1 Sections: 10.1 – 10.2.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
1 MVS 250: V. Katch S TATISTICS Chapter 5 Correlation/Regression.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Slide Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Lecture Slides Elementary Statistics Twelfth Edition
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
Review and Preview and Correlation
Correlation and Simple Linear Regression
Elementary Statistics
Correlation and Simple Linear Regression
Lecture Slides Elementary Statistics Thirteenth Edition
Correlation and Regression
Chapter 10 Correlation and Regression
Correlation and Simple Linear Regression
Lecture Slides Elementary Statistics Eleventh Edition
Lecture Slides Elementary Statistics Tenth Edition
Simple Linear Regression and Correlation
Correlation and Regression Lecture 1 Sections: 10.1 – 10.2
Lecture Slides Elementary Statistics Eleventh Edition
Created by Erin Hodgess, Houston, Texas
Chapter 9 Correlation and Regression
Presentation transcript:

Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-1 Overview Overview 10-2 Correlation 10-3 Regression-3 Regression 10-4 Variation and Prediction Intervals-4 Variation and Prediction Intervals 10-5 Rank Correlation-5 Rank Correlation

Slide 2 Copyright © 2004 Pearson Education, Inc. Created by Erin Hodgess, Houston, Texas Section 10-1 & 10-2 Overview and Correlation and Regression

Slide 3 Copyright © 2004 Pearson Education, Inc. Overview Paired Data  Is there a relationship?  If so, what is the equation?  Use that equation for prediction.

Slide 4 Copyright © 2004 Pearson Education, Inc. Definition  A correlation exists between two variables when one of them is related to the other in some way.

Slide 5 Copyright © 2004 Pearson Education, Inc.  A Scatterplot (or scatter diagram) is a graph in which the paired ( x, y ) sample data are plotted with a horizontal x- axis and a vertical y- axis. Each individual ( x, y ) pair is plotted as a single point. Definition

Slide 6 Copyright © 2004 Pearson Education, Inc. Scatter Diagram of Paired Data

Slide 7 Copyright © 2004 Pearson Education, Inc. Positive Linear Correlation Figure 10-2 Scatter Plots

Slide 8 Copyright © 2004 Pearson Education, Inc. Negative Linear Correlation Figure 10-2 Scatter Plots

Slide 9 Copyright © 2004 Pearson Education, Inc. No Linear Correlation Figure 10-2 Scatter Plots

Slide 10 Copyright © 2004 Pearson Education, Inc. Definition The linear correlation coefficient r measures strength of the linear relationship between paired x and y values in a sample.

Slide 11 Copyright © 2004 Pearson Education, Inc. Assumptions 1. The sample of paired data ( x, y ) is a random sample. 2. The pairs of ( x, y ) data have a bivariate normal distribution.

Slide 12 Copyright © 2004 Pearson Education, Inc. Notation for the Linear Correlation Coefficient n = number of pairs of data presented  denotes the addition of the items indicated.  x denotes the sum of all x- values.  x 2 indicates that each x - value should be squared and then those squares added. (  x ) 2 indicates that the x- values should be added and the total then squared.  xy indicates that each x -value should be first multiplied by its corresponding y- value. After obtaining all such products, find their sum. r represents linear correlation coefficient for a sample  represents linear correlation coefficient for a population

Slide 13 Copyright © 2004 Pearson Education, Inc. n  xy – (  x)(  y) n(  x 2 ) – (  x) 2 n(  y 2 ) – (  y) 2 r =r = Formula 10-1 The linear correlation coefficient r measures the strength of a linear relationship between the paired values in a sample. Calculators can compute r  (rho) is the linear correlation coefficient for all paired data in the population. Definition

Slide 14 Copyright © 2004 Pearson Education, Inc.  Round to three decimal places so that it can be compared to critical values in Table A-6.  Use calculator or computer if possible. Rounding the Linear Correlation Coefficient r

Slide 15 Copyright © 2004 Pearson Education, Inc Data x y Calculating r

Slide 16 Copyright © 2004 Pearson Education, Inc. Calculating r

Slide 17 Copyright © 2004 Pearson Education, Inc Data x y Calculating r n  xy – (  x)(  y) n(  x 2 ) – (  x) 2 n(  y 2 ) – (  y) 2 r =r =  48  – (10)(20) 4(36) – (10) 2 4(120) – (20) 2 r =r = – r =r = = – 0.135

Slide 18 Copyright © 2004 Pearson Education, Inc. Interpreting the Linear Correlation Coefficient  If the absolute value of r exceeds the value in Table A - 6, conclude that there is a significant linear correlation.  Otherwise, there is not sufficient evidence to support the conclusion of significant linear correlation.

Slide 19 Copyright © 2004 Pearson Education, Inc. Example: Boats and Manatees Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant linear correlation between the number of registered boats and the number of manatees killed by boats. Using the same procedure previously illustrated, we find that r = Referring to Table A-6, we locate the row for which n =10. Using the critical value for  =5, we have Because r = 0.922, its absolute value exceeds 0.632, so we conclude that there is a significant linear correlation between number of registered boats and number of manatee deaths from boats.

Slide 20 Copyright © 2004 Pearson Education, Inc. Properties of the Linear Correlation Coefficient r 1. –1  r  1 2. Value of r does not change if all values of either variable are converted to a different scale. 3. The r is not affected by the choice of x and y. interchange x and y and the value of r will not change. 4. r measures strength of a linear relationship.

Slide 21 Copyright © 2004 Pearson Education, Inc. Interpreting r: Explained Variation The value of r 2 is the proportion of the variation in y that is explained by the linear relationship between x and y.

Slide 22 Copyright © 2004 Pearson Education, Inc. Using the boat/manatee data in Table 9-1, we have found that the value of the linear correlation coefficient r = What proportion of the variation of the manatee deaths can be explained by the variation in the number of boat registrations? With r = 0.922, we get r 2 = We conclude that (or about 85%) of the variation in manatee deaths can be explained by the linear relationship between the number of boat registrations and the number of manatee deaths from boats. This implies that 15% of the variation of manatee deaths cannot be explained by the number of boat registrations. Example: Boats and Manatees

Slide 23 Copyright © 2004 Pearson Education, Inc. Common Errors Involving Correlation 1. Causation: It is wrong to conclude that correlation implies causality. 2. Averages: Averages suppress individual variation and may inflate the correlation coefficient. 3. Linearity: There may be some relationship between x and y even when there is no significant linear correlation.

Slide 24 Copyright © 2004 Pearson Education, Inc. FIGURE 10-3 Scatterplot of Distance above Ground and Time for Object Thrown Upward Common Errors Involving Correlation

Slide 25 Copyright © 2004 Pearson Education, Inc. Formal Hypothesis Test  We wish to determine whether there is a significant linear correlation between two variables.  We present two methods.  Both methods let H 0 :  =  (no significant linear correlation) H 1 :    (significant linear correlation)

Slide 26 Copyright © 2004 Pearson Education, Inc. FIGURE 10-4 Testing for a Linear Correlation

Slide 27 Copyright © 2004 Pearson Education, Inc. Test statistic: Critical values: Use Table A-3 with degrees of freedom = n – 2 1 – r 2 n – 2 r t = Method 1: Test Statistic is t (follows format of earlier chapters)

Slide 28 Copyright © 2004 Pearson Education, Inc.  Test statistic: r  Critical values: Refer to Table A-6 (no degrees of freedom) Method 2: Test Statistic is r (uses fewer calculations)

Slide 29 Copyright © 2004 Pearson Education, Inc. Using the boat/manatee data in Table 9-1, test the claim that there is a linear correlation between the number of registered boats and the number of manatee deaths from boats. Use Method 1. 1 – r 2 n – 2 r t = 1 – – t = = Example: Boats and Manatees

Slide 30 Copyright © 2004 Pearson Education, Inc. Method 1: Test Statistic is t (follows format of earlier chapters) Figure 9-5

Slide 31 Copyright © 2004 Pearson Education, Inc. Using the boat/manatee data in Table 9-1, test the claim that there is a linear correlation between the number of registered boats and the number of manatee deaths from boats. Use Method 2. The test statistic is r = The critical values of r =  are found in Table A-6 with n = 10 and  = Example: Boats and Manatees

Slide 32 Copyright © 2004 Pearson Education, Inc.  Test statistic: r  Critical values: Refer to Table A-6 (10 degrees of freedom) Method 2: Test Statistic is r (uses fewer calculations) Figure 9-6

Slide 33 Copyright © 2004 Pearson Education, Inc. Using the boat/manatee data in Table 9-1, test the claim that there is a linear correlation between the number of registered boats and the number of manatee deaths from boats. Use both (a) Method 1 and (b) Method 2. Using either of the two methods, we find that the absolute value of the test statistic does exceed the critical value (Method 1: > Method 2: > 0.632); that is, the test statistic falls in the critical region. We therefore reject the null hypothesis. There is sufficient evidence to support the claim of a linear correlation between the number of registered boats and the number of manatee deaths from boats. Example: Boats and Manatees

Slide 34 Copyright © 2004 Pearson Education, Inc. Justification for r Formula Figure 10-7 r =r = (n -1 ) S x S y  (x -x) (y -y) Formula 10-1 is developed from (x, y) centroid of sample points

Slide 35 Copyright © 2004 Pearson Education, Inc. Created by Erin Hodgess, Houston, Texas Section 10-3 Regression

Slide 36 Copyright © 2004 Pearson Education, Inc. Regression Definition  Regression Equation The regression equation expresses a relationship between x (called the independent variable, predictor variable or explanatory variable, and y (called the dependent variable or response variable. The typical equation of a straight line y = mx + b is expressed in the form y = b 0 + b 1 x, where b 0 is the y - intercept and b 1 is the slope. ^

Slide 37 Copyright © 2004 Pearson Education, Inc. Assumptions 1. We are investigating only linear relationships. 2. For each x- value, y is a random variable having a normal (bell-shaped) distribution. All of these y distributions have the same variance. Also, for a given value of x, the distribution of y- values has a mean that lies on the regression line. (Results are not seriously affected if departures from normal distributions and equal variances are not too extreme.)

Slide 38 Copyright © 2004 Pearson Education, Inc. Regression Definition  Regression Equation Given a collection of paired data, the regression equation  Regression Line The graph of the regression equation is called the regression line (or line of best fit, or least squares line). y = b 0 + b 1 x ^ algebraically describes the relationship between the two variables

Slide 39 Copyright © 2004 Pearson Education, Inc. Notation for Regression Equation y -intercept of regression equation  0 b 0 Slope of regression equation  1 b 1 Equation of the regression line y =  0 +  1 x y = b 0 + b 1 x Population Parameter Sample Statistic ^

Slide 40 Copyright © 2004 Pearson Education, Inc. Formula for b 0 and b 1 Formula 10-2 n(  xy) – (  x) (  y) b 1 = (slope) n(  x 2 ) – (  x) 2 b 0 = y – b 1 x ( y -intercept) Formula 10-3 calculators or computers can compute these values

Slide 41 Copyright © 2004 Pearson Education, Inc. If you find b 1 first, then Formula 10-4 b 0 = y - b 1 x Can be used for Formula 9-2, where y is the mean of the y -values and x is the mean of the x values

Slide 42 Copyright © 2004 Pearson Education, Inc. The regression line fits the sample points best.

Slide 43 Copyright © 2004 Pearson Education, Inc. Rounding the y -intercept b 0 and the slope b 1  Round to three significant digits.  If you use the formulas 10-2 and 10-3, try not to round intermediate values.

Slide 44 Copyright © 2004 Pearson Education, Inc Data x y Calculating the Regression Equation In Section 9-2, we used these values to find that the linear correlation coefficient of r = – Use this sample to find the regression equation.

Slide 45 Copyright © 2004 Pearson Education, Inc Data x y Calculating the Regression Equation n = 4  x = 10  y = 20  x 2 = 36  y 2 = 120  xy = 48 n(  xy) – (  x) (  y) n(  x 2 ) –(  x) 2 b 1 = 4(48) – (10) (20) 4(36) – (10) 2 b 1 = –8 44 b 1 = = –

Slide 46 Copyright © 2004 Pearson Education, Inc Data x y Calculating the Regression Equation n = 4  x = 10  y = 20  x 2 = 36  y 2 = 120  xy = 48 b 0 = y – b 1 x 5 – (– )(2.5) = 5.45

Slide 47 Copyright © 2004 Pearson Education, Inc Data x y Calculating the Regression Equation n = 4  x = 10  y = 20  x 2 = 36  y 2 = 120  xy = 48 The estimated equation of the regression line is: y = 5.45 – x ^

Slide 48 Copyright © 2004 Pearson Education, Inc. Given the sample data in Table 9-1, find the regression equation. Using the same procedure as in the previous example, we find that b 1 = 2.27 and b 0 = –113. Hence, the estimated regression equation is: y = – x ^ Example: Boats and Manatees

Slide 49 Copyright © 2004 Pearson Education, Inc. Given the sample data in Table 9-1, find the regression equation. Example: Boats and Manatees

Slide 50 Copyright © 2004 Pearson Education, Inc. In predicting a value of y based on some given value of x If there is not a significant linear correlation, the best predicted y -value is y. Predictions 2.If there is a significant linear correlation, the best predicted y -value is found by substituting the x -value into the regression equation.

Slide 51 Copyright © 2004 Pearson Education, Inc. Figure 9-8 Predicting the Value of a Variable

Slide 52 Copyright © 2004 Pearson Education, Inc. Given the sample data in Table 9-1, we found that the regression equation is y = – x. Assume that in 2001 there were 850,000 registered boats. Because Table 9-1 lists the numbers of registered boats in tens of thousands, this means that for 2001 we have x = 85. Given that x = 85, find the best predicted value of y, the number of manatee deaths from boats. ^ Example: Boats and Manatees

Slide 53 Copyright © 2004 Pearson Education, Inc. We must consider whether there is a linear correlation that justifies the use of that equation. We do have a significant linear correlation (with r = 0.922). Given the sample data in Table 9-1, we found that the regression equation is y = – x. Given that x = 85, find the best predicted value of y, the number of manatee deaths from boats. ^ Example: Boats and Manatees

Slide 54 Copyright © 2004 Pearson Education, Inc. Given the sample data in Table 9-1, we found that the regression equation is y = – x. Given that x = 85, find the best predicted value of y, the number of manatee deaths from boats. ^ Example: Boats and Manatees y = – x – (85) = 80.0 ^ The predicted number of manatee deaths is The actual number of manatee deaths in 2001 was 82, so the predicted value of 80.0 is quite close.

Slide 55 Copyright © 2004 Pearson Education, Inc. 1. If there is no significant linear correlation, don’t use the regression equation to make predictions. 2. When using the regression equation for predictions, stay within the scope of the available sample data. 3. A regression equation based on old data is not necessarily valid now. 4. Don’t make predictions about a population that is different from the population from which the sample data was drawn. Guidelines for Using The Regression Equation

Slide 56 Copyright © 2004 Pearson Education, Inc. Definitions  Marginal Change: The marginal change is the amount that a variable changes when the other variable changes by exactly one unit.  Outlier: An outlier is a point lying far away from the other data points.  Influential Points: An influential point strongly affects the graph of the regression line.

Slide 57 Copyright © 2004 Pearson Education, Inc. Definitions  Residual for a sample of paired ( x, y ) data, the difference ( y - y ) between an observed sample y -value and the value of y, which is the value of y that is predicted by using the regression equation. ^ ^ Residuals and the Least-Squares Property

Slide 58 Copyright © 2004 Pearson Education, Inc. Definitions  Residual for a sample of paired ( x, y ) data, the difference ( y - y ) between an observed sample y -value and the value of y, which is the value of y that is predicted by using the regression equation.  Least-Squares Property A straight line satisfies this property if the sum of the squares of the residuals is the smallest sum possible. ^ Residuals and the Least-Squares Property ^

Slide 59 Copyright © 2004 Pearson Education, Inc. x y y = x ^ Figure 9-9 Residuals and the Least-Squares Property

Slide 60 Copyright © 2004 Pearson Education, Inc. Created by Erin Hodgess, Houston, Texas Section 10-4 Variation and Prediction Intervals

Slide 61 Copyright © 2004 Pearson Education, Inc. Definitions We consider different types of variation that can be used for two major applications:  To determine the proportion of the variation in y that can be explained  by the linear relationship between x and y. 2. To construct interval estimates of predicted y -values. Such intervals are called prediction intervals.

Slide 62 Copyright © 2004 Pearson Education, Inc. Definitions Total Deviation The total deviation from the mean of the particular point ( x, y ) is the vertical distance y – y, which is the distance between the point ( x, y ) and the horizontal line passing through the sample mean y.

Slide 63 Copyright © 2004 Pearson Education, Inc. Definitions Total Deviation The total deviation from the mean of the particular point ( x, y ) is the vertical distance y – y, which is the distance between the point ( x, y ) and the horizontal line passing through the sample mean y. Explained Deviation is the vertical distance y - y, which is the distance between the predicted y- value and the horizontal line passing through the sample mean y. ^

Slide 64 Copyright © 2004 Pearson Education, Inc. Definitions Unexplained Deviation is the vertical distance y - y, which is the vertical distance between the point ( x, y ) and the regression line. (The distance y - y is also called a residual, as defined in Section 9-3.). ^ ^

Slide 65 Copyright © 2004 Pearson Education, Inc. Figure Unexplained, Explained, and Total Deviation

Slide 66 Copyright © 2004 Pearson Education, Inc. (total deviation) = (explained deviation) + (unexplained deviation) ( y - y ) = ( y - y ) + (y - y ) ^ ^ (total variation) = (explained variation) + (unexplained variation)  ( y - y ) 2 =  ( y - y ) 2 +  (y - y) 2 ^^ Formula 10-4

Slide 67 Copyright © 2004 Pearson Education, Inc. Definition r 2 = explained variation. total variation or simply square r (determined by Formula 10-1, section 10-2) Coefficient of determination the amount of the variation in y that is explained by the regression line

Slide 68 Copyright © 2004 Pearson Education, Inc. The standard error of estimate is a measure of the differences (or distances) between the observed sample y values and the predicted values y that are obtained using the regression equation. Prediction Intervals ^ Definition

Slide 69 Copyright © 2004 Pearson Education, Inc. Standard Error of Estimate s e = or s e =  y 2 – b 0  y – b 1  xy n – 2 Formula 10-5  ( y – y ) 2 n – 2 ^

Slide 70 Copyright © 2004 Pearson Education, Inc. Given the sample data in Table 9-1, we found that the regression equation is y = – x. Find the standard error of estimate s e for the boat/manatee data. ^ n = 10  y 2 =  y = 558  xy = b 0 = – b 1 = s e = n - 2  y 2 - b 0  y - b 1  xy s e = 10 – –(– )(558) – ( )(42414) Example: Boats and Manatees

Slide 71 Copyright © 2004 Pearson Education, Inc. Given the sample data in Table 9-1, we found that the regression equation is y = – x. Find the standard error of estimate s e for the boat/manatee data. ^ n = 10  y 2 =  y = 558  xy = b 0 = – b 1 = Example: Boats and Manatees s e = = 6.61

Slide 72 Copyright © 2004 Pearson Education, Inc. y - E < y < y + E ^ ^ Prediction Interval for an Individual y where E = t   2 s e n(x2)n(x2) – (  x) 2 n(x 0 – x) n x 0 represents the given value of x t   2 has n – 2 degrees of freedom

Slide 73 Copyright © 2004 Pearson Education, Inc. Given the sample data in Table 9-1, we found that the regression equation is y = – x. We have also found that when x = 85, the predicted number of manatee deaths is Construct a 95% prediction interval given that x = 85. ^ E = t  2 s e + n(  x 2 ) – (  x) 2 n(x 0 – x) n E = (2.306)(6.6123) 10(55289) – (741) 2 10(85 –74 ) E = 18.1 Example: Boats and Manatees

Slide 74 Copyright © 2004 Pearson Education, Inc. Given the sample data in Table 9-1, we found that the regression equation is y = – x. We have also found that when x = 85, the predicted number of manatee deaths is Construct a 95% prediction interval given that x = 85. ^ Example: Boats and Manatees y – E < y < y + E 80.6 – 18.1 < y < < y < 98.7 ^ ^

Slide 75 Copyright © 2004 Pearson Education, Inc. Created by Erin Hodgess, Houston, Texas Section 10-5 Rank Correlation

Slide 76 Copyright © 2004 Pearson Education, Inc. Created by Erin Hodgess, Houston, Texas Section 10-5 Rank Correlation

Slide 77 Copyright © 2004 Pearson Education, Inc. Rank Correlation Definition  Rank Correlation uses the ranks of sample data consisting of matched pairs.  The rank correlation test is used to test for an association between two variables  H o :  s = 0 (There is no correlation between the two variables.)  H 1 :  s  0 (There is a correlation between the two variables.)

Slide 78 Copyright © 2004 Pearson Education, Inc. Advantages 1. The nonparametric method of rank correlation can be used in a wider variety of circumstances than the parametric method of linear correlation. With rank correlation, we can analyze paired data that are ranks or can be converted to ranks. 2. Rank correlation can be used to detect some (not all) relationships that are not linear. 3. The computations for rank correlation are much simpler than the computations for linear correlation, as can be readily seen by comparing the formulas used to compute these statistics.

Slide 79 Copyright © 2004 Pearson Education, Inc. Disadvantages A disadvantage of rank correlation is its efficiency rating of 0.91, as described in Section This efficiency rating shows that with all other circumstances being equal, the nonparametric approach of rank correlation requires 100 pairs of sample data to achieve the same results as only 91 pairs of sample observations analyzed through parametric methods.

Slide 80 Copyright © 2004 Pearson Education, Inc. Assumptions 1. The sample data have been randomly selected. 2. Unlike the parametric methods of Section 10-2, there is no requirement that the sample pairs of data have a bivariate normal distribution. There is no requirement of a normal distribution for any population.

Slide 81 Copyright © 2004 Pearson Education, Inc. Notation r s = rank correlation coefficient for sample paired data ( r s is a sample statistic)  s = rank correlation coefficient for all the population data (  s is a population parameter) n = number of pairs of data d = difference between ranks for the two values within a pair r s is often called Spearman’s rank correlation coefficient

Slide 82 Copyright © 2004 Pearson Education, Inc. Test Statistic for the Rank Correlation Coefficient where each value of d is a difference between the ranks for a pair of sample data Critical values:  If n  30, refer to Table A-6  If n > 30, use Formula 9-1 r s = 1 – 6  d 2 n(n 2 – 1 )

Slide 83 Copyright © 2004 Pearson Education, Inc. Formula 9-6 where the value of z corresponds to the significance level r s = n – 1  z (critical values when n > 30)

Slide 84 Copyright © 2004 Pearson Education, Inc. Complete the computation of to get the sample statistic. Figure 12-4 Rank Correlation for Testing H 0 :  s = 0 Start Calculate the difference d for each pair of ranks by subtracting the lower rank from the higher rank. Let n equal the total number of signs. Are the n pairs of data in the form of ranks ? Convert the data of the first sample to ranks from 1 to n and then do the same for the second sample. No Square each difference d and then find the sum of those squares to get r s = 1 – 6d26d2 n(n 2 –1 ) (d2)(d2) Yes

Slide 85 Copyright © 2004 Pearson Education, Inc. Complete the computation of to get the sample statistic. r s = 1 – 6d26d2 n(n 2 – 1 ) Is n  30 ? If the sample statistic r s is positive and exceeds the positive critical value, there is a correlation. If the sample statistic r s is negative and is less than the negative critical value, there is a correlation. If the sample statistic r s is between the positive and negative critical values, there is no correlation. Find the critical values of r s in Table A-9 Calculate the critical values where z corresponds to the significance level r s = n – 1  z Yes No Figure 13-4 Rank Correlation for Testing H 0 :  s = 0

Slide 86 Copyright © 2004 Pearson Education, Inc. Example: Perceptions of Beauty Use the data in Table 12-6 to determine if there is a correlation between the rankings of men and women in terms of what they find attractive. Use a significance level of  = 0.05.

Slide 87 Copyright © 2004 Pearson Education, Inc. Example: Perceptions of Beauty Use the data in Table 12-6 to determine if there is a correlation between the rankings of men and women in terms of what they find attractive. Use a significance level of  = H 0 :  s = 0 H 1 :  s  0 n = 10 r s = 1 – 6  d 2 n(n 2 – 1 ) r s = r s = 1 – 6(74) 10(10 2 – 1 )

Slide 88 Copyright © 2004 Pearson Education, Inc. Example: Perceptions of Beauty Use the data in Table 12-6 to determine if there is a correlation between the rankings of men and women in terms of what they find attractive. Use a significance level of  = We refer to Table A-9 to determine that the critical values are  Because the test statistic of r s = does not exceed the critical value of 0.648, we fail to reject the null hypothesis. There is no sufficient evidence to support a claim of a correlation between the rankings of men and women.

Slide 89 Copyright © 2004 Pearson Education, Inc. Assume that the preceding example is expanded by including a total of 40 women and that the test statistic r s is found to be If the significance level of  = 0.05, what do you conclude about the correlation? Example: Perceptions of Beauty with Large Samples

Slide 90 Copyright © 2004 Pearson Education, Inc. Example: Perceptions of Beauty with Large Samples r s = n – 1  z r s = 40 – 1  1.96 =  These are the critical values.

Slide 91 Copyright © 2004 Pearson Education, Inc. The test statistic of r s = does not exceed the critical value of 0.314, so we fail to reject the null hypothesis. There is not sufficient evidence to support the claim of a correlation between men and women. Example: Perceptions of Beauty with Large Samples

Slide 92 Copyright © 2004 Pearson Education, Inc. The data in Table 12-7 are the numbers of games played and the last scores (in millions) of a Raiders of the Lost Ark pinball game. We expect that there should be an association between the number of games played and the pinball score. Is there sufficient evidence to support the claim that there is such an association? Example: Detecting a Nonlinear Pattern

Slide 93 Copyright © 2004 Pearson Education, Inc. Example: Detecting a Nonlinear Pattern

Slide 94 Copyright © 2004 Pearson Education, Inc. H 0:  s = 0 H 1:  s  0 n = 9 r s = 1 – 6  d 2 n(n 2 – 1 ) r s = 1 – 6(6) 9(9 2 – 1 ) r s = Example: Detecting a Nonlinear Pattern