Download presentation
Presentation is loading. Please wait.
Published byMarybeth Newman Modified over 9 years ago
1
Topic 9 Correlation Coefficient (page 185) Unit 2 Exploring Data: Comparisons and Relationships
2
Lists you will need in your calculator for this topic: ACC60 CMPG EXM1A EXM1B EXM1C EXM2A EXM2B EXM2C FCAP FWGT HMPG HOTEL HOUSE LFEXP MILE MONTH ORING PERTV … and the program CORR & SCATSIM PNUM POS PRICE PUBF PUBT RALEI RENT TEMP WGT
3
OVERVIEW You have seen how scatterplots provide useful visual information about the relationship between two quantitative variables. Just as you made use of numerical summaries of various aspects of the distribution of a single variable, it would also be handy to have a numerical measure of the association between two variables.
4
OVERVIEW This topic introduces you to just such a measure and asks you to investigate some of its properties. This measure is one of the most famous in statistics: the correlation coefficient.
5
Do the Preliminaries (page 186)
6
Essential Question What are the properties of the correlation coefficient as a numerical measure of the degree of association between two variables?
7
The __________________________, denoted by ___, is a measure of the degree to which two variables are associated. correlation coefficientr Activity 9-1 Properties of Correlation (page 186)
8
Properties of Correlation (a) D numberletterrassociation 1 strongly negative 2 3 mildly negative 4 5 virtually none 6 7 mildly positive 8 9 strongly positive G A H C E I F B
9
Please be sure you have the following lists: ACC60 … time to accelerate from 0 to 60 mph CMPG … city miles per gallon rating FCAP … fuel capacity FWGT … % front weight HMPG … highway miles per gallon rating MILE … time to cover the quarter mile PNUM … page number on which the car appeared WGT … weight of the car … and the program CORR.
10
(b) numberletterrassociation 1D strongly negative 2G 3A mildly negative 4H 5C virtually none 6E 7I mildly positive 8F 9B strongly positive -0.8876 -0.6853 -0.4516 -0.1675 -0.0671 0.2316 0.5098 0.8867 0.9943 (WGT, CMPG) (FCAP, HMPG) (WGT, MILE) (FCAP, FWGT) (PNUM, FCAP) (CMPG, FWGT) (CMPG,MILE) (CMPG, HMPG) (ACC60, MILE) (x-list, y-list) P.K. help!
11
(c)I would believe that the largest value the correlation coefficient can assume is ___ and that the smallest value the correlation coefficient can assume is ____. (d)I believe that in order for the correlation coefficient to assume its largest or smallest value, the data would have to fall … (e) The sign of the correlation matches the direction of the __________________. (+ or -) (f) The stronger the association, the closer the correlation comes to ______. The weaker the association, the closer the correlation comes to ______. 1 … exactly on a straight line. ±1 association 0
12
The correlation coefficient has to be between ___ and __. If it is equal to one of these values, then the observations form a straight ______. The sign of the correlation reflects the __________ of the association. The magnitude of the correlation indicates the _________ of the association, with values closer to ____ or ____ signifying stronger associations. +1 line direction strength +1
13
(g) Does there seem to be any relationship between temperature and month in Raleigh?_______ The data reveal a _________________________ relationship. (h) [Copy lists named MONTH and RALEI.] correlation coefficient = ____________ Are you surprised? _______ Its value seems to indicate a ___________________________ relationship. [explain] YES 0.2571 weak positive yes/no? This is not consistent with the answer to part (g). And that is because r measures strength of a linear association. Line is the root of linear! curvilinear
14
The correlation coefficient measures only ___________ (straight-line) relationships between two variables. Curvilinear relationships can go _______________ by r. Therefore, always examine a _______________ as well as the value of. Therefore, always examine a _______________ as well as the value of. linear undetected scatterplot r r
15
(i) [Copy lists named PUBT and PUBF.] correlation coefficient = ____________ Are you surprised? _______ Its value seems to indicate a _____________________________ relationship. [explain] 0.5065 moderate positive yes/no? The correlation coefficient is a higher value than what the scatterplot appears to show.
16
Essential Question Is the correlation coefficient resistant?
17
Activity 9-2 Monopoly Prices (pages 165 to 167) As of 1999, the wheeling and dealing board game Monopoly is the most played board game. It has been played by an estimated 500 million people worldwide. your guess (a)[Copy lists named PRICE and RENT.] My guess as to the value of the correlation coefficient between rent and price is … ____________.
18
(b) correlation coefficient = ____________ Are you surprised? _______ [explain] 0.9711 yes/no? The points almost lie in a straight line. (c) Boardwalk price 400 10011 Boardwalk rent 501001100050 100 guess for correlation your guess actual correlation Make a scatterplot first, guess the correlation, then calculate the correlation..9711
19
(d) Based on my analyses of Boardwalk ’ s effect, I would say that the correlation coefficient ____________ a resistant measure of association. My reason for stating this is that a single change in the data can have a ___________________________ effect on r. is not (c) Boardwalk price 400 10011 Boardwalk rent 501001100050 100 guess for correlation your guess actual correlation.9711.7940.6695.4896.5834.3936-.0189 drastic (profound)
20
The formula for the calculation for the correlation coefficient (r) is: r = where :x i denotes the ______ observation of one variable, y i denotes the ______ observation of the other variable, ______ and ______ the respective sample means, ______ and ______ the respective sample standard deviations, and n the sample size. i th sxsx sysy
21
Essential Question What are the properties of the correlation coefficient as a numerical measure of the degree of association between two variables?
22
Activity 9-3 Cars Fuel Efficiency (continued) (pages 168 & 169) x i = value of interest x = mean s x = standard deviation Remember how to calculate a z-score?
23
(a)Calculation for the z-score for the weight of a Chevy Corvette: mean for the weights = x = 2997 pounds standard deviation for the weights = s x = 357.6 pounds Corvette weight = x i = 3295 pounds Z-score Corvette weight x Z-score Corvette MPG = ( 0.833)(-1.270) = 0.833 = -1.058
24
modelweight z-score MPGz-scoreproduct BMW 318Ti2790 -0.579 230.701-0.406 BMW Z32960 -0.103 19-0.6130.063 Chevrolet Camaro3545 1.532 17-1.270-1.946 Chevy Corvette329517-1.270 Ford Mustang3270 0.763 17-1.270-0.970 Honda Prelude3040 0.120 220.3720.045 Hyundai Tiburon2705 -0.817 220.372-0.304 Mazda MX-5 Miata2365 -1.767 251.358-2.400 Mercedes Benz3020 0.064 220.3720.024 Mercury Cougar3140 0.400 20-0.285-0.114 Mitsubishi Eclipse3235 0.666 230.7010.466 Pontiac Firebird3545 1.532 18-0.942-1.443 Porsche Boxster2905 -0.257 19-0.6130.158 Saturn SC2420 -1.613 27 Toyota Celica2720 -0.775 220.372-0.288 0.833 -1.058
25
Calculation for the z-score for the MPG rating of a Saturn SC: mean for the MPG ratings = y = 20.867 mpg standard deviation for the MPG ratings = s y = 3.044 mpg Saturn SC mpg rating = y i = 27 mpg Z-score Saturn weight x Z-score Saturn MPG = ( -1.613)(2.015) = 2.015 = -3.250
26
modelweight z-score MPGz-scoreproduct BMW 318Ti2790 -0.579 230.701-0.406 BMW Z32960 -0.103 19-0.6130.063 Chevrolet Camaro3545 1.532 17-1.270-1.946 Chevy Corvette3295 0.833 17-1.270-1.058 Ford Mustang3270 0.763 17-1.270-0.970 Honda Prelude3040 0.120 220.3720.045 Hyundai Tiburon2705 -0.817 220.372-0.304 Mazda MX-5 Miata2365 -1.767 251.358-2.400 Mercedes Benz3020 0.064 220.3720.024 Mercury Cougar3140 0.400 20-0.285-0.114 Mitsubishi Eclipse3235 0.666 230.7010.466 Pontiac Firebird3545 1.532 18-0.942-1.443 Porsche Boxster2905 -0.257 19-0.6130.158 Saturn SC2420 -1.613 27 Toyota Celica2720 -0.775 220.372-0.288 2.015 -3.250
27
modelweight z-score MPGz-scoreproduct BMW 318Ti2790 -0.579 230.701-0.406 BMW Z32960 -0.103 19-0.6130.063 Chevrolet Camaro3545 1.532 17-1.270-1.946 Chevy Corvette3295 0.833 17-1.270-1.058 Ford Mustang3270 0.763 17-1.270-0.970 Honda Prelude3040 0.120 220.3720.045 Hyundai Tiburon2705 -0.817 220.372-0.304 Mazda MX-5 Miata2365 -1.767 251.358-2.400 Mercedes Benz3020 0.064 220.3720.024 Mercury Cougar3140 0.400 20-0.285-0.114 Mitsubishi Eclipse3235 0.666 230.7010.466 Pontiac Firebird3545 1.532 18-0.942-1.443 Porsche Boxster2905 -0.257 19-0.6130.158 Saturn SC2420 -1.613 272.015-3.250 Toyota Celica2720 -0.775 220.372-0.288 sum = n = 15
28
modelweight z-score MPGz-scoreproduct BMW 318Ti2790 -0.579 230.701-0.406 BMW Z32960 -0.103 19-0.6130.063 Chevrolet Camaro3545 1.532 17-1.270-1.946 Chevy Corvette3295 0.833 17-1.270-1.058 Ford Mustang3270 0.763 17-1.270-0.970 Honda Prelude3040 0.120 220.3720.045 Hyundai Tiburon2705 -0.817 220.372-0.304 Mazda MX-5 Miata2365 -1.767 251.358-2.400 Mercedes Benz3020 0.064 220.3720.024 Mercury Cougar3140 0.400 20-0.285-0.114 Mitsubishi Eclipse3235 0.666 230.7010.466 Pontiac Firebird3545 1.532 18-0.942-1.443 Porsche Boxster2905 -0.257 19-0.6130.158 Saturn SC2420 -1.613 272.015-3.250 Toyota Celica2720 -0.775 220.372-0.288 sum = - 11.423 n = 15
29
The formula for the calculation for the correlation coefficient (r) is: r = where :x i denotes the ______ observation of one variable, y i denotes the ______ observation of the other variable, ______ and ______ the respective sample means, ______ and ______ the respective sample standard deviations, and n the sample size. i th sxsx sysy
30
(b) The correlation coefficient between weight & MPG is _________. - 0.8159 n = 15, so … = - 0.8159
31
(c)The MPG z-score of most of the cars with negative weight z-scores tend to be _____________________. [explain] positive values The positive z-score values of one variable will correspond to negative z-score values of the other variable. Remember the association for weight and MPG was negative. positive value x negative value = negative value
32
Assignment Activity 9-6: Properties of Correlation (continued) (page 197) Assignment Activity 9-7: Properties of Correlation (continued) (page 197) Assignment Activity 9-8: Properties of Correlation (continued) (page 198)
33
Essential Question What is the distinction between correlation and causation?
34
Activity 9-4 Televisions and Life Expectancy (pages 193 & 194) (a) Country with the fewest people per television set is _______, with ________. Country with the most people per television set is ________, with ________. (b)[Copy lists named LFEXP and PERTV.] Make a scatterplot of Life expectancy vs. people per TV X-list is PERTV and Y-list is LFEXP U.S. 1.3 Haiti 234
36
(b)[Copy lists named LFEXP and PERTV.] Does there appear to be an association between the two variables? ___________ The association seems to indicate a ___________________________ relationship, since the countries with less people per TV have a _______________ life expectancy. strong negative YES longer
37
(c)The correlation coefficient between life expectancy and people per TV is ____________. (d)[comment] -0.8038 NO How absurd it would be to send TV’s to countries with lower life expectancies to cause their inhabitants to live longer. These variables are obviously associated with another variable. (e)If two variables have a correlation close to +1 or to -1, indicating a strong linear association between them, must there be a cause-and-effect relationship between them? _______
38
Two variables may be strongly associated (as measured by the correlation coefficient) without a _________ -and- ________ relationship existing between them. Often the explanation is that _______ variables are related to a _______ variable not being measured … which is called a __________ or ________________ variable. causeeffect both third lurking confounding
39
(f)In the case of life expectancy and television sets, a confounding variable associated with a country ’ s life expectancy and with the prevalence of televisions in the country would be … The wealth of the country The location of the country Correlation does not mean causation !
40
Assignment Activity 9-15: Ice Cream, Drownings, and Fire Damage (page 202) Assignment Activity 9-18: Space Shuttle O-Ring Failures (continued) (page 204) Assignment Activity 9-12: Monopoly Prices (continued) (page 200)
41
Essential Question Can you judge a correlation value from a scatterplot?
42
Activity 9-5 Guess the Correlation (page 195) This activity will give you practice at judging the value of a correlation coefficient by examining the scatterplot. Have fun with this activity!
43
(a)[Copy the program SCATSIM.] Before running this program, delete all lists that are not needed. Please follow the directions when you run the program. Be sure to make a guess before pressing the ENTER key. Be careful to include the (-) key for negative correlations. Record your guesses and actual values in the chart.
44
(b)Write your guess for your SCORE before pressing ENTER! (c)Record your SCORE and comment whether you are surprised. REPEAT (a) through (c) See if you can beat your previous SCORE. REPEAT (a) through (c) again! See if you can beat the highest class SCORE.
45
Be careful to include the (-) key for negative correlations. Note:This program makes lists called GUESS and ACTUA. If you miss an entry, record it on your paper, continue with the program, place GUESS and ACTUA in the SetUpEditor, then correct the entry. Run the program CORR to get your correct SCORE (correlation coefficient). Record your guesses and actual values in the chart.
46
(d)Is there evidence that your guesses got better or worse as you went along? Yes or No, only you know from your scatterplot. guess actual [explain] y = x Are your points close to or farther from the y = x line? Graph the y=x line in your calculator. Press trace to see how you did from 1st guess to 10th guess.
47
(e)Is there evidence that you are better at guessing certain values of correlation than others? Yes or No, only you know from your scatterplot. error actual ERROR = GUESS - ACTUAL ERROR is automatically made by the program SCATSIM. error = 0 (no error) If you were a great guesser, then all of your points should be on or very close to the actual (x) axis.
48
If all your guesses were perfect, this is how your scatterplots would look. (Take note this is for only 5 points.) error actual guess actual y = x
49
(f)If all of my guesses were too high by exactly 0.1, then the correlation between my guesses and the actual values would be _______. (g)If all of my guesses were too high by exactly 0.5, then the correlation between my guesses and the actual values would be _______. 1.0 error actual guess actual y = x
50
(h)If the correlation between your guesses and the actual values is 1.0, does this mean that you guessed perfectly every time? _____ A correlation of 1.0 does not necessarily indicate __________ guessing as shown. The correlation coefficient _______ the best way to determine the best guesser. NO perfect is not SORRY!
51
WRAP-UP In this topic you have discovered the very important correlation coefficient as a measure of the linear relationship between two variables. You have derived some of the properties of this measure, such as the values it can assume, how its sign and value relate to the direction and strength of the association, and its lack of resistance to outliers.
52
You have also practiced judging the direction and strength of a relationship from looking at a scatterplot. In addition, you have discovered the distinction between correlation and causation and learned that one needs to be very careful about inferring causal relationships between variables based solely on a strong correlation.
53
The next topic will expand your understanding of relationships between variables by introducing you to least squares regression, a formal mathematical model that is often useful for describing such relationships.
54
Your topic is due! Quiz on Topic 9: Correlation Coefficient
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.