Presentation is loading. Please wait.

Presentation is loading. Please wait.

Warm-up 8/25/14 Compare Data A to Data B using the five number summary, measure of center and measure of spread. A) 18, 33, 18, 87, 12, 23, 93, 34, 71,

Similar presentations


Presentation on theme: "Warm-up 8/25/14 Compare Data A to Data B using the five number summary, measure of center and measure of spread. A) 18, 33, 18, 87, 12, 23, 93, 34, 71,"— Presentation transcript:

1 Warm-up 8/25/14 Compare Data A to Data B using the five number summary, measure of center and measure of spread. A) 18, 33, 18, 87, 12, 23, 93, 34, 71, 91 B) 30, 16, 77, 42, 80, 15, 86, 46, 82, 55 1.1.2: Comparing Data Sets

2 Unit 1 Lesson 1.1.2—Comparing Data Sets
Standard(s)/ Element(s) MCC9-12.S.ID.2 Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (interquartile range, standard deviation) of two or more different data sets. Essential Question(s) How can you use statistics to describe a data set? How can outliers or other extreme values affect your choice of which statistics you use to describe a data set? How can two data sets be compared quantitatively? 1.1.2: Comparing Data Sets

3 Vocabulary Box plot, data, data distribution, dot plot, extreme value, first quartile, five-number summary, interquartile range, maximum, mean, mean absolute deviation, measure of center, measure of spread, measure of variability, median, minimum, negatively skewed, outlier, positively skewed, range, second quartile, sigma(lowercase), sigma(uppercase), skewed distribution, skewed to the left, skewed to the right, standard deviation, statistics, symmetric distribution, third quartile and variance 1.1.2: Comparing Data Sets

4 Introduction To compare data sets, use the same types of statistics that you use to represent or describe data sets. These statistics include measures of center and measures of spread, or variability. 1.1.2: Comparing Data Sets

5 Key Concepts Recall that the measure of center is the best single number for representing or describing a data set. The two commonly used measures of center are median and mean. Three commonly used measures of spread, or variability, are range, interquartile range, and standard deviation. When there is an outlier in one or more of the data sets being compared, the median is normally used for comparing typical data values; when there are no outliers, the mean is normally used. When comparing average data values, the mean is always used. 1.1.2: Comparing Data Sets

6 Key Concepts, continued
Comparing Data Sets To compare data sets, you need to compare measures of center and measures of spread. When comparing measures of center to compare typical values—that is, any value that falls within the data set and is not an outlier—use the following table as a guide. 1.1.2: Comparing Data Sets

7 Key Concepts, continued
Choosing Appropriate Measures of Center and Spread for Comparing Data Sets If there is an outlier, use: If there is no outlier, use: Measure of center Median (Q2) Mean ( ) Rough measure of spread Range Additional measure of spread Interquartile range (IQR) Standard deviation (σ)* *Mean absolute deviation (MAD) and variance (σ2) may be used sometimes as well. 1.1.2: Comparing Data Sets

8 Key Concepts, continued
When comparing measures of center to compare average values, use the mean. When there is an outlier, the mean is appropriate for comparison if the totals of the data sets are being compared because the mean is directly proportional to the total. Recall that a data distribution is an arrangement of data values. When the data values are displayed in a dot plot, the shape of the distribution will be either symmetric (with the values balanced on either side of the median) or skewed (with most values concentrated on one side of the median). 1.1.2: Comparing Data Sets

9 Key Concepts, continued
A distribution is skewed to the right if most of the data values are concentrated on the left; that is, there is a “tail” of few values to the right. A distribution is skewed to the left if most of the data values are concentrated on the right; that is, there is a “tail” of few values to the left. 1.1.2: Comparing Data Sets

10 Common Errors/Misconceptions
confusing the terms mean and median, and how to calculate each measure confusing the terms mean absolute deviation, variance, and standard deviation, and how to calculate each measure forgetting that when the medians are compared as the measure of center, the interquartile ranges should be compared as a measure of spread 1.1.2: Comparing Data Sets

11 Common Errors/Misconceptions, continued
forgetting that when the means are compared as the measure of center, the standard deviations should be compared as a measure of spread comparing different measures of center or spread comparing the means when comparing data sets that have one or more outliers 1.1.2: Comparing Data Sets

12 Guided Practice Example 1
The dot plots at right show the numbers of hours of service learning recorded by members of the student council and the Environmental Action Club. 1.1.2: Comparing Data Sets

13 Guided Practice: Example 1, continued
Determine which measure of center is more appropriate for comparing the data sets and then compare the values for that measure of center. Compare the values for the measures of spread that best correspond to that measure of center. Compare the values for the less appropriate measure of center and explain why that measure is less appropriate. 1.1.2: Comparing Data Sets

14 Guided Practice: Example 1, continued
Find the five-number summary for each data set. Arrange the data for the student council from least to greatest. The minimum value is 3.5. The median is the average of the two middle values of the data set. 1.1.2: Comparing Data Sets

15 Guided Practice: Example 1, continued
The median of the data for the student council is 4.5. The first quartile, Q1, is 4. The third quartile, Q3, is 7. The maximum value is 13.5. 1.1.2: Comparing Data Sets

16 Guided Practice: Example 1, continued
Arrange the data for the Environmental Action Club from least to greatest. The minimum value is 3.5. The median is the average of the two middle values of the data set. The median of the data for the Environmental Action Club is 5.5. 1.1.2: Comparing Data Sets

17 Guided Practice: Example 1, continued
The first quartile, Q1, is 4. The third quartile, Q3, is 6. The maximum value is 8. 1.1.2: Comparing Data Sets

18 Guided Practice: Example 1, continued
Find the interquartile range for each data set and use it to identify any outliers. The interquartile range is the difference between Q3 and Q1. Find the IQR for the student council, with Q3 = 7 and Q1 = 4. IQR = Q3 – Q1 IQR = (7) – (4) IQR = 3 Use the IQR to find any outliers for the student council data. 1.1.2: Comparing Data Sets

19 Guided Practice: Example 1, continued
A data value is an outlier if it is less than Q1 – 1.5(IQR) or greater than Q (IQR). Q1 – 1.5(IQR) = (4) – 1.5(3) Q (IQR) = (7) + 1.5(3) Q1 – 1.5(IQR) = 4 – Q (IQR) = Q1 – 1.5(IQR) = – Q (IQR) = 11.5 There are no data values less than –0.5, so there are no low outliers. The data set value 13.5 is greater than 11.5, so 13.5 is a high outlier. There is one outlier for the student council data: 13.5. 1.1.2: Comparing Data Sets

20 Guided Practice: Example 1, continued
Find the IQR for the Environmental Action Club, with Q3 = 6 and Q1 = 4. IQR = Q3 – Q1 IQR = (6) – (4) IQR = 2 1.1.2: Comparing Data Sets

21 Guided Practice: Example 1, continued
Use the IQR to find any outliers for the Environmental Action Club data. Q1 – 1.5(IQR) = (4) – 1.5(2) Q (IQR) = (6) + 1.5(2) Q1 – 1.5(IQR) = 4 – Q (IQR) = 6 + 3 Q1 – 1.5(IQR) = Q (IQR) = 9 There are no data set values less than 1 or greater than 9, so there are no outliers in the Environmental Action Club data set. The only outlier in these two data sets, 13.5, is a high outlier in the student council data set. 1.1.2: Comparing Data Sets

22 Guided Practice: Example 1, continued
Determine which measure of center is more appropriate for comparing the data sets. The median best represents the student council data set because that set has an outlier. Therefore, the medians of the data sets should be compared. 1.1.2: Comparing Data Sets

23 Guided Practice: Example 1, continued
Determine the corresponding appropriate measures of spread. The range is always appropriate as a rough measure of spread. The interquartile range is the additional measure of spread that is appropriate when the median is used as the measure of center. 1.1.2: Comparing Data Sets

24 Guided Practice: Example 1, continued
Find the range and interquartile range of each data set. We determined the interquartile range for each data set in step 2: Student council IQR = 3 Environmental Action Club IQR = 2 We need to find the range for each set. The range is the difference between the maximum and minimum values. Use the minimum and maximum values found in step 1. 1.1.2: Comparing Data Sets

25 Guided Practice: Example 1, continued
Find the range for the student council, using the maximum of 13.5 and the minimum of 3.5. range = maximum – minimum range = (13.5) – (3.5) range = 10 The range of the student council data is 10. 1.1.2: Comparing Data Sets

26 Guided Practice: Example 1, continued
Find the range for the Environmental Action Club, using the maximum of 8 and the minimum of 3.5. range = maximum – minimum range = (8) – (3.5) range = 4.5 The range of the Environmental Action Club data is 4.5. 1.1.2: Comparing Data Sets

27 Guided Practice: Example 1, continued Find the mean of each data set.
The mean is the average of all the values of the data set. Find the mean for the student council data. 1.1.2: Comparing Data Sets

28 Guided Practice: Example 1, continued
Formula for calculating mean Substitute values from the data set for xi, as shown below. (Repeated values are listed as products.) There are 12 data values, so n = 12. Simplify. The mean for the student council is 6. 1.1.2: Comparing Data Sets

29 Guided Practice: Example 1, continued
Find the mean for the Environmental Action Club data. Formula for calculating mean Substitute values from the data set for xi.There are 14 data values, so n = 14. Simplify. The mean for the club is approximately 1.1.2: Comparing Data Sets

30 Guided Practice: Example 1, continued
Organize your results in a table. Mean Median Range Interquartile range Student council 6 4.5 10 3 Environmental Action Club 5.321 5.5 2 1.1.2: Comparing Data Sets

31 Guided Practice: Example 1, continued
Use the table to summarize your results. Because there is an outlier in the student council data, we compared the medians for the two sets. The Environmental Action Club data has the higher median, as shown in the table. Using the median as the measure of center required comparing the range and interquartile range of each set. 1.1.2: Comparing Data Sets

32 ✔ Guided Practice: Example 1, continued
The student council data has a much higher range because of its outlier, The student council has a slightly higher interquartile range (3), indicating that the middle “half ” of its data is slightly more spread out. The less appropriate measure of center for comparing these data sets is the mean, because the high outlier has the effect of raising the mean in the student council data set. The table shows that the student council has the higher mean. 1.1.2: Comparing Data Sets

33 Guided Practice: Example 1, continued
1.1.2: Comparing Data Sets

34 Guided Practice Example 2
Two rival basketball teams each have ten players on a team. The total points scored by each player in the first five games of the season are shown below. Lady Angels: 21, 30, 8, 41, 11, 21, 26, 28, 32, 30 Lady Patriots: 27, 15, 22, 31, 26, 22, 93, 29, 5, 20 The coaches want to compare the points scored by a typical player on each team. What statistic should the coaches use? Compare those statistics. Then compare any other statistics that are appropriate so that center and spread are compared for both data sets. Identify any outliers and explain their effects. 1.1.2: Comparing Data Sets

35 Guided Practice: Example 2, continued
Find the five-number summary for each data set. Arrange the data for the Lady Angels from least to greatest. 1.1.2: Comparing Data Sets

36 Guided Practice: Example 2, continued
The minimum value is 8. The median is the average of the two middle values of the data set. The median of the data for the Lady Angels is 27. The first quartile, Q1, is 21. The third quartile, Q3, is 30. The maximum value is 41. 1.1.2: Comparing Data Sets

37 Guided Practice: Example 2, continued
Arrange the data for the Lady Patriots from least to greatest. The minimum value is 5. The median is the average of the two middle values of the data set. The median of the data for the Lady Patriots is 24. The first quartile, Q1, is 20. The third quartile, Q3, is 29. The maximum value is 93. 1.1.2: Comparing Data Sets

38 Guided Practice: Example 2, continued
Find the interquartile range for each data set and use it to identify any outliers. The interquartile range is the difference between Q3 and Q1. Find the IQR for the Lady Angels, with Q3 = 30 and Q1 = 21. IQR = Q3 – Q1 IQR = (30) – (21) IQR = 9 1.1.2: Comparing Data Sets

39 Guided Practice: Example 2, continued
Use the IQR to find any outliers for the Lady Angels data set. A data value is an outlier if it is less than Q1 – 1.5(IQR) or greater than Q (IQR). Q1 – 1.5(IQR) = (21) – 1.5(9) Q (IQR) = (30) + 1.5(9) Q1 – 1.5(IQR) = 21 – Q (IQR) = Q1 – 1.5(IQR) = Q (IQR) = 43.5 There are no data set values less than 7.5 or greater than 43.5, so there are no outliers in the Lady Angels data set. 1.1.2: Comparing Data Sets

40 Guided Practice: Example 2, continued
Find the IQR for the Lady Patriots, with Q3 = 29 and Q1 = 20. IQR = Q3 – Q1 IQR = (29) – (20) IQR = 9 Use the IQR to find any outliers for the Lady Patriots data set. Q1 – 1.5(IQR) = (20) – 1.5(9) Q (IQR) = (29) + 1.5(9) Q1 – 1.5(IQR) = 20 – Q (IQR) = Q1 – 1.5(IQR) = Q (IQR) = 42.5 1.1.2: Comparing Data Sets

41 Guided Practice: Example 2, continued
The data set value 5 is less than 6.5, so 5 is a low outlier. The value 93 is greater than 42.5, so 93 is a high outlier. There are two outliers, both in the Lady Patriots data set: the low outlier 5 and the high outlier 93. 1.1.2: Comparing Data Sets

42 Guided Practice: Example 2, continued
Determine which measure of center is more appropriate for comparing the data sets. The Lady Patriots data set has both a low outlier and a high outlier. In some cases, a low outlier and a high outlier will tend to balance each other out, thereby creating little or no significant net effect on the mean. 1.1.2: Comparing Data Sets

43 Guided Practice: Example 2, continued
Examine the Lady Patriots’ outliers to see if that is the case: The low outlier 5 is just barely less than the lower cut-off point (limit for outliers) of 6.5. The high outlier 93 is very much greater than the upper cut-off point of 42.5. 1.1.2: Comparing Data Sets

44 Guided Practice: Example 2, continued
In this case, the low outlier and the high outlier do not balance out because 93 is so far from the upper cut- off point for outliers. That is, the high outlier has the effect of raising the mean significantly, despite the presence of a low outlier. Since the outliers don’t cancel out each other’s effects on the mean, the median best represents the Lady Patriots data set. Therefore, the medians of the data sets should be compared. 1.1.2: Comparing Data Sets

45 Guided Practice: Example 2, continued
Determine the corresponding appropriate measures of spread. The range is always appropriate as a rough measure of spread. The interquartile range is the additional measure of spread that is appropriate when the median is used as the measure of center. 1.1.2: Comparing Data Sets

46 Guided Practice: Example 2, continued
Find the range and the interquartile range of each data set. In step 2, we determined that the interquartile range for both the Lady Angels and the Lady Patriots is 9. We need to find the range for each set. The range is the difference between the maximum and minimum values. Use the minimum and maximum values found in step 1. 1.1.2: Comparing Data Sets

47 Guided Practice: Example 2, continued
Find the range for the Lady Angels, using the maximum of 41 and the minimum of 8. range = maximum – minimum range = (41) – (8) range = 33 The range of the data for the Lady Angels is 33. 1.1.2: Comparing Data Sets

48 Guided Practice: Example 2, continued
Find the range for the Lady Patriots, using the maximum of 93 and the minimum of 5. range = maximum – minimum range = (93) – (5) range = 88 The range of the data for the Lady Patriots is 88. 1.1.2: Comparing Data Sets

49 Guided Practice: Example 2, continued
Find the mean of each data set. There are 10 data values in each set. The mean is the average of all the values of the data set. Find the mean for the Lady Angels data set. 1.1.2: Comparing Data Sets

50 Guided Practice: Example 2, continued
Formula for calculating mean Substitute values from the data set for xi, as shown below. (Repeated values are listed as products.) There are 10 data values, so n = 10. Simplify. The mean for the Lady Angels is 24.8. 1.1.2: Comparing Data Sets

51 Guided Practice: Example 2, continued
Find the mean for the Lady Patriots data set. Formula for calculating mean Substitute values from the data set for xi.There are 10 data values, so n = 10. Simplify. The mean for the Lady Patriots is 29. 1.1.2: Comparing Data Sets

52 Guided Practice: Example 2, continued
Organize your results in a table. Mean Median Range Interquartile range Lady Angels 24.8 27 33 9 Lady Patriots 29 24 88 1.1.2: Comparing Data Sets

53 Guided Practice: Example 2, continued
Use the table to summarize your results. Because there are outliers in the Lady Patriots data that do not balance each other out, the median is the best measure of center for representing that data set. Therefore, we compared the medians of both sets. The Lady Angels have the higher median, as shown in the table. 1.1.2: Comparing Data Sets

54 Guided Practice: Example 2, continued
Comparing the medians, it looks like the Lady Angels players are “better” than the Lady Patriots because the Lady Angels’ median is higher than the Lady Patriots’ median. The Lady Angels players score consistently higher than the Lady Patriots players. However, the Lady Patriots have a high-scoring player (the player who scored the high outlier of 93 points) and a low scoring player (the player who scored the low outlier of 5). 1.1.2: Comparing Data Sets

55 ✔ Guided Practice: Example 2, continued
The Lady Patriots have a much wider range of scores than the Lady Angels because of both outliers. The interquartile ranges for the teams are equal, indicating that the middle “half ” of the data in each set is equally spread out. The less appropriate measure of center is the mean, because the high outlier has the effect of raising the mean in the Lady Patriots data set. The table shows that the Lady Patriots have the higher mean. 1.1.2: Comparing Data Sets

56 Guided Practice: Example 2, continued
1.1.2: Comparing Data Sets


Download ppt "Warm-up 8/25/14 Compare Data A to Data B using the five number summary, measure of center and measure of spread. A) 18, 33, 18, 87, 12, 23, 93, 34, 71,"

Similar presentations


Ads by Google