Presentation is loading. Please wait.

Presentation is loading. Please wait.

3 Averages and Variation

Similar presentations


Presentation on theme: "3 Averages and Variation"— Presentation transcript:

1 3 Averages and Variation
Copyright © Cengage Learning. All rights reserved. Larson/Farber 4th ed.

2 Measures of Central Tendency: Mode, Median, and Mean
Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright © Cengage Learning. All rights reserved. Larson/Farber 4th ed.

3 Focus Points Compute mean, median, and mode from raw data
Interpret what mean, median, and mode tell you Explain how mean, median, and mode can be affected by extreme data values What is a trimmed mean? Compute a weighted average Larson/Farber 4th ed.

4 Measures of Central Tendency
Measure of central tendency A value that represents a typical, or central, entry of a data set. Most common measures of central tendency: Mean Median Mode Larson/Farber 4th ed.

5 Measures of Central Tendency: Mode, Median, and Mean
The average price of an ounce of gold is $1200. The Zippy car averages 39 miles per gallon on the highway. A survey showed the average shoe size for women is size 9. In each of the preceding statements, one number is used to describe the entire sample or population. Such a number is called an average. There are many ways to compute averages, but we will study only three of the major ones. The easiest average to compute is the mode. Larson/Farber 4th ed.

6 Measures of Central Tendency
Measure of central tendency Larson/Farber 4th ed.

7 Exercise 1 – Mode Count the letters in each word of this sentence and give the mode. The numbers of letters in the words of the sentence are Scanning the data, we see that 4 is the mode because more words have 4 letters than any other number. For larger data sets, it is useful to order—or sort—the data before scanning them for the mode. Larson/Farber 4th ed.

8 Measures of Central Tendency: Mode, Median, and Mean
Not every data set has a mode. For example, if Professor Fair gives equal numbers of A’s, B’s, C’s, D’s, and F’s, then there is no modal grade. In addition, the mode is not very stable. Changing just one number in a data set can change the mode dramatically. However, the mode is a useful average when we want to know the most frequently occurring data value, such as the most frequently requested shoe size. Larson/Farber 4th ed.

9 Measures of Central Tendency: Mode, Median, and Mean
When no number occurs more than once in a data set, there is no mode. If each of two numbers occurs twice, we say the set is bimodal. Example (bimodal): 1, 1, 1, 2, 2, 2, 3 1 and 2 both have the most occurrences so they are both modes Since they tie for first place, they both get it 3 is not a mode because it occurs less frequently Larson/Farber 4th ed.

10 Measures of Central Tendency: Mode, Median, and Mean
Another average that is useful is the median, or central value, of an ordered distribution. The median is the middle value. When you are given the median, you know there are an equal number of data values in the ordered distribution that are above it and below it. Larson/Farber 4th ed.

11 Measure of Central Tendency: Median
The value that lies in the middle of the data when the data set is ordered. Measures the center of an ordered data set by dividing it into two equal parts. If the data set has an odd number of entries: median is the middle data entry. even number of entries: median is the mean of the two middle data entries. Larson/Farber 4th ed.

12 Measures of Central Tendency: Mode, Median, and Mean
Procedure: Larson/Farber 4th ed.

13 Exercise 2 – Median Case: n is even
What do barbecue-flavored potato chips cost? According to Consumer Reports, Vol. 66, No. 5, the prices per ounce in cents of the rated chips are To find the median, we first order the data, and then note that there are an even number of entries. So the median is constructed using the two middle values. Larson/Farber 4th ed.

14 Exercise 2 – Median (Sorted) prices per ounce in cents of the rated chips are (b) According to Consumer Reports, the brand with the lowest overall taste rating costs 35 cents per ounce. Eliminate that brand, and find the median price per ounce for the remaining barbecue-flavored chips. Larson/Farber 4th ed.

15 Exercise 2 – Median Case: n is odd Again order the data.
The median is simply the middle value. middle value Median = middle value = 19 cents Larson/Farber 4th ed.

16 Exercise 2 – Median (c) One ounce of potato chips is considered a small serving. Is it reasonable to budget about $10.45 to serve the barbecue-flavored chips to 55 people? Yes, since the median price of the chips is 19 cents per small serving: * $0.19 = $10.45 This budget for chips assumes that there is plenty of other food! Each guest gets just 1 ounce! Larson/Farber 4th ed.

17 Measures of Central Tendency: Mode, Median, and Mean
The median uses the position rather than the specific value of each data entry. If the extreme values of a data set change, the median usually does not change. This is why the median is often used as the average for house prices. If one mansion costing several million dollars sells in a community of much-lower-priced homes, the median selling price for houses in the community would be affected very little, if at all. Larson/Farber 4th ed.

18 Measures of Central Tendency: Mode, Median, and Mean
Note: For small ordered data sets, we can easily scan the set to find the location of the median. However, for large ordered data sets of size n, it is convenient to have a formula to find the middle of the data set. Larson/Farber 4th ed.

19 Measures of Central Tendency: Mode, Median, and Mean
For instance, if n = 99 then the middle value is the (99 +1)/2 or 50th data value in the ordered data. If n = 100, then ( )/2 = 50.5 tells us that the two middle values are in the 50th and 51st positions. An average that uses the exact value of each entry is the mean (sometimes called the arithmetic mean). Larson/Farber 4th ed.

20 Measure of Central Tendency: Mean
Mean (average) The sum of all the data entries divided by the number of entries. Sigma notation: Σx = add all of the data entries (x) in the data set. Population mean: Sample mean: Larson/Farber 4th ed.

21 Exercise 3 – Population Mean
To graduate, Linda needs at least a B in biology. She did not do very well on her first three tests; however, she did well on the last four. Here are all her scores: Compute the mean and determine if Linda’s grade will be a B (80 to 89 average) or a C (70 to 79 average). Larson/Farber 4th ed.

22 Exercise 3 – Solution Since the average is 80, Linda will get the needed B. Larson/Farber 4th ed.

23 Exercise 4: Finding a Sample Mean
The prices (in dollars) for a sample of roundtrip flights from Chicago, Illinois to Cancun, Mexico are listed. What is the mean price of the flights? Larson/Farber 4th ed.

24 Solution: Finding a Sample Mean
The sum of the flight prices is Σx = = 3695 To find the mean price, divide the sum of the prices by the number of prices in the sample The mean price of the flights is about $ Larson/Farber 4th ed.

25 Exercise 5: Compare Measures of Center
The unit load of 40 randomly selected students from a college shown below. Find the mean, median and mode: 17 12 14 13 16 18 20 15 19 Larson/Farber 4th ed.

26 Exercise 5: Compare Measures of Center
Solution: Mean 15.0 Median 15 Mode 12 If the state is going to fund the college according to the “average” credit load, which “average” do you think the college will report? Why? Larson/Farber 4th ed.

27 Measures of Central Tendency: Mode, Median, and Mean
We have seen three averages: the mode, the median, and the mean. For later work, the mean is the most important. A disadvantage of the mean, however, is that it can be affected by exceptional values. A resistant measure is one that is not influenced by extremely high or low data values. The mean is not a resistant measure of center because we can make the mean as large as we want by changing the size of only one data value. Larson/Farber 4th ed.

28 Measures of Central Tendency: Mode, Median, and Mean
The median, on the other hand, is more resistant. However, a disadvantage of the median is that it is not sensitive to the specific size of a data value. A measure of center that is more resistant than the mean but still sensitive to specific data values is the trimmed mean. A trimmed mean is the mean of the data values left after “trimming” a specified percentage of the smallest and largest data values from the data set. Larson/Farber 4th ed.

29 Measures of Central Tendency: Mode, Median, and Mean
Usually a 5% trimmed mean is used. This implies that we trim the lowest 5% of the data as well as the highest 5% of the data. A similar procedure is used for a 10% trimmed mean. Procedure: Larson/Farber 4th ed.

30 Exercise 6: Find measures of central tendency
The class sizes of 20 randomly chosen Introductory Algebra classes in California are shown. Compute the mean Compute a 5% trimmed mean Larson/Farber 4th ed.

31 Exercise 6 Solution: Find Trimmed Mean
Larson/Farber 4th ed.

32 Measures of Central Tendency: Mode, Median, and Mean
In general, when a data distribution is mound-shaped symmetrical, the values for the mean, median, and mode are the same or almost the same. For skewed-left distributions, the mean is less than the median and the median is less than the mode. For skewed-right distributions, the mode is the smallest value, the median is the next largest, and the mean is the largest. Larson/Farber 4th ed.

33 Measures of Central Tendency: Mode, Median, and Mean
Figure 3-1, shows the general relationships among the mean, median, and mode for different types of distributions. Mound-shaped symmetrical (b) Skewed left (c) Skewed right Figure 3.1 Larson/Farber 4th ed.

34 Example: Comparing the Mean, Median, and Mode
Find the mean, median, and mode of the sample ages of a class shown. Which measure of central tendency best describes a typical entry of this data set? Are there any outliers? Ages in a class 20 21 22 23 24 65 Larson/Farber 4th ed.

35 Example: Comparing the Mean, Median, and Mode
Ages in a class 20 21 22 23 24 65 Mean: Median: Mode: 20 years (the entry occurring with the greatest frequency) Larson/Farber 4th ed.

36 Example: Comparing the Mean, Median, and Mode
Mean ≈ 23.8 years Median = 21.5 years Mode = 20 years The mean takes every entry into account, but is influenced by the outlier of 65. The median also takes every entry into account, and it is not affected by the outlier. In this case the mode exists, but it doesn't appear to represent a typical entry. Larson/Farber 4th ed.

37 Comparing the Mean, Median, and Mode
All three measures describe a typical entry of a data set. Advantage of using the mean: The mean is a reliable measure because it takes into account every entry of a data set. Disadvantage of using the mean: Greatly affected by outliers (a data entry that is far removed from the other entries in the data set). Larson/Farber 4th ed.

38 Weighted Average Larson/Farber 4th ed.

39 Weighted Average Sometimes we wish to average numbers, but we want to assign more importance, or weight, to some of the numbers. Category Weight Class Work 20% Homework Test 30% Quiz Larson/Farber 4th ed.

40 Weighted Average The average you need is the weighted average.
Larson/Farber 4th ed.

41 Exercise 7: Find Weighted Average
You are taking a class in which your grade is determined from five sources: 50% from your test mean, 15% from your midterm, 20% from your final exam, 10% from your computer lab work, and 5% from your homework. Your scores are 86 (test mean), 96 (midterm), 82 (final exam), 98 (computer lab), and 100 (homework). What is the weighted mean of your scores? If the minimum average for an A is 90, did you get an A? Larson/Farber 4th ed.

42 Ex 7 Solution: Finding a Weighted Mean
Source Score, x Weight, w x∙w Test Mean 86 0.50 86(0.50)= 43.0 Midterm 96 0.15 96(0.15) = 14.4 Final Exam 82 0.20 82(0.20) = 16.4 Computer Lab 98 0.10 98(0.10) = 9.8 Homework 100 0.05 100(0.05) = 5.0 Σw = 1 Σ(x∙w) = 88.6 Your weighted mean for the course is You did not get an A. Larson/Farber 4th ed.

43 Example: Weighted Mean
The data below represents customer satisfaction ratings from 4 different restaurants in a chain. Find the average customer satisfaction rating. x w xw Location Average Number of  Product Rating Customers 1 7.8 117 912.6 2 8.5 86 731 3 6.6 68 448.8 4 7.4 90 666 TOTAL 361 2758.4 Larson/Farber 4th ed.

44 Example: Weighted Mean
𝑤𝑒𝑖𝑔ℎ𝑡𝑒 𝑑 𝑎𝑣𝑔 = 𝑥𝑤 𝑤 = =7.64 Larson/Farber 4th ed.

45 Mean of Grouped Data Larson/Farber 4th ed.

46 Mean of Grouped Data Mean of a Frequency Distribution Approximated by where x and f are the midpoints and frequencies of a class, respectively Larson/Farber 4th ed.

47 Finding the Mean of a Frequency Distribution
In Words In Symbols Find the midpoint of each class. Find the sum of the products of the midpoints and the frequencies. Find the sum of the frequencies. Find the mean of the frequency distribution. Larson/Farber 4th ed.

48 Exercise 8: Find the Mean of a Frequency Distribution
Use the frequency distribution to approximate the mean number of minutes that a sample of Internet subscribers spent online during their most recent session. Class Midpoint Frequency, f 7 – 18 12.5 6 19 – 30 24.5 10 31 – 42 36.5 13 43 – 54 48.5 8 55 – 66 60.5 5 67 – 78 72.5 79 – 90 84.5 2 Larson/Farber 4th ed.

49 Solution: Find the Mean of a Frequency Distribution
Class Midpoint, x Frequency, f (x∙f) 7 – 18 12.5 6 12.5∙6 = 75.0 19 – 30 24.5 10 24.5∙10 = 245.0 31 – 42 36.5 13 36.5∙13 = 474.5 43 – 54 48.5 8 48.5∙8 = 388.0 55 – 66 60.5 5 60.5∙5 = 302.5 67 – 78 72.5 72.5∙6 = 435.0 79 – 90 84.5 2 84.5∙2 = 169.0 n = 50 Σ(x∙f) = Larson/Farber 4th ed.

50 Summary: Section 3.1 Computed mean, median, and mode from raw data
Interpreted mean, median, and mode Explained how mean, median, and mode can be affected by extreme data values Computed Trimmed mean Weighted average Mean of frequency distribution Larson/Farber 4th ed.

51 Section 3.2 Measures of Variation Larson/Farber 4th ed.

52 Objectives Determine the range of a data set Determine the variance and standard deviation of a population and of a sample Use Chebychev’s Theorem to interpret standard deviation Approximate the sample standard deviation for grouped data Larson/Farber 4th ed.

53 Range Range The difference between the maximum and minimum data entries in the set. The data must be quantitative. Range = (Max. data entry) – (Min. data entry) Larson/Farber 4th ed.

54 Example: Finding the Range
A corporation hired 10 graduates. The starting salaries for each graduate are shown. Find the range of the starting salaries. Starting salaries (1000s of dollars) Larson/Farber 4th ed.

55 Solution: Finding the Range
Ordering the data helps to find the least and greatest salaries Range = (Max. salary) – (Min. salary) = 47 – 37 = 10 The range of starting salaries is 10 or $10,000. minimum maximum Larson/Farber 4th ed.

56 Deviation, Variance, and Standard Deviation
The difference between the data entry, x, and the mean of the data set. Population data set: Deviation of x = x – μ Sample data set: Deviation of x = x – x Larson/Farber 4th ed.

57 Exercise 1: Finding the Deviation
A corporation hired 10 graduates. The starting salaries for each graduate are shown. Find the deviation of the starting salaries. Starting salaries (1000s of dollars) Solution: First determine the mean starting salary. Larson/Farber 4th ed.

58 Solution: Finding the Deviation
Determine the deviation for each data entry. Salary ($1000s), x Deviation: x – μ 41 41 – 41.5 = –0.5 38 38 – 41.5 = –3.5 39 39 – 41.5 = –2.5 45 45 – 41.5 = 3.5 47 47 – 41.5 = 5.5 44 44 – 41.5 = 2.5 37 37 – 41.5 = –4.5 42 42 – 41.5 = 0.5 Σx = 415 Σ(x – μ) = 0 Larson/Farber 4th ed.

59 Deviation, Variance, and Standard Deviation
The variance can be thought of a kind of average of the squares of the deviations Standard deviation is a measure of the typical amount an entry deviates from the mean The more the entries are spread out, the greater the standard deviation Larson/Farber 4th ed.

60 Deviation, Variance, and Standard Deviation
Population Variance Population Standard Deviation Sum of squares, SSx Larson/Farber 4th ed.

61 Finding the Population Variance & Standard Deviation
In Words In Symbols Find the mean of the population data set. Find deviation of each entry. Square each deviation. Add to get the sum of squares. x – μ (x – μ)2 SSx = Σ(x – μ)2 Larson/Farber 4th ed.

62 Finding the Population Variance & Standard Deviation
In Words In Symbols Divide by N to get the population variance. Find the square root to get the population standard deviation. Larson/Farber 4th ed.

63 Exercise 2: Finding the Population Standard Deviation
A corporation hired 10 graduates. The starting salaries for each graduate are shown. Find the population variance and standard deviation of the starting salaries. Starting salaries (1000s of dollars) Recall μ = 41.5 Larson/Farber 4th ed.

64 Solution: Finding the Population Standard Deviation
Determine SSx N = 10 Salary, x Deviation: x – μ Squares: (x – μ)2 41 41 – 41.5 = –0.5 (–0.5)2 = 0.25 38 38 – 41.5 = –3.5 (–3.5)2 = 12.25 39 39 – 41.5 = –2.5 (–2.5)2 = 6.25 45 45 – 41.5 = 3.5 (3.5)2 = 12.25 47 47 – 41.5 = 5.5 (5.5)2 = 30.25 44 44 – 41.5 = 2.5 (2.5)2 = 6.25 37 37 – 41.5 = –4.5 (–4.5)2 = 20.25 42 42 – 41.5 = 0.5 (0.5)2 = 0.25 Σ(x – μ) = 0 SSx = 88.5 Larson/Farber 4th ed.

65 Solution: Finding the Population Standard Deviation
Population Variance Population Standard Deviation The population standard deviation is about 3.0, or $3000. Larson/Farber 4th ed.

66 Deviation, Variance, and Standard Deviation
Sample Variance Sample Standard Deviation Larson/Farber 4th ed.

67 Finding the Sample Variance & Standard Deviation
In Words In Symbols Find the mean of the sample data set. Find deviation of each entry. Square each deviation. Add to get the sum of squares. Larson/Farber 4th ed.

68 Finding the Sample Variance & Standard Deviation
In Words In Symbols Divide by n – 1 to get the sample variance. Find the square root to get the sample standard deviation. Larson/Farber 4th ed.

69 Exercise 3 : Finding the Sample Standard Deviation
The starting salaries are for the Chicago branches of a corporation. The corporation has several other branches, and you plan to use the starting salaries of the Chicago branches to estimate the starting salaries for the larger population. Find the sample standard deviation of the starting salaries. Starting salaries (1000s of dollars) Larson/Farber 4th ed.

70 Solution: Finding the Sample Standard Deviation
Determine SSx n = 10 Salary, x Deviation: x – μ Squares: (x – μ)2 41 41 – 41.5 = –0.5 (–0.5)2 = 0.25 38 38 – 41.5 = –3.5 (–3.5)2 = 12.25 39 39 – 41.5 = –2.5 (–2.5)2 = 6.25 45 45 – 41.5 = 3.5 (3.5)2 = 12.25 47 47 – 41.5 = 5.5 (5.5)2 = 30.25 44 44 – 41.5 = 2.5 (2.5)2 = 6.25 37 37 – 41.5 = –4.5 (–4.5)2 = 20.25 42 42 – 41.5 = 0.5 (0.5)2 = 0.25 Σ(x – μ) = 0 SSx = 88.5 Larson/Farber 4th ed.

71 Solution: Finding the Sample Standard Deviation
Sample Variance Sample Standard Deviation The sample standard deviation is about 3.1, or $3100. Larson/Farber 4th ed.

72 Sample Variance: Computational Formula
Larson/Farber 4th ed.

73 Exercise 3: Variance Computational Formula
1 41 1681 2 38 1444 3 39 1521 4 45 2025 5 47 2209 6 7 44 1936 8 9 37 1369 10 42 1764 SUM 415 17311 Larson/Farber 4th ed.

74 Example: Using Technology to Find the Standard Deviation
Sample office rental rates (in dollars per square foot per year) for Miami’s central business district are shown in the table. Use a calculator or a computer to find the mean rental rate and the sample standard deviation. (Adapted from: Cushman & Wakefield Inc.) Office Rental Rates 35.00 33.50 37.00 23.75 26.50 31.25 36.50 40.00 32.00 39.25 37.50 34.75 37.75 37.25 36.75 27.00 35.75 26.00 29.00 40.50 24.50 33.00 38.00 Larson/Farber 4th ed.

75 Solution: Using Technology to Find the Standard Deviation
Sample Mean Sample Standard Deviation Larson/Farber 4th ed.

76 Interpreting Standard Deviation
Standard deviation is a measure of the typical amount an entry deviates from the mean. The more the entries are spread out, the greater the standard deviation. Larson/Farber 4th ed.

77 Coefficient of Variation
Notice that the numerator and denominator in the definition of CV have the same units, so CV itself has no units of measurement. Larson/Farber 4th ed.

78 Coefficient of Variation
This gives us the advantage of being able to directly compare the variability of two different populations using the coefficient of variation. In the next example, we will compute the CV of a population and of a sample and then compare the results. Larson/Farber 4th ed.

79 Exercise 4 – Coefficient of Variation
The Trading Post on Grand Mesa is a small, family-run store in a remote part of Colorado. It has just eight different types of spinners for sale. The prices (in dollars) are Since the Trading Post has only eight different kinds of spinners for sale, we consider the eight data values to be the population. Larson/Farber 4th ed.

80 Exercise 4a – Coefficient of Variation
(a) Use a calculator with appropriate statistics keys to verify that for the Trading Post data, and   $2.14 and   $0.22. Solution: Since the computation formulas for x and  are identical, most calculators provide the value of x only. Use the output of this key for . The computation formulas for the sample standard deviation  and the population standard deviation s are slightly different. Larson/Farber 4th ed.

81 Exercise 4b – Coefficient of Variation
(b) Compute the CV of prices for the Trading Post and comment on the meaning of the result. Solution: Since the Trading Post is very small, it carries a small selection of spinners that are all priced similarly. The CV tells us that the standard deviation of the spinner prices is only 10.28% of the mean. Larson/Farber 4th ed.

82 Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: k = 2: In any data set, at least of the data lie within 2 standard deviations of the mean. k = 3: In any data set, at least of the data lie within 3 standard deviations of the mean. Larson/Farber 4th ed.

83 Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: k 2 3 4 5 10 75% 88.9% 93.8% 96% 99% Larson/Farber 4th ed.

84 Example: Using Chebychev’s Theorem
The age distribution for Florida is shown in the histogram. Apply Chebychev’s Theorem to the data using k = 2. What can you conclude? Larson/Farber 4th ed.

85 Solution: Using Chebychev’s Theorem
k = 2: μ – 2σ = 39.2 – 2(24.8) = (use 0 since age can’t be negative) μ + 2σ = (24.8) = 88.8 At least 75% of the population of Florida is between 0 and 88.8 years old. Larson/Farber 4th ed.

86 Exercise 5: Using Chebychev’s Theorem
A newspaper periodically runs an ad in its own advertising section offering a free month’s subscription. Over a period of two years the mean number of responses was 525 with a sample standard deviation of s = 30. What is the smallest percentage of data we expect to fall within 2 standard deviations of the mean (i.e. between 465 and 585). b) Determine the interval from A to B about the mean in which 88.9% of the data fall. 75% 450 to 585 Larson/Farber 4th ed.

87 Exercise 5: Using Chebychev’s Theorem
A newspaper periodically runs an ad in its own advertising section offering a free month’s subscription. Over a period of two years the mean number of responses was 525 with a sample standard deviation of s = 30. c) What is the smallest percent of respondents to the ad that falls within 2.5 standard deviation of the mean? d) What is the interval from A to B from part c. Explain its meaning in this application. Larson/Farber 4th ed.

88 Standard Deviation for Grouped Data: Defining Formula
Sample standard deviation for a frequency distribution When a frequency distribution has classes, estimate the sample mean and standard deviation by using the midpoint of each class. where n= Σf (the number of entries in the data set) Larson/Farber 4th ed.

89 Exercise 6: Finding the Standard Deviation for Grouped Data
You collect a random sample of the number of children per household in a region. Find the sample mean and the sample standard deviation of the data set. Number of Children in 50 Households 1 3 2 5 6 4 Larson/Farber 4th ed.

90 Solution: Finding the Standard Deviation for Grouped Data
First construct a frequency distribution. Find the mean of the frequency distribution. x f xf 10 0(10) = 0 1 19 1(19) = 19 2 7 2(7) = 14 3 3(7) =21 4 4(2) = 8 5 5(1) = 5 6 6(4) = 24 The sample mean is about 1.8 children. Σf = 50 Σ(xf )= 91 Larson/Farber 4th ed.

91 Solution: Finding the Standard Deviation for Grouped Data
Determine the sum of squares. x f 10 0 – 1.8 = –1.8 (–1.8)2 = 3.24 3.24(10) = 32.40 1 19 1 – 1.8 = –0.8 (–0.8)2 = 0.64 0.64(19) = 12.16 2 7 2 – 1.8 = 0.2 (0.2)2 = 0.04 0.04(7) = 0.28 3 3 – 1.8 = 1.2 (1.2)2 = 1.44 1.44(7) = 10.08 4 4 – 1.8 = 2.2 (2.2)2 = 4.84 4.84(2) = 9.68 5 5 – 1.8 = 3.2 (3.2)2 = 10.24 10.24(1) = 10.24 6 6 – 1.8 = 4.2 (4.2)2 = 17.64 17.64(4) = 70.56 Larson/Farber 4th ed.

92 Solution: Finding the Standard Deviation for Grouped Data
Find the sample standard deviation. The standard deviation is about 1.7 children. Larson/Farber 4th ed.

93 Standard Deviation for Grouped Data: Computational Formula
Larson/Farber 4th ed.

94 Example: Sample Variance of a Frequency Distribution – Computational Formula
Lower Limit Upper Limit Midpoint x f x2 * f xf 7 18 12.5 6 937.5 75 19 30 24.5 10 6002.5 245 31 42 36.5 13 474.5 43 54 48.5 8 18818 388 55 66 60.5 5 302.5 67 78 72.5 435 79 90 84.5 2 169 TOTAL 50 2089 Larson/Farber 4th ed.

95 Summary Determined the range of a data set
Determined the variance and standard deviation of a population and of a sample Used the Empirical Rule and Chebychev’s Theorem to interpret standard deviation Approximated the sample standard deviation for grouped data Larson/Farber 4th ed.

96 Measures of Position Box-and-Whisker Plots
Section 3.3 Measures of Position Box-and-Whisker Plots Larson/Farber 4th ed.

97 Objectives Determine the quartiles of a data set
Interpret other fractiles such as percentiles Determine the interquartile range of a data set Create a box-and-whisker plot Larson/Farber 4th ed.

98 Percentiles A percentile measure the position of a single data item based on the percentage of data items below that single data item. Standardized tests taken by larger numbers of students, convert raw scores to a percentile score. If approximately n percent of the items in a distribution are less than the number x, then x is the nth percentile of the distribution, denoted Pn. Larson/Farber 4th ed.

99 Quartiles Fractiles are numbers that partition (divide) an ordered data set into equal parts. Quartiles approximately divide an ordered data set into four equal parts. First quartile, Q1: About one quarter of the data fall on or below Q1. Second quartile, Q2: About one half of the data fall on or below Q2 (median). Third quartile, Q3: About three quarters of the data fall on or below Q3. Larson/Farber 4th ed.

100 Percentiles and Other Fractiles
Summary Symbols Quartiles Divides data into 4 equal parts Q1, Q2, Q3 Deciles Divides data into 10 equal parts D1, D2, D3,…, D9 Percentiles Divides data into 100 equal parts P1, P2, P3,…, P99 Larson/Farber 4th ed.

101

102 Example: Interpreting Percentiles
The ogive represents the cumulative frequency distribution for SAT test scores of college-bound students in a recent year. What test score represents the 72nd percentile? How should you interpret this? (Source: College Board Online) Larson/Farber 4th ed.

103 Solution: Interpreting Percentiles
The 72nd percentile corresponds to a test score of This means that 72% of the students had an SAT score of 1700 or less. Larson/Farber 4th ed.

104 Exercise 1: Interpreting Percentiles
Suppose you challenge freshman composition by taking an exam. If your score was in the 89th percentile, what percentage of scores was at or below your score? Answer: 89 % If the scores ranged from 0 to 100 and your raw score was 95, does that mean that your score is at the 95th percentile Answer: No! Percentile score is based on position. Larson/Farber 4th ed.

105 Exercise 2: Finding Quartiles
The test scores of 15 employees enrolled in a CPR training course are listed. Find the first, second, and third quartiles of the test scores Solution: Q2 divides the data set into two halves. Lower half Upper half Q2 Larson/Farber 4th ed.

106 Solution: Finding Quartiles
The first and third quartiles are the medians of the lower and upper halves of the data set Lower half Upper half Q1 Q2 Q3 About one fourth of the employees scored 10 or less, about one half scored 15 or less; and about three fourths scored 18 or less. Larson/Farber 4th ed.

107 Interquartile Range Interquartile Range (IQR) The difference between the third and first quartiles. IQR = Q3 – Q1 Larson/Farber 4th ed.

108 Exercise 3: Find the Interquartile Range
Find the interquartile range of the test scores. Recall Q1 = 10, Q2 = 15, and Q3 = 18 Solution: IQR = Q3 – Q1 = 18 – 10 = 8 The test scores in the middle portion of the data set vary by at most 8 points. Larson/Farber 4th ed.

109 Box-and-Whisker Plot Box-and-whisker plot
Exploratory data analysis tool. Highlights important features of a data set. Requires (five-number summary): Minimum entry First quartile Q1 Median Q2 Third quartile Q3 Maximum entry Larson/Farber 4th ed.

110 Drawing a Box-and-Whisker Plot
Find the five-number summary of the data set. Construct a horizontal scale that spans the range of the data. Plot the five numbers above the horizontal scale. Draw a box above the horizontal scale from Q1 to Q3 and draw a vertical line in the box at Q2. Draw whiskers from the box to the minimum and maximum entries. Whisker Maximum entry Minimum entry Box Median, Q2 Q3 Q1 Larson/Farber 4th ed.

111 Exercise 4: Draw Box-and-Whisker Plot
Draw a box-and-whisker plot that represents the 15 test scores. Recall Min = 5 Q1 = 10 Q2 = 15 Q3 = 18 Max = 37 Solution: 5 10 15 18 37 Larson/Farber 4th ed.

112 Exercise 5: Interpret Box-and-Whisker Plot
Recall Min = 5 Q1 = 10 Q2 = 15 Q3 = 18 Max = 37 Solution: 5 10 15 18 37 About half the scores are between 10 and 18. By looking at the length of the right whisker, you can conclude 37 is a possible outlier. Larson/Farber 4th ed.

113 Summary Determined the quartiles of a data set
Interpreted other fractiles such as percentiles Determined the interquartile range of a data set Created a box-and-whisker plot Larson/Farber 4th ed.


Download ppt "3 Averages and Variation"

Similar presentations


Ads by Google