Presentation is loading. Please wait.

Presentation is loading. Please wait.

CHAPTER 3 – Numerical Techniques for Describing Data 3.1 Measures of Central Tendency 3.2 Measures of Variability.

Similar presentations


Presentation on theme: "CHAPTER 3 – Numerical Techniques for Describing Data 3.1 Measures of Central Tendency 3.2 Measures of Variability."— Presentation transcript:

1 CHAPTER 3 – Numerical Techniques for Describing Data 3.1 Measures of Central Tendency 3.2 Measures of Variability

2 Exploring Measures of Central Tendency This section discusses ways we can determine the most “typical” data values within a distribution. Typical data values give an idea of the center of the distribution (when organized from smallest to greatest). Example: If all the test scores for students who take an exam are considered, measures of central tendency will examine the average or typical test score of the distribution. The three measures of central tendency are the mean, median, and mode.

3 The Mean The common term “average” is referred to as the mean in statistics. The mean of a sample is determined by summing up all the data values of the sample and dividing this sum by the total number of data values. The symbol for the sample mean is and the formula to calculate the sample mean is:

4 The mean of a population is denoted by the symbol μ. The formula to calculate the mean of a finite population of size N is:

5 Example 3.2 page 87 A quality control inspector of a battery manufacturer would like to estimate the life expectancy of the manufacturer’s 9 volt batteries manufactured during the day shift. In order to estimate the life expectancy of all the batteries during the day shift, a sample of 20 batteries was randomly obtained. The following data represents the battery life in the months of the 20 batteries. 20, 11, 15, 18, 24, 17, 19, 12, 19, 22 18, 15, 19, 21, 20, 15, 17, 24, 16, 18 Calculate the sample mean, using the appropriate symbol.

6 Example 3.3 (pg. 89) According to the US Department of Energy, the following data represent the estimated miles per gallon (mpg) for the 2006 mini- compact cars under city driving conditions. 9, 20, 20, 20, 23, 20, 22, 21, 22, 17, 17, 16, 13, 21, 22, 28, 30, 18, 18 Assuming these estimated mpg represent the population of all mini-compact cars, find the mean of this population, using the appropriate symbol.

7 Example 3.7 page 91 Given the following ages of seven children at a playground: 5, 4, 4, 8, 4, 9, 8 (a) Compute the mean age, (b) Evaluate the sum of the deviations from the mean.

8 5 4-2 4 82 4 93 82

9 The Median The median is the middle value of a set of data values after they have been arranged in numerical order. Example 3.8 For the quiz grades: 8, 7, 9, 9, 9, 6, 10, 7, 5 find the median quiz grade. (First, put the numbers in ascending order) 5 6 7 7 8 9 9 9 10 The median is the middle grade. 8

10 Example 3.8 with an additional score of 10 For the quiz grades: 8, 7, 9, 9, 9, 6, 10, 7, 5, 10, find the median quiz grade. 5 6 7 7 8 9 9 9 10 10 There are an even number of data values. Is the median 8 or 9? The median is the mean of the two middle grades. (8+9)/2 = 8.5 Rule: If there are an even # of data values, the median is the mean of the two middle values.

11 The Mode The mode of a data set is the data value which occurs most frequently. A data set may not have a mode or may have more than one mode. A distribution is called a bi-modal distribution if it has two data values that appear with the greatest frequency. If a distribution has more than two modes, then the distribution is non-modal.

12 Example 3.10 page 96 Find the modal test score for the following eight test scores: 65, 75, 45, 90, 75, 68, 85, 60 Since 75 appears most frequently, it is the modal test score. Example 3.12 page 96 Find the mode for the following distribution: 5, 4, 6, 10, 6, 4, 10, 3, 8, 7 Since there are more than two data values which appear most frequently (4, 6, and 10 appear twice), we will refer to such a distribution as having no mode.

13 Example 3.14 page 97 The small community of Poorich is applying for federal aid. One question on the application requests the average family income for the community. The following data are the family incomes for the five community residents: $25,000, $20, 000, $200,000, $30,000, $15,000 Find the mean, median and mode. Which one depicts the typical income level accurately?

14 The mean is $58,000 The median is $25,000 The distribution has no mode. Since the mean income of $58,000 is greater than four of the five family incomes, it does not represent the typical income level of the community. The median income of $25,000 more accurately depicts the typical income level of the residents since it is not sensitive to the extreme high income within the community. $25,000 is closer to 4 of the five values: $25,000, $20, 000, $30,000, $15,000.

15 The Relationship of the Mean, Median and Mode – Page 99 I. Symmetric, bell-shaped distribution The Mean, Median and Mode are all located at the center of the distribution. Mean, Median, Mode

16 Skewed to the Left Distribution Both the mean and the median have smaller values than the mode, as they are influenced by the extreme SMALL data values that occur in the left tail of the distribution However, the median is larger than the mean because the median is not influenced as greatly as the mean to the extreme SMALL values in the left tail

17 Skewed to the Right Distribution Both the mean and the median have larger values than the mode, as they are influenced by the extreme LARGE data values that occur in the right tail of the distribution However, the median is less than the mean because the median is not influenced as greatly as the mean to the extreme LARGE values in the right tail

18 The mean is not a resistant measure of central tendency, as it is not resistant to the influence of the extreme values of outliers. ◦ For example, a few very high incomes in a distribution will have a great influence on a mean, even if most data values are lower. The median a more resistant measure of central tendency than the mean, as its value is not significantly influenced by a few extreme data values (outliers) regardless of how large they may be. ◦ For example, a few very high incomes in a distribution will not have a significant influence on the median. The mode is most resistant to outliers.

19 3.2 Measures of Variability What is Variability? Table 3.3 page 103 In Distribution 1, there is NO VARIABILITY since all the data values have the same value. In Distribution 2, there is VARIABILITY because the data values vary from 1 to 100. Distribution 1Distribution 2 41 422 424 431 4100

20 3.2 Measures of Variability We use measures of variability in addition to measures of central tendency to describe the characteristics of a distribution. The three measures of variability are: Range, variance, and standard deviation Range: The range of a distribution is the number representing the difference between the largest and the smallest data values.

21 Example 3.16 page 103 The math test grades for two students are: Student A: 50, 70, 70, 70, 70, 70, 90 Student B: 50, 55, 62, 70, 78, 85, 90 Compute the range for each student Student A: range = 40Student B: range = 40 Examine the variability of the test grades for both students. Do you believe that the grades for both of these students have the same variability? Explain. No. While the range for both students is the same, the grades for student B vary more than the grades for student A. The variability of student B becomes noticeable when examining all the test grades for each student rather than just their extreme grades.

22 Shortcoming of the Range The range gives a quick method for describing variability, but it does not take into consideration the value of all the data values when measuring the variability of a distribution. We couldn’t see that student B had the greater variation of grades by just looking at the range.

23 Variance of a Sample and Population The variance of a sample of n data values is equal to the sum of the squared deviations from the mean divided by (n-1). The variance of a population of N data values:

24 Calculating a Sample Variance Compute the sample variance for the numbers 3, 5, 7, 9 Step 1. Arrange the numbers a column and find the sample mean. 3 5 7 9

25 Calculating a Sample Variance Step 2. Subtract the mean (x-bar) from each term in the given distribution. 33-6 = -3 55-6 = -1 77-6 = 1 99-6 = 3

26 Step 3. Square each result from the previous step and add up the results. Calculating the Sample Variance 33-6 = -3 55-6 = -1 77-6 = 1 99-6 = 3

27 Step 4. Take the result from the previous step and divide by n-1. Calculating the Sample Variance 33-6 = -3 55-6 = -1 77-6 = 1 99-6 = 3

28 The variance is in SQUARE units, but the data values are NOT. By squaring all the values in Step 3, we are squaring data values. This means, that if our data values were initially measured in inches, the sample variance would be in square inches. Since the unit measurements for the variance and the data will always be different (inches vs. square inches), it may be difficult to use and interpret the variance as a measure of variability.

29 Standard Deviation How can we “undo” the squared units of the variance? Take the square root of the sample variance. This is the sample standard deviation. Similarly, if we take the square root of the population variance, we get the population standard deviation.

30 Sample Standard Deviation In the previous example, we found the sample variance, s 2, to be 6.67. The sample standard deviation is: So, 2.58 is the typical deviation from the mean for the set of data values 3, 5, 7, 9

31 Symbols we’ve used so far FormulaSamplePopulation Sample size Mean Variance Standard Deviation

32 Use the calculator to get the sample and population standard deviation Put the data into a list Press STAT -> CALC -> 1 VAR STATS (enter) - >2 nd L1 -> enter To get the variance, square the standard deviation.

33 Example 3.19 page 111 Two high school women basketball players are being considered for an award to be given to the most consistent player with the highest scoring average for the season. The number of points scored per game for all the games played during the season is given below: a) Calculate the population mean and population standard deviation of the number of points scored per game for each player. b) Using the results of part (a) determine which player should receive the award. Player A Points per game 532452581035123721 Player B Points per game 18202215352429192523

34 Example 3.19 page 111 More consistent means smallest standard deviation More varied means largest standard deviation Greatest dispersion means largest standard deviation

35 Example 3.19 page 111 For player A, the population mean and standard deviation is For player B, the population mean and standard deviation is Player B should receive the award for the most consistently outstanding woman’s scorer for the season. Player B had the smaller standard deviation. The smaller standard deviation indicates that player B was more consistent because her points per game were closer to the mean than player A.

36 In Class Example For the distribution 58, 67, 45, 25, 33, 15, 85 a) Find the value of and b) Compute c) Find the data values that are within one standard deviation above the mean. d) Compute e) Find the data values that are within one standard deviation below the mean. f) Compute and g) What percent of the data values are within 1 standard deviation from the mean? h) What percent of the data values are within 2 standard deviations from the mean?

37 Which data values are within 1 standard deviation above the mean? 58, 67 Which data values are within 1 standard deviation below the mean? 25, 33, 45 What percent of data values are within 1 standard deviation (above and below) from the mean? 25, 33, 45, 58, 67 = 5/7 * 100 = 71.43% What percent of data values are within 2 standard deviations from the mean? 15, 25, 33, 45, 58, 67, 85 = 7/7 * 100 = 100% 46.86 24.74 71.6 22.12 -2.62 96.34

38 TEST #1 Wednesday, 2/29. Chapter 2&3 Note: Study homework examples


Download ppt "CHAPTER 3 – Numerical Techniques for Describing Data 3.1 Measures of Central Tendency 3.2 Measures of Variability."

Similar presentations


Ads by Google