Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 3 Numerically Summarizing Data Copyright of the definitions and examples is reserved to Pearson Education, Inc.. In order to use this PowerPoint.

Similar presentations


Presentation on theme: "Chapter 3 Numerically Summarizing Data Copyright of the definitions and examples is reserved to Pearson Education, Inc.. In order to use this PowerPoint."— Presentation transcript:

1 Chapter 3 Numerically Summarizing Data Copyright of the definitions and examples is reserved to Pearson Education, Inc.. In order to use this PowerPoint presentation, the required textbook for the class is the Fundamentals of Statistics, Informed Decisions Using Data, Michael Sullivan, III, fourth edition. Prepared by DWLos Angeles Mission College

2 Chapter 3.1 Measures of Central Tendency Objective A : Mean, Median, and Mode Objective B : Relation Between the Mean, Median, and Distribution Shape Prepared by DWLos Angeles Mission College

3 Population mean: where is each data value and is the population size (the number of observations in the population). The mean of a variable is the sum of all data values divided by the number of observations. A1. Mean Sample mean: where is each data value and in the sample size (the number of observations in the sample). Chapter 3.1 Measures of Central Tendency Objective A : Mean, Median, and Mode Three measures of central of tendency: the mean, the median, and the mode. Prepared by DWLos Angeles Mission College

4 Example 1: Population : Compute the population mean and sample mean from a simple random sample of size 4. Does the sample mean equal to the population mean? Does the population mean or sample mean stay the same? Explain. (a) Population mean : (Round the mean to one more decimal place than that in the raw data) Prepared by DWLos Angeles Mission College

5 (b) Sample mean: From a lottery method, were selected. (c) Does the sample mean equal to the population mean? No. (d) Does the population mean or sample mean stay the same? Explain. stays the same. varies from sample to sample. Prepared by DWLos Angeles Mission College

6 If is odd, the median is the data value in the middle of the data set; the location of the median is the position. If is even, the median is the mean of the two middle observations in the data set that lie in the and position respectively. A2. Median The median,, is the value that lies in the middle of the data when arranged in ascending order. Prepared by DWLos Angeles Mission College

7 Example 1: Find the median of the data given below Reorder: The median is 18. The location of median is at = = = 5 th position Prepared by DWLos Angeles Mission College

8 Example 2: Find the median of the data given below. $35.34 $42.09 $38.72 $43.28 $39.45 $49.36 $30.15 $40.88 Reorder: $30.15 $35.34 $38.72 $39.45 $40.88 $42.09 $43.28 $49.36 The location of median is between and which is between and = 4 th and 5 th position. The median Prepared by DWLos Angeles Mission College

9 A3. Mode Mode is the most frequent observation in the data set. Example 1: Find the mode of the data given below Reorder: Mode = 60 and 80 Example 2: Find the mode of the data given below. A C D C B C A B B F B W F D B W D A D C D Reorder: A A A B B B B B C C C C D D D D D F F W W Mode = B and D Prepared by DWLos Angeles Mission College

10 Example 3: The following data represent the G.P.A. of 12 students Find the mean, median, and mode G.P.A. Reorder: (a) mean Prepared by DWLos Angeles Mission College

11 (b) median (c) mode None. Reorder: th 7th The median The location of median is between and which is between and = 6 th and 7 th position. Prepared by DWLos Angeles Mission College

12 Chapter 3.1 Measures of Central Tendency Objective A : Mean, Median, and Mode Objective B : Relation Between the Mean, Median, and Distribution Shape Prepared by DWLos Angeles Mission College

13 Objective B : Relation Between the Mean, Median, and Distribution Shape  The mean is sensitive to extreme data. For continuous data, if the distribution shape is a bell-shaped curve, the mean is a better measure of central tendency because it includes all data values in a data set.  The median is resistant to extreme data. For continuous data, if the distribution shape is skewed to the right or left, the median is a better measure of central tendency.  The mode is used to represent the measure of central tendency for qualitative data. Prepared by DWLos Angeles Mission College

14 Mean or Median versus Skewness Prepared by DWLos Angeles Mission College

15 Chapter 3.2 Measures of Dispersion Objective A : Range, Variance, and Standard Deviation Objective B : Empirical Rule Objective C : Chebyshev’s Inequality Prepared by DWLos Angeles Mission College

16 Chapter 3.2 Measures of Dispersion (Part I) Measurement of dispersion is a numerical measure that can quantify the spread of data. In this section, the three numerical measures of dispersion that we will discuss are the range, variance, and standard deviation. In the later section, we will discuss another measure of dispersion called interquartile range (IQR). A1. Range Range = = largest data value – smallest data value The range is not resistant because it is affected by extreme values in the data set. Objective A : Range, Variance, and Standard Deviation Prepared by DWLos Angeles Mission College

17 Definition Formula A2. Variance and Standard Deviation Standard Deviation is based on the deviation about the mean. Since the sum of deviation about the mean is zero, we cannot use the average deviation about the mean as a measure of spread. We use the average squared deviation (variance) instead. The population variance,, of a variable is the sum of the squared deviations about the population mean,, divided by the number of observations in the population,. Computational Formula Prepared by DWLos Angeles Mission College

18 The sample variance,, of a variable is the sum of the squared deviations about the sample mean,, divided by the number of observations in the sample minus 1,. Definition Formula Computational Formula Prepared by DWLos Angeles Mission College

19 In order to use the sample variance to obtain an unbiased estimate of the population variance, we divide the sum of the squared deviations about the sample mean by. We call the degree of freedom because the first observations have freedom to be whatever value they wish, but the th value has no freedom in order to force to be zero. The sample standard deviation,, is the square root of the sample variance or. The population standard deviation,, is the square root of the population variance or. To avoid round-off error, never use the rounded value of the variance to compute the standard deviation. Keep a few more decimal places for an intermediate step calculation. Prepared by DWLos Angeles Mission College

20 Example 1: Use the definition formula to find the population variance and standard deviation. Population: 4, 10, 12, 13, 21 Definition formulawhere Population variance: Population standard deviation: Prepared by DWLos Angeles Mission College

21 Example 2: Use the definition formula to find the sample variance and standard deviation. Sample: 83, 65, 91, 84 Definition formulawhere Sample variance: Sample standard deviation: Sample mean: Prepared by DWLos Angeles Mission College

22 Example 3: Use the computational formula to find the sample variance and standard deviation. Sample: 83, 65, 91, 84 (same data set as Example 2) Computational Formula (Sample variance) Sample standard deviation: Prepared by DWLos Angeles Mission College

23 Prepared by DWLos Angeles Mission College Example 4: Use StatCrunch to find the sample variance and standard deviation. Sample: 83, 65, 91, 84 (same data set as Example 2) Step 1: Click StatCrunch navigation button under the Course Home page  Click StatCrunch website  Click Open StatCrunch  Input the raw data in Var 1 column  Click Stat  Click Summary Stats  Columns

24 Prepared by DWLos Angeles Mission College Step 2: Click var1 under Select column(s):  Under Statistics:, choose Variance and Std. dev. (click them while holding Ctrl key on the keyboard)  Click Compute!

25 Prepared by DWLos Angeles Mission College Note : For a small data set, students are expected to calculate the standard deviation by hand. Variance and standard deviation are computed. For more detailed instructions, please download “Q “ by clicking the StatCrunch Handout navigation button of the course homepage.

26 Chapter 3.2 Measures of Dispersion Objective A : Range, Variance, and Standard Deviation Objective B : Empirical Rule Objective C : Chebyshev’s Inequality Prepared by DWLos Angeles Mission College

27 Objective B : Empirical Rule Prepared by DWLos Angeles Mission College

28 The figure below illustrates the Empirical Rule. Prepared by DWLos Angeles Mission College

29 Example 1: SAT Math scores have a bell-shaped distribution with a mean of 515 and a standard deviation of 114. (Source: College Board, 2007) (a) What percentage of SAT scores is between 401 and 629? According to the Empirical Rule, approximately 68% of the data will lie within 1 standard deviation of the mean. 68% of SAT scores is between 401 and 629. Prepared by DWLos Angeles Mission College

30 Example 1: (b) What percentage of SAT scores is between 287 and 743? According to the Empirical Rule, approximately 95% of the data will lie within 2 standard deviations of the mean. 95% of SAT scores is between 287 and 743. Prepared by DWLos Angeles Mission College

31 Example 1: (c) What percentage of SAT scores is less than 401 or greater than 629? – = Prepared by DWLos Angeles Mission College

32 Example 1: (d) What percentage of SAT scores is between 515 and 743? = Prepared by DWLos Angeles Mission College

33 Example 1: (e) About 99.7% of SAT scores will be between what scores? According to the Empirical Rule, approximately 99.7% of the data will lie within 3 standard deviations of the mean. Prepared by DWLos Angeles Mission College

34 Chapter 3.2 Measures of Dispersion Objective A : Range, Variance, and Standard Deviation Objective B : Empirical Rule Objective C : Chebyshev’s Inequality Prepared by DWLos Angeles Mission College

35 Objective C : Chebyshev’s Inequality Prepared by DWLos Angeles Mission College

36 Example 1: According to the U.S. Census Bureau, the mean of the commute time to for a resident to Boston, Massachusetts, is 27.3 minutes. Assume that the standard deviation of the commute time is 8.1 minutes to answer the following: (a)What minimum percentage of commuters in Boston has a commute time within 2 standard deviations of the mean? According to the Chebyshev’s Inequality, at least will lie within 2 standard deviations of the mean. Standard deviation → Prepared by DWLos Angeles Mission College

37 Example 1: (b) (i) What minimum percentage of commuters in Boston has a commute time within 1.5 standard deviations of the mean? (ii) What are the commute times within 1.5 standard deviations of the mean? At least 55.6% of commuters in Boston has a commute time between minutes and minutes 55.6% of commuters in Boston has a commute time. (i) According to the Chebyshev’s Inequality, at least of the data will lie within standard deviations of the mean. Since, (ii) Prepared by DWLos Angeles Mission College

38 Chapter 3.3 Measures of Central Tendency and Dispersion from Grouped Data This section we are going to learn how to calculate the mean,, and the weighted mean,, from data that have already been summarized in frequency distributions (group data). Midpoint = (Adding consecutive lower class limits) ÷ 2 Since raw data cannot be retrieved from a frequency table, the class midpoint is used to represent the mean of the data values within each class. Prepared by DWLos Angeles Mission College

39 Chapter 3.3 Measures of Central Tendency and Dispersion from Grouped Data Objective A : Approximate the sample mean of a variable from grouped data. Objective B : The weighted Mean, Prepared by DWLos Angeles Mission College

40 Objective A : Approximate the sample mean of a variable from grouped data. Sample Mean: is the frequency of the th class where is the midpoint of the th class is the number of classes Prepared by DWLos Angeles Mission College

41 Example 1: The following frequency distribution represents the second test scores of my Math 227 from last semester. Approximate the mean of the score. Prepared by DWLos Angeles Mission College

42 The mean of the score : From the previous slide, Prepared by DWLos Angeles Mission College

43 Chapter 3.3 Measures of Central Tendency and Dispersion from Grouped Data Objective A : Approximate the sample mean of a variable from grouped data. Objective B : The weighted Mean, Prepared by DWLos Angeles Mission College

44 Objective B : The weighted Mean, We compute the weighted mean when data values are not weighted equally. where is the weight of the th observation is the value of the th observation Prepared by DWLos Angeles Mission College

45 Example 1: Michael and Kevin want to buy nuts. They can't agree on whether they want peanuts, cashews, or almonds. They agree to create a mix. They bought 2.5 pounds of peanuts for $1.30 per pound, 4 pounds of cashews for $4.50 per pounds, and 2 pounds of almonds for $3.75 per pound. Determine the price per pound of the mix. The price per pound of the mix : Prepared by DWLos Angeles Mission College

46 Example 2: In Marissa's calculus course, attendance counts for 5% of the grade, quizzes count for 10% of the grade, exams count for 60% of the grade, and the final exam counts for 25% of the grade. Marissa had a 100% average for attendance, 93% for quizzes, 86% for exams, and 85% on the final. Determine Marissa's course average. Marissa’s course average : Prepared by DWLos Angeles Mission College

47 Ch 3.4 Measures of Positions and Outliers Objective A : -scores Objective B : Percentiles and Quartiles Objective C : Outliers Prepared by DWLos Angeles Mission College

48 Ch3.4 Measures of Positions and Outliers Objective A : -scores The -score represents the distance that a data value is from the mean in terms of the number of standard deviations. Population -score: Sample -score: Measures of position determine the relative position of a certain data value within the entire set of data. Prepared by DWLos Angeles Mission College

49 Example 1: The average 20- to 29-year-old man is 69.6 inches tall, with a standard deviation of 3.0 inches, while the average 20- to 29-year-old woman is 64.1 inches tall, with a standard deviation of 3.8 inches. Who is relatively taller, a 67-inch man or 62-inch woman? 0.87 standard deviation below the mean standard deviation below the mean Therefore, the 62-inch woman is relatively taller than the 67-inch man. Man : Woman : Prepared by DWLos Angeles Mission College

50 Ch 3.4 Measures of Positions and Outliers Objective A : -scores Objective B : Percentiles and Quartiles Objective C : Outliers Prepared by DWLos Angeles Mission College

51 Objective B : Percentiles and Quartiles The th percentile,, of a set of data is a value such that percent of the observations are less than or equal to the value. Example 1: Explain the meaning of the 5th percentile of the weight of males 36 months of age is 12.0 kg. 5% of 36-month-old males weighs 12.0 kg or less. 95% of 36-month-old males weighs more than 12.0 kg. B1. Percentiles Prepared by DWLos Angeles Mission College

52 The second quartile,, is equivalent to. The most common percentiles are quartiles. The first quartile,, is equivalent to. The third quartile,, is equivalent to. Prepared by DWLos Angeles Mission College

53 Example 2: Determine the quartiles of the following data Lower half of the data : Ascending order : Upper half of the data : Prepared by DWLos Angeles Mission College

54 The interquartile range, IQR, is the measure of dispersion that is based on quartiles. The range and standard deviation are effected by extreme values. The IQR is resistant to extreme values. B2. Interquartile Prepared by DWLos Angeles Mission College

55 Example 1: One variable that is measured by online homework systems is the amount of time a student spends on homework for each section of the text. The following is a summary of the number of minutes a student spends for each section of the text for the fall 2007 semester in a College Algebra class at Joliet Junior College. (a) Provide an interpretation of these results. 25% of the students spend 42 minutes or less on homework for each section, and 75% of the students spend more than 42 minutes. 50% of the students spend 51.5 minutes or less on homework for each section, and 50% of the students spend more than 51.5 minutes. 75% of the students spend 72.5 minutes or less on homework for each section, and 25% of the students spend more than 72.5 minutes. Prepared by DWLos Angeles Mission College

56 (b) Determine and interpret the interquartile range. (c) Do you believe that the distribution of time spent doing homework is skewed or symmetric? Why? The middle of 50% of all students has a range of 30.5 minutes of time spent on homework. Skewed right. The difference between and is less than the difference between and. Prepared by DWLos Angeles Mission College

57 Prepared by DWLos Angeles Mission College

58 Ch 3.4 Measures of Positions and Outliers Objective A : -scores Objective B : Percentiles and Quartiles Objective C : Outliers Prepared by DWLos Angeles Mission College

59 Extreme observations are called outliers; they may occur by error in the measurement or during data entry or from errors in sampling. Objective C : Outliers Prepared by DWLos Angeles Mission College

60 Example 1: The following data represent the hemoglobin ( in g/dL ) for 20 randomly selected cats. (Source: Joliet Junior College Veterinarian Technology Program) (a) Determine the quartiles. Ascending order : Prepared by DWLos Angeles Mission College

61 (b) Compute and interpret the interquartile range, IQR. Lower half of the data: Upper half of the data: Prepared by DWLos Angeles Mission College

62 (c) Determine the lower and upper fences. Are there any outliers, according to this criterion? All data falls within 6.23 to except is the outlier. Ascending order of the original data : Prepared by DW Los Angeles Mission College

63 Objective A : The Five-Number Summary Ch 3.5 The Five-Number Summary and Boxplots Objective B : Boxplots Objective C : Using a Boxplot to describe the shape of a distribution Prepared by DWLos Angeles Mission College

64 Objective A : The Five-Number Summary Ch 3.5 The Five-Number Summary and Boxplots Prepared by DWLos Angeles Mission College

65 Example 1: The number of chocolate chips in a randomly selected 21 name-brand cookies were recorded. The results are shown Find the Five-Number Summary. Ascending order : Lower half of the data: Upper half of the data: Five-number summary: Minimum = 19, = 22.5, = 25, = 28.5, Maximum = 33 Prepared by DWLos Angeles Mission College

66 Objective A : The Five-Number Summary Ch 3.5 The Five-Number Summary and Boxplots Objective B : Boxplots Objective C : Using a Boxplot to describe the shape of a distribution Prepared by DWLos Angeles Mission College

67 Objective B : Boxplots The five-number summary can be used to construct a graph called the boxplot. Prepared by DWLos Angeles Mission College

68 Example 1: A stockbroker recorded the number of clients she saw each day over an 11-day period. The data are shown. Draw a boxplot Since all data fall between the lower fence, 12, and upper fence, 60. There is no outlier Ascending order : Prepared by DWLos Angeles Mission College

69 Objective A : The Five-Number Summary Ch 3.5 The Five-Number Summary and Boxplots Objective B : Boxplots Objective C : Using a Boxplot to describe the shape of a distribution Prepared by DWLos Angeles Mission College

70 Objective C : Using a Boxplot to describe the shape of a distribution Prepared by DWLos Angeles Mission College

71 Example 1:Use the side-by-side boxplots shown to answer the questions that follow. (a) To the nearest integer, what is the median of variable ? (b) To the nearest integer, what is the first quartile of variable ? Prepared by DWLos Angeles Mission College

72 (e) Describe the shape of the variable. Support your position. (c) Which variable has more dispersion? Why? (d) Does the variable have any outliers? If so, what is the value of the outlier? The variable has more dispersion because the IQR on is wider than the IQR on the variable. Yes, there is an asterisk on the right side of the boxplot. Outliers Since there is a longer whisker on the left and is bigger than, the shape of the distribution is skewed to the left. Prepared by DWLos Angeles Mission College

73 Example 2: The following data represent the carbon dioxide emissions per capita (total carbon dioxide emissions, in tons, divided by total population) for the countries of Western Europe in Prepared by DWLos Angeles Mission College

74 (b) Determine the lower and upper fences. (a) Find the five-number summary. Ascending order: Minimum = 1.01, = 1.61, = 2.165, = 2.68, Maximum = 6.81 Prepared by DWLos Angeles Mission College

75 Since there are two extreme large outliers, the shape of the distribution is skewed to the right. (c) Construct a boxplot is a mild outlier which is represented by an asterisk. (d) Use the boxplot and quartiles to describe the shape of the distribution is an extreme outlier because it is larger than. An extreme outlier is presented by an open circle. Prepared by DWLos Angeles Mission College

76 Note: Part (a) and (c) can be easily done by using StatCrunch. For the instructions, please refer to the StatCrunch handout. Prepared by DWLos Angeles Mission College


Download ppt "Chapter 3 Numerically Summarizing Data Copyright of the definitions and examples is reserved to Pearson Education, Inc.. In order to use this PowerPoint."

Similar presentations


Ads by Google