Presentation is loading. Please wait.

Presentation is loading. Please wait.

3.1 - 1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1.

Similar presentations


Presentation on theme: "3.1 - 1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1."— Presentation transcript:

1 3.1 - 1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Review and Preview 3-2 Measures of Center 3-3 Measures of Variation 3-4 Measures of Relative Standing and Boxplots

2 3.1 - 2 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.  Descriptive Statistics In this chapter we’ll learn to summarize or describe the important characteristics of a known set of data  Inferential Statistics In later chapters we’ll learn to use sample data to make inferences or generalizations about a population Preview

3 3.1 - 3 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 3-2 Measures of Center

4 3.1 - 4 Measure of Center  Measure of Center the value at the center or middle of a data set  The three common measures of center are the mean, the median, and the mode.

5 3.1 - 5 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.  Mean the measure of center obtained by adding the data values and then dividing the total by the number of values What most people call an average also called the arithmetic mean.

6 3.1 - 6 Notation  Greek letter sigma used to denote the sum of a set of values. x is the variable usually used to represent the data values. n represents the number of data values in a sample.

7 3.1 - 7 Example of summation If there are n data values that are denoted as: Then:

8 3.1 - 8 Example of summation data Then: 21,25,32,48,53,62,62,64

9 3.1 - 9 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Sample Mean x = n  x x is pronounced ‘x-bar’ and denotes the mean of a set of sample values x

10 3.1 - 10 Example of Sample Mean data Then: 21,25,32,48,53,62,62,64

11 3.1 - 11 Notation µ Greek letter mu used to denote the population mean N represents the number of data values in a population.

12 3.1 - 12 Population Mean N µ =  x x Note: here x represents the data values in the population

13 3.1 - 13 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.  Advantages  Is relatively reliable: means of samples drawn from the same population don’t vary as much as other measures of center  Takes every data value into account Mean

14 3.1 - 14 Mean  Disadvantage Is sensitive to every data value, one extreme value can affect it dramatically; is not a resistant measure of center Example: 21,25,32,48,53,62,62,64 → 21,25,32,48,53,62,62,300 →

15 3.1 - 15 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Median  Median the measure of center which is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude  often denoted by x (pronounced ‘x-tilde’) ~  is not affected by an extreme value - is a resistant measure of the center

16 3.1 - 16 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Finding the Median 1.If the number of data values is odd, the median is the number located in the exact middle of the list. 2.If the number of data values is even, the median is found by computing the mean of the two middle numbers. First sort the values (arrange them in order), the follow one of these

17 3.1 - 17 Example of Median 6 data values: 5.40 1.10 0.420.73 0.48 1.10 Sorted data: 0.42 0.48 0.731.10 1.10 5.40 (even number of values – no exact middle)

18 3.1 - 18 Example of Median 7 data values: 5.40 1.10 0.420.73 0.48 1.10 0.66 Sorted data: 0.42 0.48 0.660.73 1.10 1.10 5.40

19 3.1 - 19  Mode the value that occurs with the greatest frequency  Data set can have one, more than one, or no mode

20 3.1 - 20 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same greatest frequency Multimodalmore than two data values occur with the same greatest frequency No Modeno data value is repeated

21 3.1 - 21 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. a) 5.40 1.10 0.42 0.73 0.48 1.10 b) 27 27 27 55 55 55 88 88 99 c) 1 2 3 6 7 8 9 10 Mode - Examples  Mode is 1.10  Bimodal - 27 & 55  No Mode

22 3.1 - 22 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.  Midrange the value midway between the maximum and minimum data values in the original data set (average of highest and lowest data values) Definition = maximum data value + minimum data value 2

23 3.1 - 23 Example of Midrange data values: 5.41 1.13 0.420.49 0.65 1.86 1.69

24 3.1 - 24 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.  Sensitive to extremes because it uses only the maximum and minimum values, so rarely used Midrange  Redeeming Features (1)very easy to compute (2) reinforces that there are several ways to define the center (3)avoids confusion with median

25 3.1 - 25 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Best Measure of Center

26 3.1 - 26 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Carry one more decimal place than is present in the original set of values. Round-off Rule for Measures of Center

27 3.1 - 27 Example of Round-off Rule data values: 5.41 1.13 0.420.49 0.65 1.86 1.69

28 3.1 - 28 Example problem 12, page 95 These data values represent weight gain or loss in kg for a simple random sample (SRS) of18 college freshman (negative data values indicate weight loss) 11 3 0 -2 3 -2 -2 5 -2 7 2 4 1 8 1 0 -5 2 Do these values support the legend that college students gain 15 pounds (6.8 kg) during their freshman year? Explain

29 3.1 - 29 Sample Mean Example problem 12, page 95 Median -5 -2 -2 -2 -2 0 0 1 1 2 2 3 3 4 5 7 8 11

30 3.1 - 30 Example problem 12, page 95 Mode is -2 Midrange

31 3.1 - 31 Example problem 12, page 95 All of the measures of center are below 6.8 kg (15 pounds) Based on measures of center, these data values do not support the idea that college students gain 15 pounds (6.8 kg) during their freshman year

32 3.1 - 32 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Think about the method used to collect the sample data. See example 7, page 89 Critical Thinking Think about whether the results are reasonable. See example 6, page 89

33 3.1 - 33 First, enter the list of data values Then select 2nd STAT (LIST) and arrow right to MATH option 3:mean( or 4: median( and input the desired list Mean/Median with Graphing Calculator

34 3.1 - 34 Example of Computing the Mean Using Calculator Sorted amounts of Strontium-90 (in millibecquerels) in a simple random sample of baby teeth obtained from Philadelphia residents born after 1979 Note: this data is related to Three Mile Island nuclear power plant Accident in 1979. x = n  x x = 149.2

35 3.1 - 35 Example of Computing the Mean Using Calculator Median is 150

36 3.1 - 36 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Beyond the Basics of Measures of Center Part 2

37 3.1 - 37 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Assume that all sample values in each class are equal to the class midpoint. Mean from a Frequency Distribution

38 3.1 - 38 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. use class midpoints for variable x Mean from a Frequency Distribution

39 3.1 - 39 Example of Computing the Mean from a Frequency Distribution 676465 64 5967 7265 6462666766 6070686164 6068656662 Heights (inches) of 25 Women

40 3.1 - 40 Example (cont.) Frequency distribution of heights of women HEIGHTFREQUENCY 59-603 61-623 63-644 65-667 67-686 69-701 71-721 Class midpoints are: 59.5, 61.5, 63.5, 65.5, 67.5, 69.5, 71.5

41 3.1 - 41 Example (cont.) HEIGHTFREQUENCY 59-603 61-623 63-644 65-667 67-686 69-701 71-721

42 3.1 - 42 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Weighted Mean x = w  (w x)  When data values are assigned different weights, we can compute a weighted mean.

43 3.1 - 43 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: Weighted Mean In this class, homework/quizz average is weighted 10%, 2 exams are weighted 60%, and final exam is weighted 30%. Suppose a student makes homework/quiz average 87, exam scores of 80 and 92, and final exam score 85. Compute the weighted average.

44 3.1 - 44 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: Weighted Mean Weighted average is:

45 3.1 - 45 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.  Symmetric distribution of data is symmetric if the left half of its histogram is roughly a mirror image of its right half  Skewed distribution of data is skewed if it is not symmetric and extends more to one side than the other Skewed and Symmetric

46 3.1 - 46 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.  Skewed to the left (also called negatively skewed) have a longer left tail, mean and median are to the left of the mode  Skewed to the right (also called positively skewed) have a longer right tail, mean and median are to the right of the mode Skewed Left or Right

47 3.1 - 47 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. The mean and median cannot always be used to identify the shape of the distribution. Shape of the Distribution

48 3.1 - 48 Shape of the Distribution Consider the data: 55 5 5 5 10 10 10 10 10 10 15 15 15 15 15 20 20 20 20 25 25 25 30 30 30 35 35 40 45 Mean: Median: The distribution is skewed and the mean is “pulled” in the direction of the outliers 40 and 45

49 3.1 - 49 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Distribution Skewed Left Outliers that are much less than the median

50 3.1 - 50 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Symmetric Distribution No outliers

51 3.1 - 51 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Distribution Skewed Right Outliers that are much more than the median

52 3.1 - 52 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Recap In this section we have discussed:  Types of measures of center Mean Median Mode  Mean from a frequency distribution  Weighted means  Best measures of center  Skewness

53 3.1 - 53 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Section 3-3 Measures of Variation

54 3.1 - 54 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Key Concept Discuss characteristics of variation, in particular, measures of variation, such as standard deviation, for analyzing data. Make understanding and interpreting the standard deviation a priority.

55 3.1 - 55 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Basics Concepts of Measures of Variation Part 1

56 3.1 - 56 Why is it important to understand variation? A measure of the center by itself can be misleading Example: Two nations with the same median family income are very different if one has extremes of wealth and poverty and the other has little variation among families (see the following table).

57 3.1 - 57 Example of variation Data Set A Data Set B 50,00010,000 60,00020,000 70,00070,000 80,000120,000 90,000130,000 MEAN70,00070,000 MEDIAN70,00070,000 Data set B has more variation about the mean

58 3.1 - 58 Histograms: example of variation Data set B has more variation about the mean (Target).

59 3.1 - 59 How do we quantify variation?

60 3.1 - 60 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Definition The range of a set of data values is the difference between the maximum data value and the minimum data value. Range = (maximum value) – (minimum value)

61 3.1 - 61 Range = 30 - 6 = 24 Example of range. Data: 27 28 25 6 27 30 26

62 3.1 - 62 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Range (cont.) This shows that the range is very sensitive to extreme values; therefore not as useful as other measures of variation. Ignoring the outlier of 6 in the previous data set gives data 27 28 25 27 30 26 Range = 30 - 25 = 5

63 3.1 - 63 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Definition The standard deviation of a set of sample values, denoted by s, is a measure of variation of values about the mean.

64 3.1 - 64 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Sample Standard Deviation Formula  ( x – x ) 2 n – 1 s =s =

65 3.1 - 65 Steps to calculate the sample standard deviation 1.Calculate the sample mean 2.Find the squared deviations from the sample mean for each sample data value: 3.Add the squared deviations 4.Divide the sum in step 3 by n-1 5.Take the square root of the quotient in step 4

66 3.1 - 66 Example: Standard Deviation Given the data set: 8, 5, 12, 8, 9, 15, 21, 16, 3 Find the standard deviation

67 3.1 - 67 Example: Standard Deviation Find the mean

68 3.1 - 68 Data Value Squared Deviations From the Mean 8 5 12 8 9 15 21 16 3

69 3.1 - 69 Example: Standard Deviation Add the squared deviations (last column in the table above)

70 3.1 - 70 Divide the sum by 9-1=8: Take the square root: Example: Standard Deviation

71 3.1 - 71 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Round-Off Rule for Measures of Variation When rounding the value of a measure of variation, carry one more decimal place than is present in the original set of data. Round only the final answer, not values in the middle of a calculation.

72 3.1 - 72 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Sample Standard Deviation (Shortcut Formula) n ( n – 1) s =s = n  x 2 ) – (  x ) 2

73 3.1 - 73 Example: Standard Deviation Page 110, problem 10 Data: 2 1 1 1 1 1 1 4 1 2 2 1 2 3 3 2 3 1 3 1 3 1 3 2 2 The range is 4 - 1 = 3 Determine the standard deviation using the previous formula

74 3.1 - 74 Example: Standard Deviation We need to find each the following:

75 3.1 - 75 Data Table (25 data values) TOTALS: 47 109

76 3.1 - 76 Example: Standard Deviation Thus:

77 3.1 - 77 Example: Standard Deviation And:

78 3.1 - 78 Example: Standard Deviation Page 110, problem 10 Do the results make sense? NO: although we can determine the standard deviation, the results make no sense because the data is actually categorical (describing phenotype) and do not represent counts. Note that if we renumber the categories, the standard deviation would change but the results would not.

79 3.1 - 79 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Standard Deviation - Important Properties  The standard deviation is a measure of variation of all values from the mean.  The value of the standard deviation s is never negative and usually not zero.  The value of the standard deviation s can increase dramatically with the inclusion of one or more outliers (data values far away from all others).  The units of the standard deviation s are the same as the units of the original data values.

80 3.1 - 80 Calculator Example Data from problem on page 111, problem 20 is strontium- 90 data from previous class examples

81 3.1 - 81 Calculator Example Amounts of Strontium-90 (in millibecquerels) in a simple random sample of baby teeth obtained from Philadelphia residents born after 1979 Note: this data is related to Three Mile Island nuclear power plant Accident in 1979. Find standard deviation DIRECTIONS:

82 3.1 - 82 To compute standard deviation: 1) Enter the list of data values 2) Select 2nd STAT (LIST) and arrow right to MATH option 7:stdDev( 3) Input the data by choosing the desired list Previous example: s = 15.0 millibecquerels Calculator Example

83 3.1 - 83 Calculator Example Press STAT. Arrow to the right to CALC. Now choose option #1: 1-Var Stats. When 1-Var Stats appears on the home screen, tell the calculator the name of the list containing the data

84 3.1 - 84 Calculator Example Strontium-90 data, 1-Var Stats Displays: ETC.

85 3.1 - 85 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Population Standard Deviation 2  ( x – µ ) N  = This formula is similar to the previous formula, but instead, the population mean and population size are used.

86 3.1 - 86 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education  Population variance:  2 - Square of the population standard deviation  Variance  The variance of a set of values is a measure of variation equal to the square of the standard deviation.  Sample variance: s 2 - Square of the sample standard deviation s

87 3.1 - 87 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Unbiased Estimator The sample variance s 2 is an unbiased estimator of the population variance  2, which means values of s 2 tend to target the value of  2 instead of systematically tending to overestimate or underestimate  2.

88 3.1 - 88 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Variance  Unlike standard deviation, the units of variance do not match the units of the original data set, they are the square of the units in the original data set.

89 3.1 - 89 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Variance - Notation s = sample standard deviation s 2 = sample variance  = population standard deviation  2 = population variance

90 3.1 - 90 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Beyond the Basics of Measures of Variation Part 2

91 3.1 - 91 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Range Rule of Thumb for many data sets, the vast majority (such as 95%) of sample values lie within two standard deviations of the mean.

92 3.1 - 92 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Range Rule of Thumb usual values in a data set are those that are “typical” and not too extreme rough estimates of the minimum and maximum “usual” sample values can be computed using the “range rule of thumb”

93 3.1 - 93 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Range Rule of Thumb minimum “usual” value = (mean) – 2  (standard deviation)

94 3.1 - 94 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Range Rule of Thumb maximum “usual” value = (mean) + 2  (standard deviation)

95 3.1 - 95 Range Rule of Thumb  Usual values fall between the maximum and minimum usual values:  Otherwise the value is unusual

96 3.1 - 96 Example: Standard Deviation Page 110, problem 12 Data (kg): 113 0 -2 3 -2 -2 5 -2 7 2 4 1 8 1 0 -5 2

97 3.1 - 97 Example (cont.) Thus:

98 3.1 - 98 Example (cont.)

99 3.1 - 99 Example (cont.) Minimum usual value Maximum usual value

100 3.1 - 100 Example (cont.) Because 6.8 kg falls between the minimum usual value and the maximum usual value, it is not considered “unusual”

101 3.1 - 101 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Range Rule of Thumb for Estimating Standard Deviation s To roughly estimate the standard deviation from a collection of known sample data use where range = (maximum value) – (minimum value)

102 3.1 - 102 Example: Standard Deviation Page 110, problem 12 Data (kg): 113 0 -2 3 -2 -2 5 -2 7 2 4 1 8 1 0 -5 2 Range rule of thumb:

103 3.1 - 103 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Properties of the Standard Deviation Measures the variation among data values Values close together have a small standard deviation, but values with much more variation have a larger standard deviation Has the same units of measurement as the original data

104 3.1 - 104 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Properties of the Standard Deviation For many data sets, a value is unusual if it differs from the mean by more than two standard deviations Compare standard deviations of two different data sets only if the they use the same scale and units, and they have means that are approximately the same

105 3.1 - 105 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Empirical (or 68-95-99.7) Rule For data sets having a distribution that is approximately bell shaped, the following properties apply:  About 68% of all values fall within 1 standard deviation of the mean.  About 95% of all values fall within 2 standard deviations of the mean.  About 99.7% of all values fall within 3 standard deviations of the mean.

106 3.1 - 106 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education The Empirical Rule

107 3.1 - 107 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education The Empirical Rule

108 3.1 - 108 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education The Empirical Rule

109 3.1 - 109 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Example: The Empirical Rule Page 113, problem 34 One standard deviation from the mean is between: 68% of the data are between 124.7 volts and 125.3 volts

110 3.1 - 110 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Example: The Empirical Rule Page 113, problem 34 Two standard deviations from the mean is between: 95% of the data are between 124.4 volts and 125.6 volts

111 3.1 - 111 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Example: The Empirical Rule Page 113, problem 34 Three standard deviations from the mean is between: 99.7% of the data are between 124.1 volts and 125.9 volts

112 3.1 - 112 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Example: The Empirical Rule Page 113, problem 34 a) 95% of the data are between 124.4 volts and 125.6 volts b) 99.7% of the data are between 124.1 volts and 125.9 volts

113 3.1 - 113 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Chebyshev’s Theorem The proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1–1/K 2, where K is any positive number greater than 1.  For K = 2, at least 3/4 (or 75%) of all values lie within 2 standard deviations of the mean.  For K = 3, at least 8/9 (or 89%) of all values lie within 3 standard deviations of the mean.

114 3.1 - 114 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Example: Chebyshev’s Theorem Page 113, problem 36 By Chebyshev’s Theorem there must be within three standard deviations of the mean

115 3.1 - 115 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Example: Chebyshev’s Theorem Page 113, problem 36 The minimum voltage amount within 3 standard deviations of the mean is and the maximum amount is

116 3.1 - 116 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Comparing Variation in Different Samples It’s a good practice to compare two sample standard deviations with s only when the sample means are approximately the same. In general, it is better to use the coefficient of variation when comparing any two standard deviations.

117 3.1 - 117 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Coefficient of Variation The coefficient of variation (or CV) for a set of nonnegative sample or population data, expressed as a percent, describes the standard deviation relative to the mean. SamplePopulation s x CV =  100%  CV =   100%

118 3.1 - 118 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Example: Comparing Variation Page 112, problem 24 Waiting times for Jefferson Valley:

119 3.1 - 119 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Example: Comparing Variation Page 112, problem 24 Waiting times for Providence:

120 3.1 - 120 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Example: Comparing Variation Page 112, problem 24 CV for Jefferson Valley wait times much smaller than CV for Providence wait times implies that variation for Jefferson Valley is much smaller than that for Providence

121 3.1 - 121 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education  ( x – x ) 2 n – 1 s =s = Formula for Standard Deviation of a Sample Why do we use n-1 instead of n?

122 3.1 - 122 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Rationale for using n – 1 versus n There are only n – 1 independent values. With a given mean, only n – 1 values can be freely assigned any number before the last value is determined (see homework problem from 3-2 number 35).

123 3.1 - 123 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Rationale for using n – 1 versus n Dividing by n – 1 yields better results than dividing by n. It causes s 2 to target  2 whereas division by n causes s 2 to underestimate  2.

124 3.1 - 124 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.Copyright © 2010 Pearson Education Recap In this section we have looked at:  Range  Standard deviation of a sample and population  Variance of a sample and population  Coefficient of variation (CV)  Range rule of thumb  Empirical distribution  Chebyshev’s theorem

125 3.1 - 125 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 3-4 Measures of Relative Standing and Boxplots

126 3.1 - 126 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Key Concepts This section introduces measures of relative standing, which are numbers showing the location of data values relative to the other values within a data set. Measures of relative standing can be used to compare values from different data sets, or to compare values within the same data set.

127 3.1 - 127 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Key Concepts The most important concept is the z score. We will also discuss percentiles and quartiles, as well as a new statistical graph called the boxplot.

128 3.1 - 128 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Basics of z Scores, Percentiles, Quartiles, and Boxplots Part 1

129 3.1 - 129 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.  z Score (or standardized value) the number of standard deviations that a given value x is above or below the mean Z score

130 3.1 - 130 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Sample Population x – µ z =  Round z scores to 2 decimal places Measures of Position z Score z = x – x s

131 3.1 - 131 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Interpreting Z Scores Whenever a value is less than the mean, its corresponding z score is negative Ordinary values: –2 ≤ z score ≤ 2 Unusual Values: z score 2

132 3.1 - 132 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Z score Problem 6 on page 127 Given mean and standard deviation: a) Difference between Hoffman’s age and mean age is

133 3.1 - 133 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Z score Problem 6 on page 127 Note: book answer is absolute value of the difference between Hoffman’s age and mean age which is

134 3.1 - 134 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Z score Problem 6 on page 127 b) How many standard deviations is that? Here take the ratio of the absolute value of the difference and the standard deviation:

135 3.1 - 135 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Z score Problem 6 on page 127 c) Convert Hoffman’s age to a z score Here take the ratio of the difference and the standard deviation:

136 3.1 - 136 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Z score Problem 6 on page 127 d) Is Hoffman’s age usual or unusual? “Usual” means z score is between - 2 and 2 and since -0.65 is between - 2 and 2, Hoffman’s age is usual.

137 3.1 - 137 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Z score Problem 10 on page 127 Given mean and standard deviation for heights of women:

138 3.1 - 138 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Z score Problem 10 on page 127 Z scores for height requirements for women in army Minimum height requirement z score

139 3.1 - 139 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Z score Problem 10 on page 127 Z scores for height requirements for women in army Maximum height requirement z score

140 3.1 - 140 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Z score Problem 10 on page 127 Minimum height requirement z score is less than -2 so it is considered unusual Maximum height requirement z score is greater than 2 so it is considered unusual

141 3.1 - 141 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Z score Problem 13 on page 128 Score on SAT test is 1190 with mean of scores on SAT test 1518 and standard deviation 325 has z score of

142 3.1 - 142 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Z score Problem 13 on page 128 Score on ACT test is 26 with mean of scores on ACT test 21 and standard deviation 4.8 has z score of

143 3.1 - 143 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Z score Problem 13 on page 128 Comparing z scores the ACT z score was better than the SAT z score which means the ACT score was higher above the mean than the SAT score (relative to the standard deviation) so the ACT score is better.

144 3.1 - 144 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Percentiles are measures of location. There are 99 percentiles denoted P 1, P 2,... P 99, which divide a set of data into 100 groups with about 1% of the values in each group.

145 3.1 - 145 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Finding the Percentile of a Data Value Percentile of value x =

146 3.1 - 146 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Finding Percentile Problem 16 on page 128 Data (NFL superbowl total points): 36 37 37 39 39 41 43 44 44 47 50 53 54 55 56 56 57 59 61 61 65 69 69 75 Find the percentile corresponding to 65

147 3.1 - 147 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Finding Percentile Problem 16 on page 128 Percentile of value x =

148 3.1 - 148 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. n total number of values in the data set k percentile being used L locator that gives the position of a value P k k th percentile L = n k 100 Notation Converting from the kth Percentile to the Corresponding Data Value

149 3.1 - 149 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Converting from the kth Percentile to the Corresponding Data Value

150 3.1 - 150 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Finding Data Value Given Its Percentile Problem 22 on page 128 Data (NFL superbowl total points): 36 37 37 39 39 41 43 44 44 47 50 53 54 55 56 56 57 59 61 61 65 69 69 75 Find the data value at P 80

151 3.1 - 151 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Finding Data Value Given Its Percentile Problem 22 on page 128 For 80 th percentile, the k value in the locator formula is 80 If L is not a whole number, then round up to next whole number: P 80

152 3.1 - 152 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Finding Data Value Given Its Quartile The 20 th score is 61: 36 37 37 39 39 41 43 44 44 47 50 53 54 55 56 56 57 59 61 61 65 69 69 75 Thus:

153 3.1 - 153 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Quartiles  Q 1 (First Quartile) separates the bottom 25% of sorted values from the top 75%.  Q 2 (Second Quartile) same as the median; separates the bottom 50% of sorted values from the top 50%.  Q 3 (Third Quartile) separates the bottom 75% of sorted values from the top 25%. Are measures of location, denoted Q 1, Q 2, and Q 3, which divide a set of data into four groups with about 25% of the values in each group.

154 3.1 - 154 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Quartiles The first quartile is the 25 th percentile The second quartile is the 50 th percentile The third quartile is the 75 th percentile

155 3.1 - 155 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Finding Data Value Given Its Quartile Problem 20 on page 128 Data (NFL superbowl total points): 36 37 37 39 39 41 43 44 44 47 50 53 54 55 56 56 57 59 61 61 65 69 69 75 Find the data value at

156 3.1 - 156 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Finding Data Value Given Its Quartile Convert quartile to percentile: For 25 th percentile, the k value in the locator formula is 25 P 25

157 3.1 - 157 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Finding Data Value Given Its Quartile If L is a whole number, then round up to next whole number, find the percentile by adding the Lth value and the next value then dividing by 2 Sixth value is 41 and seventh value is 43:

158 3.1 - 158 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Q 1, Q 2, Q 3 divide ranked scores into four equal parts Quartiles 25% Q3Q3 Q2Q2 Q1Q1 (minimum)(maximum) (median)

159 3.1 - 159 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.  Interquartile Range (or IQR): Q 3 – Q 1  10 - 90 Percentile Range: P 90 – P 10  Semi-interquartile Range: 2 Q 3 – Q 1  Midquartile: 2 Q 3 + Q 1 Some Other Statistics

160 3.1 - 160 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.  For a set of data, the 5-number summary consists of the minimum value; the first quartile Q 1 ; the median (or second quartile Q 2 ); the third quartile, Q 3 ; and the maximum value. 5-Number Summary

161 3.1 - 161 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.  A boxplot (or box-and-whisker- diagram) is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile, Q 1 ; the median; and the third quartile, Q 3. Boxplot

162 3.1 - 162 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Boxplots Boxplot of Movie Budget Amounts (See example 7 and 8 on page 121)

163 3.1 - 163 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Boxplots - Normal Distribution Normal Distribution: Heights from a Simple Random Sample of Women

164 3.1 - 164 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Boxplots - Skewed Distribution Skewed Distribution: Salaries (in thousands of dollars) of NCAA Football Coaches

165 3.1 - 165 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Five Number Summary and Boxplots Problem 30 on page 128 uses 10 data values from Strontium-90 data ordered as follows: 128 130 133 137 138 142 142 144 147 149 151 151 151 155 156 161 163 163 166 172

166 3.1 - 166 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Five Number Summary and Boxplots Min data value is 128 First quartile

167 3.1 - 167 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Five Number Summary and Boxplots Second quartile

168 3.1 - 168 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Five Number Summary and Boxplots Third quartile

169 3.1 - 169 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Five Number Summary and Boxplots Max data value is 172 Five Number Summary 128 140 150 158.5 172 Boxplot?

170 3.1 - 170 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example of Five Number Summary Boxplot

171 3.1 - 171 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Calculator: Five Number Summary Five Number Summary (from website mathbits.com) Enter the data in a list:

172 3.1 - 172 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Calculator: Five Number Summary Go to STAT - CALC and choose 1-Var Stats

173 3.1 - 173 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Calculator: Five Number Summary On the HOME screen, when 1-Var Stats appears, type the list containing the data.

174 3.1 - 174 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Calculator: Five Number Summary Arrow down to the five number summary (last five items in the list)

175 3.1 - 175 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Calculator: Boxplot CLEAR out the graphs under y = (or turn them off). Enter the data into the calculator lists. (choose STAT, #1 EDIT and type in entries)

176 3.1 - 176 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Calculator: Boxplot CLEAR out the graphs under y = (or turn them off). Enter the data into the calculator lists. (choose STAT, #1 EDIT and type in entries)

177 3.1 - 177 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Calculator: Boxplot Press 2nd STATPLOT and choose #1 PLOT 1. Be sure the plot is ON, the second box-and- whisker icon is highlighted, and that the list you will be using is indicated next to Xlist.

178 3.1 - 178 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Calculator: Boxplot To see the box-and-whisker plot, press ZOOM and #9 ZoomStat. Press the TRACE key to see on-screen data about the box-and-whisker plot.

179 3.1 - 179 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Outliers and Modified Boxplots OMIT THIS PART OF 3-4 Part 2

180 3.1 - 180 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Recap In this section we have discussed:  z Scores  z Scores and unusual values  Quartiles  Percentiles  Converting a percentile to corresponding data values  Other statistics  5-number summary

181 3.1 - 181 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Putting It All Together Always consider certain key factors:  Context of the data  Source of the data  Measures of Center  Sampling Method  Measures of Variation  Outliers  Practical Implications  Changing patterns over time  Conclusions  Distribution


Download ppt "3.1 - 1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1."

Similar presentations


Ads by Google