Presentation on theme: "Chapter 1: Exploring Data, cont. 1.2 Describing Distributions with Numbers Measuring Center: The Mean Most common measure of center Arithmetic average,"— Presentation transcript:
Chapter 1: Exploring Data, cont. 1.2 Describing Distributions with Numbers Measuring Center: The Mean Most common measure of center Arithmetic average, where is the mean, and n is the number of observations. Note: The mean is not a resistant measure – it is sensitive to the influence of a few extreme observations. A resistant measure is a measure that is not influenced by extreme observations.
Measuring Center: The Median Midpoint of a distribution Half the observations are smaller, the other half larger, than the median Median is a “resistant” measure Comparing the mean and the median: The mean and median of a symmetric distribution are close together. In a skewed distribution, the mean is farther out in the long tail than the median.
Measuring Spread One way to measure spread is to use the range – the difference between the highest and lowest values. Another way is to use the quartiles. The quartiles mark out the middle half of data. The first quartile (Q1) is greater than 25% of the observations. The third quartile (Q3) is greater than 75% of the observations. These are resistant measures. The interquartile range (IQR) is the difference between the third and first quartiles.
Outliers One rule of thumb for identifying outliers: 1.5IQR If an observation falls more than 1.5IQR above Q3 or below Q1, then call it an outlier. The Five-Number Summary An additional way to summarize data. It includes the smallest observation (min), the first quartile (Q1), the median (M or Q2), the third quartile (Q3) and the largest observation (max) written in order from smallest to largest.
A boxplot is a graphical display that is based on the five number summary. It is best used for side by side comparisons of more than one distribution. It conceals outliers. A modified boxplot plots outliers as isolated points. It shows more detail. We will make modified boxplots, even if not specified.
Example p.48 Exercise 1.37 HOW OLD ARE THE PRESIDENTS? Return to the data on presidential ages in Table 1.4 on page 19. In Example 1.6, we constructed a histogram of the age data. (a)From the shape of the histogram (Figure 1.7, page 20), do you expect the mean to be much less than the median, about the same as the median, or much greater than the median? Explain. (b)Find the five-number summary and verify your expectation from (a). (c)What is the range of the middle half of the ages of new presidents? (d)Using the 1.5IQR rule, determine if there are any outliers. (e)Construct by hand a (modified) boxplot of the ages of new presidents.
(a)From the shape of the histogram (Figure 1.7, page 20), do you expect the mean to be much less than the median, about the same as the median, or much greater than the median? Explain. (b) Find the five-number summary and verify your expectation from (a). (c)What is the range of the middle half of the ages of new presidents? (d) Using the 1.5IQR rule, determine if there are any outliers. (e) Construct by hand a (modified) boxplot of the ages of new presidents.
Measuring Spread: The Standard Deviation standard deviation “s”: measures spread by looking at how far the observations are from the mean; it is the square root of the variance variance “s 2 ”: average of the squares of the deviations of the observations from their mean s 2 = s =
Let’s practice calculating standard deviation by hand. p. 52 Exercise 1.40 PHOSPHATE LEVELS The level of various substances in the blood influences our health. Here are measurements of the level of phosphate in the blood of a patient, in milligrams of phosphate per deciliter of blood, made on 6 consecutive visits to a clinic. A graph of only 6 observations gives little information, so we proceed to compute the mean and standard deviation. (a)Find the mean. (b)Find the standard deviation from its definition. That is, find the deviations of each observation from the mean, square the deviations, then obtain the variance and the standard deviation. 126.96.36.199.95.76.4
(b)Find the standard deviation from its definition. That is, find the deviations of each observation from the mean, square the deviations, then obtain the variance and the standard deviation. s 2 = s = xixi 5.6 5.2 4.6 4.9 5.7 6.4 Sum =
The sum of the deviations of the observations from their mean will always be zero. Typically a calculator or computer software is used to calculate the standard deviation. Notice that when calculating variance, you’re dividing by n – 1, not n. This number is called the “degrees of freedom” of the variance or standard deviation. The degrees of freedom is the number of data values you need to know in order to determine the entire set of values. Since the sum of the deviations is 0, you need to know “all but 1” of the values to determine the final value.
Properties of the Standard Deviation (s) s measures spread about the mean and should only be used when the mean is chosen as the measure of center s = 0 when there is NO spread (occurs when all of the observations have the same values) s is not resistant (strong skewness or outliers can make s very large)
Choosing measures of center and spread – Distributions that are skewed or have strong outliers – use the five number summary Symmetric distributions that are free of outliers – use mean and the standard deviation Always plot your data first! Graphs give the best overall picture because numerical measures do not describe the entire shape.
Changing the unit of measurement– Linear Transformation: changes the original variable x into the new variable x new by: x new = a + bx a: shifts x values up/down by the same amount b: changes the size of the unit of measurement *adding a constant amount to each observation does not change the spread or shape of the distribution
Effect of a linear transformation: Multiplying each observation by a positive number b multiplies both the center (mean & median) and measures of spread (s & IQR) by b Adding the same number a (either positive or negative) to each observation adds a to measures of center and to quartiles but does not change s or IQR
p. 56 Exercise 1.44 COCKROACHES! Maria measures the lengths of 5 cockroaches that she finds at school. Here are her results (in inches): (a)Find the mean and standard deviation of Maria’s measurements. (You may use a calculator.) (b) Maria’s science teacher is furious to discover that she has measured the cockroach lengths in inches rather than centimeters. (There are 2.54 cm in 1 inch.) She gives Maria two minutes to report the mean and standard deviation of the 5 cockroaches in centimeters. Maria succeeded. Will you? 188.8.131.52.61.2
Comparing distributions– Side-by-side bar graph (displays similarities and differences within categories)