BPS - 5th Ed. Chapter 22 Numerical Summaries u Center of the data –mean –median u Variation –range –quartiles (interquartile range) –variance –standard deviation
BPS - 5th Ed. Chapter 23 Mean or Average u Traditional measure of center u Sum the values and divide by the number of values
BPS - 5th Ed. Chapter 24 Median (M) u A resistant measure of the data’s center u At least half of the ordered values are less than or equal to the median value u At least half of the ordered values are greater than or equal to the median value u If n is odd, the median is the middle ordered value u If n is even, the median is the average of the two middle ordered values
BPS - 5th Ed. Chapter 25 Comparing the Mean & Median u The mean and median of data from a symmetric distribution should be close together. The actual (true) mean and median of a symmetric distribution are exactly the same. u In a skewed distribution, the mean is farther out in the long tail than is the median [the mean is ‘pulled’ in the direction of the possible outlier(s)].
BPS - 5th Ed. Chapter 26 Question A recent newspaper article in California said that the median price of single-family homes sold in the past year in the local area was $136,000 and the mean price was $149,160. Which do you think is more useful to someone considering the purchase of a home, the median or the mean?
BPS - 5th Ed. Chapter 27 Spread, or Variability u If all values are the same, then they all equal the mean. There is no variability. u Variability exists when some values are different from (above or below) the mean. u We will discuss the following measures of spread: range, quartiles, variance, and standard deviation
BPS - 5th Ed. Chapter 28 Range u One way to measure spread is to give the smallest (minimum) and largest (maximum) values in the data set; Range = max min u The range is strongly affected by outliers
BPS - 5th Ed. Chapter 29 Quartiles u Three numbers which divide the ordered data into four equal sized groups. u Q 1 has 25% of the data below it. u Q 2 has 50% of the data below it. (Median) u Q 3 has 75% of the data below it.
BPS - 5th Ed. Chapter 210 Boxplot u Central box spans Q 1 and Q 3. u A line in the box marks the median M. u Lines extend from the box out to the minimum and maximum.
BPS - 5th Ed. Chapter 211 Example from Text: Boxplots
BPS - 5th Ed. Chapter 212 Identifying Outliers u The central box of a boxplot spans Q 1 and Q 3 ; recall that this distance is the Interquartile Range (IQR). u We call an observation a suspected outlier if it falls more than 1.5 IQR above the third quartile or below the first quartile.
BPS - 5th Ed. Chapter 213 Variance and Standard Deviation u Recall that variability exists when some values are different from (above or below) the mean. u Each data value has an associated deviation from the mean:
BPS - 5th Ed. Chapter 214 Deviations u what is a typical deviation from the mean? (standard deviation) u small values of this typical deviation indicate small variability in the data u large values of this typical deviation indicate large variability in the data
BPS - 5th Ed. Chapter 215 Variance u Find the mean u Find the deviation of each value from the mean u Square the deviations u Sum the squared deviations u Divide the sum by n-1 (gives typical squared deviation from mean)
BPS - 5th Ed. Chapter 216 Choosing a Summary u Outliers affect the values of the mean and standard deviation. u The five-number summary should be used to describe center and spread for skewed distributions, or when outliers are present. u Use the mean and standard deviation for reasonably symmetric distributions that are free of outliers.