Presentation on theme: "Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group."— Presentation transcript:
Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group were accidents, 12,958; homicide, 5,793; suicide, 4,146; cancer, 1,583; heart disease, 1,013; congenital defects, 383; AIDS, 276. Make a bar graph to display these data. What additional information do you need to make a pie chart?
Section 1.2 Describing Distributions with Numbers
Specific Ways to Describe Shape, Center and Spread Center: Mean – ordinary arithmetic average. Pronounced “x-bar.” Median – the midpoint of the data set. Denoted M. Σ, pronounced “sigma” means the sum of… In other words, you add up the terms 1 through n.
Bonds vs. Aaron Barry BondsHank Aaron 16401332 19422744 24462639 25494429 25733044 333938 334047 34 4540 374420 3724
Have no fear… Your calculator is here! You can get all this information from your calculator. Type your data in L1 and L2. Stat, 1-Var Stats, L1. Do the same thing for L2.
Compare Centers Find the mean and median of both Bonds’ and Aaron’s home runs. Bonds has a higher average number of home runs, but this average is affected by the extreme value of 73. The median for Aaron is higher than Bonds, indicating that he hit more home runs than Bonds in a typical season.
Resistant and Non-resistant The mean is affected by extreme observations, such as Bonds’ single season record of 73 home runs. It is a non-resistant measure of center. The median, however, is resistant to extreme measures. It is preferable when a data set has outliers.
Think About This Change Bonds’ single season record from 73 home runs to 100 home runs. How is the mean affected? The median? How do the mean and median compare to each other in a symmetric distribution? In a (unimodal) skewed right distribution? In a (unimodal) skewed left distribution?
Introduction to Measures of Spread Today, we’ll learn about quartiles. Oddly enough, they divide a data set into fourths (25% sections). Finding quartiles is like finding the median. You count midpoints, and average the middle two numbers if there are an even number of data points.
A Visual Representation of Quartiles Q1 Lower Quartile 25 th %ile Q2 Median 50 th %ile Q3 Upper Quartile 75 th %ile So, there are really only THREE quartiles, and the middle one isn’t usually called a quartile (it’s called the median). We generally refer to Q1, M, and Q3. 25%
To find Q1, you find the median of the lowest half of data. To find Q3, you find the median of the higher half of the data.
Try it! 16192425 33 34 37 4042464973 Find the Range, Median, Q1, and Q3
Solution 16192425 33 34 37 4042464973 Median = 34 Q1 = 25Q3 = 41 So, the Range is 73 – 16 = 57. This gives us a little information about the variability of Bonds’ home runs in a season. The middle 50% of the data lies between 25 and 41, so we see where the spread of the middle half of the data lies.
Interquartile Range and the Outlier Rule IQR is simply Q3 - Q1. In our Barry Bonds example, IQR = 41 – 25 = 16. The IQR is a suitable measure of spread and is paired with Median. We use the IQR to define what an outlier is. An outlier is any value (or values) that falls more than 1.5*IQR above the upper quartile or below the lower quartile.
“Fences” Think of the 1.5*IQR rule as fences. They draw the boundary line beyond which values are outliers. Is Barry Bonds’ 73 homer season an outlier??? Recall: Q1 = 25; Q3 = 41; IQR = 16 So, 1.5*IQR = 1.5*16 = 24. Add 24 to Q3 and Subtract 24 from Q1: Upper boundary = 24 + 41 = 65 Lower boundary = 25 – 24 = 1 Conclusion: 73 falls above the outlier boundary of 65, so it is an outlier!!!
5 Number Summary The five number summary consists of the lowest value, Q1, the Median, Q3, and the highest value. It is important because we’ll use it to create a new kind of graph: a boxplot (also called a box-and-whiskers plot).
Bonds’ Boxplot Recall his 5 number summary: L = 16; Q1 = 25; M = 34; Q3 = 41; H = 73 10203040506070 Number of home runs in a season
Modified Boxplots Modified boxplots show outliers as isolated points. Bonds’ 73 home run season was an outlier, so the whisker in a modified boxplot only extends to the last data point that was NOT an outlier. Any outlier is shown as a star (*). CAUTION: Many students extend the whisker to the outlier “fence” (i.e. 65) This is WRONG! The whisker should stop at the last actual data point. So tell me – where should the upper whisker end in a modified boxplot of Bonds’ home runs per season??? 49
We can look at these in the calculator as well. Go to StatPlot.
It’s Never Too Soon for a Practice AP Question 2005 AP Statistics Problem #1
Question 1 Part a) Part a) is graded Essentially Correct, Partially Correct, or Incorrect To receive an Essentially Correct, a student must successfully compare center, shape and spread. Specific numeric values are not required. To receive a Partially Correct, a student must successfully compare 2 of the 3 measures of center, shape and spread. All other responses are graded as Incorrect.
Special Notes Compare means you state which is larger. For example, “the mean of the rural students’ daily caloric intake is greater than the mean for the urban students” is a correct comparison. However, stating “the mean of the rural students’ daily caloric intake is 40.45 while the mean for the urban students is 32.6” is not a COMPARISON.
In Conclusion Graders were looking for three comparisons: Center—the mean caloric intake of the rural students is greater than the mean caloric intake of the urban students Spread—the spread of the rural students’ distribution is larger than the spread of the urban students Shape—the rural students’ caloric intakes are roughly symmetric while the urban students’ caloric intakes are skewed right.
There’s More to Spread than IQR Section 1.2 Standard Deviation
Describing Data with Numbers So far, we’ve learned the 5 Number Summary to describe a set of data: Min, Q1, M, Q3, and Max. We’ve also used the mean as another measure of center.
Measuring Spread: Standard Deviation The most commonly used measure of spread is the standard deviation. Standard deviation tells us, on average, how far the observations are away from the mean.
Standard Deviation and Variance Variance is the average of the squares of the deviations of the observations from the mean. WHAT??? But your calculator can tell you all of this!
Properties of Standard Deviation s 2 is called variance. Square root of s 2 is __. s measures spread about the mean and is called standard deviation. s = 0 only when there is NO SPREAD (in other words, all the data values are the same). As the observations become more spread out about their mean, s gets larger. s is not resistant to skewness or outliers. WHY?
Recap Measures of spread: IQR, standard deviation Measures of center: Median, Mean When to use which??? The mean and the std. dev. are not resistant to outliers, so use them only when the distribution is roughly symmetric and there aren’t outliers. Use the 5 Number Summary when the distribution is strongly skewed or has outliers.
How the AP Folks Test Your Ability to Reason How do the following affect the mean? The median? The Std. Dev.? Adding a certain amount to every value in a data set Multiplying each value in a data set by the same number
Homework Day 1: Chapter 1 #40, 41, 45, 50, 52 Day 2: Chapter 1 #63, 91, 94, 96, 101