Presentation on theme: "Describing Distributions with Numbers"— Presentation transcript:
1 Describing Distributions with Numbers Chapter 1 Section 1.2Describing Distributions with Numbers
2 Parameter -Fixed value about a populationTypical unknown
3 Value calculated from a sample Statistic -Value calculated from a sample
4 Measures of Central Tendency parameterMean - the arithmetic averageUse m to represent a population meanUse to represent a sample meanstatisticThis is on the formula sheet, so you do not have to memorize it.Formula:S is the capital Greek letter sigma – it means to sum the values that follow
5 Measures of Central Tendency Median - the middle of the data; 50th percentileObservations must be in numerical orderIs the middle single value if n is oddThe average of the middle two values if n is evenNOTE: n denotes the sample size
6 Measures of Central Tendency Mode – the observation that occurs the most oftenCan be more than one modeIf all values occur only once – there is no modeNot used as often as mean & median
7 Measures of Central Tendency Range-The difference between the largest and smallest observations.This is only one number! Not 3-8 but 5
8 2 3 4 8 12 The median is 4 lollipops! Suppose we are interested in the number of lollipops that are bought at a certain store. A sample of 5 customers buys the following number of lollipops. Find the median.The median is 4 lollipops!The numbers are in order & n is odd – so find the middle observation.
9 Suppose we have sample of 6 customers that buy the following number of lollipops. The median is … The numbers are in order & n is even – so find the middle two observations.The median is 5 lollipops!Now, average these two values.5
10 Suppose we have sample of 6 customers that buy the following number of lollipops. Find the mean. To find the mean number of lollipops add the observations and divide by n.
11 2 3 4 6 8 20 5 7.17 The median is . . . The mean is . . . What would happen to the median & mean if the 12 lollipops were 20?5The median is . . .7.17The mean is . . .What happened?
12 2 3 4 6 8 50 5 12.17 The median is . . . The mean is . . . What would happen to the median & mean if the 20 lollipops were 50?5The median is . . .12.17The mean is . . .What happened?
13 Resistant - YES NO Statistics that are not affected by outliers Is the median resistant?YESIs the mean resistant?NO
14 YES Look at the following data set. Find the mean. Now find how each observation deviates from the mean.What is the sum of the deviations from the mean?Will this sum always equal zero?This is the deviation from the mean.YES
15 Look at the following data set. Find the mean & median. 2727Use scale of 2 on graphCreate a histogram with the data. (use x-scale of 2) Then find the mean and median.Look at the placement of the mean and median in this symmetrical distribution.
16 Look at the following data set. Find the mean & median. 28.17625Use scale of 2 on graphCreate a histogram with the data. (use x-scale of 8) Then find the mean and median.Look at the placement of the mean and median in this right skewed distribution.
17 Create a histogram with the data. Then find the mean and median. Look at the following data set. Find the mean & median.Mean =Median =54.58858Use scale of 2 on graphCreate a histogram with the data. Then find the mean and median.Look at the placement of the mean and median in this skewed left distribution.
19 Recap: In a symmetrical distribution, the mean and median are equal. In a skewed distribution, the mean is pulled in the direction of the skewness.In a symmetrical distribution, you should report the mean!In a skewed distribution, the median should be reported as the measure of center!
20 QuartilesArrange the observations in increasing order and locate the median M in the ordered list of observations.The first quartile Q1 is the median of the 1st half of the observationsThe third quartile Q3 is the median of the2nd half of the observations.
22 What if there is odd number? medianWhen dividing data in half, forget about the middle number
23 The interquartile range (IQR) The distance between the first and third quartiles.IQR = Q3 – Q1Always positive
24 Outlier: Q1=25 Q3=41 IQR=41-25=16 25 - 1.5 x 16 = 1 41 + 1.5 x 16 = 65 We call an observation an outlier if it falls more than 1.5 x IQR above the third or below the first.Let’s look back at the same data:Q1=25Q3=41IQR=41-25=16x 16 = 1x 16 = 65Lower CutoffUpper Cutoff
25 Since 73 is above the upper cutoff, we will call it an outlier.
27 If you plot these five numbers on a graph, we have a ……… Boxplot
28 Advantage boxplots? ease of construction convenient handling of outliersconstruction is not subjective (like histograms)Used with medium or large size data sets (n > 10)useful for comparative displays
29 Disadvantage of boxplots does not retain the individual observationsshould not be used with small data sets (n < 10)
30 How to construct find five-number summary Min Q1 Med Q3 Max draw box from Q1 to Q3draw median as center line in the boxextend whiskers to min & max
31 ALWAYS use modified boxplots in this class!!! display outliersfences mark off the outlierswhiskers extend to largest (smallest) data value inside the fenceALWAYS use modified boxplots in this class!!!
32 Modified Boxplot Q1 – 1.5IQR Q3 + 1.5IQR Interquartile Range (IQR) – is the range (length) of the boxQ3 - Q1Q1 – 1.5IQRQ IQRThese are called the fences and should not be seen.Any observation outside this fence is an outlier! Put a dot for the outliers.
33 Modified Boxplot . . .Draw the “whisker” from the quartiles to the observation that is within the fence!
34 A report from the U.S. Department of Justice gave the following percent increase in federal prison populations in 20 northeastern & mid-western states in 1999.Create a modified boxplot. Describe the distribution.Use the calculator to create a modified boxplot.
35 Evidence suggests that a high indoor radon concentration might be linked to the development of childhood cancers. The data that follows is the radon concentration in two different samples of houses. The first sample consisted of houses in which a child was diagnosed with cancer. Houses in the second sample had no recorded cases of childhood cancer.(see data on note page)Create parallel boxplots. Compare the distributions.
36 CancerNo Cancer100200RadonThe median radon concentration for the no cancer group is lower than the median for the cancer group. The range of the cancer group is larger than the range for the no cancer group. Both distributions are skewed right. The cancer group has outliers at 39, 45, 57, and The no cancer group has outliers at 55 and 85.
50 Linear transformation rule When multiplying or adding a constant to a random variable, the mean and median changes by both.When multiplying or adding a constant to a random variable, the standard deviation changes only by multiplication.Formulas:
51 An appliance repair shop charges a $30 service call to go to a home for a repair. It also charges $25 per hour for labor. From past history, the average length of repairs is 1 hour 15 minutes (1.25 hours) with standard deviation of 20 minutes (1/3 hour). Including the charge for the service call, what is the mean and standard deviation for the charges for labor?
52 Rules for Combining two variables To find the mean for the sum (or difference), add (or subtract) the two meansTo find the standard deviation of the sum (or differences), ALWAYS add the variances, then take the square root.Formulas:If variables are independent
53 Bicycles arrive at a bike shop in boxes Bicycles arrive at a bike shop in boxes. Before they can be sold, they must be unpacked, assembled, and tuned (lubricated, adjusted, etc.). Based on past experience, the times for each setup phase are independent with the following means & standard deviations (in minutes). What are the mean and standard deviation for the total bicycle setup times?PhaseMeanSDUnpacking3.50.7Assembly21.82.4Tuning12.32.7