2 Describing the Distribution CenterMedianMeanSpreadRangeInterquartile RangeStandard Deviation
3 Median Literally = middle number (data value) n (number of observations) is oddOrder the data from smallest to largestMedian is the middle number on the list(n+1)/2 number from the smallest valueEx: If n=11, median is the (11+1)/2 = 6th number from the smallest valueEx: If n=37, median is the (37+1)/2 = 19th number from the smallest value
4 Example – August Temps 13 observations High Temperatures for Des Moines, Iowa taken from the first 13 days of August 2005.Remember to order the values, if they aren’t already in order!13 observations(13+1)/2 = 7th observation from the bottomMedian = 90
5 Median n is even Order the data from smallest to largest Median is the average of the two middle numbers(n+1)/2 will be halfway between these two numbersEx: If n=10, (10+1)/2 = 5.5, median is average of 5th and 6th numbers from smallest value
6 Example – Yankees 10 observations (10 + 1)/2 = 5.5, average of 5th and 6th observations from bottomMedian = 5Scores of last 10 gamesRemember to order the values if they aren’t already in order!
7 Mean Ordinary average Formula Add up all observations Divide by the number of observationsFormulan observationsy1, y2, y3, …, yn are the values
9 Example – Vikings (as of 1/9) Find the mean of the (17 values)
10 Example – Colts as of (1/9) Find the mean of the scores (17 values)
11 Mean vs. Median Median = middle number Mean = value where histogram balancesMean and Median similar whenData are symmetricMean and median different whenData are skewedThere are outliers
12 Mean vs. MedianMean influenced by unusually high or unusually low valuesExample: Income in a small town of 6 people$25,000 $27,000 $29,000$35,000 $37,000 $38,000**The mean income is $31,830**The median income is $32,000
13 Mean vs. Median Bill Gates moves to town Mean is pulled by the outlier $25,000 $27,000 $29,000$35,000 $37,000 $38,000 $40,000,000**The mean income is $5,741,571**The median income is $35,000Mean is pulled by the outlierMedian is notMean is not a good center of these data
14 Mean vs. Median Skewness pulls the mean in the direction of the tail Skewed to the right = mean > medianSkewed to the left = mean < medianOutliers pull the mean in their directionLarge outlier = mean > medianSmall outlier = mean < median
15 Weighted Mean Used when values are not equally represented.
16 Example (weighted mean) A recent survey of new diet cola reported the following percentages of people who liked the taste. Find the weighted mean of the percentages.Area% FavoredNumber surveyed14010002303000350800
18 Spread Range is a very basic measure of spread (Max – Min). It is highly affected by outliersMakes spread appear larger than realityEx. The annual numbers of deaths from tornadoes in the U.S. from 1990 to 2000:Range with outlier: 130 – 25 = 105Range without outlier: 94 – 25 = 69
19 Spread Interquartile Range (IQR) IQR = Q3 – Q1 First Quartile (Q1) 25th PercentileThird Quartile (Q3)75th PercentileIQR = Q3 – Q1Center (Middle) 50% of the values
20 Finding Quartiles Order the data Split into two halves at the median When n is odd, include the median in both halvesWhen n is even, do not include the median in either halfQ1 = median of the lower halfQ3 = median of the upper half
21 Top 15 Populations US Cities 2004 New York, N.Y.810Los Angeles, Calif.385Chicago, Ill.286Houston, Tex.201Philadelphia, Pa.147Phoenix, Ariz.142San Diego, Calif.126San Antonio, Tex.124Dallas, Tex.121San Jose, Calif.90Detroit, Mich.Indianapolis, Ind.78Jacksonville, Fla.San Francisco, Calif.74* Populations were all divided by 10,000.
22 Example – Top City Populations Order the values (14 values)Lower Half = Q1 = Median of lower half = 90Upper Half =Q3 = Median of upper half = 201IQR = Q3 – Q1 = = 111
23 August High Temps (8/1–8/13) Order the values (13 values)Lower Half =Q1 = Median of lower half = 81Upper Half =Q3 = Median of upper half = 93IQR = Q3 – Q1 = = 12
24 August High Temps (8/14–8/25) Order the values (12 values)Lower Half =Q1 = Median of lower half = 78Upper Half =Q3 = Median of upper half = 87IQR = Q3 – Q1 = = 9
26 Examples Vikings (as of 1/9) Colts (as of 1/9) Min = 13 Q1 = 20 Median = 27Q3 = 31Max = 38Colts (as of 1/9)Min = 14Q1 = 24Median = 34Q3 = 41Max = 51
27 Graph of Five Number Summary BoxplotBox between Q1 and Q3Line in the box marks the medianLines extend out to minimum and maximumBest used for comparisonsUse this simpler method
28 Example – Vikings & Colts Boxplot of Vikings scoresBox from 20 to 31Line in box 27Lines extend out from box from 14 and 38Boxplot of Colts scoresBox from 24 to 41Line in box at 34Lines extend out from box to 14 and 51
29 Side by Side Boxplots of Vikings Scores and Colts Scores
30 Spread Standard deviation “Average” spread from mean Most common measure of spreadDenoted by letter sMake a table when calculating by hand
32 Example – Deaths from Tornadoes 53=-3.2710.6939=298.2533=541.4969= 12.73162.0530=690.1125=977.8167= 10.73115.13130= 73.7394= 37.7340=264.71
33 Example - VikingsFind the standard deviation of the scores of Vikings games given the following statistic:
34 Properties of ss = 0 only when all observations are equal; otherwise, s > 0s has the same units as the datas is not resistantSkewness and outliers affect s, just like meanTornado Example:s with outlier: 31.97s without outlier:
35 Which summaries should you use with different distributions? The appropriate measures of center and spread when your distribution is symmetric are:MeanStandard deviationThe appropriate measures of center and spread when your distribution is skewed are:MedianIQR
36 Comparing VarianceWhen comparing the variance for two sets of numbers find the coefficient of variation:Formula = Cvar = =Then compare the percentages.
37 Standardizing (first look) I got a 85 on my English test and you got a 36 on your Spanish test. Who did better?How can we compare things that come from different scales?StandardizingUse z formula (called z-score)
38 Standardizing Z=standardized score X = raw score X-bar = mean of raw scoresS = sample standard deviationSo what does this mean for our test scores?
39 StandardizingI got a 85 on my English test and you got a 35 on your Spanish test. Who did better?Now I need to give you more information.The English class’s tests had a mean of 83 and a standard deviation of 3.The Spanish tests had a mean of 30 and a standard deviation of 2.
41 Comparing Standardized Scores I scored .667 standard deviations above the mean on my English test where you scored 2.5 standard deviations above the mean on your Spanish test.Comparatively you scored better on your exam.