1Class Session #2 Numerically Summarizing Data Measures of Central TendencyMeasures of DispersionMeasures of Central Tendency and Dispersion from Grouped DataMeasures of Position
2Recall the Definitions Parameter – a descriptive measure of a population(p = parameter = population,usually in Greek letters)Statistic – a descriptive measure of a sample(s = statistic = sample,usually in Roman letters)
3Common “descriptions” ? Average ? – “typical” as described in the news reportsGive some of today’s examplesData distributions’ “characteristics”Shape – look at a picture (histogram)Center – mean, mode, medianSpread – range, variance, std. dev.
4Central Tendency Definitions Arithmetic mean – the sum of all the values of the variable in the data set, divided by the number of observationsPopulation arithmetic mean - computed using all the individuals in the population (“mew” = μ) (≠ micro µ)Sample arithmetic mean – computed using the sample data (“x-bar”)Note: is a statistic, μ is a parameter
5More Central Tendency Defs Median – the value that lies in the middle of the data, when arranged in ascending order (think of the median strip of highway in the middle of the road)Mode – the most frequent observation of the variable in the data set (think “a la mode” in fashion /on top)
6Measures of Dispersion Definitions Range (R) – the difference between the largest data value (maximum) & the smallest data value (minimum)Deviation about the mean – how “spread out” the data is.? for both population and sample variance, the sum of all deviations about the mean equals what ?? the square of a non-zero number is ?
7More Measures of Dispersion Definitions Population Variance – sum of squared deviations about the population mean, divided by the number of observations in the population N (sigma squared)? i.e. population variance is the mean of the ______ _________ ____ __ _________ ___ ?Answer: Population variance is the mean of the squared deviations about the population mean
8More Measures of Dispersion Definitions Sample Variance – sum of the squared deviations about the sample mean, divided by the number of observations minus one (s squared)Degrees of freedom is the “n-1”
9More Measures of Dispersion Definitions Population Standard Deviation – the square root of the population variance (sigma, written as “σ”)Sample Standard Deviation – the square root of the sample variance (s, written as “s”)BTW, later we discover “s” itself is a random variable
10Empirical Rule for Symmetric Data If the distribution is bell shaped:68% of data within 1 std deviations95% of data within 2 std deviations99.7% of data within 3 standard deviations of the meanRule holds for both samples & populations
11Supposing Grouped Data Approximate mean of a variable from a frequency distributionUse the midpoint of each classUse the frequency of each classUse the number of classesPopulation MeanSample Mean
12Supposing Grouped Data Weighted MeanGood to use when certain data values have higher importance (or weight)[Sum of each value of variable times its weight] / [sum of weights]Examples of Grade Point Average (GPA) and mixed nuts pricing
13Supposing Grouped Data Population Variancesum of [(midpoint – mean)2 times frequency] / [sum of frequencies]Sample Varianceas before except “-1” in denominator (the degrees of freedom thing again)
14Supposing Grouped Data Population Standard Deviationtake square root of population varianceSample Standard Deviationtake square root of sample variance
15Measures of Position Definition z-Score – the distance that a data value is from the mean in terms of standard deviations. Equals (data value minus mean) divided by standard deviation]Population z-scoreSample z-score
16Measures of Position Definitions z-score equals [(data value minus mean) divided by standard deviation]Is a "unitless" measureCan be “normalized” to getMean of zeroStandard Deviation of one
17Measures of Position Definitions z-score purpose is to provide a way to "compare apples and oranges"by converting variables with different centers and/or spreadsto variables with the same center (0) and spread (1).
18Measures of Position Definition Percentiles – k th percentile is a set of data divides the lower k% from the upper (1-k)%Divide into 100 parts, so 99 percentiles exist“P sub k”Use to give relative standing of the data
19Measures of Position Definition Quartiles – divides the data into four equal partsFour parts, so three percentiles exist“Q sub one, two, or three”Q2 is the median of the dataQ1 is the median of the lower halfQ3 is the median of the upper half
20Numerical summary of data Five number summariesInterquartile range (Q3 – Q1) is resistant to extreme valuesCompute five number summaryMin value | Q1 | M | Q3 | max value
21Building a Box Plot – part 1 1. Calculate interquartile range (IQR)2. Compute lower & upper fenceLower fence = Q1 – 1.5 (IQR)Upper fence = Q (IQR)3. Draw scale then mark Q1 and Q34. Box in Q1 to Q3 then mark M
22Building a Box Plot – part 2 5. Temporarily mark fences with brackets6. Draw line from Q1 to smallest value inside the lower fence and a line from Q3 to largest value inside the upper fence7. Put * for all values outside of the fences8. Erase brackets
23Distribution based on Boxplot Symmetricmedian near center of boxhorizontal lines about same lengthSkewed Right / Positive Skewmedian towards left of boxright line much longer than left lineSkewed Left / Negative Skewmedian towards right of boxleft line much longer than right line
24Which measure best to report? Symmetric distributionMeanStandard DeviationSkewed distributionMedianInterquartile Range
25Self Quiz When can the mean and the median be about equal? In the 2000 census conducted by the U.S. Census Bureau, two average household incomes were reported: $41,349 and $55,263. One of these averages is the mean and the other is the median. Which is which and why?
26Self QuizThe U.S. Department of Housing and Urban Development (HUD) uses the median to report the average price of a home in the United States.Why do they do that?
27Self QuizA histogram of a set of data indicates that the distribution of the data is skewed right.Which measure of central tendency will be larger, the mean or the median?Why?
28Self Quiz _____ is a descriptive measure of a population If a data set contains 10,000 values arranged in increasing order, where is the median located?Matching: (parameter; statistic)_____ is a descriptive measure of a population_____ is a descriptive measure of a sample.
29Self QuizA data set will always have exactly one mode. (true or false)If the number of observations, n, is odd; then the median, M, is the value calculated by the formula M=(n+1)/2
30Self QuizFind the Sample Mean:20, 13, 4, 8, 1083, 65, 91, 87, 84Find the Population Mean:3, 6, 10, 12, 14
31Self Quiz The median for the given list of six data values is 26.5. 7 , 12 , 21 , , 41 , 50What is the missing value?
32Self QuizThe following data represent the monthly cell phone bill for the cell phone for six randomly selected months.$ $ $39.43$ $ $49.26Compute the mean, median, and mode cell phone bill.
33Self QuizHeather and Bill go to the store to purchase nuts, but can not decide among peanuts, cashews, or almonds. They agree to create a mix. They bought 2.5 pounds of peanuts for $1.30 per pound, 4 pounds of cashews for $4.50 per pound, and 2 pounds of almonds for $3.75 per pound. Determine the price per pound of the mix.