Presentation on theme: "Describing Distributions With Numbers"— Presentation transcript:
1Describing Distributions With Numbers Section 1.3 cont.(five number summary, boxplots, variance, standard deviation)Target Goal: I can calculate a 5 number summary and construct a boxplot.I can describe spread using the standard deviation of a distribution.Hw: pg 71: 92, 93, 95, 96, 97, 103, 105,
2Five-Number SummaryData set consisting of smallest observation, first quartile, median, third quartile, and largest observation written in order.Min Q1 M Q3 MaxIt gives us a quick summary of both center and spread.
4Box (and whiskers) Plot A graph of a five-number summary of a distribution;best for side- by-side comparisons since they show less detail than histograms or stemplots;drawn either horizontally or vertically.
5Modified BoxplotBecause the regular boxplot conceals outliers we will use modified boxplot.Plots outliers as isolated pointsExtend “whiskers” out to largest and/or smallest data points that are not outliersRemember: label axis, title graph, scale axis.
6Regular (a) and modified (b) boxplots comparing Barry Bonds and Hank Aaron home runs. Min Q1 M Q3 Max Outlier
7Activity: Acing the First Test Enter the scores of Mrs Activity: Acing the First Test Enter the scores of Mrs. Liao’s students on their first statistics test into L1 from page 71, ex. 92 Sort Data(ascending): Inspire Place cursor on column title Select:Menu,1:Actions,6:sort, sort by (a) Inspire: Appendix A6
8Calculator: 1 VAR STAT(L1) 43, 82, 87.75, 93, 98 a. Find the five-number summary and verify your expectation from a. Calculator activityEnter the scores into L1 from page 71.Calculator: 1 VAR STAT(L1)43, 82, 87.75, 93, 98mean = 2544/30 (or )= 84.8the median is greater than the mean
9Between Q1 and Q3: Between 82 and 93 b. What is the range of the middle half of the score of the statistic students?Between Q1 and Q3:Between 82 and 93
10Acing the First Test Cont. Construct by hand a modified boxplot of the stats students scores.First find potential outliers.IQR =Q1 - IQR x 1.5 =Q3 + IQR x 1.5 =Outliers:Graph: Mark a small x for the outlier(s), next lowest min, Q1, M, Q3, max.Draw box and whisker plot.
11Acing the First Test Cont. On your calculator: First define Plot1 to be a modified boxplot using the list. Graph, trace and compare. Is there an outlier? If so, was it the same as in part a ? Based on the boxplot, conjecture the shape of the corresponding histogram.Histogram shape:______________________
12Acing the First Test Cont. Next, Define Plot2 to be a histogram also using the same list. Trace and compare. Did you guess correctly? Roughly draw histogram below.
13Important Note:If a distribution contains outliers, use the median and the IQR to describe the distribution.
14The most common numerical description of a distribution is the : Standard deviation (s):measures spread by looking at how far the observations are from their meanThe standard deviations (s) is the square root of the variance (s2).
15Variance (s2) of a set of observations is the average of the squares of the deviations of the observations from their mean.Note: Most of the time we will use calculator (STAT:CALC:1VAR STAT).
16Why square the deviations? It makes them all non negative so that the observations far from the mean in either direction will have large positive squared deviation.
17Properties of the Standard Deviation The sum of the deviations of the observations from their mean will always be zero.Choose s only when mean is chosen as the measure of center.s = 0 only when there is no spread (all observations have the same value).s, like the mean is not resistant. Strong skewness or a few outliers can make s very large.If a value is more than 2σ’s from the mean it is an outlier.
18Why divide by (n – 1)? Degrees of freedom – Since is the exact balancing point of the data, the data will almost always be closer to , on average, than they will be to μ. The sum of the squared deviations of will underestimate the sum of the squared deviations of µ. To correct this we divide by n-1 instead of n.
19Example: Roger MarisNew York Yankee Roger Maris held the single-season home run record from 1961 until Here are Maris’s home run counts for his 10 years in the American League:
20Maris’s mean number of home runs is = Find the standard deviation s from its definition (by hand).∑ (xi - )2 = ( )2 + ( )2…s2 = / n-1s2 = /9s2 =s =
21b. Use your calculator to verify your results.(STAT:CALC:1 var stat:L1)Then use your calculator to find the mean and s for the 9 observations that remain when you leave out any outlier(s).Recall IQR x 1.5Note: they choose 61 as an outlier while the upper bound is 61.5.
22Mean = 22.2Sx =How does the leaving out the “outlier” affect the values of the mean and s?It caused the values of both measures to decrease.Is s a resistant measure of spread?Clearly, s is not a resistant measure of spread.