Download presentation

Presentation is loading. Please wait.

Published byBrendan Buck Modified about 1 year ago

1
5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions? 2.Which type of graph can our calculators do (bar or histogram)? 3.How many classes should a histogram have? 4.What needs to be looked for in time-series graphs? 5.What is the major difference between a histogram and a stem- plot? 6.Name a possible graphical error in a histogram Shape, Outliers, Center, Spread (SOCS) histogram classes = square root (number of observations) seasonal trends histogram summarizes the data stem-plot maintains the data overlapping categories

2
Lesson Describing Quantitative Data with Numbers adapted from Mr. Molesky’s TPS 4E slides

3
Objectives Calculate and interpret measures of center (mean, median, mode) Calculate and interpret measures of spread (IQR, standard deviation, range) Identify outliers using the 1.5 x IQR rule Make a boxplot Select appropriate measures of center and spread Use appropriate graphs and numerical summaries to compare distributions of quantitative variables

4
Vocabulary Boxplot – graphs the five number summary and any outliers Degrees of freedom – the number of independent pieces of information that are included in your measurement Five-number summary – the minimum, Q1, Median, Q3, maximum Interquartile range – the range of the middle 50% of the data; (IQR) – IQR = Q3 – Q1 Mean – the average value (balance point); x-bar Median – the middle value (in an ordered list); M Mode – the most frequent data value

5
Vocabulary cont Outlier – a data value that lies outside the interval [Q1 – 1.5 IQR, Q IQR] P th percentile – p percent of the observations (in an ordered list) fall below at or below this number Quartile – multiples of 25 th percentile (Q1 – 25 th ; Q2 –50 th or median; Q3 – 75 th ) Range – difference between the largest and smallest observations Resistant measure – a measure (statistic or parameter) that is not sensitive to the influence of extreme observations Standard Deviation– the square root of the variance Variance – the average of the squares of the deviations from the mean

6
Measures of Center Numerical descriptions of distributions begin with a measure of its “center” If you could summarize the data with one number, what would it be? Mean: The “average” value of a dataset Median: The “middle” value of an ordered dataset 1.Arrange observations in order min to max 2.Locate the middle observation, average if needed

7
Mean vs Median The mean and the median are the most common measures of center If a distribution is perfectly symmetric, the mean and the median are the same The mean is not resistant to outliers The mode, the data value that occurs the most often, is a common measure of center for categorical data You must decide which number is the most appropriate description of the center... MeanMedian Applet Use the mean on symmetric data and the median on skewed data or data with outliers

8
Skewed Left: (tail to the left) Mean substantially smaller than median (tail pulls mean toward it) Mean < Median < Mode Mode Median Mean Distributions Parameters

9
Symmetric: Mean roughly equal to median Mean ≈ Median ≈ Mode Mode Median Mean Distributions Parameters

10
Skewed Right: (tail to the right) Mean substantially greater than median (tail pulls mean toward it) Mean > Median > Mode Mode Median Mean Distributions Parameters

11
Central Measures Comparisons Measure of Central Tendency ComputationInterpretationWhen to use Mean μ = (∑x i ) / N x‾ = (∑x i ) / n Center of gravity Data are quantitative and frequency distribution is roughly symmetric Median Arrange data in ascending order and divide the data set into half Divides into bottom 50% and top 50% Data are quantitative and frequency distribution is skewed Mode Tally data to determine most frequent observation Most frequent observation Data are categorical or the most frequent observation is the desired measure of central tendency

12
Measuring Center: Example 1 Use the data below to calculate the mean and median of the commuting times (in minutes) of 20 randomly selected New York workers. Example, page Key: 4|5 represents a New York worker who reported a 45- minute travel time to work.

13
Example 2 Which of the following measures of central tendency resistant? 1.Mean 2.Median 3.Mode Not resistant Resistant

14
Example 3 Given the following set of data: 70, 56, 48, 48, 53, 52, 66, 48, 36, 49, 28, 35, 58, 62, 45, 60, 38, 73, 45, 51, 56, 51, 46, 39, 56, 32, 44, 60, 51, 44, 63, 50, 46, 69, 53, 70, 33, 54, 55, 52 What is the mean? What is the median? What is the mode? What is the shape of the distribution? , 51, 56 Symmetric (tri-modal)

15
Example 4 Given the following types of data and sample sizes, list the measure of central tendency you would use and explain why? Sample of 50 Sample of 200 Hair color Height Weight Parent’s Income Number of Siblings Age Does sample size affect your decision? mode mean median mean Not in this case, but the larger the sample size, might allow use to use the mean vs the median

16
Day 1 Summary and Homework Summary –Three characteristics must be used to describe distributions (from histograms or similar charts) Shape (uniform, symmetric, bi-modal, etc) Outliers (rule next lesson) Center (mean, median, mode measures) Spread (IQR, variance – next lesson) –Median is resistant to outliers; mean is not! –Use Mean for symmetric data –Use Median for skewed data (or data with outliers) –Use Mode for categorical data Homework –pg 70-74; prob 79, 81, 83, 87, 89

17
5-Minute Check on Lesson 1-3a Click the mouse button or press the Space Bar to display the answers. 1.What are the two quantitative measures of center? 2.When do we use one versus the other? 3.Which one is resistant to outliers? 4.Which measure of center is used for qualitative data? 5.Find the mean, median and mode of the following data set: 7, 15, 4, 8, 16, 17, 2, 5, 11, 8, 12, 6 Mean and median Mean for symmetric data and median for skewed Median Mode Mean: 9.25 Median: 8 Mode: 8

18
Measures of Spread Variability is the key to Statistics. Without variability, there would be no need for the subject. When describing data, never rely on center alone. Measures of Spread: Range - {rarely used... why?} Quartiles - InterQuartile Range {IQR=Q3-Q1} Variance and Standard Deviation {var and s x } Like Measures of Center, you must choose the most appropriate measure of spread.

19
Standard Deviation Another common measure of spread is the Standard Deviation: a measure of the “average” deviation of all observations from the mean. To calculate Standard Deviation: Calculate the mean. Determine each observation’s deviation (x - xbar). “Average” the squared-deviations by dividing the total squared deviation by (n-1). This quantity is the Variance. Square root the result to determine the Standard Deviation.

20
Standard Deviation Properties s measures spread about the mean and should be used only when the mean is used as the measure of center s = 0 only when there is no spread/variability. This happens only when all observations have the same value. Otherwise, s > 0. As the observations become more spread out about their mean, s gets larger s, like the mean x-bar, is not resistant. A few outliers can make s very large

21
Standard Deviation Variance: Standard Deviation: Example 1.16 (p.85 of TPS 3E): Metabolic Rates

22
Standard Deviation x(x - x)(x - x) Totals: Metabolic Rates: mean=1600 Total Squared Deviation Variance var=214870/6 var= Standard Deviation s=√ s= cal What does this value, s, mean?

23
The Interquartile Range (IQR) –A measure of center alone can be misleading. –A useful numerical description of a distribution requires both a measure of center and a measure of spread. To calculate the quartiles: 1)Arrange the observations in increasing order and locate the median M. 2)The first quartile Q 1 is the median of the observations located to the left of the median in the ordered list. 3)The third quartile Q 3 is the median of the observations located to the right of the median in the ordered list. The interquartile range (IQR) is defined as: IQR = Q 3 – Q 1 How to Calculate the Quartiles and the Interquartile Range

24
Quartiles Quartiles Q1 and Q3 represent the 25th and 75th percentiles. To find them, order data from min to max. Determine the median - average if necessary. The first quartile is the middle of the ‘bottom half’. The third quartile is the middle of the ‘top half’ medQ3=29.5 Q1=23 med=79Q1Q3

25
Example 1 Which of the following measures of spread are resistant? 1.Range 2.Variance 3.Standard Deviation 4.Interquartile Range (IQR) Not Resistant Resistant

26
Example 2 Travel times to work for 20 randomly selected New Yorkers Example, page M = 22.5 Q 3 = 42.5 Q 1 = 15 IQR= Q 3 – Q 1 = 42.5 – 15 = 27.5 minutes Interpretation: The range of the middle half of travel times for the New Yorkers in the sample is 27.5 minutes.

27
Determining Outliers InterQuartile Range “IQR”: Distance between Q1 and Q3. Resistant measure of spread...only measures middle 50% of data. IQR = Q3 - Q1 {width of the “box” in a boxplot} 1.5 IQR Rule: If an observation falls more than 1.5 IQRs above Q3 or below Q1, it is an outlier. “1.5 IQR Rule” Why 1.5? According to John Tukey, 1 IQR seemed like too little and 2 IQRs seemed like too much...

28
Outliers: 1.5 IQR Rule To determine outliers: 1.Find 5 Number Summary 2.Determine IQR 3.Multiply 1.5 IQR 4.Set up “fences” A.Lower Fence: Q1 - (1.5 IQR) B.Upper Fence: Q3 + (1.5 IQR) 5.Observations “outside” the fences are outliers.

29
Example 2 part 2 In addition to serving as a measure of spread, the interquartile range (IQR) is used as part of a rule of thumb for identifying outliers. Definition: The 1.5 x IQR Rule for Outliers Call an observation an outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile. Example, page 57 In the New York travel time data, we found Q 1 =15 minutes, Q 3 =42.5 minutes, and IQR=27.5 minutes. For these data, 1.5 x IQR = 1.5(27.5) = Q x IQR = 15 – = Q x IQR = = Any travel time shorter than minutes or longer than minutes is considered an outlier

30
5-Number Summary, Boxplots The 5 Number Summary provides a reasonably complete description of the center and spread of distribution We can visualize the 5 Number Summary with a boxplot. MINQ1MEDQ3MAX min=45Q1=74med=79Q3=91max= Quiz Scores Outlier?

31
Drawing a Boxplot The five-number summary divides the distribution roughly into quarters. This leads to a new way to display quantitative data, the boxplot. Draw and label a number line that includes the range of the distribution. Draw a central box from Q 1 to Q 3. Note the median M inside the box. Extend lines (whiskers) from the box out to the minimum and maximum values that are not outliers

32
Example 2 part 3 Boxplot M = 22.5 Q 3 = 42.5 Q 1 = 15 Min= Max=85 Recall, this is an outlier by the 1.5 x IQR rule Max=85 Recall, this is an outlier by the 1.5 x IQR rule

33
Example 3 Consumer Reports did a study of ice cream bars (sigh, only vanilla flavored) in their August 1989 issue. Twenty-seven bars having a taste-test rating of at least “fair” were listed, and calories per bar was included. Calories vary quite a bit partly because bars are not of uniform size. Just how many calories should an ice cream bar contain? Construct a boxplot for the data above

34
Example 3 - Answer Q1 = 182Q2 = 221.5Q3 = 319 Min = 111Max = 439Range = 328 IQR = 137UF = 524.5LF = Calories

35
Example 4 The weights of 20 randomly selected juniors at MSHS are recorded below: a) Construct a boxplot of the data b) Determine if there are any mild or extreme outliers c) Comment on the distribution

36
Example 4 - Answer Q1 = 130.5Q2 = 138Q3 = Min = 121Max = 213Range = 92 IQR = 15UF = 168 LF = 108 Mean = StDev = Weight (lbs) * * Extreme Outliers ( > 3 IQR from Q3) Shape: somewhat symmetric Outliers: 2 extreme outliers Center: Median = 138Spread: IQR = 15

37
Example 5 Consider the following test scores for a small class: Plot the data and describe the SOCS: Why use median describes the “center”? Why use IQR to describes the “spread’? Shape? Outliers? Center? Spread? skewed left maybe 45 M = 79 IQR = 91-74=17 data skewed

38
Choosing Measures of Center & Spread We now have a choice between two descriptions for center and spread –Mean and Standard Deviation –Median and Interquartile Range The median and IQR are usually better than the mean and standard deviation for describing a skewed distribution or a distribution with outliers. Use mean and standard deviation only for reasonably symmetric distributions that don’t have outliers. NOTE: Numerical summaries do not fully describe the shape of a distribution. ALWAYS PLOT YOUR DATA! Choosing Measures of Center and Spread

39
Using the TI-83 Enter the test data into List, L1 –STAT, EDIT enter data into L1 Calculate 5 Number Summary –Hit STAT go over to CALC and select 1-Var Stats and hitt 2 nd 1 (L1) Use 2 nd Y= (STAT PLOT) to graph the box plot –Turn plot1 ON –Select BOX PLOT (4 th option, first in second row) –Xlist: L1 –Freq: 1 –Hit ZOOM 9:ZoomStat to graph the box plot Copy graph with appropriate labels and titles

40
Day 2 Summary and Homework Summary –Sample variance is found by dividing by (n – 1) to keep it an unbiased (since we estimate the population mean, μ, by using the sample mean, x-bar) estimator of population variance –The larger the standard deviation, the more dispersion the distribution has –Boxplots can be used to check outliers and distributions –Use comparative boxplots for two datasets –Identifying a distribution from boxplots or histograms is subjective! –Use standard deviation with mean and IQR with median Homework –pg 82: prob 33; pg 89 probs 40, 41; pg 97 probs 45, 46

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google