# Click the mouse button or press the Space Bar to display the answers.

## Presentation on theme: "Click the mouse button or press the Space Bar to display the answers."— Presentation transcript:

Click the mouse button or press the Space Bar to display the answers.
5-Minute Check on Lesson 1-2 What 4 terms are used to describe data sets or distributions? Which type of graph can our calculators do (bar or histogram)? How many classes should a histogram have? What needs to be looked for in time-series graphs? What is the major difference between a histogram and a stem- plot? Name a possible graphical error in a histogram Shape, Outliers, Center, Spread (SOCS) histogram classes = square root (number of observations) seasonal trends histogram summarizes the data stem-plot maintains the data overlapping categories Click the mouse button or press the Space Bar to display the answers.

Describing Quantitative Data with Numbers
Lesson 1 - 3 Describing Quantitative Data with Numbers adapted from Mr. Molesky’s TPS 4E slides

Objectives Calculate and interpret measures of center (mean, median, mode) Calculate and interpret measures of spread (IQR, standard deviation, range) Identify outliers using the 1.5 x IQR rule Make a boxplot Select appropriate measures of center and spread Use appropriate graphs and numerical summaries to compare distributions of quantitative variables

Vocabulary Boxplot – graphs the five number summary and any outliers
Degrees of freedom – the number of independent pieces of information that are included in your measurement Five-number summary – the minimum, Q1, Median, Q3, maximum Interquartile range – the range of the middle 50% of the data; (IQR) – IQR = Q3 – Q1 Mean – the average value (balance point); x-bar Median – the middle value (in an ordered list); M Mode – the most frequent data value

Vocabulary cont Outlier – a data value that lies outside the interval [Q1 – 1.5  IQR, Q  IQR] Pth percentile – p percent of the observations (in an ordered list) fall below at or below this number Quartile – multiples of 25th percentile (Q1 – 25th; Q2 –50th or median; Q3 – 75th) Range – difference between the largest and smallest observations Resistant measure – a measure (statistic or parameter) that is not sensitive to the influence of extreme observations Standard Deviation– the square root of the variance Variance – the average of the squares of the deviations from the mean

Measures of Center Mean: The “average” value of a dataset
Numerical descriptions of distributions begin with a measure of its “center” If you could summarize the data with one number, what would it be? Mean: The “average” value of a dataset Median: The “middle” value of an ordered dataset Arrange observations in order min to max Locate the middle observation, average if needed

Mean vs Median The mean and the median are the most common measures of center If a distribution is perfectly symmetric, the mean and the median are the same The mean is not resistant to outliers The mode, the data value that occurs the most often, is a common measure of center for categorical data You must decide which number is the most appropriate description of the center... MeanMedian Applet Use the mean on symmetric data and the median on skewed data or data with outliers

Distributions Parameters
Median Mean Mode Mean < Median < Mode Skewed Left: (tail to the left) Mean substantially smaller than median (tail pulls mean toward it)

Distributions Parameters
Mode Median Mean Mean ≈ Median ≈ Mode Symmetric: Mean roughly equal to median

Distributions Parameters
Median Mode Mean Mean > Median > Mode Skewed Right: (tail to the right) Mean substantially greater than median (tail pulls mean toward it)

Central Measures Comparisons
Measure of Central Tendency Computation Interpretation When to use Mean μ = (∑xi ) / N x‾ = (∑xi) / n Center of gravity Data are quantitative and frequency distribution is roughly symmetric Median Arrange data in ascending order and divide the data set into half Divides into bottom 50% and top 50% Data are quantitative and frequency distribution is skewed Mode Tally data to determine most frequent observation Most frequent observation Data are categorical or the most frequent observation is the desired measure of central tendency

Measuring Center: Example 1
Use the data below to calculate the mean and median of the commuting times (in minutes) of 20 randomly selected New York workers. Example, page 53 10 30 5 25 40 20 15 85 65 60 45 0 5 3 00 5 7 8 5 Key: 4|5 represents a New York worker who reported a 45- minute travel time to work.

Example 2 Which of the following measures of central tendency resistant? Mean Median Mode Not resistant Resistant Resistant

Symmetric (tri-modal)
Example 3 Given the following set of data: 70, 56, 48, 48, 53, 52, 66, 48, 36, 49, 28, 35, 58, 62, 45, 60, 38, 73, 45, 51, 56, 51, 46, 39, 56, 32, 44, 60, 51, 44, 63, 50, 46, 69, 53, 70, 33, 54, 55, 52 What is the mean? What is the median? What is the mode? What is the shape of the distribution? 51.125 51 48, 51, 56 Symmetric (tri-modal)

Example 4 Given the following types of data and sample sizes, list the measure of central tendency you would use and explain why? Sample of Sample of 200 Hair color Height Weight Parent’s Income Number of Siblings Age Does sample size affect your decision? mode mode mean mean mean mean median median mean mean mean mean Not in this case, but the larger the sample size, might allow use to use the mean vs the median

Day 1 Summary and Homework
Three characteristics must be used to describe distributions (from histograms or similar charts) Shape (uniform, symmetric, bi-modal, etc) Outliers (rule next lesson) Center (mean, median, mode measures) Spread (IQR, variance – next lesson) Median is resistant to outliers; mean is not! Use Mean for symmetric data Use Median for skewed data (or data with outliers) Use Mode for categorical data Homework pg 70-74; prob 79, 81, 83, 87, 89

Click the mouse button or press the Space Bar to display the answers.
5-Minute Check on Lesson 1-3a What are the two quantitative measures of center? When do we use one versus the other? Which one is resistant to outliers? Which measure of center is used for qualitative data? Find the mean, median and mode of the following data set: , 15, 4, 8, 16, 17, 2, 5, 11, 8, 12, 6 Mean and median Mean for symmetric data and median for skewed Median Mode Mean: Median: Mode: Click the mouse button or press the Space Bar to display the answers.

Measures of Spread Variability is the key to Statistics. Without variability, there would be no need for the subject. When describing data, never rely on center alone. Measures of Spread: Range - {rarely used ... why?} Quartiles - InterQuartile Range {IQR=Q3-Q1} Variance and Standard Deviation {var and sx} Like Measures of Center, you must choose the most appropriate measure of spread.

Standard Deviation Another common measure of spread is the Standard Deviation: a measure of the “average” deviation of all observations from the mean. To calculate Standard Deviation: Calculate the mean. Determine each observation’s deviation (x - xbar). “Average” the squared-deviations by dividing the total squared deviation by (n-1). This quantity is the Variance. Square root the result to determine the Standard Deviation.

Standard Deviation Properties
s measures spread about the mean and should be used only when the mean is used as the measure of center s = 0 only when there is no spread/variability. This happens only when all observations have the same value. Otherwise, s > 0. As the observations become more spread out about their mean, s gets larger s, like the mean x-bar, is not resistant. A few outliers can make s very large

Standard Deviation Variance: Standard Deviation: 1792 1666 1362 1614
Example 1.16 (p.85 of TPS 3E): Metabolic Rates 1792 1666 1362 1614 1460 1867 1439

Total Squared Deviation
Standard Deviation 1792 1666 1362 1614 1460 1867 1439 Metabolic Rates: mean=1600 x (x - x) (x - x)2 1792 192 36864 1666 66 4356 1362 -238 56644 1614 14 196 1460 -140 19600 1867 267 71289 1439 -161 25921 Totals: 214870 Total Squared Deviation 214870 Variance var=214870/6 var= Standard Deviation s=√ s= cal What does this value, s, mean?

The Interquartile Range (IQR)
A measure of center alone can be misleading. A useful numerical description of a distribution requires both a measure of center and a measure of spread. How to Calculate the Quartiles and the Interquartile Range To calculate the quartiles: Arrange the observations in increasing order and locate the median M. The first quartile Q1 is the median of the observations located to the left of the median in the ordered list. The third quartile Q3 is the median of the observations located to the right of the median in the ordered list. The interquartile range (IQR) is defined as: IQR = Q3 – Q1

Quartiles Quartiles Q1 and Q3 represent the 25th and 75th percentiles.
To find them, order data from min to max. Determine the median - average if necessary. The first quartile is the middle of the ‘bottom half’. The third quartile is the middle of the ‘top half’. 19 22 23 26 27 28 29 30 31 32 Q1=23 med Q3=29.5 45 68 74 75 76 82 91 93 98 med=79 Q1 Q3

Example 1 Which of the following measures of spread are resistant?
Range Variance Standard Deviation Interquartile Range (IQR) Not Resistant Not Resistant Not Resistant Resistant

Example 2 IQR = Q3 – Q1 = 42.5 – 15 = 27.5 minutes
Travel times to work for 20 randomly selected New Yorkers Example, page 57 10 30 5 25 40 20 15 85 65 60 45 5 10 15 20 25 30 40 45 60 65 85 5 10 15 20 25 30 40 45 60 65 85 Q1 = 15 M = 22.5 Q3= 42.5 IQR = Q3 – Q1 = 42.5 – 15 = 27.5 minutes Interpretation: The range of the middle half of travel times for the New Yorkers in the sample is 27.5 minutes.

Determining Outliers “1.5  IQR Rule”
InterQuartile Range “IQR”: Distance between Q1 and Q3. Resistant measure of spread...only measures middle 50% of data. IQR = Q3 - Q1 {width of the “box” in a boxplot} 1.5 IQR Rule: If an observation falls more than 1.5 IQRs above Q3 or below Q1, it is an outlier. Why 1.5? According to John Tukey, 1 IQR seemed like too little and 2 IQRs seemed like too much...

Outliers: 1.5  IQR Rule To determine outliers: Find 5 Number Summary
Determine IQR Multiply 1.5  IQR Set up “fences” Lower Fence: Q1 - (1.5  IQR) Upper Fence: Q3 + (1.5  IQR) Observations “outside” the fences are outliers.

Example 2 part 2 In addition to serving as a measure of spread, the interquartile range (IQR) is used as part of a rule of thumb for identifying outliers. Definition: The 1.5 x IQR Rule for Outliers Call an observation an outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile. Example, page 57 In the New York travel time data, we found Q1=15 minutes, Q3=42.5 minutes, and IQR=27.5 minutes. For these data, 1.5 x IQR = 1.5(27.5) = 41.25 Q x IQR = 15 – = Q x IQR = = 83.75 Any travel time shorter than minutes or longer than minutes is considered an outlier. 0 5 3 00 5 7 8 5

5-Number Summary, Boxplots
The 5 Number Summary provides a reasonably complete description of the center and spread of distribution We can visualize the 5 Number Summary with a boxplot. MIN Q1 MED Q3 MAX min=45 Q1=74 med=79 Q3=91 max=98 Outlier? 45 50 55 60 65 70 75 80 85 90 95 100 Quiz Scores

Drawing a Boxplot The five-number summary divides the distribution roughly into quarters. This leads to a new way to display quantitative data, the boxplot. Draw and label a number line that includes the range of the distribution. Draw a central box from Q1 to Q3. Note the median M inside the box. Extend lines (whiskers) from the box out to the minimum and maximum values that are not outliers

Recall, this is an outlier by the
Example 2 part 3 Boxplot 10 30 5 25 40 20 15 85 65 60 45 5 10 15 20 25 30 40 45 60 65 85 Min=5 Q1 = 15 M = 22.5 Q3= 42.5 Max=85 Recall, this is an outlier by the 1.5 x IQR rule

Example 3 Consumer Reports did a study of ice cream bars (sigh, only vanilla flavored) in their August 1989 issue. Twenty-seven bars having a taste-test rating of at least “fair” were listed, and calories per bar was included. Calories vary quite a bit partly because bars are not of uniform size. Just how many calories should an ice cream bar contain? Construct a boxplot for the data above. 342 377 319 353 295 234 294 286 182 310 439 111 201 197 209 147 190 151 131

Example 3 - Answer Q1 = 182 Q2 = Q3 = 319 Min = 111 Max = 439 Range = 328 IQR = 137 UF = LF = -23.5 Calories

Example 4 The weights of 20 randomly selected juniors at MSHS are recorded below: a) Construct a boxplot of the data b) Determine if there are any mild or extreme outliers c) Comment on the distribution 121 126 130 132 143 137 141 144 148 205 125 128 131 133 135 139 147 153 213

Example 4 - Answer Q1 = Q2 = 138 Q3 = Min = 121 Max = 213 Range = 92 IQR = 15 UF = 168 LF = 108 Mean = StDev = 23.91 Extreme Outliers ( > 3 IQR from Q3) * * Weight (lbs) Shape: somewhat symmetric Outliers: 2 extreme outliers Center: Median = Spread: IQR = 15

Example 5 Plot the data and describe the SOCS: Shape? Outliers?
Consider the following test scores for a small class: 75 76 82 93 45 68 74 91 98 Plot the data and describe the SOCS: Shape? Outliers? Center? Spread? skewed left maybe 45 M = 79 IQR = 91-74=17 Why use median describes the “center”? Why use IQR to describes the “spread’? data skewed data skewed

Choosing Measures of Center & Spread
We now have a choice between two descriptions for center and spread Mean and Standard Deviation Median and Interquartile Range Choosing Measures of Center and Spread The median and IQR are usually better than the mean and standard deviation for describing a skewed distribution or a distribution with outliers. Use mean and standard deviation only for reasonably symmetric distributions that don’t have outliers. NOTE: Numerical summaries do not fully describe the shape of a distribution. ALWAYS PLOT YOUR DATA!

Using the TI-83 Enter the test data into List, L1
STAT, EDIT enter data into L1 Calculate 5 Number Summary Hit STAT go over to CALC and select 1-Var Stats and hitt 2nd 1 (L1) Use 2nd Y= (STAT PLOT) to graph the box plot Turn plot1 ON Select BOX PLOT (4th option, first in second row) Xlist: L1 Freq: 1 Hit ZOOM 9:ZoomStat to graph the box plot Copy graph with appropriate labels and titles

Day 2 Summary and Homework
Sample variance is found by dividing by (n – 1) to keep it an unbiased (since we estimate the population mean, μ, by using the sample mean, x-bar) estimator of population variance The larger the standard deviation, the more dispersion the distribution has Boxplots can be used to check outliers and distributions Use comparative boxplots for two datasets Identifying a distribution from boxplots or histograms is subjective! Use standard deviation with mean and IQR with median Homework pg 82: prob 33; pg 89 probs 40, 41; pg 97 probs 45, 46