Boxplots.

Slides:



Advertisements
Similar presentations
Describing Distributions with Numbers
Advertisements

Lecture 17 Sec Wed, Feb 13, 2008 Boxplots.
Understanding and Comparing Distributions 30 min.
Measures of Variation Sample range Sample variance Sample standard deviation Sample interquartile range.
1 Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Range Standard Deviation Interquartile Range (IQR)
Percentiles Def: The kth percentile is the value such that at least k% of the measurements are less than or equal to the value. I.E. k% of the measurements.
5 Number Summary, Boxplots, Outliers, and Resistance
Why use boxplots? ease of construction convenient handling of outliers construction is not subjective (like histograms) Used with medium or large size.
M08-Numerical Summaries 2 1  Department of ISM, University of Alabama, Lesson Objectives  Learn what percentiles are and how to calculate quartiles.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
1 Further Maths Chapter 2 Summarising Numerical Data.
Chapter 5: Boxplots  Objective: To find the five-number summaries of data and create and analyze boxplots CHS Statistics.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Why use boxplots? ease of construction convenient handling of outliers construction is not subjective (like histograms) Used with medium or large size.
1 Chapter 2 Bivariate Data A set of data that contains information on two variables. Multivariate A set of data that contains information on more than.
Why use boxplots? ease of construction convenient handling of outliers construction is not subjective (like histograms) Used with medium or large size.
Module 8 Test Review. Find the following from the set of data: 6, 23, 8, 14, 21, 7, 16, 8  Five Number Summary: Answer: Min 6, Lower Quartile 7.5, Median.
5 Number Summary, Boxplots, Outliers, and Resistance.
Probability & Statistics Box Plots. Describing Distributions Numerically Five Number Summary and Box Plots (Box & Whisker Plots )
AP Statistics 5 Number Summary and Boxplots. Measures of Center and Distributions For a symmetrical distribution, the mean, median and the mode are the.
Example - Fax Here are the number of pages faxed by each fax sent from our Math and Stats department since April 24 th, in the order that they occurred.
5,8,12,15,15,18,20,20,20,30,35,40, Drawing a Dot plot.
Probability & Statistics
Quantitative Data Continued
Box and Whisker Plots or Boxplots
CHAPTER 1 Exploring Data
Analyzing One-Variable Data
Chapter 1: Exploring Data
Chapter 5 : Describing Distributions Numerically I
How to describe a graph Otherwise called CUSS
Boxplots.
Unit 2 Section 2.5.
Chapter 2b.
2.6: Boxplots CHS Statistics
Warmup What is the shape of the distribution? Will the mean be smaller or larger than the median (don’t calculate) What is the median? Calculate the.
Numerical Measures: Skewness and Location
Describing Distributions Numerically
Boxplots.
Range between the quartiles. Q3 – Q1
Approximate the answers by referring to the box plot.
10.5 Organizing & Displaying Date
Boxplots.
Measuring Variation 2 Lecture 17 Sec Mon, Oct 3, 2005.
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Measures of Central Tendency
Define the following words in your own definition
Box & Whiskers Plots AQR.
Boxplots.
Organizing, Summarizing, &Describing Data UNIT SELF-TEST QUESTIONS
Boxplots.
Chapter 1: Exploring Data
Boxplots.
Describing Distributions Numerically
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Boxplots.
Chapter 1: Exploring Data
Chapter 1: Exploring Data
The Five-Number Summary
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Quiz.
Box and Whisker Plots and the 5 number summary
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Boxplots.
Boxplots.
Chapter 1: Exploring Data
Presentation transcript:

Boxplots

Why use boxplots? Easy to construct Conveniently shows outliers Construction is not subjective (like histograms) Good for medium/large data sets Useful to compare data sets

Disadvantages of boxplots Can’t see individual data Should not be used with small data sets (n < 10)

How to Construct Find five-number summary Min Q1 Med Q3 Max Draw box from Q1 to Q3 Draw median as center line in the box Extend whiskers to min & max

ALWAYS use modified boxplots in this class! Displays outliers Fences mark off mild & extreme outliers Whiskers extend to most extreme data values inside the fence ALWAYS use modified boxplots in this class!

Interquartile Range (IQR): the length of the box Inner fence Interquartile Range (IQR): the length of the box  IQR = Q3 - Q1 Q1 – 1.5IQR Q3 + 1.5IQR Any observation outside this fence is an outlier! Mark them with a dot.

Inner fence Draw the whisker from the box to the last observation inside the fence.

Any observation outside this fence is an extreme outlier Outer fence Q1 – 3IQR Q3 + 3IQR Any observation between the fences is considered a mild outlier Any observation outside this fence is an extreme outlier

For the AP Exam . . . . . . We only need to find outliers, NOT identify them as mild or extreme.  So we only need: 1.5IQR

A report from the U.S. Department of Justice gave the following percent increases in federal prison populations in 20 northeastern & midwestern states in 2009. 5.9 1.3 5.0 5.9 4.5 5.6 4.1 6.3 4.8 6.9 4.5 3.5 7.2 6.4 5.5 5.3 8.0 4.4 7.2 3.2 Create a modified boxplot. Describe the distribution.

Evidence suggests that a high indoor radon concentration might be linked to the development of childhood cancers. The data that follows is the radon concentration in two different samples of houses. The first sample consisted of houses in which a child was diagnosed with cancer. Houses in the second sample had no recorded cases of childhood cancer. (see data on note page) Create parallel boxplots. Compare the distributions.

Cancer No Cancer 100 200 Radon The median radon concentration for the no cancer group is lower than the median for the cancer group. The range of the cancer group is larger than the range for the no cancer group, though its IQR is smaller. Both distributions are skewed right. The cancer group has outliers at 39, 45, 57, and 210. The no cancer group has outliers at 55 and 85.