Chapter 2 Exploring Data with Graphs and Numerical Summaries

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

C. D. Toliver AP Statistics
Probabilistic & Statistical Techniques
Descriptive Measures MARE 250 Dr. Jason Turner.
CHAPTER 1 Exploring Data
Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?
EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES
1 Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Range Standard Deviation Interquartile Range (IQR)
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Measures of Relative Standing and Boxplots
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
Boxplots The boxplot is an informative way of displaying the distribution of a numerical variable.. It uses the five-figure summary: minimum, lower quartile,
1.3: Describing Quantitative Data with Numbers
Slide 1 Statistics Workshop Tutorial 6 Measures of Relative Standing Exploratory Data Analysis.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
Chapter 2 Describing Data.
Describing distributions with numbers
1 Further Maths Chapter 2 Summarising Numerical Data.
Chapter 5 Describing Distributions Numerically.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Summary Statistics: Measures of Location and Dispersion.
Chapter 6: Interpreting the Measures of Variability.
Using Measures of Position (rather than value) to Describe Spread? 1.
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
Probability & Statistics Box Plots. Describing Distributions Numerically Five Number Summary and Box Plots (Box & Whisker Plots )
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 2 Exploring Data with Graphs and Numerical Summaries Section 2.1 Different Types of.
CHAPTER 1 Exploring Data
Chapter 16: Exploratory data analysis: numerical summaries
Statistics 1: Statistical Measures
Describing Distributions Numerically
Statistics Unit Test Review
Unit 2 Section 2.5.
CHAPTER 1 Exploring Data
Chapter 2b.
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Box and Whisker Plots Algebra 2.
Numerical Measures: Skewness and Location
Describing Distributions of Data
CHAPTER 1 Exploring Data
Basic Practice of Statistics - 3rd Edition
Measures of Central Tendency
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Day 52 – Box-and-Whisker.
Chapter 1: Exploring Data
Basic Practice of Statistics - 3rd Edition
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Lecture Slides Elementary Statistics Eleventh Edition
Presentation transcript:

Chapter 2 Exploring Data with Graphs and Numerical Summaries Section 2.5 Using Measures of Position to Describe Variability

Percentile The th percentile is a value such that percent of the observations fall below or at that value.

Quartiles Figure 2.14 The Quartiles Split the Distribution Into Four Parts. 25% is below the first quartile (Q1), 25% is between the first quartile and the second quartile (the median, Q2), 25% is between the second quartile and the third quartile (Q3), and 25% is above the third quartile. Question: Why is the second quartile also the median?

SUMMARY: Finding Quartiles Arrange the data in order. Consider the median. This is the second quartile, Q2. Consider the lower half of the observations (excluding the median itself if n is odd). The median of these observations is the first quartile, Q1. Consider the upper half of the observations (excluding the median itself if n is odd). Their median is the third quartile, Q3.

Example: Cereal Sodium Data Consider the sodium values for the 20 breakfast cereals. What are the quartiles for the 20 cereal sodium values? From Table 2.3, the sodium values, in ascending order, are: The median of the 20 values is the average of the 10th and 11th observations, 180 and 180, which is Q2 = 180 mg. The first quartile Q1 is the median of the 10 smallest observations (in the top row), which is the average of 130 and 140, Q1 = 135mg. The third quartile Q3 is the median of the 10 largest observations (in the bottom row), which is the average of 200 and 210, Q3 = 205 mg.

The Interquartile Range (IQR) The interquartile range is the distance between the third quartile and first quartile: IQR = Q3  Q1 IQR gives spread of middle 50% of the data In Words: If the interquartile range of U.S. music teacher salaries equals $16,000, this means that for the middle 50% of the distribution of salaries, $16,000 is the distance between the largest and smallest salaries.

Detecting Potential Outliers Examining the data for unusual observations, such as outliers, is important in any statistical analysis. Is there a formula for flagging an observation as potentially being an outlier? The 1.5 x IQR Criterion for Identifying Potential Outliers An observation is a potential outlier if it falls more than 1.5 x IQR below the first quartile or more than 1.5 x IQR above the third quartile.

5-Number Summary of Positions The 5-number summary is the basis of a graphical display called the box plot, and consists of Minimum value First Quartile Median Third Quartile Maximum value

SUMMARY: Constructing a Box Plot A box goes from Q1 to Q3. A line is drawn inside the box at the median. A line goes from the lower end of the box to the smallest observation that is not a potential outlier and from the upper end of the box to the largest observation that is not a potential outlier. The potential outliers are shown separately.

Example: Boxplot for Cereal Sodium Data Figure 2.15 shows a box plot for the sodium values. Labels are also given for the five-number summary of positions. Figure 2.15 Box Plot and Five-Number Summary for 20 Breakfast Cereal Sodium Values. The central box contains the middle 50% of the data. The line in the box marks the median. Whiskers extend from the box to the smallest and largest observations, which are not identified as potential outliers. Potential outliers are marked separately. Question: Why is the left whisker drawn down only to 50 rather than to 0?

Comparing Distributions A box plot does not portray certain features of a distribution, such as distinct mounds and possible gaps, as clearly as does a histogram. Box plots are useful for identifying potential outliers. Figure 2.16 Box Plots of Male and Female College Student Heights. The box plots use the same scale for height. Question: What are approximate values of the quartiles for the two groups?

Z-Score The z-score also identifies position and potential outliers. The z-score for an observation is the number of standard deviations that it falls from the mean. A positive z-score indicates the observation is above the mean. A negative z-score indicates the observation is below the mean. For sample data, the z -score is calculated as: An observation from a bell-shaped distribution is a potential outlier if its z-score < -3 or > +3 (3 standard deviation criterion) .