Describing Distributions with Numbers

Slides:



Advertisements
Similar presentations
CHAPTER 1 Exploring Data
Advertisements

Looking at data: distributions - Describing distributions with numbers IPS chapter 1.2 © 2006 W.H. Freeman and Company.
BPS - 5th Ed. Chapter 21 Describing Distributions with Numbers.
CHAPTER 2: Describing Distributions with Numbers
Describing distributions with numbers
CHAPTER 2: Describing Distributions with Numbers ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
1.3: Describing Quantitative Data with Numbers
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group.
Chapter 3 Looking at Data: Distributions Chapter Three
Describing Quantitative Data with Numbers Section 1.3.

Chapter 2 Describing Distributions with Numbers. Numerical Summaries u Center of the data –mean –median u Variation –range –quartiles (interquartile range)
Organizing Data AP Stats Chapter 1. Organizing Data Categorical Categorical Dotplot (also used for quantitative) Dotplot (also used for quantitative)
BPS - 5th Ed. Chapter 21 Describing Distributions with Numbers.
1.3 Describing Quantitative Data with Numbers Pages Objectives SWBAT: 1)Calculate measures of center (mean, median). 2)Calculate and interpret measures.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
BPS - 5th Ed.Chapter 21 Describing Distributions with Numbers.
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Describe Quantitative Data with Numbers. Mean The most common measure of center is the ordinary arithmetic average, or mean.
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 2: Describing Distributions with Numbers
CHAPTER 2: Describing Distributions with Numbers
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
DAY 3 Sections 1.2 and 1.3.
Please take out Sec HW It is worth 20 points (2 pts
Warmup What is the shape of the distribution? Will the mean be smaller or larger than the median (don’t calculate) What is the median? Calculate the.
CHAPTER 1 Exploring Data
Organizing Data AP Stats Chapter 1.
Describing Quantitative Data with Numbers
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
Basic Practice of Statistics - 3rd Edition
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
The Five-Number Summary
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
Describing Distributions with Numbers
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

Describing Distributions with Numbers Section 1.3 Describing Distributions with Numbers

Quantitative Data Measuring Center Measuring Spread Boxplots Mean Median Measuring Spread Quartiles Five Number Summary Standard deviation Boxplots

Basic Practice of Statistics - 3rd Edition Measuring Center: The Mean The most common measure of center is the arithmetic average, or mean. To find the mean (pronounced “x-bar”) of a set of observations, add their values, and divide by the number of observations. If the n observations are x1, x2, x3, …, xn, their mean is: In more compact notation: 3 Chapter 5

Calculations Mean highway mileage for the 19 2-seaters: Average: 25.8 miles/gallon Issue here: Honda Insight 68 miles/gallon! Exclude it, the mean mileage: only 23.4 mpg What does this say about the mean?

Median is the midpoint of a distribution. Problem: Mean can be easily influenced by outliers. It is NOT a resistant measure of center.  Median Median is the midpoint of a distribution. Resistant or robust measure of center. i.e. not sensitive to extreme observations

Mean vs. Median In a symmetric distribution, mean = median In a skewed distribution, the mean is further out in the long tail than the median. Example: house prices are usually right skewed The mean price of existing houses sold in 2014 in West Lafayete is 231,000. (Mean chases the right tail) The median price of these houses was only 169,900.

Measuring Center: The Median Because the mean cannot resist the influence of extreme observations, it is not a resistant measure of center. Another common measure of center is the median. The median M is the midpoint of a distribution, the number such that half of the observations are smaller and the other half are larger. To find the median of a distribution: Arrange all observations from smallest to largest. If the number of observations n is odd, the median M is the center observation in the ordered list. If the number of observations n is even, the median M is the average of the two center observations in the ordered list.

Measures of spread Quartiles: Divides data into four parts (with the Median) pth percentile – p percent of the observations fall at or below it. Median – 50th percentile First Quartile (Q1) – 25th percentile (median of the lower half of data) Third Quartile (Q3) – 75th percentile (median of the upper half of data) The median and the two quartiles break the data into four 25% pieces.

Calculating median Trick: Always the (n+1)/2 position from the ordered data Example: Data: 1 2 3 4 5 6 7 8 9 (n+1)/2 = 5, so median is the 5th position Median = 5 Example: Data: 1 2 3 4 5 6 7 8 9 10 (n+1)/2 = 5.5, so median is the 5.5th position Median = just the average of 5 and 6 = 5.5

Calculating Quartiles: Example: Data: 1 2 3 4 5 6 7 8 9 Median = 5 = “Q2” Q1 is the median of the lower half = Q3 is the median of the upper half = (ignore the median when counting) Example: Data: 1 2 3 4 5 6 7 8 9 10 Median = 5.5 Q1 = Q3 =

Five-Number Summary 5 numbers Minimum Q1 Median Q3 Maximum

Find the 5-Number Summaries Example: Data: 26 13 35 76 44 58 Data: 84 89 89 64 78

Boxplots The median and quartiles divide the distribution roughly into quarters. This leads to a new way to display quantitative data, the boxplot. How to Make a Boxplot Draw and label a number line that includes the range of the distribution. Draw a central box from Q1 to Q3. Note the median M inside the box. Extend lines (whiskers) from the box out to the minimum and maximum values that are not outliers.

Find the 5 # summary and make a boxplot Numbers of home runs that Hank Aaron hit in each of his 23 years in the Major Leagues: 10 12 13 20 24 26 27 29 30 32 34 34 38 39 39 40 40 44 44 44 44 45 47

Criterion for suspected outliers Interquartile Range (IQR) = Q3 - Q1 Observation is a suspected outlier IF it is: greater than Q3 + 1.5*IQR OR less than Q1 – 1.5*IQR

Criterion for suspected outliers Are there any outliers?

Criterion for suspected outliers Find 5 number summary: Min Q1 Median Q3 Max 1 54.5 103.5 200 2631 Are there any outliers? Q3 – Q1 = 200 – 54.5 = 145.5 Times by 1.5: 145.5*1.5 = 218.25 Add to Q3: 200 + 218.25 = 418.25 Anything higher is a high outlier  7 obs. Subtract from Q1: 54.5 – 218.25 = -163.75 Anything lower is a low outlier  no obs.

Criterion for suspected outliers Seven high outliers circled… Find and circle the eighth outlier.

Modified Boxplot Has outliers as dots or stars. The line extends only to the first non-outlier.

Standard deviation Deviation : Variance : s2 Standard Deviation : s

Finding the standard deviation by hand: DATA points: 1792 1666 1362 1614 1460 1867 1439 Mean = 1600 Finding the standard deviation by hand: Find the deviations from the mean: Deviation1 = 1792 – 1600 = 192 Deviation2 = 1666 – 1600 = 66 … Deviation7 = 1439 – 1600 = -161 Square the deviations. Add them up and divide the sum by n-1 = 6, this gives you s2. Take square root: Standard Deviation = s = 189.24

Properties of the standard deviation Standard deviation is always non-negative s = 0 when there is no spread s has the same units as the original observations s is not resistant to presence of outliers 5-number summary usually better describes a skewed distribution or a distribution with outliers. s is used when we use the mean Mean and standard deviation are usually used for reasonably symmetric distributions without outliers.

Find the mean and standard deviation. Numbers of home runs that Hank Aaron hit in each of his 23 years in the Major Leagues: 13 27 26 44 30 39 40 34 45 44 24 32 44 39 29 44 38 47 34 40 20 12 10

Linear Transformations: changing units of measurements xnew = a + bxold Common conversions Distance: 100km is equivalent to 62 miles xmiles = 0 + 0.62xkm Weight: 1ounce is equivalent to 28.35 grams xg= 0 + 28.35 xoz , Temperature: _

Linear Transformations Do not change shape of distribution However, change center and spread Example: weights of newly hatched pythons: PythonWeight 1 2 3 4 5 oz 1.13 1.02 1.23 1.06 1.16 g 32 29 35 30 33

Ounces Grams Mean weight = (1.13+…+1.16)/5 = 1.12 oz Standard deviation = 0.084 Grams Mean weight =(32+…+33)/5 = 31.8 g or 1.12 * 28.35 = 31.8 Standard deviation = 2.38 or 28.35 * 0.084 = 2.38

Effect of a linear transformation Multiplying each observation by a positive number b multiplies both measures of center (mean and median) and measures of spread (IQR and standard deviation) by b. Adding the same number a to each observation adds a to measures of center and to quartiles and other percentiles but does not change measures of spread (IQR and standard deviation)

Effects of Linear Transformations Your Transformation: xnew = a + b*xold meannew = a + b*mean mediannew = a + b*median stdevnew = |b|*stdev IQRnew = |b|*IQR |b|= absolute value of b (value without sign)

Example Winter temperature recorded in Fahrenheit mean = 20 stdev = 10 median = 22 IQR = 11 Convert into Celsius: mean = -160/9 + 5/9 * 20 = -6.67 C stdev = 5/9 * 10 = 5.56 median = IQR =

SAS tips “proc univariate” procedure generates all the descriptive summaries. For the time being, draw boxplots by hand from the 5-number summary Optional: proc boxplot. See plot.doc

Summary (1.2) Measures of location: Mean, Median, Quartiles Measures of spread: stdev, IQR Mean, stdev affected by extreme observations Median, IQR robust to extreme observations Five number summary and boxplot Linear Transformations