# Chapter 5 Describing Distributions Numerically.

## Presentation on theme: "Chapter 5 Describing Distributions Numerically."— Presentation transcript:

Chapter 5 Describing Distributions Numerically

Describing the Distribution
Center Median (.5 quartile, 2nd quartile, 50th percentile) Mean Spread Range Interquartile Range Standard Deviation

Median Literally = middle number (data value)
Has the same units as the data n (number of observations) is odd Order the data from smallest to largest Median is the middle number on the list (n+1)/2 number from the smallest value Ex: If n=11, median is the (11+1)/2 = 6th number from the smallest value Ex: If n=37, median is the (37+1)/2 = 19th number from the smallest value

Example – Frank Thomas 15 observations Median = 32 HRs
Career Home Runs Remember to order the values, if they aren’t already in order! 15 observations (15+1)/2 = 8th observation from bottom Median = 32 HRs

Median n is even Order the data from smallest to largest
Median is the average of the two middle numbers (n+1)/2 will be halfway between these two numbers Ex: If n=10, (10+1)/2 = 5.5, median is average of 5th and 6th numbers from smallest value

Example – Ryne Sandberg
16 observations (16 + 1)/2 = 8.5, average of 8th and 9th observations from bottom Median = average of 16 and 19 Median = 17.5 HRs Career Home Runs Remember to order the values if they aren’t already in order!

Mean Ordinary average Formula Add up all observations
Divide by the number of observations Has the same units as the data Formula n observations y1, y2, y3, …, yn are the values

Mean

Examples Thomas Sandberg FIND THE MEAN

Mean vs. Median Median = middle number
Mean = value where histogram balances Mean and Median similar when Data are symmetric Mean and median different when Data are skewed There are outliers

Mean vs. Median Mean influenced by unusually high or unusually low values Example: Income in a small town of 6 people \$25,000 \$27,000 \$29,000 \$35,000 \$37,000 \$38,000 **The mean income is \$31,830 **The median income is \$32,000

Mean vs. Median Bill Gates moves to town Mean is pulled by the outlier
\$25,000 \$27,000 \$29,000 \$35,000 \$37,000 \$38,000 \$40,000,000 **The mean income is \$5,741,571 **The median income is \$35,000 Mean is pulled by the outlier Median is not Mean is not a good center of these data

Mean vs. Median Skewness pulls the mean in the direction of the tail
Skewed to the right = mean > median Skewed to the left = mean < median Outliers pull the mean in their direction Large outlier = mean > median Small outlier = mean < median

Spread Range = maximum – minimum Thomas Sandberg
Min = 4, Max = 43, Range = = 39 HRs Sandberg Min = 0, Max = 40, Range = = 40 HRs

It is highly affected by outliers Makes spread appear larger than reality Ex. The annual numbers of deaths from tornadoes in the U.S. from 1990 to 2000: Range with outlier: 130 – 25 = 105 tornadoes Range without outlier: 94 – 25 = 69 tornadoes

Spread Interquartile Range (IQR) IQR = Q3 – Q1 First Quartile (Q1)
Larger than about 25% of the data Third Quartile (Q3) Larger than about 75% of the data IQR = Q3 – Q1 Center (Middle) 50% of the values

Finding Quartiles Order the data Split into two halves at the median
When n is odd, include the median in both halves When n is even, do not include the median in either half Q1 = median of the lower half Q3 = median of the upper half

Example – Frank Thomas Order the values (15 values)
Lower Half = Q1 = Median of lower half = 21 HRs Upper Half = Q3 = Median of upper half = 40 HRs IQR = 40 – 21 = 19 HRs

Example – Ryne Sandberg
Order the values (16 values) Lower Half = Q1 = Median of lower half = 8.5 HRs Upper Half = Q3 = Median of upper half = 26 HRs IQR = Q3 – Q1 = 26 – 8.5 = 17.5 HRs

Five Number Summary Minimum Q1 Median Q3 Maximum

Examples Thomas Sandberg Min = 4 HRs Q1 = 21 HRs Median = 32 HRs
Max = 43 HRs Sandberg Min = 0 HRs Q1 = 8.5 HRs Median = 17.5 HRs Q3 = 26 HRs Max = 40 HRs

Graph of Five Number Summary
Boxplot Box between Q1 and Q3 Line in the box marks the median Lines extend out to minimum and maximum Best used for comparisons Use this simpler method

Example – Thomas & Sandberg
Boxplot of Thomas Home Runs Box from 21 to 40 Line in box 32 Lines extend out from box from 4 and 43 Boxplot of Sandberg Home Runs Box from 8.5 to 26 Line in box at 17.5 Lines extend out from box to 0 and 40

Side by Side Boxplots of Thomas & Sandberg Home Runs

Most common measure of spread (Although it is influenced by skewness and outliers) Denoted by letter s Make a table when calculating by hand

Standard Deviation

53 =-3.27 10.69 39 = 298.25 33 = 541.49 69 = 12.73 162.05 30 = 690.11 25 = 977.81 67 = 10.73 115.13 130 = 73.73 94 = 37.73 40 = 264.71

Example – Frank Thomas Find the standard deviation of the number of home runs given the following statistic:

Properties of s s = 0 only when all observations are equal; otherwise, s > 0 s has the same units as the data s is not resistant Skewness and outliers affect s, just like mean Tornado Example: s with outlier: tornadoes s without outlier: tornadoes

Which summaries should you use?
What numbers are affected by outliers? Mean Standard deviation Range What numbers are not affected by outliers? Median IQR

Which summaries should you use?
Five Number Summary Skewed Data Data with outliers Mean and Standard Deviation Symmetric Data ALWAYS PLOT YOUR DATA!!