Download presentation

1
Chapter 5 Describing Distributions Numerically

2
**Describing the Distribution**

Center Median (.5 quartile, 2nd quartile, 50th percentile) Mean Spread Range Interquartile Range Standard Deviation

3
**Median Literally = middle number (data value)**

Has the same units as the data n (number of observations) is odd Order the data from smallest to largest Median is the middle number on the list (n+1)/2 number from the smallest value Ex: If n=11, median is the (11+1)/2 = 6th number from the smallest value Ex: If n=37, median is the (37+1)/2 = 19th number from the smallest value

4
**Example – Frank Thomas 15 observations Median = 32 HRs**

Career Home Runs Remember to order the values, if they aren’t already in order! 15 observations (15+1)/2 = 8th observation from bottom Median = 32 HRs

5
**Median n is even Order the data from smallest to largest**

Median is the average of the two middle numbers (n+1)/2 will be halfway between these two numbers Ex: If n=10, (10+1)/2 = 5.5, median is average of 5th and 6th numbers from smallest value

6
**Example – Ryne Sandberg**

16 observations (16 + 1)/2 = 8.5, average of 8th and 9th observations from bottom Median = average of 16 and 19 Median = 17.5 HRs Career Home Runs Remember to order the values if they aren’t already in order!

7
**Mean Ordinary average Formula Add up all observations**

Divide by the number of observations Has the same units as the data Formula n observations y1, y2, y3, …, yn are the values

8
Mean

9
Examples Thomas Sandberg FIND THE MEAN

10
**Mean vs. Median Median = middle number**

Mean = value where histogram balances Mean and Median similar when Data are symmetric Mean and median different when Data are skewed There are outliers

11
Mean vs. Median Mean influenced by unusually high or unusually low values Example: Income in a small town of 6 people $25,000 $27,000 $29,000 $35,000 $37,000 $38,000 **The mean income is $31,830 **The median income is $32,000

12
**Mean vs. Median Bill Gates moves to town Mean is pulled by the outlier**

$25,000 $27,000 $29,000 $35,000 $37,000 $38,000 $40,000,000 **The mean income is $5,741,571 **The median income is $35,000 Mean is pulled by the outlier Median is not Mean is not a good center of these data

13
**Mean vs. Median Skewness pulls the mean in the direction of the tail**

Skewed to the right = mean > median Skewed to the left = mean < median Outliers pull the mean in their direction Large outlier = mean > median Small outlier = mean < median

14
**Spread Range = maximum – minimum Thomas Sandberg**

Min = 4, Max = 43, Range = = 39 HRs Sandberg Min = 0, Max = 40, Range = = 40 HRs

15
**Spread Range is a very basic measure of spread**

It is highly affected by outliers Makes spread appear larger than reality Ex. The annual numbers of deaths from tornadoes in the U.S. from 1990 to 2000: Range with outlier: 130 – 25 = 105 tornadoes Range without outlier: 94 – 25 = 69 tornadoes

16
**Spread Interquartile Range (IQR) IQR = Q3 – Q1 First Quartile (Q1)**

Larger than about 25% of the data Third Quartile (Q3) Larger than about 75% of the data IQR = Q3 – Q1 Center (Middle) 50% of the values

17
**Finding Quartiles Order the data Split into two halves at the median**

When n is odd, include the median in both halves When n is even, do not include the median in either half Q1 = median of the lower half Q3 = median of the upper half

18
**Example – Frank Thomas Order the values (15 values)**

Lower Half = Q1 = Median of lower half = 21 HRs Upper Half = Q3 = Median of upper half = 40 HRs IQR = 40 – 21 = 19 HRs

19
**Example – Ryne Sandberg**

Order the values (16 values) Lower Half = Q1 = Median of lower half = 8.5 HRs Upper Half = Q3 = Median of upper half = 26 HRs IQR = Q3 – Q1 = 26 – 8.5 = 17.5 HRs

20
Five Number Summary Minimum Q1 Median Q3 Maximum

21
**Examples Thomas Sandberg Min = 4 HRs Q1 = 21 HRs Median = 32 HRs**

Max = 43 HRs Sandberg Min = 0 HRs Q1 = 8.5 HRs Median = 17.5 HRs Q3 = 26 HRs Max = 40 HRs

22
**Graph of Five Number Summary**

Boxplot Box between Q1 and Q3 Line in the box marks the median Lines extend out to minimum and maximum Best used for comparisons Use this simpler method

23
**Example – Thomas & Sandberg**

Boxplot of Thomas Home Runs Box from 21 to 40 Line in box 32 Lines extend out from box from 4 and 43 Boxplot of Sandberg Home Runs Box from 8.5 to 26 Line in box at 17.5 Lines extend out from box to 0 and 40

24
**Side by Side Boxplots of Thomas & Sandberg Home Runs**

25
**Spread Standard deviation “Average” spread from mean**

Most common measure of spread (Although it is influenced by skewness and outliers) Denoted by letter s Make a table when calculating by hand

26
Standard Deviation

27
**Example – Deaths from Tornadoes**

53 =-3.27 10.69 39 = 298.25 33 = 541.49 69 = 12.73 162.05 30 = 690.11 25 = 977.81 67 = 10.73 115.13 130 = 73.73 94 = 37.73 40 = 264.71

28
Example – Frank Thomas Find the standard deviation of the number of home runs given the following statistic:

29
Properties of s s = 0 only when all observations are equal; otherwise, s > 0 s has the same units as the data s is not resistant Skewness and outliers affect s, just like mean Tornado Example: s with outlier: tornadoes s without outlier: tornadoes

30
**Which summaries should you use?**

What numbers are affected by outliers? Mean Standard deviation Range What numbers are not affected by outliers? Median IQR

31
**Which summaries should you use?**

Five Number Summary Skewed Data Data with outliers Mean and Standard Deviation Symmetric Data ALWAYS PLOT YOUR DATA!!

Similar presentations

© 2021 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google