Presentation on theme: "Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation."— Presentation transcript:
Chapter 3 Describing Distributions Numerically
Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation
Median Literally = middle number (data value) n (number of observations) is odd –Order the data from smallest to largest –Median is the middle number on the list –(n+1)/2 number from the smallest value Ex: If n=11, median is the (11+1)/2 = 6 th number from the smallest value Ex: If n=37, median is the (37+1)/2 = 19 th number from the smallest value
Example – August Temps High Temperatures for Des Moines, Iowa taken from the first 13 days of August Remember to order the values, if they aren’t already in order! 13 observations –(13+1)/2 = 7 th observation from the bottom Median = 90
Median n is even –Order the data from smallest to largest –Median is the average of the two middle numbers –(n+1)/2 will be halfway between these two numbers Ex: If n=10, (10+1)/2 = 5.5, median is average of 5 th and 6 th numbers from smallest value
Example – Yankees Scores of last 10 games Remember to order the values if they aren’t already in order! 10 observations –(10 + 1)/2 = 5.5, average of 5 th and 6 th observations from bottom Median = 5
Mean Ordinary average –Add up all observations –Divide by the number of observations Formula –n observations –y 1, y 2, y 3, …, y n are the values
Mean ( )
Example – Vikings (as of 1/9) Find the mean of the (17 values)
Example – Colts as of (1/9) Find the mean of the scores (17 values)
Mean vs. Median Median = middle number Mean = value where histogram balances Mean and Median similar when –Data are symmetric Mean and median different when –Data are skewed –There are outliers
Mean vs. Median Mean influenced by unusually high or unusually low values –Example: Income in a small town of 6 people $25,000 $27,000 $29,000 $35,000 $37,000 $38,000 **The mean income is $31,830 **The median income is $32,000
Mean vs. Median –Bill Gates moves to town $25,000 $27,000 $29,000 $35,000 $37,000 $38,000 $40,000,000 **The mean income is $5,741,571 **The median income is $35,000 –Mean is pulled by the outlier –Median is not –Mean is not a good center of these data
Mean vs. Median Skewness pulls the mean in the direction of the tail –Skewed to the right = mean > median –Skewed to the left = mean < median Outliers pull the mean in their direction –Large outlier = mean > median –Small outlier = mean < median
Weighted Mean Used when values are not equally represented. Weighted mean =
Example (weighted mean) Area% FavoredNumber surveyed A recent survey of new diet cola reported the following percentages of people who liked the taste. Find the weighted mean of the percentages.
Spread Range is a very basic measure of spread (Max – Min). –It is highly affected by outliers –Makes spread appear larger than reality –Ex. The annual numbers of deaths from tornadoes in the U.S. from 1990 to 2000: Range with outlier: 130 – 25 = 105 Range without outlier: 94 – 25 = 69
Spread Interquartile Range (IQR) –First Quartile (Q1) 25 th Percentile –Third Quartile (Q3) 75 th Percentile IQR = Q3 – Q1 –Center (Middle) 50% of the values
Finding Quartiles Order the data Split into two halves at the median –When n is odd, include the median in both halves –When n is even, do not include the median in either half Q1 = median of the lower half Q3 = median of the upper half
Top 15 Populations US Cities 2004 New York, N.Y.810 Los Angeles, Calif.385 Chicago, Ill.286 Houston, Tex.201 Philadelphia, Pa.147 Phoenix, Ariz.142 San Diego, Calif.126 San Antonio, Tex.124 Dallas, Tex.121 San Jose, Calif.90 Detroit, Mich.90 Indianapolis, Ind.78 Jacksonville, Fla.78 San Francisco, Calif.74 * Populations were all divided by 10,000.
Example – Top City Populations Order the values (14 values) Lower Half = Q1 = Median of lower half = 90 Upper Half = –Q3 = Median of upper half = 201 IQR = Q3 – Q1 = = 111
August High Temps (8/1–8/13) Order the values (13 values) Lower Half = –Q1 = Median of lower half = 81 Upper Half = –Q3 = Median of upper half = 93 IQR = Q3 – Q1 = = 12
August High Temps (8/14–8/25) Order the values (12 values) Lower Half = –Q1 = Median of lower half = 78 Upper Half = –Q3 = Median of upper half = 87 IQR = Q3 – Q1 = = 9
Five Number Summary Minimum Q1 Median Q3 Maximum
Graph of Five Number Summary Boxplot –Box between Q 1 and Q 3 –Line in the box marks the median –Lines extend out to minimum and maximum Best used for comparisons Use this simpler method
Example – Vikings & Colts Boxplot of Vikings scores –Box from 20 to 31 –Line in box 27 –Lines extend out from box from 14 and 38 Boxplot of Colts scores –Box from 24 to 41 –Line in box at 34 –Lines extend out from box to 14 and 51
Side by Side Boxplots of Vikings Scores and Colts Scores
Spread Standard deviation –“Average” spread from mean –Most common measure of spread –Denoted by letter s –Make a table when calculating by hand
Example – Deaths from Tornadoes = = = = = = = = = = =
Example - Vikings Find the standard deviation of the scores of Vikings games given the following statistic:
Properties of s s = 0 only when all observations are equal; otherwise, s > 0 s has the same units as the data s is not resistant –Skewness and outliers affect s, just like mean –Tornado Example: s with outlier: s without outlier: 21.70
Which summaries should you use with different distributions? The appropriate measures of center and spread when your distribution is symmetric are: –Mean –Standard deviation The appropriate measures of center and spread when your distribution is skewed are: –Median –IQR
Comparing Variance When comparing the variance for two sets of numbers find the coefficient of variation: Formula = Cvar = = –Then compare the percentages.
Standardizing (first look) I got a 85 on my English test and you got a 36 on your Spanish test. Who did better? How can we compare things that come from different scales? Standardizing –Use z formula (called z-score)
Standardizing Z=standardized score X = raw score X-bar = mean of raw scores S = sample standard deviation So what does this mean for our test scores?
Standardizing I got a 85 on my English test and you got a 35 on your Spanish test. Who did better? Now I need to give you more information. The English class’s tests had a mean of 83 and a standard deviation of 3. The Spanish tests had a mean of 30 and a standard deviation of 2.
Comparing Standardized Scores I scored.667 standard deviations above the mean on my English test where you scored 2.5 standard deviations above the mean on your Spanish test. Comparatively you scored better on your exam.