Presentation is loading. Please wait.

Presentation is loading. Please wait.

Summary Statistics 9/23/2018 Summary Statistics

Similar presentations


Presentation on theme: "Summary Statistics 9/23/2018 Summary Statistics"— Presentation transcript:

1 Summary Statistics 9/23/2018 Summary Statistics Last week we used stemplots and histograms to describe the shape, location, and spread of a distribution. This week we use numerical summaries of location and spread. 9/23/2018 Summary Statistics HS 167

2 Main Summary Statistics by Type
Central location Mean Median Mode Spread Variance and standard deviation Quartiles and Inter Quartile Range (IQR) Shape Statistical measures of spread (e.g., skewness and kurtosis) are available but are seldom used in practice (not covered) 9/23/2018 Summary Statistics

3 Notation n  sample size X  variable xi  value of individual i
  sum all values (capital sigma) Illustrative example (sample.sav), data:   n = 10 X = age x1= 21, x2= 42, …, x10= 52 x = … + 52 = 290 9/23/2018 Summary Statistics

4 Sample Mean Illustrative example: n = 10 (data & intermediate calculations on prior slide) 9/23/2018 Summary Statistics

5 Population Mean Same operation as sample mean, but based on entire population (N = population size) Not available in practice, but important conceptually 9/23/2018 Summary Statistics

6 Interpretation of xbar
Sample mean used to predict an observation drawn at random from a sample an observation drawn at random from the population the population mean Gravitational center (balance point) 9/23/2018 Summary Statistics

7 Median – a different kind of average
“Middle value” Covered last week Order data Depth of median is (n+1) / 2 When n is odd  middle value When n is even  average two middle values Illustrative example, n = 10  median has depth (10+1) / 2 = 5.5  median = average of 27 and 28 = 27.5 9/23/2018 Summary Statistics

8 Median is “robust” Robust  resistant to skews and outliers
Summary Statistics 9/23/2018 Median is “robust” Robust  resistant to skews and outliers This data set has a mean (xbar) of 1600: This data set has an outlier and a mean of 2743: Outlier The median is 1614 in both instances. The median was not influenced by the outlier. 9/23/2018 Summary Statistics HS 167

9 Mode Mode  value with greatest frequency
e.g., {4, 7, 7, 7, 8, 8, 9} has mode = 7 Used only in very large data sets 9/23/2018 Summary Statistics

10 Mean, Median, Mode Symmetrical data: mean = median
positive skew: mean > median [mean gets “pulled” by tail] negative skew: mean < median 9/23/2018 Summary Statistics

11 Summary Statistics 9/23/2018 Spread = Variability Variability  amount values spread above and below the average Measures of spread Range and inter-quartile range Standard deviation and variance (this week) 9/23/2018 Summary Statistics HS 167

12 Summary Statistics 9/23/2018 Range = max – min The range is rarely used in practice b/c it tends to underestimate population range and is not robust 9/23/2018 Summary Statistics HS 167

13 Standard deviation Deviation = Sum of squared deviations =
Summary Statistics 9/23/2018 Standard deviation Most common descriptive measure of spread Deviation = Sum of squared deviations = Sample variance = Sample standard deviation = 9/23/2018 Summary Statistics HS 167

14 Standard deviation (formula)
Sample standard deviation s is the unbiased estimator of population standard deviation . Population standard deviation  is rarely known in practice. 9/23/2018 Summary Statistics

15 Summary Statistics 9/23/2018 New data set (“Metabolic Rates”) This example is not in your lecture notes Metabolic rates (cal/day), n = 7 9/23/2018 Summary Statistics HS 167

16 Metabolic rates showing mean (
Metabolic rates showing mean (*) and deviations of first two observations 9/23/2018 Summary Statistics

17 Standard Deviation Calculation metabolic.sav – introduced slide 15
Summary Statistics 9/23/2018 Standard Deviation Calculation metabolic.sav – introduced slide 15 Observations Deviations Squared deviations 1792 1792 1600 = 192 (192)2 = 36,864 1666 1666 1600 = 66 (66)2 = 4,356 1362 1362 1600 = -238 (-238)2 = 56,644 1614 1614 1600 = 14 (14)2 = 1460 1460 1600 = -140 (-140)2 = 19,600 1867 1867 1600 = 267 (267)2 = 71,289 1439 1439 1600 = -161 (-161)2 = 25,921 SUMS  0* SS = 214,870 * Sum of deviations will always equal zero 9/23/2018 Summary Statistics HS 167

18 Standard Deviation Metabolic data (cont.)
Summary Statistics 9/23/2018 Standard Deviation Metabolic data (cont.) Variance (s2) Standard deviation (s) 9/23/2018 Summary Statistics HS 167

19 General rule for rounding means and standard deviations
Report mean to one additional decimals above that of the data To achieve accuracy, intermediate calculations should carry still an additional decimals Illustrative example Suppose data is recorded with one decimal accuracy (i.e., xx.x) Report mean with two decimal accuracy (i.e., xx.xx) Carry all intermediate calculations with at least three decimal accuracy (i.e., xx.xxx) Even more important: Always use common sense and judgment. 9/23/2018 Summary Statistics

20 TI-30XIIS – about $12 In practice, we often use software or a calculator to check our standard deviation 9/23/2018 Summary Statistics

21 Interpretation of Standard Deviation
Larger standard deviation  greater variability s1 = 15 and s2 = 10  group 1 has more variability rule – Normal data only 68% of data with 1 SD of mean, 95% within 2 SD from mean, and 99.7% within 3 SD of mean e.g., if mean = 30 and SD = 10, then 95% of individuals are in the range 30 ± (2)(10) = 30 ± 20 = (10 to 50) Chebychev’s rule – All data at least 75% data within 2 SD of mean e.g., mean = 30 and SD = 10, then at least 75% of individuals in range 30 ± (2)(10) = (10 to 50) 9/23/2018 Summary Statistics

22 Summary Statistics 9/23/2018 Quartiles and IQR Quartiles divide the ordered data into four equally-sized groups Q0 = minimum Q1 = 25th %ile Q2 = 50th %ile (Median) Q3 = 75th %ile Q4 = maximum 9/23/2018 Summary Statistics HS 167

23 gives spread of middle 50% of the data
Summary Statistics 9/23/2018 Rule for quartiles Find the median  Q2 Middle of lower half of data set  Q1 Middle of upper half of the data  Q3 Bottom half | Top half |    Q Q Q3 IQR = Q3 – Q1 = 42 – 21 = 21 gives spread of middle 50% of the data 9/23/2018 Summary Statistics HS 167

24 5-Point Summary (sample.sav)
Summary Statistics 9/23/2018 5-Point Summary (sample.sav) Q0 = 5 (minimum) Q1 = 21 (lower hinge) Q2 = 27.5 (median) Q3 = 42 (upper hinge) Q4 = 52 (maximum) Best descriptive statistics for skewed data 9/23/2018 Summary Statistics HS 167

25 Illustrative example (metabolic.sav)
Summary Statistics 9/23/2018 Illustrative example (metabolic.sav)  median Bottom half :  Q1 = ( ) / 2 = Top half:  Q3 = ( ) / 2 = 1729 5-point summary: 1362, , 1614, 1729, 1867 9/23/2018 Summary Statistics HS 167

26 Box-and-whiskers plot (boxplot)
5 point summary + “outside values” Procedure Determine 5-point summary Draw box from Q1 to Q3 Draw Q2 Calculate IQR = Q3 – Q1 Calculate fences FLower = Q1 – 1.5(IQR) FUpper = Q (IQR) Determine if any outside values? If so, plot separately Determine inside values and draw whiskers from box to inside values 9/23/2018 Summary Statistics

27 Boxplot example 5-point: 5, 21, 27.5, 42, 52 IQR = 42 – 21 = 21
5-point: 5, 21, 27.5, 42, 52 IQR = 42 – 21 = 21 FU = 42 + (1.5)(21) = 73.5 No outside above (outside) Upper inside value = 52 FL = 21 – (1.5)(21) = –10.5 No values below (outside) Lower inside value = 5 60 50 40 30 20 10 Upper inside = 52 Q3 = 42 Q1 = 21 Lower inside = 5 Q2 = 27.5 9/23/2018 Summary Statistics

28 Boxplot example 2 5-point: 3, 22, 25.5, 29, 51 IQR = 29 – 22 = 7
5-point: 3, 22, 25.5, 29, 51 IQR = 29 – 22 = 7 FU = 29 + (1.5)(7) = 39.5 One outside (51) Inside value = 31 FL = 22 – (1.5)(7) = 11.5 One outside (3) Inside value = 21 9/23/2018 Summary Statistics

29 Boxplot example 3 (metabolic.sav)
5-point: 1362, , 1614, 1729, 1867 (slide 30) IQR = 1729 – = 279.5 FU = (1.5)(279.5) = None outside Upper inside = 1867 FL = – (1.5)(279.5) = Lower inside = 1362 9/23/2018 Summary Statistics

30 Interpretation of boxplots
Location Position of median Position of box Spread Hinge-spread (box length) = IQR Whisker-to-whisker spread (range or range minus the outside values) Shape Symmetry of box Size of whiskers Outside values (potential outliers) 9/23/2018 Summary Statistics

31 Side-by-side boxplots
Boxplots are especially useful for comparing groups: 9/23/2018 Summary Statistics


Download ppt "Summary Statistics 9/23/2018 Summary Statistics"

Similar presentations


Ads by Google