Summary Statistics 9/23/2018 Summary Statistics

Slides:



Advertisements
Similar presentations
DESCRIBING DISTRIBUTION NUMERICALLY
Advertisements

HS 67 - Intro Health Statistics Describing Distributions with Numbers
Descriptive Measures MARE 250 Dr. Jason Turner.
Class Session #2 Numerically Summarizing Data
Measures of Dispersion
Numerically Summarizing Data
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
1 Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Range Standard Deviation Interquartile Range (IQR)
Chapter In Chapter 3… … we used stemplots to look at shape, central location, and spread of a distribution. In this chapter we use numerical summaries.
Basic Practice of Statistics - 3rd Edition
Describing distributions with numbers
Objectives 1.2 Describing distributions with numbers
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Chapter 3 Looking at Data: Distributions Chapter Three
Essential Statistics Chapter 21 Describing Distributions with Numbers.
BPS - 5th Ed. Chapter 21 Describing Distributions with Numbers.
© 2012 W.H. Freeman and Company Lecture 2 – Aug 29.
CHAPTER 4 NUMERICAL METHODS FOR DESCRIBING DATA What trends can be determined from individual data sets?
Chapter 1: Exploring Data
Notes 13.2 Measures of Center & Spread
Numerical descriptions of distributions
CHAPTER 2: Describing Distributions with Numbers
Descriptive Statistics (Part 2)
CHAPTER 2: Describing Distributions with Numbers
Description of Data (Summary and Variability measures)
CHAPTER 1 Exploring Data
Numerical Descriptive Measures
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Please take out Sec HW It is worth 20 points (2 pts
Numerical Measures: Skewness and Location
Lecture 2 Chapter 3. Displaying and Summarizing Quantitative Data
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
CHAPTER 1 Exploring Data
Describing Quantitative Data with Numbers
Basic Practice of Statistics - 3rd Edition
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Honors Statistics Review Chapters 4 - 5
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
Essential Statistics Describing Distributions with Numbers
Basic Practice of Statistics - 3rd Edition
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
The Five-Number Summary
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

Summary Statistics 9/23/2018 Summary Statistics Last week we used stemplots and histograms to describe the shape, location, and spread of a distribution. This week we use numerical summaries of location and spread. 9/23/2018 Summary Statistics HS 167

Main Summary Statistics by Type Central location Mean Median Mode Spread Variance and standard deviation Quartiles and Inter Quartile Range (IQR) Shape Statistical measures of spread (e.g., skewness and kurtosis) are available but are seldom used in practice (not covered) 9/23/2018 Summary Statistics

Notation n  sample size X  variable xi  value of individual i   sum all values (capital sigma) Illustrative example (sample.sav), data: 21 42 5 11 30 50 28 27 24 52  n = 10 X = age x1= 21, x2= 42, …, x10= 52 x = 21 + 42 + … + 52 = 290 9/23/2018 Summary Statistics

Sample Mean Illustrative example: n = 10 (data & intermediate calculations on prior slide) 9/23/2018 Summary Statistics

Population Mean Same operation as sample mean, but based on entire population (N = population size) Not available in practice, but important conceptually 9/23/2018 Summary Statistics

Interpretation of xbar Sample mean used to predict an observation drawn at random from a sample an observation drawn at random from the population the population mean Gravitational center (balance point) 9/23/2018 Summary Statistics

Median – a different kind of average “Middle value” Covered last week Order data Depth of median is (n+1) / 2 When n is odd  middle value When n is even  average two middle values Illustrative example, n = 10  median has depth (10+1) / 2 = 5.5 05 11 21 24 27 28 30 42 50 52  median = average of 27 and 28 = 27.5 9/23/2018 Summary Statistics

Median is “robust” Robust  resistant to skews and outliers Summary Statistics 9/23/2018 Median is “robust” Robust  resistant to skews and outliers This data set has a mean (xbar) of 1600: 1362 1439 1460 1614 1666 1792 1867 This data set has an outlier and a mean of 2743: 1362 1439 1460 1614 1666 1792 9867 Outlier The median is 1614 in both instances. The median was not influenced by the outlier. 9/23/2018 Summary Statistics HS 167

Mode Mode  value with greatest frequency e.g., {4, 7, 7, 7, 8, 8, 9} has mode = 7 Used only in very large data sets 9/23/2018 Summary Statistics

Mean, Median, Mode Symmetrical data: mean = median positive skew: mean > median [mean gets “pulled” by tail] negative skew: mean < median 9/23/2018 Summary Statistics

Summary Statistics 9/23/2018 Spread = Variability Variability  amount values spread above and below the average Measures of spread Range and inter-quartile range Standard deviation and variance (this week) 9/23/2018 Summary Statistics HS 167

Summary Statistics 9/23/2018 Range = max – min The range is rarely used in practice b/c it tends to underestimate population range and is not robust 9/23/2018 Summary Statistics HS 167

Standard deviation Deviation = Sum of squared deviations = Summary Statistics 9/23/2018 Standard deviation Most common descriptive measure of spread Deviation = Sum of squared deviations = Sample variance = Sample standard deviation = 9/23/2018 Summary Statistics HS 167

Standard deviation (formula) Sample standard deviation s is the unbiased estimator of population standard deviation . Population standard deviation  is rarely known in practice. 9/23/2018 Summary Statistics

Summary Statistics 9/23/2018 New data set (“Metabolic Rates”) This example is not in your lecture notes Metabolic rates (cal/day), n = 7 1792 1666 1362 1614 1460 1867 1439 9/23/2018 Summary Statistics HS 167

Metabolic rates showing mean ( Metabolic rates showing mean (*) and deviations of first two observations 9/23/2018 Summary Statistics

Standard Deviation Calculation metabolic.sav – introduced slide 15 Summary Statistics 9/23/2018 Standard Deviation Calculation metabolic.sav – introduced slide 15 Observations Deviations Squared deviations 1792 1792 1600 = 192 (192)2 = 36,864 1666 1666 1600 = 66 (66)2 = 4,356 1362 1362 1600 = -238 (-238)2 = 56,644 1614 1614 1600 = 14 (14)2 = 196 1460 1460 1600 = -140 (-140)2 = 19,600 1867 1867 1600 = 267 (267)2 = 71,289 1439 1439 1600 = -161 (-161)2 = 25,921 SUMS  0* SS = 214,870 * Sum of deviations will always equal zero 9/23/2018 Summary Statistics HS 167

Standard Deviation Metabolic data (cont.) Summary Statistics 9/23/2018 Standard Deviation Metabolic data (cont.) Variance (s2) Standard deviation (s) 9/23/2018 Summary Statistics HS 167

General rule for rounding means and standard deviations Report mean to one additional decimals above that of the data To achieve accuracy, intermediate calculations should carry still an additional decimals Illustrative example Suppose data is recorded with one decimal accuracy (i.e., xx.x) Report mean with two decimal accuracy (i.e., xx.xx) Carry all intermediate calculations with at least three decimal accuracy (i.e., xx.xxx) Even more important: Always use common sense and judgment. 9/23/2018 Summary Statistics

TI-30XIIS – about $12 In practice, we often use software or a calculator to check our standard deviation 9/23/2018 Summary Statistics

Interpretation of Standard Deviation Larger standard deviation  greater variability s1 = 15 and s2 = 10  group 1 has more variability 68-95-99.7 rule – Normal data only 68% of data with 1 SD of mean, 95% within 2 SD from mean, and 99.7% within 3 SD of mean e.g., if mean = 30 and SD = 10, then 95% of individuals are in the range 30 ± (2)(10) = 30 ± 20 = (10 to 50) Chebychev’s rule – All data at least 75% data within 2 SD of mean e.g., mean = 30 and SD = 10, then at least 75% of individuals in range 30 ± (2)(10) = (10 to 50) 9/23/2018 Summary Statistics

Summary Statistics 9/23/2018 Quartiles and IQR Quartiles divide the ordered data into four equally-sized groups Q0 = minimum Q1 = 25th %ile Q2 = 50th %ile (Median) Q3 = 75th %ile Q4 = maximum 9/23/2018 Summary Statistics HS 167

gives spread of middle 50% of the data Summary Statistics 9/23/2018 Rule for quartiles Find the median  Q2 Middle of lower half of data set  Q1 Middle of upper half of the data  Q3 Bottom half | Top half 05 11 21 24 27 | 28 30 42 50 52    Q1 Q2 Q3 IQR = Q3 – Q1 = 42 – 21 = 21 gives spread of middle 50% of the data 9/23/2018 Summary Statistics HS 167

5-Point Summary (sample.sav) Summary Statistics 9/23/2018 5-Point Summary (sample.sav) Q0 = 5 (minimum) Q1 = 21 (lower hinge) Q2 = 27.5 (median) Q3 = 42 (upper hinge) Q4 = 52 (maximum) Best descriptive statistics for skewed data 9/23/2018 Summary Statistics HS 167

Illustrative example (metabolic.sav) Summary Statistics 9/23/2018 Illustrative example (metabolic.sav) 1362 1439 1460 1614 1666 1792 1867  median Bottom half : 1362 1439 1460 1614  Q1 = (1439 + 1460) / 2 = 1449.5 Top half: 1614 1666 1792 1867  Q3 = (1666 + 1792) / 2 = 1729 5-point summary: 1362, 1449.5, 1614, 1729, 1867 9/23/2018 Summary Statistics HS 167

Box-and-whiskers plot (boxplot) 5 point summary + “outside values” Procedure Determine 5-point summary Draw box from Q1 to Q3 Draw line @ Q2 Calculate IQR = Q3 – Q1 Calculate fences FLower = Q1 – 1.5(IQR) FUpper = Q3 + 1.5(IQR) Determine if any outside values? If so, plot separately Determine inside values and draw whiskers from box to inside values 9/23/2018 Summary Statistics

Boxplot example 5-point: 5, 21, 27.5, 42, 52 IQR = 42 – 21 = 21 05 11 21 24 27 28 30 42 50 52 5-point: 5, 21, 27.5, 42, 52 IQR = 42 – 21 = 21 FU = 42 + (1.5)(21) = 73.5 No outside above (outside) Upper inside value = 52 FL = 21 – (1.5)(21) = –10.5 No values below (outside) Lower inside value = 5 60 50 40 30 20 10 Upper inside = 52 Q3 = 42 Q1 = 21 Lower inside = 5 Q2 = 27.5 9/23/2018 Summary Statistics

Boxplot example 2 5-point: 3, 22, 25.5, 29, 51 IQR = 29 – 22 = 7 3 21 22 24 25 26 28 29 31 51 5-point: 3, 22, 25.5, 29, 51 IQR = 29 – 22 = 7 FU = 29 + (1.5)(7) = 39.5 One outside (51) Inside value = 31 FL = 22 – (1.5)(7) = 11.5 One outside (3) Inside value = 21 9/23/2018 Summary Statistics

Boxplot example 3 (metabolic.sav) 1362 1439 1460 1614 1666 1792 1867 5-point: 1362, 1449.5, 1614, 1729, 1867 (slide 30) IQR = 1729 – 1449.5 = 279.5 FU = 1729 + (1.5)(279.5) = 2148.25 None outside Upper inside = 1867 FL = 1449.5 – (1.5)(279.5) = 1030.25 Lower inside = 1362 9/23/2018 Summary Statistics

Interpretation of boxplots Location Position of median Position of box Spread Hinge-spread (box length) = IQR Whisker-to-whisker spread (range or range minus the outside values) Shape Symmetry of box Size of whiskers Outside values (potential outliers) 9/23/2018 Summary Statistics

Side-by-side boxplots Boxplots are especially useful for comparing groups: 9/23/2018 Summary Statistics