Summary Statistics 9/23/2018 Summary Statistics

Slides:

Advertisements

Similar presentations

DESCRIBING DISTRIBUTION NUMERICALLY

Advertisements

HS 67 - Intro Health Statistics Describing Distributions with Numbers

Descriptive Measures MARE 250 Dr. Jason Turner.

Class Session #2 Numerically Summarizing Data

Measures of Dispersion

Numerically Summarizing Data

Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.

1 Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Range Standard Deviation Interquartile Range (IQR)

Chapter In Chapter 3… … we used stemplots to look at shape, central location, and spread of a distribution. In this chapter we use numerical summaries.

Basic Practice of Statistics - 3rd Edition

Describing distributions with numbers

Objectives 1.2 Describing distributions with numbers

1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)

1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)

Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Chapter 3 Looking at Data: Distributions Chapter Three

Essential Statistics Chapter 21 Describing Distributions with Numbers.

BPS - 5th Ed. Chapter 21 Describing Distributions with Numbers.

© 2012 W.H. Freeman and Company Lecture 2 – Aug 29.

CHAPTER 4 NUMERICAL METHODS FOR DESCRIBING DATA What trends can be determined from individual data sets?

Chapter 1: Exploring Data

Notes 13.2 Measures of Center & Spread

Numerical descriptions of distributions

CHAPTER 2: Describing Distributions with Numbers

Descriptive Statistics (Part 2)

CHAPTER 2: Describing Distributions with Numbers

Description of Data (Summary and Variability measures)

CHAPTER 1 Exploring Data

Numerical Descriptive Measures

CHAPTER 1 Exploring Data

CHAPTER 1 Exploring Data

Please take out Sec HW It is worth 20 points (2 pts

Numerical Measures: Skewness and Location

Lecture 2 Chapter 3. Displaying and Summarizing Quantitative Data

STA 291 Spring 2008 Lecture 5 Dustin Lueker.

STA 291 Spring 2008 Lecture 5 Dustin Lueker.

CHAPTER 1 Exploring Data

Describing Quantitative Data with Numbers

Basic Practice of Statistics - 3rd Edition

CHAPTER 2: Describing Distributions with Numbers

Chapter 1: Exploring Data

CHAPTER 1 Exploring Data

CHAPTER 1 Exploring Data

Honors Statistics Review Chapters 4 - 5

CHAPTER 2: Describing Distributions with Numbers

Chapter 1: Exploring Data

Essential Statistics Describing Distributions with Numbers

Basic Practice of Statistics - 3rd Edition

CHAPTER 1 Exploring Data

Chapter 1: Exploring Data

MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.

Chapter 1: Exploring Data

CHAPTER 1 Exploring Data

CHAPTER 1 Exploring Data

CHAPTER 1 Exploring Data

CHAPTER 1 Exploring Data

Chapter 1: Exploring Data

The Five-Number Summary

CHAPTER 1 Exploring Data

Chapter 1: Exploring Data

Basic Practice of Statistics - 3rd Edition

Chapter 1: Exploring Data

CHAPTER 1 Exploring Data

CHAPTER 1 Exploring Data

Chapter 1: Exploring Data

Chapter 1: Exploring Data

CHAPTER 1 Exploring Data

Chapter 1: Exploring Data

Chapter 1: Exploring Data

Presentation transcript:

Summary Statistics 9/23/2018 Summary Statistics Last week we used stemplots and histograms to describe the shape, location, and spread of a distribution. This week we use numerical summaries of location and spread. 9/23/2018 Summary Statistics HS 167

Main Summary Statistics by Type Central location Mean Median Mode Spread Variance and standard deviation Quartiles and Inter Quartile Range (IQR) Shape Statistical measures of spread (e.g., skewness and kurtosis) are available but are seldom used in practice (not covered) 9/23/2018 Summary Statistics

Notation n  sample size X  variable xi  value of individual i   sum all values (capital sigma) Illustrative example (sample.sav), data: 21 42 5 11 30 50 28 27 24 52 n = 10 X = age x1= 21, x2= 42, …, x10= 52 x = 21 + 42 + … + 52 = 290 9/23/2018 Summary Statistics

Sample Mean Illustrative example: n = 10 (data & intermediate calculations on prior slide) 9/23/2018 Summary Statistics

Population Mean Same operation as sample mean, but based on entire population (N = population size) Not available in practice, but important conceptually 9/23/2018 Summary Statistics

Interpretation of xbar Sample mean used to predict an observation drawn at random from a sample an observation drawn at random from the population the population mean Gravitational center (balance point) 9/23/2018 Summary Statistics

Median – a different kind of average “Middle value” Covered last week Order data Depth of median is (n+1) / 2 When n is odd  middle value When n is even  average two middle values Illustrative example, n = 10  median has depth (10+1) / 2 = 5.5 05 11 21 24 27 28 30 42 50 52  median = average of 27 and 28 = 27.5 9/23/2018 Summary Statistics

Median is “robust” Robust  resistant to skews and outliers Summary Statistics 9/23/2018 Median is “robust” Robust  resistant to skews and outliers This data set has a mean (xbar) of 1600: 1362 1439 1460 1614 1666 1792 1867 This data set has an outlier and a mean of 2743: 1362 1439 1460 1614 1666 1792 9867 Outlier The median is 1614 in both instances. The median was not influenced by the outlier. 9/23/2018 Summary Statistics HS 167

Mode Mode  value with greatest frequency e.g., {4, 7, 7, 7, 8, 8, 9} has mode = 7 Used only in very large data sets 9/23/2018 Summary Statistics

Mean, Median, Mode Symmetrical data: mean = median positive skew: mean > median [mean gets “pulled” by tail] negative skew: mean < median 9/23/2018 Summary Statistics

Summary Statistics 9/23/2018 Spread = Variability Variability  amount values spread above and below the average Measures of spread Range and inter-quartile range Standard deviation and variance (this week) 9/23/2018 Summary Statistics HS 167

Summary Statistics 9/23/2018 Range = max – min The range is rarely used in practice b/c it tends to underestimate population range and is not robust 9/23/2018 Summary Statistics HS 167

Standard deviation Deviation = Sum of squared deviations = Summary Statistics 9/23/2018 Standard deviation Most common descriptive measure of spread Deviation = Sum of squared deviations = Sample variance = Sample standard deviation = 9/23/2018 Summary Statistics HS 167

Standard deviation (formula) Sample standard deviation s is the unbiased estimator of population standard deviation . Population standard deviation  is rarely known in practice. 9/23/2018 Summary Statistics

Summary Statistics 9/23/2018 New data set (“Metabolic Rates”) This example is not in your lecture notes Metabolic rates (cal/day), n = 7 1792 1666 1362 1614 1460 1867 1439 9/23/2018 Summary Statistics HS 167

Metabolic rates showing mean ( Metabolic rates showing mean (*) and deviations of first two observations 9/23/2018 Summary Statistics

Standard Deviation Calculation metabolic.sav – introduced slide 15 Summary Statistics 9/23/2018 Standard Deviation Calculation metabolic.sav – introduced slide 15 Observations Deviations Squared deviations 1792 1792 1600 = 192 (192)2 = 36,864 1666 1666 1600 = 66 (66)2 = 4,356 1362 1362 1600 = -238 (-238)2 = 56,644 1614 1614 1600 = 14 (14)2 = 196 1460 1460 1600 = -140 (-140)2 = 19,600 1867 1867 1600 = 267 (267)2 = 71,289 1439 1439 1600 = -161 (-161)2 = 25,921 SUMS  0* SS = 214,870 * Sum of deviations will always equal zero 9/23/2018 Summary Statistics HS 167

Standard Deviation Metabolic data (cont.) Summary Statistics 9/23/2018 Standard Deviation Metabolic data (cont.) Variance (s2) Standard deviation (s) 9/23/2018 Summary Statistics HS 167

General rule for rounding means and standard deviations Report mean to one additional decimals above that of the data To achieve accuracy, intermediate calculations should carry still an additional decimals Illustrative example Suppose data is recorded with one decimal accuracy (i.e., xx.x) Report mean with two decimal accuracy (i.e., xx.xx) Carry all intermediate calculations with at least three decimal accuracy (i.e., xx.xxx) Even more important: Always use common sense and judgment. 9/23/2018 Summary Statistics

TI-30XIIS – about $12 In practice, we often use software or a calculator to check our standard deviation 9/23/2018 Summary Statistics

Interpretation of Standard Deviation Larger standard deviation  greater variability s1 = 15 and s2 = 10  group 1 has more variability 68-95-99.7 rule – Normal data only 68% of data with 1 SD of mean, 95% within 2 SD from mean, and 99.7% within 3 SD of mean e.g., if mean = 30 and SD = 10, then 95% of individuals are in the range 30 ± (2)(10) = 30 ± 20 = (10 to 50) Chebychev’s rule – All data at least 75% data within 2 SD of mean e.g., mean = 30 and SD = 10, then at least 75% of individuals in range 30 ± (2)(10) = (10 to 50) 9/23/2018 Summary Statistics

Summary Statistics 9/23/2018 Quartiles and IQR Quartiles divide the ordered data into four equally-sized groups Q0 = minimum Q1 = 25th %ile Q2 = 50th %ile (Median) Q3 = 75th %ile Q4 = maximum 9/23/2018 Summary Statistics HS 167

gives spread of middle 50% of the data Summary Statistics 9/23/2018 Rule for quartiles Find the median  Q2 Middle of lower half of data set  Q1 Middle of upper half of the data  Q3 Bottom half | Top half 05 11 21 24 27 | 28 30 42 50 52    Q1 Q2 Q3 IQR = Q3 – Q1 = 42 – 21 = 21 gives spread of middle 50% of the data 9/23/2018 Summary Statistics HS 167

5-Point Summary (sample.sav) Summary Statistics 9/23/2018 5-Point Summary (sample.sav) Q0 = 5 (minimum) Q1 = 21 (lower hinge) Q2 = 27.5 (median) Q3 = 42 (upper hinge) Q4 = 52 (maximum) Best descriptive statistics for skewed data 9/23/2018 Summary Statistics HS 167

Illustrative example (metabolic.sav) Summary Statistics 9/23/2018 Illustrative example (metabolic.sav) 1362 1439 1460 1614 1666 1792 1867  median Bottom half : 1362 1439 1460 1614  Q1 = (1439 + 1460) / 2 = 1449.5 Top half: 1614 1666 1792 1867  Q3 = (1666 + 1792) / 2 = 1729 5-point summary: 1362, 1449.5, 1614, 1729, 1867 9/23/2018 Summary Statistics HS 167

Box-and-whiskers plot (boxplot) 5 point summary + “outside values” Procedure Determine 5-point summary Draw box from Q1 to Q3 Draw line @ Q2 Calculate IQR = Q3 – Q1 Calculate fences FLower = Q1 – 1.5(IQR) FUpper = Q3 + 1.5(IQR) Determine if any outside values? If so, plot separately Determine inside values and draw whiskers from box to inside values 9/23/2018 Summary Statistics

Boxplot example 5-point: 5, 21, 27.5, 42, 52 IQR = 42 – 21 = 21 05 11 21 24 27 28 30 42 50 52 5-point: 5, 21, 27.5, 42, 52 IQR = 42 – 21 = 21 FU = 42 + (1.5)(21) = 73.5 No outside above (outside) Upper inside value = 52 FL = 21 – (1.5)(21) = –10.5 No values below (outside) Lower inside value = 5 60 50 40 30 20 10 Upper inside = 52 Q3 = 42 Q1 = 21 Lower inside = 5 Q2 = 27.5 9/23/2018 Summary Statistics

Boxplot example 2 5-point: 3, 22, 25.5, 29, 51 IQR = 29 – 22 = 7 3 21 22 24 25 26 28 29 31 51 5-point: 3, 22, 25.5, 29, 51 IQR = 29 – 22 = 7 FU = 29 + (1.5)(7) = 39.5 One outside (51) Inside value = 31 FL = 22 – (1.5)(7) = 11.5 One outside (3) Inside value = 21 9/23/2018 Summary Statistics

Boxplot example 3 (metabolic.sav) 1362 1439 1460 1614 1666 1792 1867 5-point: 1362, 1449.5, 1614, 1729, 1867 (slide 30) IQR = 1729 – 1449.5 = 279.5 FU = 1729 + (1.5)(279.5) = 2148.25 None outside Upper inside = 1867 FL = 1449.5 – (1.5)(279.5) = 1030.25 Lower inside = 1362 9/23/2018 Summary Statistics

Interpretation of boxplots Location Position of median Position of box Spread Hinge-spread (box length) = IQR Whisker-to-whisker spread (range or range minus the outside values) Shape Symmetry of box Size of whiskers Outside values (potential outliers) 9/23/2018 Summary Statistics

Side-by-side boxplots Boxplots are especially useful for comparing groups: 9/23/2018 Summary Statistics