Summary statistics Using a single value to summarize some characteristic of a dataset. For example, the arithmetic mean (or average) is a summary statistic.

Slides:



Advertisements
Similar presentations
Chapter 3, Numerical Descriptive Measures
Advertisements

Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Descriptive Measures MARE 250 Dr. Jason Turner.
Measures of Variation Sample range Sample variance Sample standard deviation Sample interquartile range.
Measures of Dispersion
Descriptive Statistics
Measures of Dispersion or Measures of Variability
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Biostatistics Unit 2 Descriptive Biostatistics 1.
Slides by JOHN LOUCKS St. Edward’s University.
Edpsy 511 Homework 1: Due 2/6.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Coefficient of Variation
Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Standard Deviation Interquartile Range (IQR)
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 3 Describing Data Using Numerical Measures.
Variability Ibrahim Altubasi, PT, PhD The University of Jordan.
Chapter 2 Describing distributions with numbers. Chapter Outline 1. Measuring center: the mean 2. Measuring center: the median 3. Comparing the mean and.
Describing Data: Numerical
LECTURE 12 Tuesday, 6 October STA291 Fall Five-Number Summary (Review) 2 Maximum, Upper Quartile, Median, Lower Quartile, Minimum Statistical Software.
Objectives 1.2 Describing distributions with numbers
Why statisticians were created Measure of dispersion FETP India.
1 1 Slide Descriptive Statistics: Numerical Measures Location and Variability Chapter 3 BA 201.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Descriptive Statistics, The Normal Distribution, and Standardization.
KNR 445 Statistics t-tests Slide 1 Variability Measures of dispersion or spread 1.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.
Descriptive Statistics1 LSSG Green Belt Training Descriptive Statistics.
Lecture 3 Describing Data Using Numerical Measures.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Measures of Dispersion How far the data is spread out.
INVESTIGATION 1.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Chapter 5 Measures of Variability. 2 Measures of Variability Major Points The general problem The general problem Range and related statistics Range and.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Business Statistics Spring 2005 Summarizing and Describing Numerical Data.
Numerical Measures of Variability
Introduction to Statistics Santosh Kumar Director (iCISA)
Chapter 3, Part A Descriptive Statistics: Numerical Measures n Measures of Location n Measures of Variability.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Chapter 5 Describing Distributions Numerically.
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
CHAPTER 2: Basic Summary Statistics
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Descriptive Statistics(Summary and Variability measures)
IPS Chapter 1 © 2012 W.H. Freeman and Company  1.1: Displaying distributions with graphs  1.2: Describing distributions with numbers  1.3: Density Curves.
© 2012 W.H. Freeman and Company Lecture 2 – Aug 29.
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Describe Quantitative Data with Numbers. Mean The most common measure of center is the ordinary arithmetic average, or mean.
Chapter 3 Describing Data Using Numerical Measures
Measures of dispersion
Description of Data (Summary and Variability measures)
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Descriptive Statistics
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Basic Practice of Statistics - 3rd Edition
Basic Practice of Statistics - 3rd Edition
MCC6.SP.5c, MCC9-12.S.ID.1, MCC9-12.S.1D.2 and MCC9-12.S.ID.3
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
CHAPTER 2: Basic Summary Statistics
The Five-Number Summary
Basic Practice of Statistics - 3rd Edition
Presentation transcript:

Summary statistics Using a single value to summarize some characteristic of a dataset. For example, the arithmetic mean (or average) is a summary statistic because it gives the average value of a dataset such as average blood pressure readings

4.1 Indices of Central Tendency (or location) (Arithmetic) Mean: average of a set of values Blood Pressure Readings X i 95X 1 98X 2 101X 3 87X 4 105X Sum Arithmetic Mean = = 486 / 5 = 97.2 mm Hg

4.2 Robust Measure of Location Mean is very sensitive (not robust) to extreme values Blood Pressure Readings X i 87X 1 95X 2 98X 3 101X X Mean = 97.2Decimal overlooked, Mean = 286.2

Robust measure of location The median ( the middle value of an ordered data set ) is less sensitive (robust) to extreme values in the data Blood Pressure Readings X i 87X 1 95X 2 98X 3 101X X 5 median value = is unchanged Trimmed mean (e.g. 10% trimmed mean is the average after deleting 10% of the data at both ends) is also less affected by extreme values

Intervals between failures of an air conditioner (in operating hours) 413, 14, 58, 37, 100, 65, 9, 169, 447, 184, 36, 201, 118, 34, 31, 18, 19, 67, 57, 62, 7, 22, 34, 90, 10 Mean = ?8% trimmed mean = ? Median = ?

Ordered values 7, 9, 10, 14, 18, 19, 22, 31, 34, 34, 36, 37, 57, 58, 62, 65, 67, 90, 100, 118, 169, 184, 201, 413, 447 Measures of location Sample size = 25 mean = 2302/25 = 92.1 hrs 8% of 25 = 2,leave out 2 obs at both ends 8% trimmed mean = 1426/21 = 67.9 hrs median = 13 th ordered value = 57 < 67.9 <92.1 hrs

Desirable properties of the median Not sensitive to extreme values in data More suitable for describing skewed distributions (e.g., median income vs average income) The relative positions of the data points are unchanged when log-transformed. As a result, the median of the log-transformed data is just the log of the median of the original data Not so for the mean, the mean of logX is not obtainable from the mean of X 87 < 95 < 98 < 101 < 105Med = 98

Relative positions of median and mean for skewed distributions Positively-skewed or skewed to the right (where the longer tail is) Mean > Median Negatively-skewed or skewed to the left (where the longer tail is) Mean < Median

When to use mean or median: Use both by all means. Mean performs best when we have a normal or symmetric distribution with thin tails. If skewed or when we want to limit the influence of outliers, use the median.

Indices of Dispersion or Spread Range: difference between the largest and the smallest value Problem: does not consider how values in between are scattered. In the following, for both sets of data, the numbers of observations, means, medians and ranges are all equal. Which one has more scatter? 10, 12, 13, 14, 15, 16, 17, 18, 20 10, 15, 15, 15, 15, 15, 15, 15, 20 datasets with same range but different scatter of values range

Indices of Dispersion A good index of dispersion should be one that summarises the dispersion of individual values from some central value like the mean X X X X X X mean

Indices of Dispersion Problem with averaging deviations of individual values from the mean is that it is always = = = = = where 97.2 is the mean of values 87, 95, 98, 101, 105 average of deviations of individual values from the mean

Indices of Dispersion Usual approach: consider square deviations from the mean and take their average sum of squares of deviations from the mean = = = = =

Variance calculation from a sample: customary to divide by n-1 (default option in most software) rather than by n = / 4 = 46.2 effective sample size - also called degrees of freedom

Variance of a sample Can be shown mathematically:

Why subtract 1 ? Results in a better estimator of the population variance Acknowledge the fact that the population mean is unknown and has to be estimated by the sample mean (effective sample size decreased by 1 for every parameter estimated) No need to subtract 1 if we calculate variance using deviations from the population mean

Variance of a sample Problem with variance is its awkward unit of measurement as values have been squared Problem overcome by taking square root of variance - revert back to original unit of measurement Square root of the variance gives the standard deviation

Sample Standard Deviation The Sample Standard Deviation (S or SD)

4.4 Robust Measure of Dispersion Variance is defined as the mean of the squared deviations and as such is even more nonrobust to extreme values than the mean (an extreme deviation becomes even more extreme after squaring) A robust measure of dispersion is IQR/1.35 whereIQR = 3 rd quartile – 1 st quartile = Inter-quartile range The reason for dividing IRQ by 1.35 is to make it compatible with the standard deviation when the underlying distribution is normal

Intervals between failures of an air conditioner (in operating hours) 413, 14, 58, 37, 100, 65, 9, 169, 447, 184, 36, 201, 118, 34, 31, 18, 19, 67, 57, 62, 7, 22, 34, 90, 10 Mean = ?8% trimmed mean = ? Median = ? SD=?IQR/1.35 = ?

Ordered values 7, 9, 10, 14, 18, 19, 22, 31, 34, 34, 36, 37, 57, 58, 62, 65, 67, 90, 100, 118, 169, 184, 201, 413, 447 Measures of location Sample size = 25 mean = 2302/25 = 92.1 hrs 8% of 25 = 2,leave out 2 obs at both ends 8% trimmed mean = 1426/21 = 67.9 hrs median = 13 th ordered value = 57 < 67.9 <92.1 hrs Measures of dispersionSD = hrs 1 st quartile = 7 th ordered value = 22 hrs 3 rd quartile = 19 th ordered value = 100 hrs IQR/1.35 = 78/1.35 = 57.8 hrs

5-Number Summary of a data set Min, 1 st quartile Median 3 rd quartile, Max Represent graphically by a box plot