Descriptive Measures MARE 250 Dr. Jason Turner.

Slides:



Advertisements
Similar presentations
Chapter 3, Numerical Descriptive Measures
Advertisements

Describing Quantitative Variables
DESCRIBING DISTRIBUTION NUMERICALLY
Chapter 2 Exploring Data with Graphs and Numerical Summaries
Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Measures of Dispersion
Numerically Summarizing Data
Descriptive Statistics
Lecture 4 Chapter 2. Numerical descriptors
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
Measures of Central Tendency MARE 250 Dr. Jason Turner.
Chapter 3: Descriptive Measures STP 226: Elements of Statistics Jenifer Boshes Arizona State University.
Measures of Central Tendency MARE 250 Dr. Jason Turner.
Measures of Central Tendency
1 Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Range Standard Deviation Interquartile Range (IQR)
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
CHAPTER 2: Describing Distributions with Numbers ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Objectives 1.2 Describing distributions with numbers
Numerical Descriptive Techniques
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
Methods for Describing Sets of Data
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Review Measures of central tendency
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data.
Describing distributions with numbers
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Measures of Center.
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Measures of Dispersion How far the data is spread out.
Categorical vs. Quantitative…
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
INVESTIGATION 1.
LECTURE CENTRAL TENDENCIES & DISPERSION POSTGRADUATE METHODOLOGY COURSE.
Chapter 3 Looking at Data: Distributions Chapter Three
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
1 Measures of Center. 2 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely.
BPS - 5th Ed. Chapter 21 Describing Distributions with Numbers.
Chapter 3 Averages and Variation Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Honors Statistics Chapter 3 Measures of Variation.
Describe Quantitative Data with Numbers. Mean The most common measure of center is the ordinary arithmetic average, or mean.
Averages and Variation
Midrange (rarely used)
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
Summary Statistics 9/23/2018 Summary Statistics
CHAPTER 1 Exploring Data
Numerical Descriptive Measures
Descriptive Statistics
DAY 3 Sections 1.2 and 1.3.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
Honors Statistics Review Chapters 4 - 5
Basic Practice of Statistics - 3rd Edition
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Presentation transcript:

Descriptive Measures MARE 250 Dr. Jason Turner

Descriptive Measures Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used to summarize raw data

Descriptive Measures Measures of Center Measures of Variation – how data are distributed around center 5-number summary – used to construct visual representation - Boxplot

Measures of Center Measure of Central Tendency – indicate where center or most typical value of data set lie Mean, Median, Mode

Measures of Center Mean – of a dataset is the sum of the observations divided by the number of observations; Arithmetic Average 10,20,30,40,50,60,70,80,90,100 = 550 550 / 10 = 55

Measures of Center Median – the number that divides the bottom 50% of the data from the top 50% 1) Arrange data in increasing order 2) If number of observations is ODD, the median is the observation exactly in the middle 3) If the number of observations is EVEN, median is the mean of the middle two observations

Measures of Center Median = (n+1)/2 10,20,30,40,50,60,70,80,90,100, 110 (ODD); Median = 60 10,20,30,40,50,60,70,80,90,100 (EVEN); Median = 50+60/2 = 55

Measures of Center Mode – frequency of each value in the data set If no value occurs more than once – No Mode; 10,20,30,40,50,60,70,80,90,100 Otherwise – any value with greatest frequency is Mode; 10,20,30,40,50,50, 60,70,80,90,100…Mode is 50

Measures of Center The mode is useful if the distribution is skewed or bimodal (having two very pronounced values around which data are concentrated) 30 Number of Individuals 10 20

You are so totally skewed! The mean is sensitive to extreme (very large or small) observations and the median is not Therefore – you can determine how skewed your data is by looking at the relationship between median and mean Mean is Greater than the Median Mean and Median are Equal Mean is Less Than the Median

Resistance Measures A resistance measure is not sensitive to the influences of a few extreme observations Median – resistant measure of center Mean – not resistant Outliers DO NOT affect Median Outliers DO affect Mean

Resistance Measures Resistance of Mean can be improved by using – Trimmed Means – a specified percentage of the smallest and largest observations are removed before computing the mean Will do something like this later when exploring the data and evaluating outliers…(their effects upon the mean)

Measures of Variation Measures of Variation (Spread) – amount of variability in the data set Range, Standard Deviation, Variance Range = Maximum Observation – Minimum Observation 10,20,30,40,50,60,70,80,90,100; Range = 100-10 = 90

Measures of Variation Standard Deviation - (±SD) measures the variation by indicating how far (on average) the observations are from the mean Large Dev. – far From mean Small Dev. – Close to mean

Measures of Variation Variance - (measure used by statistical formulas) square of the standard deviation “Equal Variance” is one of the assumptions of parametric means testing…(we will learn this later)

Measures of Variation Three Standard Deviations Rule – almost all observations in any data set lie within three standard deviations to either side of the mean; “almost all” defined in 2-ways by stats nerds…

Measures of Variation Three Standard Deviations Rule – Chebychev’s Rule – 89% of data within 3 Standard Deviations Empirical Rule – 99.7% of observations are within 3 Standard deviations; if data are approximately bell-shaped

5 Number Summary Percentiles – data set is divided into hundredths (100 equal parts) Why?..Percentiles are not sensitive to the influence of a few extreme observations (outliers)

5 Number Summary Quartiles – data set is divided into quarters (4 equal parts); most typically used Data set has 3 Quartiles: Q1, Q2, Q3 Q1 – is the number that divides the bottom 25% from top 75% Q2 – is the median; bottom 50% from top 50% Q3 – is the number that divides the bottom 75% from top 25%

5 Number Summary Quartiles – data set is divided into quarters (4 equal parts); most typically used

5 Number Summary Interquartile Range (IQR) – the difference between the first and third quartiles IQR = Q3 – Q1 The IQR gives you the range of the middle 50% of the data

Outlier, Outlier Outliers – observations that fall well outside the overall pattern of the data Requires special attention May be the result of: Measurement or Recording Error Observation from a different population Unusual Extreme observation

Pants on Fire! Must deal with outliers: (Yes, really!) If error – can delete; otherwise judgment call Can use quartiles and IQR to identify potential outliers

The Outer Limits Lower and Upper Limits: Lower limit – is the number that lies 1.5 IQR’s below the first quartile Lower Limit = Q1 - 1.5 * IQR Upper limit – is the number that lies 1.5 IQR’s above the first quartile Upper Limit = Q3 + 1.5 * IQR

OUTLIER! The Outer Limits If a value is outside the “Outer Limits” of a dataset it is an… OUTLIER!

5 Number Summary 5-Number Summary: Min, Q1, Q2, Q3, Max Written in increasing order Provides information on Center and Variation Are used to construct Box-Plots

Boxplot Boxplot (Box-and-Whisker-Design): based on the 5-number summary provide graphic display of the center and variation Q1 Q2 Q3 Min Max 70

Note that Min & Max are determine after outliers are removed! Boxplot Modified Boxplot – includes outliers Potential Outlier 70 * Note that Min & Max are determine after outliers are removed!

Boxplot

Boxplot Boxplots summarize information about the shape, dispersion, and center of your data They can also help you spot outliers

Boxplot Left edge of the box represents the first quartile (Q1), while the right edge represents the third quartile (Q3) Box portion of the plot represents the interquartile range (IQR) - middle 50% of data Q1 Q2 Q3 Upper Limit Lower Limit 70

Boxplot The line drawn through the box represents the median of the data The lines extending from the box are called whiskers The whiskers extend outward to indicate the Upper and Lower limits in the data set (excluding outliers)

Boxplot Extreme values, or outliers, are represented by dots A value is considered an outlier if it is outside of the box (greater than Q3 or less than Q1) by more than 1.5 times the IQR 70 * Potential Outlier

Boxplot Use the boxplot to assess the symmetry of the data: If the data are fairly symmetric, the median line will be roughly in the middle of the IQR box and the whiskers will be similar in length 70

Boxplot Use the boxplot to assess the symmetry of the data: If the data are skewed, the median may not fall in the middle of the IQR box, and one whisker will likely be noticeably longer than the other 70