Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Slides:



Advertisements
Similar presentations
Measures of Dispersion
Advertisements

Measures of Dispersion
Chapter 3 Describing Data Using Numerical Measures
Calculating & Reporting Healthcare Statistics
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 3-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
PPA 415 – Research Methods in Public Administration Lecture 4 – Measures of Dispersion.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Chapter 5 – 1 Chapter 5: Measures of Variability The Importance of Measuring Variability The Range IQR (Inter-Quartile Range) Variance Standard Deviation.
Measures of Central Tendency
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data.
Measures of Central Tendency
Describing Data: Numerical
Working with one variable data. Spread Joaquin’s Tests Taran’s Tests: 76, 45, 83, 68, 64 67, 70, 70, 62, 62 What can you infer, justify and conclude about.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Chapter 3 – Descriptive Statistics
Measures of Central Tendency & Spread
PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses.
1 Review Descriptive Statistics –Qualitative (Graphical) –Quantitative (Graphical) –Summation Notation –Qualitative (Numerical) Central Measures (mean,
Applied Quantitative Analysis and Practices LECTURE#08 By Dr. Osman Sadiq Paracha.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Measures of Dispersion
Part II  igma Freud & Descriptive Statistics Chapter 3 Viva La Difference: Understanding Variability.
Review Measures of central tendency
Descriptive Statistics: Numerical Methods
What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data.
By: Amani Albraikan 1. 2  Synonym for variability  Often called “spread” or “scatter”  Indicator of consistency among a data set  Indicates how close.
1 Review Sections Descriptive Statistics –Qualitative (Graphical) –Quantitative (Graphical) –Summation Notation –Qualitative (Numerical) Central.
Measures of Central Tendency And Spread Understand the terms mean, median, mode, range, standard deviation.
Statistics 11 The mean The arithmetic average: The “balance point” of the distribution: X=2 -3 X=6+1 X= An error or deviation is the distance from.
Copyright © 2014 by Nelson Education Limited. 3-1 Chapter 3 Measures of Central Tendency and Dispersion.
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Measures of Central Tendency Foundations of Algebra.
Chapter 5: Measures of Variability  The Importance of Measuring Variability  IQV (Index of Qualitative Variation)  The Range  IQR (Inter-Quartile Range)
Chapter 2 Means to an End: Computing and Understanding Averages Part II  igma Freud & Descriptive Statistics.
Measures of Dispersion. Introduction Measures of central tendency are incomplete and need to be paired with measures of dispersion Measures of dispersion.
CHAPTER 3  Descriptive Statistics Measures of Central Tendency 1.
1 Descriptive statistics: Measures of dispersion Mary Christopoulou Practical Psychology 1 Lecture 3.
Numerical Measures of Variability
Summary Statistics: Measures of Location and Dispersion.
Chapter 5: Measures of Dispersion. Dispersion or variation in statistics is the degree to which the responses or values obtained from the respondents.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
The following data represent marks of three students’ groups and each group with different teacher, find the mean of each group: A: 59, 61, 62, 58, 60.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Measures of Central Tendency. Definition Measures of Central Tendency (Mean, Median, Mode)
Measures of Central Tendency: Just an Average Topic in Statistics.
Descriptive Statistics(Summary and Variability measures)
CCGPS Coordinate Algebra Unit 4: Describing Data.
1.Assemble the following tools: Graphing calculator z-tables (modules 3 - 5)z-tables Paper and pencil Reference for calculator keystrokes 2.Complete the.
Measures of Dispersion Measures of Variability
MM150 ~ Unit 9 Statistics ~ Part II. WHAT YOU WILL LEARN Mode, median, mean, and midrange Percentiles and quartiles Range and standard deviation z-scores.
Descriptive statistics
Descriptive Statistics ( )
Exploratory Data Analysis
Different Types of Data
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 4 Fundamental statistical characteristics II: Dispersion and form measurements.
Numerical Measures: Centrality and Variability
Description of Data (Summary and Variability measures)
Summary descriptive statistics: means and standard deviations:
Numerical Descriptive Measures
Descriptive Statistics
Descriptive Statistics
Quartile Measures DCOVA
Summary descriptive statistics: means and standard deviations:
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
CHAPTER 2: Basic Summary Statistics
Presentation transcript:

Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Why Can’t Everyone Be Like Me? Have you ever noticed that while many objects are very similar, they are not exactly alike? Is every quarter-pounder exactly a quarter-pound? Why do two pairs of pants the same size fit slightly differently? How much do adults differ in the amount of sleep they need? All the above are concerned with variability.

Measure of variability - a value which indicates the degree to which a set of scores is clustered or scattered around a measure of central tendency. What measures of variability do not do is: 1.specify how far a particular score diverges from the mean 2.provide information about the level of performance of a set of scores 3.describe the shape of a distribution

We will examine four measures of variability: 1.range 2.interquartile range (and semiinterquartile range) 3.standard deviation 4.index of dispersion

Range The range is the difference between the upper-exact limit of the highest score and lower-exact limit of the lowest score. In the data below, 28 is the highest score and 12 is the lowest. Therefore, the range is = 17. Score f n = 40

Advantages of the Range The range: – is easy to calculate – is easily understood by general audiences – can provide a very quick and dirty idea of dispersion

Disadvantages of the Range The range: - does not tell us about the scores between the end points range =

- a single extreme score can grossly distort the degree of variability - in general, the larger the sample size, the larger the range - the range is a terminal statistic 10 30

Interquartile Range The interquartile range is the difference between the 1st and 3rd quartiles. 25% Q1Q1 Q2Q2 Q3Q3 You can see from the diagram that Q 2 is actually the median.

Another way to think of it is that Q 2 is the same as the centile with a rank of 50 (i.e., the score below which there are 50% of the cases). In the same way, Q 1 and Q 3 are the centiles with ranks of 25 and 75, respectfully. That is, they are the scores below which there are 25% and below which there are 75% of the cases. Once the centiles are calculated, you simply calculate the difference: Interquartile range = Q 3 - Q 1

Score f cum f n = 40 Consider the following data: 40 x.75 = 30; [(1/5) x 3] Q 3 ( C 75 ) = x.5 = 20; [(5/14) x 3] Q 2 ( C 50 ) = (Unnecessary for calculation of interquartile range) 40 x.25 = 10; [(7/12) x 3] Q 1 ( C 25 ) = Interquartile range = Q 3 - Q 1 = = 4.85

Advantages of the Interquartile Range –is not sensitive to extreme scores –is the only reasonable measure of variability with open-ended distributions –should be used with highly skewed distributions The interquartile range: Q 1 Q 2 Q 3 25%

Disvantages of the Interquartile Range –is a terminal statistic –is unfamiliar to most people The interquartile range:

A related measure is the semiinterquartile range. It is half the distance between the first and third quartiles: Q 3 - Q 1 2 Semiinterquartile range =

A Short Tangent Below are several people standing near a tree. 10ft. 7ft. 0ft. 6ft. 9ft. If we wanted to find out, on average, how far the people were from the tree, we could simply add the distances and divide by the number of people: = 6.4ft.

Standard Deviation Now consider the following data: Score f n = 40 X = 18.85

You can see that some scores are closer to the mean than are others. Score f n = 40 X = We can determine the distance a score is from the mean by calculating a deviation score which indicates how far a score is above or below the mean.

Deviation Score: A Brief Review x = X - X tells us the position of X relative to X. For example, a score of 24 would have a deviation score of 5.15: x = = That is, it is 5.15 points above the mean. A score of 16, in contrast, would have a deviation score of -2.85: x = = That is, it is 2.85 points below the mean

xx x x x xx x x x x x x x x x x Using deviation scores, we could find out how far away each score is from the mean. x = X-X If we wanted to find the average of those distances, we could add them all and divide by the number of scores. Unfortunately, since the mean is the balance point,  x = 0.

What we can do, however, is take the absolute value of each deviation score and find the mean of them: x = |X-X|  = = 3.56 Mean distance =

Standard Deviation …the “average” distance a set of scores is from the mean. DO NOT FORGET THIS!

Well, Not Exactly… n  X - X) S = 2 √ The definition just given, while an excellent way of understanding and interpreting “standard deviation,” is not technically correct (but it is a mean): Calculation formula:  X 2 - (  X) 2 n S = √ n x2x2 n = √ XX n X = (just a reminder)

Advantages of the Standard Deviation –is quite resistant to sampling variability –is mathematically tractable The standard deviation:

Disadvantages of the Standard Deviation –is not a good index of variability with a few very extreme scores –should not be used with highly skewed distributions –cannot be used with open-ended distributions The standard deviation:

Coefficient of Variation Consider the following: X 1 = 9.00, S 1 = 3.00 X 2 = 90.00, S 2 = 3.00 Note the dispersion of S 1 around X 1 appears considerably greater than that of S 2 around X 2.

Coefficient of Variation If two means are very different, we may consider a relative measure of dispersion: CV = 100 SXSX ( ) In our example: CV 1 = 100 CV 2 = = ( ) = 3.33 ( ) The larger CV, the larger the dispersion relative to the mean.

Coefficient of Variation The coefficient of variation is also useful when comparing the standard deviations of two variables with different units of measure (e.g., SAT scores vs. age).

Index of Dispersion (D) When you have a qualitative variable, the index of dispersion is available as a measure of variability. It is defined as the ratio between distinguishable pairs (DP) and the number of distinguishable pairs under the condition of maximum dispersion (DP max ): D = DP DP max

a1a1 a2a2 b1b1 b2b2 b3b3 b4b4 Category A Category B Political Affiliation Consider the following data of a survey asking individuals their political affiliation: Eight pairs of observations can be distinguished: a 1 b 1 a 1 b 2 a 1 b 3 a 1 b 4 a 2 b 1 a 2 b 2 a 2 b 3 a 2 b 4 Cannot distinguish between this pair (b 2 b 4 ) Can distinguish between this pair (a 2 b 3 )

Nine pairs of observations can be distinguished under the condition of maximum dispersion: a 1 b 1 a 1 b 2 a 1 b 3 a 2 b 1 a 2 b 2 a 2 b 3 a 3 b 1 a 3 b 2 a 3 b 3 a1a1 a2a2 b1b1 b2b2 b3b3 a2a2 Category A Category B Political Affiliation The diagram below illustrates the “condition of maximum dispersion” (i.e., if the observations were equally spread across the available categories):

D = DP DP max = = D can range between 0-1. “0” if all observations are in one category and none in any others “1” if all observations are equally divided between categories Should interpret D as the percent of Dp max Useful when comparing two distributions of equal number of categories Index of Dispersion

Computational Formula for D ( n 2 j ) c j=1  n 2 (c- 1 ) where: n = number of observations c = number of categories n j = number of observations in category j c n 2 - D =