Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Similar presentations


Presentation on theme: "Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”"— Presentation transcript:

1

2 Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

3 Why Can’t Everyone Be Like Me? Have you ever noticed that while many objects are very similar, they are not exactly alike? Is every quarter-pounder exactly a quarter-pound? Why do two pairs of pants the same size fit slightly differently? How much do adults differ in the amount of sleep they need? All the above are concerned with variability.

4 Measure of variability - a value which indicates the degree to which a set of scores is clustered or scattered around a measure of central tendency. What measures of variability do not do is: 1.specify how far a particular score diverges from the mean 2.provide information about the level of performance of a set of scores 3.describe the shape of a distribution

5 We will examine four measures of variability: 1.range 2.interquartile range (and semiinterquartile range) 3.standard deviation 4.index of dispersion

6 Range The range is the difference between the upper-exact limit of the highest score and lower-exact limit of the lowest score. In the data below, 28 is the highest score and 12 is the lowest. Therefore, the range is 28.5-11.5 = 17. Score f 28 1 19 2 27 0 18 5 26 1 17 7 25 2 16 2 24 2 15 5 23 2 14 0 22 1 13 2 21 2 12 1 20 5 n = 40

7 Advantages of the Range The range: – is easy to calculate – is easily understood by general audiences – can provide a very quick and dirty idea of dispersion

8 Disadvantages of the Range The range: - does not tell us about the scores between the end points range = 20 10 30

9 - a single extreme score can grossly distort the degree of variability - in general, the larger the sample size, the larger the range - the range is a terminal statistic 10 30

10 Interquartile Range The interquartile range is the difference between the 1st and 3rd quartiles. 25% Q1Q1 Q2Q2 Q3Q3 You can see from the diagram that Q 2 is actually the median.

11 Another way to think of it is that Q 2 is the same as the centile with a rank of 50 (i.e., the score below which there are 50% of the cases). In the same way, Q 1 and Q 3 are the centiles with ranks of 25 and 75, respectfully. That is, they are the scores below which there are 25% and below which there are 75% of the cases. Once the centiles are calculated, you simply calculate the difference: Interquartile range = Q 3 - Q 1

12 Score f cum f 27 - 29 1 40 24 - 26 5 39 21 - 23 5 34 18 - 20 14 29 15 - 17 12 15 12 - 14 3 3 n = 40 Consider the following data: 40 x.75 = 30; [(1/5) x 3] + 20.5 Q 3 ( C 75 ) = 21.10 40 x.5 = 20; [(5/14) x 3] + 17.5 Q 2 ( C 50 ) = 18.57 (Unnecessary for calculation of interquartile range) 40 x.25 = 10; [(7/12) x 3] + 14.5 Q 1 ( C 25 ) = 16.25 Interquartile range = Q 3 - Q 1 = 21.10 - 16.25 = 4.85

13 Advantages of the Interquartile Range –is not sensitive to extreme scores –is the only reasonable measure of variability with open-ended distributions –should be used with highly skewed distributions The interquartile range: Q 1 Q 2 Q 3 25%

14 Disvantages of the Interquartile Range –is a terminal statistic –is unfamiliar to most people The interquartile range:

15 A related measure is the semiinterquartile range. It is half the distance between the first and third quartiles: Q 3 - Q 1 2 Semiinterquartile range =

16 A Short Tangent Below are several people standing near a tree. 10ft. 7ft. 0ft. 6ft. 9ft. If we wanted to find out, on average, how far the people were from the tree, we could simply add the distances and divide by the number of people: 10 + 7 + 0 + 6 + 9 5 = 6.4ft.

17 Standard Deviation Now consider the following data: Score f 28 1 19 2 27 0 18 5 26 1 17 7 25 2 16 2 24 2 15 5 23 2 14 0 22 1 13 2 21 2 12 1 20 5 n = 40 X = 18.85

18 You can see that some scores are closer to the mean than are others. Score f 28 1 19 2 27 0 18 5 26 1 17 7 25 2 16 2 24 2 15 5 23 2 14 0 22 1 13 2 21 2 12 1 20 5 n = 40 X = 18.85 We can determine the distance a score is from the mean by calculating a deviation score which indicates how far a score is above or below the mean.

19 Deviation Score: A Brief Review x = X - X tells us the position of X relative to X. For example, a score of 24 would have a deviation score of 5.15: x = 24 - 18.85 = 5.15. That is, it is 5.15 points above the mean. A score of 16, in contrast, would have a deviation score of -2.85: x = 16 - 18.85 = -2.85. That is, it is 2.85 points below the mean. 16 18.85 24 -2.85 5.15

20 xx x x x xx x x x x x x x x x x 18.85 Using deviation scores, we could find out how far away each score is from the mean. x = X-X - 5.82 3.71 -2.62 2.75. 2.64 -2.83 -2.02 2.33 If we wanted to find the average of those distances, we could add them all and divide by the number of scores. Unfortunately, since the mean is the balance point,  x = 0.

21 What we can do, however, is take the absolute value of each deviation score and find the mean of them: x = |X-X| 5.82 3.71 2.62 2.75. 2.64 2.83 2.02 2.33  = 142.56 142.56 40 = 3.56 Mean distance =

22 Standard Deviation …the “average” distance a set of scores is from the mean. DO NOT FORGET THIS!

23 Well, Not Exactly… n  X - X) S = 2 √ The definition just given, while an excellent way of understanding and interpreting “standard deviation,” is not technically correct (but it is a mean): Calculation formula:  X 2 - (  X) 2 n S = √ n x2x2 n = √ XX n X = (just a reminder)

24 Advantages of the Standard Deviation –is quite resistant to sampling variability –is mathematically tractable The standard deviation:

25 Disadvantages of the Standard Deviation –is not a good index of variability with a few very extreme scores –should not be used with highly skewed distributions –cannot be used with open-ended distributions The standard deviation:

26 Coefficient of Variation Consider the following: X 1 = 9.00, S 1 = 3.00 X 2 = 90.00, S 2 = 3.00 Note the dispersion of S 1 around X 1 appears considerably greater than that of S 2 around X 2.

27 Coefficient of Variation If two means are very different, we may consider a relative measure of dispersion: CV = 100 SXSX ( ) In our example: CV 1 = 100 CV 2 = 100 3.00 9.00 = 33.33 ( ) 3.00 90.00 = 3.33 ( ) The larger CV, the larger the dispersion relative to the mean.

28 Coefficient of Variation The coefficient of variation is also useful when comparing the standard deviations of two variables with different units of measure (e.g., SAT scores vs. age).

29 Index of Dispersion (D) When you have a qualitative variable, the index of dispersion is available as a measure of variability. It is defined as the ratio between distinguishable pairs (DP) and the number of distinguishable pairs under the condition of maximum dispersion (DP max ): D = DP DP max

30 a1a1 a2a2 b1b1 b2b2 b3b3 b4b4 Category A Category B Political Affiliation Consider the following data of a survey asking individuals their political affiliation: Eight pairs of observations can be distinguished: a 1 b 1 a 1 b 2 a 1 b 3 a 1 b 4 a 2 b 1 a 2 b 2 a 2 b 3 a 2 b 4 Cannot distinguish between this pair (b 2 b 4 ) Can distinguish between this pair (a 2 b 3 )

31 Nine pairs of observations can be distinguished under the condition of maximum dispersion: a 1 b 1 a 1 b 2 a 1 b 3 a 2 b 1 a 2 b 2 a 2 b 3 a 3 b 1 a 3 b 2 a 3 b 3 a1a1 a2a2 b1b1 b2b2 b3b3 a2a2 Category A Category B Political Affiliation The diagram below illustrates the “condition of maximum dispersion” (i.e., if the observations were equally spread across the available categories):

32 D = DP DP max = =.89 8989 D can range between 0-1. “0” if all observations are in one category and none in any others “1” if all observations are equally divided between categories Should interpret D as the percent of Dp max Useful when comparing two distributions of equal number of categories Index of Dispersion

33 Computational Formula for D ( n 2 j ) c j=1  n 2 (c- 1 ) where: n = number of observations c = number of categories n j = number of observations in category j c n 2 - D =


Download ppt "Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”"

Similar presentations


Ads by Google