Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measures of Dispersion

Similar presentations


Presentation on theme: "Measures of Dispersion"— Presentation transcript:

1 Measures of Dispersion
Dr Anshul Singh Thapa

2 INTRODUCTION We have studied how to sum up the data into a single representative value. However, that value does not reveal the variability present in the data. Now we will study those measures, which seek to quantify variability of the data.

3 Sr. No Family I Family II Family III 1 12,000 7,000 2 14,000 10,000 3 16,000 8,000 4 18,000 17,000 5 20,000 50,000 6 22,000 Total income 60,000 90,000 75,000 Average income 15,000 It is quite obvious that averages try to tell only one aspect of a distribution i.e. a representative size of the values. To understand it better, you need to know the spread of values also.

4 We can see that in Family I, differences in incomes are comparatively lower. In Family II, differences are higher and in Family III, the differences are the highest. Knowledge of only average is insufficient. If we have another value which reflects the quantum of variation in values, our understanding of a distribution improves considerably. Dispersion is the extent to which values in a distribution differ from the average of the distribution.

5 To quantify the extent of the variation, there are certain measures namely:
Range Quartile Deviation Mean Deviation Standard Deviation Apart from these measures which give a numerical value, there is a graphic method for estimating dispersion. Range and Quartile Deviation measure the dispersion by calculating the spread within which the values lie. Mean Deviation and Standard Deviation calculate the extent to which the values differ from the average.

6 MEASURES BASED UPON SPREAD OF VALUES
Range Range (R) is the difference between the largest (L) and the smallest value (S) in a distribution. Thus, R = L – S. Higher value of Range implies higher dispersion and vice-versa. Range is unduly affected by extreme values. It is not based on all the values. As long as the minimum and maximum values remain unaltered, any change in other values does not affect range. It cannot be calculated for open-ended frequency distribution. Notwithstanding some limitations, Range is understood and used frequently because of its simplicity. For example, we see the maximum and minimum temperatures of different cities almost daily on our TV screens and form judgments about the temperature variations in them.

7 Quartiles Quartiles are the measures which divide the data into four equal parts, each portion contains equal number of observations. There are three quartiles. The first Quartile (denoted by Q1) or lower quartile has 25% of the items of the distribution below it and 75% of the items are greater than it. The second Quartile (denoted by Q2) or median has 50% of items below it and 50% of the observations above it. The third Quartile (denoted by Q3) or upper Quartile has 75% of the items of the distribution below it and 25% of the items above it. Thus, Q1 and Q3 denote the two limits within which central 50% of the data lies.

8 Quartile Deviation The presence of even one extremely high or low value in a distribution can reduce the utility of range as a measure of dispersion. Thus, you may need a measure which is not unduly affected by the outliers. In such a situation, if the entire data is divided into four equal parts, each containing 25% of the values, we get the values of Quartiles and Median. The upper and lower quartiles (Q3 and Q1, respectively) are used to calculate Inter Quartile Range which is Q3 – Q1. Inter-Quartile Range is based upon middle 50% of the values in a distribution and is, therefore, not affected by extreme values. Half of the Inter-Quartile Range is called Quartile Deviation (Q.D.). Thus: Q .D . = Q3 – Q1 2 Q.D. is therefore also called Semi-Inter Quartile Range

9

10 Calculation of Range and Q.D. for ungrouped data
Calculate Range and Q.D. of the following observations: 20, 25, 29, 30, 35, 39, 41, 48, 51, 60 and 70 Range is clearly 70 – 20 = 50 For Q.D., we need to calculate values of Q3 and Q1. Q1 is the size of n + 1 th value 4 n being 11, Q1 is the size of 3rd value. As the values are already arranged in ascending order, it can be seen that Q1, the 3rd value is 29. Similarly, Q3 is size of 3(n + 1) th value ; i.e. 9th value. Hence Q3 = 51 Q .D. = Q3 - Q1= 51 – 29 = 11 Q.D. is the average difference of the Quartiles from the median.

11 Example Monthly wages:
120, 150, 170, 180, 181, 187, 190, 192, 200, 210 Q1 = Size of (N + 1)th item = size of (10 + 1)th item = Size of 2.75th item = Size of 2nd item + ¾ (size of 3rd item – size of 2nd item) = ¾ ( ) Q1 = 165 Q3 = Size of 3(N + 1)th item = size of 3(10 + 1)th item = Size of 8.25th item = Size of 8th item + ¼ (size of 9th item – size of 8th item) = ¼ ( ) Q3 = 194 QD = Q3 – Q1 = 194 – 165 = 29 = 14.5

12 Calculation of Range for a frequency distribution
Discrete Series Calculate the Range of the following series: Size: 10 11 12 13 14 15 16 18 Frequency 1 24 20 Here, H = 18 and L = 10 Range (R) = H – L = 18 – 10 = 8 (Ans)

13 Calculation of Range for a frequency distribution
Continuous Series: Find out Range of the following Series: Size: 5 – 10 10 – 15 15 – 20 20 – 25 25 – 30 Frequencies: 4 9 15 30 40 Size Mid- value Frequency 5 – 10 7.5 4 10 – 15 12.5 9 15 – 20 17.5 15 20 – 25 22.5 30 25 – 30 27.5 40 Range = H – L = 27.5 – 7.5 = 20 (Ans)

14 Calculation of Q.D. for grouped data
The following data shows daily wages of 199 workers of a factory. Find out the Q.D.: Wages 10 20 30 40 50 60 70 80 90 100 No. of Workers 2 8 35 42 28 26 16

15 Q1 = Size of N + 1 th item = 199 + 1 = 200 = 50th item 4 4 4
Wages Frequency Cumulative Frequency 10 2 20 8 30 40 35 65 50 42 107 60 127 70 28 155 80 26 181 90 16 197 100 199 N = 199 Q1 = Size of N + 1 th item = = 200 = 50th item Q3 = Size of 3( N + 1 )th item = 3( )= 600 = 150th item QD = Q3 – Q1 = 70 – 40 = 30 = 15

16 Calculation of Q.D. for a frequency distribution
Continuous Series: Find out Quartile Deviation of the following Series Age (Yrs.) 0 – 20 20 – 40 40 – 60 60 – 80 Person 4 10 15 20 11

17 Q1 = Size of N/4 th item = 60/4 = 15th item
Age (Years) No. of Person Cumulative Frequency 0 – 20 4 20 – 40 10 14 40 – 60 15 29 60 – 80 20 49 80 – 100 11 60 Q1 = Size of N/4 th item = 60/4 = 15th item 15th item lies in group 40 – 60 and falls within 29th cumulative frequency of the series. Q1 = l1 + N/4 – c.f. x i f (l1 = lower limit of the class interval, N = sum total of the frequencies, c.f. = cumulative frequency of the class preceding the first quartile class, f = frequency of the quartile class, i = class interval) Thus, Q1 = – 14 x 20 = 41.33 15

18 Q3 = Size of 3(N/4)th item = 3(60/4) = 45th item
45th item lies in group 60 – 80 and falls within 49th cumulative frequency of the series. Q3 = l1 + 3(N/4) – c.f. x i f (l1 = lower limit of the class interval, N = sum total of the frequencies, c.f. = cumulative frequency of the class preceding the third quartile class, f = frequency of the quartile class, i = class interval) Thus, Q3 = – 29 x 20 = 76 15 Having known the values of Q1 and Q3, Quartile Deviation QD is found as; QD = Q3 – Q1 = 76 – = 17.34

19 MEASURES OF DISPERSION FROM AVERAGE
Recall that dispersion was defined as the extent to which values differ from their average. Range and Quartile Deviation are not useful in measuring, how far the values are, from their average. Yet, by calculating the spread of values, they do give a good idea about the dispersion. Two measures which are based upon deviation of the values from their average are Mean Deviation and Standard Deviation. Since the average is a central value, some deviations are positive and some are negative. If these are added as they are, the sum will not reveal anything. In fact, the sum of deviations from Arithmetic Mean is always zero. Mean Deviation tries to overcome this problem by ignoring the signs of deviations, i.e., it considers all deviations positive. For standard deviation, the deviations are first squared and averaged and then square root of the average is found. We shall now discuss them separately in detail.

20 Mean Deviation Mean Deviation is the arithmetic average of the deviations of all the values taken from some average value (mean, median, mode) of the series, ignoring signs (+ or -) of the deviations. Clark and Schakde “Mean deviation is the arithmetic average of deviations of all the values taken from a statistical average (mean, median or mode) of series. In taking deviation of values, Algebraic signs + and - are not taken into consideration, that is negative deviations are also treated as positive deviation. Suppose a college is proposed for students of five towns A, B, C, D and E which lie in that order along a road. Distances of towns in kilometers from town A and number of students in these towns are given below:

21 Town Distance from town A No. of Students A 90 B 2 150 C 6 100 D 14 200 e 18 80 620

22 MEAN DEVIATION UNGROUPED DATA GROUPED DATA FROM ARITHMETIC MEAN
FROM MEDIAN DISCRETE SERIES CONTINUOUS SERIES FROM MEDIAN FROM ARITHMETIC MEAN FROM MEDIAN FROM ARITHMETIC MEAN DIRECT METHOD DIRECT METHOD DIRECT METHOD DIRECT METHOD ASSUMED MEAN METHOD SHORT CUT METHOD SHORT CUT METHOD SHORT CUT METHOD

23 MEAN DEVIATION UNGROUPED DATA GROUPED DATA FROM ARITHMETIC MEAN
FROM MEDIAN DISCRETE SERIES CONTINUOUS SERIES FROM MEDIAN FROM ARITHMETIC MEAN FROM MEDIAN FROM ARITHMETIC MEAN ΣI dx I n ΣI dx I n ΣI dx I n ΣI dx I n ΣI dx I n ΣI dx I n

24 CALCULATION OF MEAN DEVIATION FOR UNGROUPED DATA
STEPS (FROM MEDIAN) STEPS (FROM MEAN) I. Arrange the data in ascending order. II. Calculate the median Me = Size of (N + 1/ 2)th item III. Find out the deviation of the items from median. Ignore (-) and (+) signs. Express it by I dm I sign. IV. Sum up the deviations, and express the same by using the relevant formula to find out the required answer. MDm = ΣI dm I N V. After that solve the question with the help of formula. Calculate Arithmetic mean by adding up the data. Find out the deviation of the items from mean. Ignore (-) and (+) signs. Express it by I dx I sign. Sum up the deviations. Use the relevant formula to find the required answer: MDx = ΣI dx I

25 Calculation of Mean Deviation from Arithmetic Mean for ungrouped data – Direct Method
Steps: The Arithmetic Mean of the values is calculated Difference between each value and the Arithmetic Mean is calculated. All differences are considered positive. These are denoted as |d| The Arithmetic Mean of these differences (called deviations) is the Mean Deviation.

26 Example The A M = Σx = 30 = 6 N 5 The M Dx = Σ|dx| = 12 = 2.4 N 5 X
|dx| = X – X = 6 2 4 7 1 8 9 3 Σ|dx| = 12 The A M = Σx = 30 = N The M Dx = Σ|dx| = 12 = 2.4 N

27 Calculation of Mean Deviation from Arithmetic Mean for ungrouped data – Assumed Mean Method
Mean Deviation can also be calculated by calculating deviations from an assumed mean. This method is adopted especially when the actual mean is a fractional number. (Take care that the assumed mean is close to the true mean). For the values in example 3, suppose value 7 is taken as assumed mean, M.D. can be calculated as under:

28 Example In such cases, the following formula is used:
|dx| = Ax – X 2 5 4 3 7 8 1 9 Σ|dx| = 11 In such cases, the following formula is used: MDx = Σ|dx| + (X – Ax) (ΣfB – ΣfA) n

29 Where Σ|dx| is the sum of absolute deviations taken from the assumed mean.
x is the actual mean. Ax is the assumed mean used to calculate deviations. ΣfB is the number of values below the actual mean including the actual mean. ΣfA is the number of values above the actual mean. Substituting the values in the above formula: MDx = 11 + (6 – 7) (2 – 3) = 12 = 2.4

30 Mean Deviation from median for ungrouped data
Direct Method M.D. from the Median can be calculated as follows: Calculate the median which is 4400 in this case. Calculate the absolute deviations from median, denote them as |d|. Find the average of these absolute deviations. It is the Mean Deviation.

31 Mean Deviation from median for ungrouped data
Income (X) Deviation from Median 4400 |dx| 4000 400 4200 200 4400 4600 4800 Σ|dx| = 1200 Median = Size of N + 1 th item = 5+ 1 = 3rd item 2 2 Size of the third item is 4400 M.D.median = |dx| = 1200 = 240 n 5

32 Mean Deviation from median for ungrouped data
Short-cut method To calculate Mean Deviation by short cut method, a value (A) is used to calculate the deviations and the following formula is applied: MDmedian = Σ|dx| + (Median – A) (ΣfB – ΣfA) n Where, A = the constant from which deviations are calculated. (Other notations are the same as given in the assumed mean method).

33 Mean Deviation from mean for discrete series
Height 158 159 160 161 162 163 164 165 166 Students 15 20 32 35 33 22 10 8

34 Height Students d (Ax = 162) fd d’ + & - Sign ignored fd’
158 15 4 -60 3.51 52.65 159 20 3 2.51 50.20 160 32 2 -64 1.51 48.32 161 35 1 -35 0.51 17.85 162 33 0.49 16.17 163 22 +1 +22 1.49 32.78 164 +2 +40 2.49 49.80 165 10 +3 +30 3.49 34.90 166 8 +4 +32 4.49 35.95 Σf = 195 Σfdx = -95 Σfd = X = A + Σfd Σf X = (-95) = 195 Mean Deviation Mean = Σf|d| = = 1.74 Σf

35 Mean Deviation from median for discrete series
X 10 11 12 13 14 f 3 18

36 Median = size of (N+1)/2 th item Median = (48 + 1) = 49 = 24.5th item
x f |d| f|d| c.f 10 3 2 6 11 12 1 15 18 33 13 45 14 48 Σf = 48 Σfd = 36 Median = size of (N+1)/2 th item Median = (48 + 1) = 49 = 24.5th item 2 Size of 5th item lies in 33cmlativefrequecy therefore the median is 12 Mean Deviation median = Σf|d| = 36 = 0.75 Σf

37 Mean Deviation from Mean for Continuous distribution
Steps: Calculate the mean of the distribution. Calculate the absolute deviations |d| of the class midpoints from the mean. Multiply each |d| value with its corresponding frequency to get f|d| values. Sum them up to get Σf|d|. Apply the following formula,

38 Profit of companies (Rs in Lakhs)
Example Profit of companies (Rs in Lakhs) Class – interval Number of Companies 10 – 20 5 20 – 30 8 30 – 50 16 50 – 70 70 – 80 3 40

39 A.M. = Σfm = 1620 = 40.5 Σf 40 M.D.x = Σf|d| = 519 = 12.97 Σf 40
Class Interval f Mid point m fm |d| f|d| 10 – 20 5 15 75 25.5 127.5 20 – 30 8 25 200 15.5 124.0 30 – 50 16 40 640 0.5 8.0 50 – 70 60 480 19.5 156.0 70 – 80 3 225 34.5 103.5 Σf = 40 Σfm = 1620 Σf|d| = 519.0 A.M. = Σfm = 1620 = 40.5 Σf M.D.x = Σf|d| = 519 = 12.97 Σf 40

40 Mean Deviation from median for Continuous distribution
The procedure to calculate Mean Deviation from the median is the same as it is in case of M.D. from Mean, except that deviations are to be taken from the median as given below: Class Interval f Mid point m c.f |d| f|d| 20 – 30 5 25 125 30 – 40 10 35 15 150 40 – 60 20 50 60 – 80 9 70 44 180 80 – 90 6 85 210 Σf = 50 Σf|d| = 665

41 Median = L + (N/2 - c.f.) × i f Median = 40 + ( ) x 20 = Mean Deviation median = Σf|d| = 665 Σf 50

42 Standard Deviation Standard Deviation is the positive square root of the mean of squared deviations from mean. So if there are five values x1, x2, x3, x4 and x5, first their mean is calculated. Then deviations of the values from mean are calculated. These deviations are then squared. The mean of these squared deviations is the variance. Positive square root of the variance is the standard deviation. (Note that Standard Deviation is calculated on the basis of the mean only).

43 STANDARD DEVIATION UNGROUPED DATA GROUPED DATA DISCRETE SERIES
CONTINUOUS SERIES Direct method Step Deviation Method Assumed Mean Method Actual Mean Method Actual Mean Method Actual Mean Method Assumed Mean Method Assumed Mean Method Step Deviation Method Step Deviation Method

44 Calculation of Standard Deviation for ungrouped data
Four alternative methods are available for the calculation of standard deviation of individual values. All these methods result in the same value of standard deviation. These are: Actual Mean Method Assumed Mean Method Direct Method Step-Deviation Method

45 Actual Mean Method Suppose you have to calculate the standard deviation of the following values: 5, 10, 25, 30, 50 X d (X-X) d2 5 -19 361 10 -14 196 25 +1 1 30 +6 36 50 +26 676 1270

46 Assumed Mean Method For the same values, deviations may be calculated from any arbitrary value A x such that d = X – A x . Taking A x = 25, the computation of the standard deviation is shown below: X d(X-X) d2 5 -20 400 10 -15 225 25 30 +5 50 +25 625 -5 1275

47 Direct Method Standard Deviation can also be calculated from the values directly, i.e., without taking deviations, as shown below: X X2 5 25 10 100 625 30 900 50 2500 120 4150

48 Step-deviation Method
If the values are divisible by a common factor, they can be so divided and standard deviation can be calculated from the resultant values as follows: X X’ d (x’ – x’) d2 5 1 -3.8 14.44 10 2 -2.8 7.84 25 +0.2 0.04 30 6 +1.2 1.44 50 +5.2 27.04 24 50.80

49 Standard Deviation in Continuous frequency distribution:
Like ungrouped data, S.D. can be calculated for grouped data by any of the following methods: Actual Mean Method Assumed Mean Method Step-Deviation Method

50 Actual Mean Method Following steps are required:
Calculate the mean of the distribution. mean = Σfm/ Σf = 1620/40 = 40.5 Calculate deviations of mid-values from the mean so that d = m – mean. Multiply the deviations with their corresponding frequencies to get ‘fd’ values. Calculate ‘fd2’ values by multiplying ‘fd’ values with ‘d’ values. Sum up these to get Σfd2. Apply the formula.

51 CI f m fm d fd fd2 10 – 20 5 15 75 -25.5 -127.5 20 – 30 8 25 200 -15.5 -124.0 30 – 50 16 40 640 -0.5 -8.0 4.00 50 – 70 60 480 +19.5 +156.0 70 – 80 3 225 +34.5 +103.5 1620 11790

52 Assumed Mean Method For the values in earlier example, standard deviation can be calculated by taking deviations from an assumed mean (say 40) as follows: The following steps are required: Calculate mid-points of classes Calculate deviations of mid-points from an assumed mean such that d = m – Assumed mean. Assumed Mean = 40. Multiply values of ‘d’ with corresponding frequencies to get ‘fd’ values. (note that the total of this column is not zero since deviations have been taken from assumed mean). Multiply ‘fd’ values with ‘d’ values to get fd2 values. Find fd2. Standard Deviation can be calculated by formula.

53 CI f m d fd fd2 10 – 20 5 15 -25 -125 3125 20 – 30 8 25 -15 -120 1800 30 – 50 16 40 50 – 70 60 +20 +160 3200 70 – 80 3 75 +35 +105 3675 11800

54 Step-deviation Method
In case the values of deviations are divisible by a common factor, the calculations can be simplified by the step-deviation method. Steps required: Calculate class mid-points (Col. 3) and deviations from an arbitrarily chosen value, just like in the assumed mean method. In this example, deviations have been taken from the value 40. (Col. 4) Divide the deviations by a common factor denoted as ‘c’. c = 5 in the above example. The values so obtained are ‘d'’ values (Col. 5). Multiply ‘d'’ values with corresponding ‘f'’ values (Col. 2) to obtain ‘fd'’ values (Col. 6). Multiply ‘fd'’ values with ‘d'’ values to get ‘fd'2’ values (Col. 7) Sum up values in Col. 6 and Col. 7 to get S fd' and S fd'2 values. Apply the following formula.

55 C f m d d' fd’ fd2 10 – 20 5 15 -25 -5 125 20 – 30 8 25 -15 -3 -24 72 30 – 50 16 40 50 – 70 60 +20 4 +32 128 70 – 80 3 75 +35 7 +21 147 +4 472


Download ppt "Measures of Dispersion"

Similar presentations


Ads by Google