Presentation is loading. Please wait.

Presentation is loading. Please wait.

SESSION 19 & 20 Last Update 16 th March 2011 Measures of Dispersion Measures of Variability - Grouped Data -

Similar presentations


Presentation on theme: "SESSION 19 & 20 Last Update 16 th March 2011 Measures of Dispersion Measures of Variability - Grouped Data -"— Presentation transcript:

1 SESSION 19 & 20 Last Update 16 th March 2011 Measures of Dispersion Measures of Variability - Grouped Data -

2 Lecturer:Florian Boehlandt University:University of Stellenbosch Business School Domain:http://www.hedge-fund- analysis.net/pages/vega.php

3 Learning Objectives All measures for grouped data: 1.Measures of relative standing: Median, Quartiles, Deciles and Percentiles 2.Measures of dispersion: Range 3.Measures of variability: Variance and Standard Deviation 4.Empirical Rule and Chebysheff’s Theroem 5.Coefficient of Variation

4 Percentiles We can determine any percentile for grouped data using the following formula: For quartiles, the formula ‘simplifies’ to: Where m = 1, 2, 3 or 4 for the first, second, third and fourth quartile

5 Calculation of Percentile 1.Calculate the less than cumulative frequencies f(<) from the observed frequencies f 2.Use the following formula to determine the location of the P th percentile: L p = (n + 1) * (P / 100) 3.Locate the interval L p falls into

6 Calculation of Percentile 4.Determine the following parameters 5.Apply formula for P th Percentile PThe percentile (e.g. 25 for the first quartile) nSample size O LP The lower limit of the interval L p falls into CClass width f(<)The cumulative frequency of the previous interval of the interval L p falls into f LP The observed frequency of the interval L p falls into

7 Percentile: An example Let us assume the following grouped data is to be assessed: Intervalff(<) 40 to 4966 50 to 591420 60 to 691131 70 to 79637 80 to 89340 C10 n40 C = Upper + 1 – Lower C = 49 + 1 – 40 = 10

8 Percentile: An example If the data is interval (student marks approximately are), inequalities in the intervals may be more appropriate. Intervalff(<) 40 to 4966 50 to 591420 60 to 691131 70 to 79637 80 to 89340 C10 n40 Intervalff(<) 40 to <5066 50 to <601420 60 to <701131 70 to <80637 80 to <90340 C10 n40 This example comes from your student manual. The intervals on the right including inequalities may be somewhat more intuitive C = Upper + 1 – Lower C = 49 + 1 – 40 = 10 C = Upper – Lower C= 50 – 40 = 10

9 Solution – Step 1 Use the formula for the calculation to determine what interval the median falls into. Since 6 < 9.75 < 20, the median interval is 50 to < 60. Beware that the median interval is to be looked up in the cumulative frequency column, not the interval column! Intervalff(<) 40 to < 5066 50 to < 601420 60 to < 701131 70 to < 80637 80 to < 90340 C10 n40 P25 LpLp 9.75=(40 + 1) * (25/100)

10 Solution – Step 2 Read of the parameters required for the median formula for grouped data. The formula: Intervalff(<) 40 to < 5066 50 to < 601420 60 to < 701131 70 to < 80637 80 to < 90340 C10 n40 P25 LpLp 9.75 O LP 50 f LP 14 f(<)6 Now yields: It is left as an exercise to confirm that the formula for Q yields the same result.

11 Variance Using the midpoints allows us to calculate the variance of grouped data as well. In the case of interval data, as with the mean, the original data is to be preferred to the grouped data. For ordinal or nominal data the variance has no probabilistic meaning! Measures of relative standing (i.e. percentiles) may be used for ordinal data. There are no measures of variability for nominal data (Example: 1 = married, 2 = single, 3 = divorced, 4 = widowed).

12 Calculation of Variance 1.Determine the interval midpoints x 2.Multiply the observed frequencies f with the interval midpoints (fx) 3.Sum the results from 2. and divide by n (Steps 1 to 3 are identical to calculating the mean for grouped data) 4.Square x and multiply by f yielding fx 2

13 Calculation of Variance 6.Use the following formula to determine the variance for grouped data (sample): And for the population: Note that x denotes the midpoints here and not the actual observations.

14 Variance: An example Let us assume the following grouped data is to be assessed: Intervalf 40 to < 496 50 to < 5914 60 to < 6911 70 to < 796 80 to < 893 C9 n40

15 Solution – Step 1 Intervalfxfxx2x2 fx 2 40 to < 49644.5267.01980.2511881.5 50 to < 591454.5763.02970.2541583.5 60 to < 691164.5709.54160.2545762.75 70 to < 79674.5447.05550.2533301.5 80 to < 89384.5253.57140.2521420.75 Total402440.0153950 Average61.0 61 40 153950

16 Solution – Step 2 Using the formula yields: As before, the square root yields the standard deviation.

17 Empirical Rule In normal bell-shaped frequency distribution polygons, we find the following: 1.Approx. 68.2% of all observations fall within one standard deviation of the mean 2.Approx. 95.4% of all observations fall within two standard deviations of the mean 3.Approx. 99.7% of all observations fall within three standard deviations of the mean

18 Chebycheff’s Theorem The Chebycheff Theorem is a more general alternative to the empirical rule, which applies to all shapes of histograms. The proportion of observations that lie within k standard deviations of the mean is at least: 1 – 1 / k 2 for k > 1 Where k denotes the standard deviations away from the mean

19 Chebycheff’s Theorem - Example kFormulaChebycheffEmpirical k = 1not definedn.a.=68.2% k = 21 – 1 / 4=75%=95.4% k = 31 – 1 / 9=88.9%=99.7% K = 41 – 1/16=93.75%n.a. The Empirical Rule provides approximate proportions under the assumption of a bell- shaped normal distribution, whereas Chebycheff’s Theorem provides lower bounds on the approximations for any types of distribution. Consequently, the tail-ends of the distribution are further apart. Chebycheff is not relevant to your examination!

20 Coefficient of Variation The coefficient of variation of a set of observations is the standard deviation divided by their mean: By relating the standard deviation to its mean one can make a statement about the variability of the data. Compare a standard deviation of 10 to a mean of 100 and a mean of 1,000,000! SamplePopulation


Download ppt "SESSION 19 & 20 Last Update 16 th March 2011 Measures of Dispersion Measures of Variability - Grouped Data -"

Similar presentations


Ads by Google