Presentation on theme: "Measures of Variability"— Presentation transcript:
1 Measures of Variability RangeInterquartile rangeVarianceStandard deviationCoefficient of variation
2 Consider the sample of starting salaries of business grads Consider the sample of starting salaries of business grads. We would be interested in knowing if there was a low or high degree of variability or dispersion in starting salaries received.
3 RangeRange is simply the difference between the largest and smallest values in the sampleRange is the simplest measure of variability.Note that range is highly sensitive to the largest and smallest values.
4 Example: Apartment Rents Seventy studio apartmentswere randomly sampled ina small college town. Themonthly rent prices forthese apartments are listedin ascending order on the next slide.
6 Interquartile RangeThe interquartile range of a data set is the differencebetween the third quartile and the first quartile.It is the range for the middle 50% of the data.It overcomes the sensitivity to extreme data values.
8 VarianceThe variance is a measure of variability that uses all the dataThe variance is based on the difference between each observation (xi) and the mean ( for the sample and μ for the population).
9 For the population: For the sample: The variance is the average of the squared differences between the observations and the mean valueFor the population:For the sample:
10 Standard DeviationThe Standard Deviation of a data set is the square root of the variance.The standard deviation is measured in the same units as the data, making it easy to interpret.
11 Computing a standard deviation For the population:For the sample:
12 Coefficient of Variation Just divide the standard deviation by the mean and multiply times 100Computing the coefficient of variation:For the populationFor the sample
13 The heights (in inches) of 25 individuals were recorded and the following statistics were calculated mean = 70range = 20mode = 73variance = 784median = 74 The coefficient of variation equals1011.2%1120%0.4%40%5
14 squared divided by (n - 1) rounded down rounded up If index i (which is used to determine the location of the pth percentile) is not an integer, its value should be10squareddivided by (n - 1)rounded downrounded up5
15 Which of the following symbols represents the variance of the population? 10s2sm5
16 Which of the following symbols represents the size of the sample 105
17 The symbol s is used to represent the variance of the populationthe standard deviation of the samplethe standard deviation of the populationthe variance of the sample105
18 The numerical value of the variance is always larger than the numerical value of the standard deviationis always smaller than the numerical value of the standard deviationis negative if the mean is negativecan be larger or smaller than the numerical value of the standard deviation105
19 If the coefficient of variation is 40% and the mean is 70, then the variance is 2828001.75784105
21 Broker-Assisted 100 Shares at $50 per Share Range45.05Interquartile Range23.98Variance190.67Standard Deviation13.8Coefficient of Variation38.0225th percentile675th percentile18interquart 2524.995interquart 7548.975Mean36.32
22 Online 500 Shares at $50 per Share Range57.50Interquartile Range11.475VarianceStandard Deviation11.859Coefficient of Variation57.94925th percentile75th percentileinterquart 2513.475interquart 7524.95Mean20.46
23 The variability of commissions is greater for broker-assisted trades
24 Using Excel to Compute the Sample Variance, Standard Deviation, and Coefficient of Variation Formula WorksheetNote: Rows 8-71 are not shown.
25 Using Excel to Compute the Sample Variance, Standard Deviation, and Coefficient of Variation Value WorksheetNote: Rows 8-71 are not shown.
26 Using Excel’s Descriptive Statistics Tool Step 4 When the Descriptive Statistics dialog boxappears:Enter B1:B71 in the Input Range boxSelect Grouped By ColumnsSelect Labels in First RowSelect Output RangeEnter D1 in the Output Range boxSelect Summary StatisticsClick OK
27 Using Excel’s Descriptive Statistics Tool Descriptive Statistics Dialog Box
28 Using Excel’s Descriptive Statistics Tool Value Worksheet (Partial)Note: Rows 9-71 are not shown.
29 Using Excel’s Descriptive Statistics Tool Value Worksheet (Partial)Note: Rows 1-8 and are not shown.
30 Measures of Relative Location and Detecting Outliers z-scoresChebyshev’s TheoremDetecting OutliersBy using the mean and standard deviation together, we can learn more about the relative location of observations in a data set
31 z-score The z-score is compute for each xi : Where Here we compare the deviation from the mean of a single observation to the standard deviationThe z-score is compute for each xi :Wherezi is the z-score for xiis the sample means is the sample standard deviation
32 The z-score can be interpreted as the number of standard deviations xi is from the sample mean
33 Z-scores for the starting salary data GraduateStarting Salaryxi - xz-score12850-90-0.54322950100.060330501100.66442880-60-0.36252755-185-1.11762710-230-1.38872890-50-0.302831301901.147929400.00033253852.324112920-20-0.12112
34 Chebyshev’s TheoremAt least (1-1/z2) of the data values must be within z standard deviations of the mean, where z is greater than 1.This theorem enables us to make statements about the proportion of data values that must be within a specified number of standard deviations from the mean
35 Implications of Chebychev’s Theorem At least .75, or 75 percent of the data values must be within 2 ( z = 2) standard deviations of the mean.At least .89, or 89 percent, of the data values must be within 3 (z = 3) standard deviations of the mean.At least .94, or 94percent, of the data values must be within 4 (z = 4) standard deviations from the mean.Note: z must be greater than one but need not be an integer.
36 Chebyshev’s Theorem For example: Let z = 1.5 with = and s = 54.74At least (1 - 1/(1.5)2) = = 0.56 or 56%of the rent values must be between- z(s) = (54.74) = 409and+ z(s) = (54.74) = 573(Actually, 86% of the rent valuesare between 409 and 573.)
37 Detecting OutliersYou can use z-scores to detect extreme values in the data set, or “outliers.” In the case of very high z-scores (absolute values) it is a good idea to recheck the data for accuracy.