# Measures of Variability

## Presentation on theme: "Measures of Variability"— Presentation transcript:

Measures of Variability
Range Interquartile range Variance Standard deviation Coefficient of variation

Consider the sample of starting salaries of business grads. We would be interested in knowing if there was a low or high degree of variability or dispersion in starting salaries received.

Range Range is simply the difference between the largest and smallest values in the sample Range is the simplest measure of variability. Note that range is highly sensitive to the largest and smallest values.

Example: Apartment Rents
Seventy studio apartments were randomly sampled in a small college town. The monthly rent prices for these apartments are listed in ascending order on the next slide.

Range = largest value - smallest value

Interquartile Range The interquartile range of a data set is the difference between the third quartile and the first quartile. It is the range for the middle 50% of the data. It overcomes the sensitivity to extreme data values.

Interquartile Range = Q3 - Q1 = 525 - 445 = 80
3rd Quartile (Q3) = 525 1st Quartile (Q1) = 445 Interquartile Range = Q3 - Q1 = = 80

Variance The variance is a measure of variability that uses all the data The variance is based on the difference between each observation (xi) and the mean ( for the sample and μ for the population).

For the population: For the sample:
The variance is the average of the squared differences between the observations and the mean value For the population: For the sample:

Standard Deviation The Standard Deviation of a data set is the square root of the variance. The standard deviation is measured in the same units as the data, making it easy to interpret.

Computing a standard deviation
For the population: For the sample:

Coefficient of Variation
Just divide the standard deviation by the mean and multiply times 100 Computing the coefficient of variation: For the population For the sample

The heights (in inches) of 25 individuals were recorded and the following statistics were calculated mean = 70range = 20mode = 73variance = 784median = 74 The coefficient of variation equals 10 11.2% 1120% 0.4% 40% 5

squared divided by (n - 1) rounded down rounded up
If index i (which is used to determine the location of the pth percentile) is not an integer, its value should be 10 squared divided by (n - 1) rounded down rounded up 5

Which of the following symbols represents the variance of the population?
10 s2 s m 5

Which of the following symbols represents the size of the sample
10 5

The symbol s is used to represent
the variance of the population the standard deviation of the sample the standard deviation of the population the variance of the sample 10 5

The numerical value of the variance
is always larger than the numerical value of the standard deviation is always smaller than the numerical value of the standard deviation is negative if the mean is negative can be larger or smaller than the numerical value of the standard deviation 10 5

If the coefficient of variation is 40% and the mean is 70, then the variance is
28 2800 1.75 784 10 5

Problem 22, page 94

Broker-Assisted 100 Shares at \$50 per Share
Range 45.05 Interquartile Range 23.98 Variance 190.67 Standard Deviation 13.8 Coefficient of Variation 38.02 25th percentile 6 75th percentile 18 interquart 25 24.995 interquart 75 48.975 Mean 36.32

Online 500 Shares at \$50 per Share
Range 57.50 Interquartile Range 11.475 Variance Standard Deviation 11.859 Coefficient of Variation 57.949 25th percentile 75th percentile interquart 25 13.475 interquart 75 24.95 Mean 20.46

The variability of commissions is greater for broker-assisted trades

Using Excel to Compute the Sample Variance, Standard Deviation, and Coefficient of Variation
Formula Worksheet Note: Rows 8-71 are not shown.

Using Excel to Compute the Sample Variance, Standard Deviation, and Coefficient of Variation
Value Worksheet Note: Rows 8-71 are not shown.

Using Excel’s Descriptive Statistics Tool
Step 4 When the Descriptive Statistics dialog box appears: Enter B1:B71 in the Input Range box Select Grouped By Columns Select Labels in First Row Select Output Range Enter D1 in the Output Range box Select Summary Statistics Click OK

Using Excel’s Descriptive Statistics Tool
Descriptive Statistics Dialog Box

Using Excel’s Descriptive Statistics Tool
Value Worksheet (Partial) Note: Rows 9-71 are not shown.

Using Excel’s Descriptive Statistics Tool
Value Worksheet (Partial) Note: Rows 1-8 and are not shown.

Measures of Relative Location and Detecting Outliers
z-scores Chebyshev’s Theorem Detecting Outliers By using the mean and standard deviation together, we can learn more about the relative location of observations in a data set

z-score The z-score is compute for each xi : Where
Here we compare the deviation from the mean of a single observation to the standard deviation The z-score is compute for each xi : Where zi is the z-score for xi is the sample mean s is the sample standard deviation

The z-score can be interpreted as the number of standard deviations xi is from the sample mean

Z-scores for the starting salary data
Graduate Starting Salary xi - x z-score 1 2850 -90 -0.543 2 2950 10 0.060 3 3050 110 0.664 4 2880 -60 -0.362 5 2755 -185 -1.117 6 2710 -230 -1.388 7 2890 -50 -0.302 8 3130 190 1.147 9 2940 0.000 3325 385 2.324 11 2920 -20 -0.121 12

Chebyshev’s Theorem At least (1-1/z2) of the data values must be within z standard deviations of the mean, where z is greater than 1. This theorem enables us to make statements about the proportion of data values that must be within a specified number of standard deviations from the mean

Implications of Chebychev’s Theorem
At least .75, or 75 percent of the data values must be within 2 ( z = 2) standard deviations of the mean. At least .89, or 89 percent, of the data values must be within 3 (z = 3) standard deviations of the mean. At least .94, or 94percent, of the data values must be within 4 (z = 4) standard deviations from the mean. Note: z must be greater than one but need not be an integer.

Chebyshev’s Theorem For example:
Let z = 1.5 with = and s = 54.74 At least (1 - 1/(1.5)2) = = 0.56 or 56% of the rent values must be between - z(s) = (54.74) = 409 and + z(s) = (54.74) = 573 (Actually, 86% of the rent values are between 409 and 573.)

Detecting Outliers You can use z-scores to detect extreme values in the data set, or “outliers.” In the case of very high z-scores (absolute values) it is a good idea to recheck the data for accuracy.