Presentation on theme: "Measures of Variability Range Interquartile range Variance Standard deviation Coefficient of variation."— Presentation transcript:
Measures of Variability Range Interquartile range Variance Standard deviation Coefficient of variation
Consider the sample of starting salaries of business grads. We would be interested in knowing if there was a low or high degree of variability or dispersion in starting salaries received.
Range Range is simply the difference between the largest and smallest values in the sample Range is the simplest measure of variability. Note that range is highly sensitive to the largest and smallest values.
Example: Apartment Rents Seventy studio apartments Seventy studio apartments were randomly sampled in a small college town. The monthly rent prices for these apartments are listed in ascending order on the next slide.
Range Range = largest value - smallest value Range = = 190
Interquartile Range The interquartile range of a data set is the difference The interquartile range of a data set is the difference between the third quartile and the first quartile. between the third quartile and the first quartile. It is the range for the middle 50% of the data. It is the range for the middle 50% of the data. It overcomes the sensitivity to extreme data values. It overcomes the sensitivity to extreme data values.
Variance The variance is a measure of variability that uses all the data The variance is based on the difference between each observation (x i ) and the mean ( for the sample and μ for the population).
The variance is the average of the squared differences between the observations and the mean value For the population: For the sample:
Standard Deviation The Standard Deviation of a data set is the square root of the variance. The standard deviation is measured in the same units as the data, making it easy to interpret.
Computing a standard deviation For the population: For the sample:
Coefficient of Variation Just divide the standard deviation by the mean and multiply times 100 Computing the coefficient of variation: For the sample For the population
The heights (in inches) of 25 individuals were recorded and the following statistics were calculated mean = 70range = 20mode = 73variance = 784median = 74 The coefficient of variation equals % % 3.0.4% 4.40%
If index i (which is used to determine the location of the pth percentile) is not an integer, its value should be 1.squared 2.divided by (n - 1) 3.rounded down 4.rounded up
Which of the following symbols represents the variance of the population?
Which of the following symbols represents the size of the sample 2 3.N 4.n
The symbol s is used to represent 1.the variance of the population 2.the standard deviation of the sample 3.the standard deviation of the population 4.the variance of the sample
The numerical value of the variance 1.is always larger than the numerical value of the standard deviation 2.is always smaller than the numerical value of the standard deviation 3.is negative if the mean is negative 4.can be larger or smaller than the numerical value of the standard deviation
If the coefficient of variation is 40% and the mean is 70, then the variance is
Problem 22, page 94
Broker-Assisted 100 Shares at $50 per Share Range45.05 Interquartile Range23.98 Variance Standard Deviation13.8 Coefficient of Variation th percentile6 75th percentile18 interquart interquart Mean36.32
Online 500 Shares at $50 per Share Range57.50 Interquartile Range Variance Standard Deviation Coefficient of Variation th percentile 75th percentile interquart interquart Mean20.46
The variability of commissions is greater for broker-assisted trades
Using Excel to Compute the Sample Variance, Standard Deviation, and Coefficient of Variation n Formula Worksheet Note: Rows 8-71 are not shown.
n Value Worksheet Using Excel to Compute the Sample Variance, Standard Deviation, and Coefficient of Variation Note: Rows 8-71 are not shown.
Using Excels Descriptive Statistics Tool Step 4 When the Descriptive Statistics dialog box appears: appears: Enter B1:B71 in the Input Range box Select Grouped By Columns Select Labels in First Row Select Output Range Enter D1 in the Output Range box Select Summary Statistics Click OK
Descriptive Statistics Dialog Box Using Excels Descriptive Statistics Tool
n Value Worksheet (Partial) Using Excels Descriptive Statistics Tool Note: Rows 9-71 are not shown.
n Value Worksheet (Partial) Using Excels Descriptive Statistics Tool Note: Rows 1-8 and are not shown.
Measures of Relative Location and Detecting Outliers z-scores Chebyshevs Theorem Detecting Outliers By using the mean and standard deviation together, we can learn more about the relative location of observations in a data set
z-score Here we compare the deviation from the mean of a single observation to the standard deviation The z-score is compute for each x i : Where z i is the z-score for x i is the sample mean s is the sample standard deviation
The z-score can be interpreted as the number of standard deviations x i is from the sample mean
Z-scores for the starting salary data GraduateStarting Salary x i - x z-score
Chebyshevs Theorem This theorem enables us to make statements about the proportion of data values that must be within a specified number of standard deviations from the mean At least (1-1/ z 2 ) of the data values must be within z standard deviations of the mean, where z is greater than 1.
Implications of Chebychevs Theorem At least.75, or 75 percent of the data values must be within 2 ( z = 2) standard deviations of the mean. At least.89, or 89 percent, of the data values must be within 3 (z = 3) standard deviations of the mean. At least.94, or 94percent, of the data values must be within 4 (z = 4) standard deviations from the mean. Note: z must be greater than one but need not be an integer.
For example: Chebyshevs Theorem Let z = 1.5 with = and s = At least (1 1/(1.5) 2 ) = = 0.56 or 56% of the rent values must be between - z ( s ) = (54.74) = z ( s ) = (54.74) = 409and + z ( s ) = (54.74) = z ( s ) = (54.74) = 573 (Actually, 86% of the rent values are between 409 and 573.) are between 409 and 573.)
Detecting Outliers You can use z-scores to detect extreme values in the data set, or outliers. In the case of very high z-scores (absolute values) it is a good idea to recheck the data for accuracy.