2 Chapter 3, Part A Descriptive Statistics: Numerical Measures Measures of LocationMeasures of Variability
3 Measures of Location Mean If the measures are computed for data from a sample,they are called sample statistics.MedianModePercentilesIf the measures are computedfor data from a population,they are called population parameters.QuartilesA sample statistic is referred toas the point estimator of thecorresponding population parameter.
4 Mean The mean of a data set is the average of all the data values. The sample mean is the point estimator of the population mean m.
5 Sample Mean Sum of the values of the n observations Number of in the sample
6 Population Mean m Sum of the values of the N observations Number of observations inthe population
7 Sample Mean Example: Apartment Rents Seventy efficiency apartments were randomlysampled in a small college town. The monthly rentprices for these apartments are listed below.
9 Median The median of a data set is the value in the middle when the data items are arranged in ascending order.Whenever a data set has extreme values, the medianis the preferred measure of central location.The median is the measure of location most oftenreported for annual income and property value data.A few extremely large incomes or property valuescan inflate the mean.
10 Median For an odd number of observations: 26 18 27 12 14 27 19 in ascending orderthe median is the middle value.Median = 19
11 Median For an even number of observations: 26 18 27 12 14 27 30 19 in ascending orderthe median is the average of the middle two values.Median = ( )/2 =
12 Averaging the 35th and 36th data values: MedianExample: Apartment RentsAveraging the 35th and 36th data values:Median = ( )/2 =Note: Data is in ascending order.
13 Mode The mode of a data set is the value that occurs with greatest frequency.The greatest frequency can occur at two or moredifferent values.If the data have exactly two modes, the data arebimodal.If the data have more than two modes, the data aremultimodal.
14 450 occurred most frequently (7 times) ModeExample: Apartment Rents450 occurred most frequently (7 times)Mode = 450Note: Data is in ascending order.
15 Using Excel to Compute the Mean, Median, and Mode Excel Formula WorksheetNote: Rows 7-71 are not shown.
16 Using Excel to Compute the Mean, Median, and Mode Value WorksheetNote: Rows 7-71 are not shown.
17 Percentiles A percentile provides information about how the data are spread over the interval from the smallestvalue to the largest value.Admission test scores for colleges and universitiesare frequently reported in terms of percentiles.The pth percentile of a data set is a value such that at least p percent of the items take on this value or less and at least (100 - p) percent of the items take on this value or more.
18 Percentiles Arrange the data in ascending order. Compute index i, the position of the pth percentile.i = (p/100)nIf i is not an integer, round up. The p th percentileis the value in the i th position.If i is an integer, the p th percentile is the averageof the values in positions i and i +1.
19 Averaging the 56th and 57th data values: 80th PercentileExample: Apartment Rentsi = (p/100)n = (80/100)70 = 56Averaging the 56th and 57th data values:80th Percentile = ( )/2 = 542Note: Data is in ascending order.
20 80th Percentile Example: Apartment Rents “At least 80% of the items take on avalue of 542 or less.”“At least 20% of theitems take on avalue of 542 or more.”56/70 = .8 or 80%14/70 = .2 or 20%
21 Using Excel’s Rank and Percentile Tool to Compute Percentiles and QuartilesUsing Excel’s Percentile FunctionThe formula Excel uses to compute the location (Lp)of the pth percentile isLp = (p/100)n + (1 – p/100)Excel would compute the location of the 80thpercentile for the apartment rent data as follows:L80 = (80/100)70 + (1 – 80/100) = = 56.2The 80th percentile would be( ) = =
22 Using Excel’s Rank and Percentile Tool to Compute Percentiles and QuartilesExcel Formula Worksheet80th percentileIt is not necessaryto put the datain ascending order.Note: Rows 7-71 are not shown.
23 Using Excel’s Rank and Percentile Tool to Compute Percentiles and QuartilesExcel Value WorksheetNote: Rows 7-71 are not shown.
24 Quartiles Quartiles are specific percentiles. First Quartile = 25th PercentileSecond Quartile = 50th Percentile = MedianThird Quartile = 75th Percentile
25 Third quartile = 75th percentile Example: Apartment RentsThird quartile = 75th percentilei = (p/100)n = (75/100)70 = 52.5 = 53Third quartile = 525Note: Data is in ascending order.
26 Third Quartile Using Excel’s Quartile Function Excel computes the locations of the 1st, 2nd, and 3rdquartiles by first converting the quartiles topercentiles and then using the following formula tocompute the location (Lp) of the pth percentile:Lp = (p/100)n + (1 – p/100)Excel would compute the location of the 3rd quartile(75th percentile) for the rent data as follows:L75 = (75/100)70 + (1 – 75/100) = = 52.75The 3rd quartile would be( ) = =
27 Third Quartile Excel Formula Worksheet 3rd quartile It is not necessaryto put the datain ascending order.Note: Rows 7-71 are not shown.
28 Third QuartileExcel Value WorksheetNote: Rows 7-71 are not shown.
29 Excel’s Rank and Percentile Tool Step 1 Click the Data tab on the RibbonStep 2 In the Analysis group, click Data AnalysisStep 3 Choose Rank and Percentile from the list ofAnalysis ToolsStep 4 When the Rank and Percentile dialog box appears(see details on next slide)
30 Excel’s Rank and Percentile Tool Step 4 Complete the Rank and Percentile dialogbox as follows:
31 Excel’s Rank and Percentile Tool Excel Value WorksheetNote: Rows are not shown.
32 Averages and Standard Deviation Ch 3 Geometric Mean (GM)The Geometric Mean is useful in finding the averages of increases in:PercentsRatiosIndexesGrowth RatesThe Geometric Mean will always be less than or equal to (never more than) the arithmetic meanThe GM gives a more conservative figure that is not drawn up by large values in the setStatistics Is Fun! & Statistics Are Fun!
33 Averages and Standard Deviation Ch 3 Geometric MeanThe GM of a set of n positive numbers is defined as the nth root of the product of n values. The formula is:Statistics Is Fun! & Statistics Are Fun!
34 Geometric Mean Example 1: Percentage Increase Averages and Standard Deviation Ch 3Geometric Mean Example 1: Percentage IncreaseStatistics Is Fun! & Statistics Are Fun!
35 Verify Geometric Mean Example Averages and Standard Deviation Ch 3Verify Geometric Mean ExampleStatistics Is Fun! & Statistics Are Fun!
36 Another Use Of GM: Ave. % Increase Over Time Averages and Standard Deviation Ch 3Another Use Of GM: Ave. % Increase Over TimeAnother use of the geometric mean is to determine the percent increase in sales, production or other business or economic series from one time period to anotherWhere n = number of periodsStatistics Is Fun! & Statistics Are Fun!
37 Example for GM: Ave. % Increase Over Time Averages and Standard Deviation Ch 3Example for GM: Ave. % Increase Over TimeThe total number of females enrolled in American colleges increased from 755,000 in 1992 to 835,000 in That is, the geometric mean rate of increase is 1.27%.The annual rate of increase is 1.27%For the years 1992 through 2000, the rate of female enrollment growth at American colleges was 1.27% per yearStatistics Is Fun! & Statistics Are Fun!
38 Measures of Variability It is often desirable to consider measures of variability(dispersion), as well as measures of location.For example, in choosing supplier A or supplier B wemight consider not only the average delivery time foreach, but also the variability in delivery time for each.
39 Measures of Variability RangeInterquartile RangeVarianceStandard DeviationCoefficient of Variation
40 Range The range of a data set is the difference between the largest and smallest data values.It is the simplest measure of variability.It is very sensitive to the smallest and largest datavalues.
41 Range = largest value - smallest value Example: Apartment RentsRange = largest value - smallest valueRange = = 190Note: Data is in ascending order.
42 Interquartile RangeThe interquartile range of a data set is the differencebetween the third quartile and the first quartile.It is the range for the middle 50% of the data.It overcomes the sensitivity to extreme data values.
43 Interquartile Range = Q3 - Q1 = 525 - 445 = 80 Example: Apartment Rents3rd Quartile (Q3) = 5251st Quartile (Q1) = 445Interquartile Range = Q3 - Q1 = = 80Note: Data is in ascending order.
44 Variance The variance is a measure of variability that utilizes all the data.It is based on the difference between the value ofeach observation (xi) and the mean ( for a sample,m for a population).
45 Variance The variance is the average of the squared differences between each data value and the mean.The variance is computed as follows:for asamplefor apopulation
46 Standard DeviationThe standard deviation of a data set is the positivesquare root of the variance.It is measured in the same units as the data, makingit more easily interpreted than the variance.
47 Standard Deviation The standard deviation is computed as follows: for asamplefor apopulation
48 Coefficient of Variation The coefficient of variation indicates how large thestandard deviation is in relation to the mean.The coefficient of variation is computed as follows:for asamplefor apopulation
49 Sample Variance, Standard Deviation, And Coefficient of Variation Example: Apartment RentsVarianceStandard Deviationthe standarddeviation isabout 11%of the meanCoefficient of Variation
50 Using Excel to Compute the Sample Variance, Standard Deviation, and Coefficient of Variation Formula WorksheetNote: Rows 8-71 are not shown.
51 Using Excel to Compute the Sample Variance, Standard Deviation, and Coefficient of Variation Value WorksheetNote: Rows 8-71 are not shown.
52 Using Excel’s Descriptive Statistics Tool Step 1 Click the Data tab on the RibbonStep 2 In the Analysis group, click Data AnalysisStep 3 Choose Descriptive Statistics from the list ofAnalysis ToolsStep 4 When the Descriptive Statistics dialog box appears:(see details on next slide)
57 Chapter 3, Part B Descriptive Statistics: Numerical Measures Measures of Distribution Shape, Relative Location, and Detecting OutliersExploratory Data Analysis
58 Measures of Distribution Shape, Relative Location, and Detecting Outliers z-ScoresChebyshev’s TheoremEmpirical RuleDetecting Outliers
59 Distribution Shape: Skewness An important measure of the shape of a distribution is called skewness.The formula for computing skewness for a data set is somewhat complex.Skewness can be easily computed using statistical software.Excel’s SKEW function can be used to compute theskewness of a data set.
60 Distribution Shape: Skewness Symmetric (not skewed)Skewness is zero.Mean and median are equal.Skewness = 0.05.10.15.126.96.36.199Relative Frequency
61 Distribution Shape: Skewness Moderately Skewed LeftSkewness is negative.Mean will usually be less than the median.Skewness =Relative Frequency.05.10.15.20.25.30.35
62 Distribution Shape: Skewness Moderately Skewed RightSkewness is positive.Mean will usually be more than the median.Skewness = .31Relative Frequency.05.10.15.20.25.30.35
63 Distribution Shape: Skewness Highly Skewed RightSkewness is positive (often above 1.0).Mean will usually be more than the median.Skewness = 1.25Relative Frequency.05.10.15.20.25.30.35
64 Distribution Shape: Skewness Example: Apartment RentsSeventy efficiency apartments were randomlysampled in a college town. The monthly rent pricesfor the apartments are listed below in ascending order.
65 Distribution Shape: Skewness Example: Apartment Rents.05.10.15.20.25.30.35Skewness = .92Relative Frequency
66 z-Scores The z-score is often called the standardized value. It denotes the number of standard deviations a datavalue xi is from the mean.
67 z-Scores An observation’s z-score is a measure of the relative location of the observation in a data set.A data value less than the sample mean will have az-score less than zero.A data value greater than the sample mean will havea z-score greater than zero.A data value equal to the sample mean will have az-score of zero.
68 Standardized Values for Apartment Rents z-ScoresExample: Apartment Rentsz-Score of Smallest Value (425)Standardized Values for Apartment Rents
69 Chebyshev’s TheoremAt least (1 - 1/z2) of the items in any data set will bewithin z standard deviations of the mean, where z isany value greater than 1.
70 Chebyshev’s Theorem At least of the data values must be 75% within of the mean.75%z = 2 standard deviationsAt least of the data values must bewithin of the mean.89%z = 3 standard deviationsAt least of the data values must bewithin of the mean.94%z = 4 standard deviations
71 Chebyshev’s Theorem Example: Apartment Rents Let z = 1.5 with = and s = 54.74At least (1 - 1/(1.5)2) = = 0.56 or 56%of the rent values must be between- z(s) = (54.74) = 409and+ z(s) = (54.74) = 573(Actually, 86% of the rent valuesare between 409 and 573.)
72 +/- 1 standard deviation Empirical RuleFor data having a bell-shaped distribution:of the values of a normal random variableare within of its mean.68.26%+/- 1 standard deviationof the values of a normal random variableare within of its mean.95.44%+/- 2 standard deviationsof the values of a normal random variableare within of its mean.99.72%+/- 3 standard deviations
73 Empirical Rule x 99.72% 95.44% 68.26% m m – 3s m – 1s m + 1s m + 3s 99.72%95.44%68.26%xmm – 3sm – 1sm + 1sm + 3sm – 2sm + 2s
74 Detecting Outliers An outlier is an unusually small or unusually large value in a data set.A data value with a z-score less than -3 or greaterthan +3 might be considered an outlier.It might be:an incorrectly recorded data valuea data value that was incorrectly included in thedata seta correctly recorded data value that belongs inthe data set
75 Standardized Values for Apartment Rents Detecting OutliersExample: Apartment RentsThe most extreme z-scores are and 2.27Using |z| > 3 as the criterion for an outlier, thereare no outliers in this data set.Standardized Values for Apartment Rents
76 Exploratory Data Analysis Five-Number SummaryBox Plot
77 Five-Number Summary 1 Smallest Value 2 First Quartile 3 Median 4 Third Quartile5Largest Value
78 Five-Number Summary Example: Apartment Rents Lowest Value = 425 First Quartile = 445Median = 475Third Quartile = 525Largest Value = 615
79 Box Plot Example: Apartment Rents A box is drawn with its ends located at the first andthird quartiles.A vertical line is drawn in the box at the location ofthe median (second quartile).400425450475500525550575600625Q1 = 445Q3 = 525Q2 = 475
80 Box PlotLimits are located (not drawn) using the interquartile range (IQR).Data outside these limits are considered outliers.The locations of each outlier is shown with the symbol * .continued
81 Box Plot Example: Apartment Rents The lower limit is located 1.5(IQR) below Q1.Lower Limit: Q (IQR) = (80) = 325The upper limit is located 1.5(IQR) above Q3.Upper Limit: Q (IQR) = (80) = 645There are no outliers (values less than 325 orgreater than 645) in the apartment rent data.
82 Box Plot Example: Apartment Rents Whiskers (dashed lines) are drawn from the endsof the box to the smallest and largest data valuesinside the limits.400425450475500525550575600625Smallest valueinside limits = 425Largest valueinside limits = 615