# Describing Data: Measures of Central Tendency

## Presentation on theme: "Describing Data: Measures of Central Tendency"— Presentation transcript:

Describing Data: Measures of Central Tendency
Topic-3 Describing Data: Measures of Central Tendency

Average: A Measure of Central Tendency
Average: A single numerical characteristic that represents a specific feature of a set of data - the center of the values. Population Mean (Overall Average): Computed by the Total Sum divided by the total number of items in the population. It is a Parameter - a measurable characteristic of a population. Sample Mean (Sample Average): Computed by the Total Sum of the sample divided by the total number of items in the sample. It is a Statistic - a measurable characteristic of a sample. Both Population Mean and Sample Mean are the Arithmetic Mean.

Important Properties of Arithmetic Mean
Every set of data (interval and ratio level) has a unique mean. All values are included in computing the mean. The "sum of the deviations" of each value from the mean will always be equal to zero. Mean can be viewed as a "balance" point. Mean can be easily affected by unusually (large or small) values (called as "outliers"). There is no "mean" if open-end classes are included in the data.

Median The value of midpoint - after the data have been ordered from the smallest to the largest (vise verse). There are the same number of values above the median as below it. For an even set, the median is computed as the arithmetic average of the two middle numbers. The median is not affected by extreme values in the set. Median is unique, can be determined even with open-end classes. Median can be computed for all levels of data (except nominal level data).

Mode The value of the observation that appears most frequently.
Mode can be determined for all levels of data sets. Mode is not affected by extreme values in the set. Mode can be determined even with open-end classes. However, mode may not be unique for some data sets, while may not exist in other data sets. Another measure of central tendency.

Weighted Mean To allow each item weighted differently according to its importance in the calculation of the mean. All weights (Wi) must be determined carefully. Different weight sets may result in different weighted mean. Weighted mean allows subjective managerial considerations to be considered and evaluated. Widely used in many business statistical applications. Examples

Geometric Mean The Nth root of the product of N numbers for a set of N positive numbers, normally used: to average percents, indexes, and relatives, and to determine average percent changes (in sales, production) from one time period to another. Geometric mean is a more conservative measure of average. Examples

Mean, Median, Mode for Grouped Data
In practice, we have to, more often, make estimates about some thing that are of interest from a sample, in a form of grouped data, as frequency distribution. Mean: (Arithmetic) the sum of the products of the midpoint by its frequency of each group divided by the total number of frequencies. (Examples) This mean may be different from the population mean, because it is only an estimate of the population mean. Median: only can be estimated by locating the median group and then calculating the midpoint assuming this group is evenly spaced. (Examples) This median is an estimate, may be different from population median. In the calculation, either frequency or its percentage is enough and open-end class at either side has no effect. Mode: can be estimated by the midpoint of the group with the largest group frequency. (Examples) There may be more than one "mode" (multimodal) within a group of data. "Bimodal data" is for having two modes co-existing.

Symmetric Distribution
Symmetric Distribution: symmetrical with bell-shaped, having the same shape on both sides of the center axis. Its mean, median, and mode are equal, located at the center. Asymmetric Distribution: bell-shaped but skewed toward one side, having different shapes on two sides of the center axis. If positively skewed, the mean will be the largest among three averages (Mean > Median > Mode), because the different influences of extreme values on the three averages. (Examples) If negatively skewed, the mean will be the lowest among three averages (Mean < Median < Mode). (Examples) Approximate relationship among the three: when the data set is large enough and not skewed too much, then the median is 1/3 from the mean to the mode. If two averages are known, the third can be estimated by given formulas.

Population Mean Definition: For ungrouped data, the population mean is the sum of all the population values divided by the total number of population values. To compute the population mean, use the following formula: Where: µ is mu Σ is Sigma Χ is the Individual value Ν is the Population size

Sample Mean Definition: For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample values. To compute the sample mean, use the following formula: Where: is X-bar Σ is Sigma Χ is the Individual value n is the Sample size

Example 1 Parameter: a measurable characteristic of a population.
The Kiers family owns 4 cars. The following is the mileage attained by each car: 56,000; 23,000; 42,000; and 73,000. Find the average miles covered by each car. The mean is (56, , , ,000)/4 = 48,500.

Example 2 Statistic: a measurable characteristic of a sample.
A sample of 5 executives received the following bonuses last year: \$14,000; \$15,000; \$17,000; \$16,000 and \$15,000. Find the average bonus for these 5 executives. Since these values represent a sample size of 5, the sample mean is (\$14,000 + \$15,000 + \$17,000 + \$16,000 + \$15,000)/5 = \$15,400.

Illustration Consider the set of values: 3, 8, and 4. The mean is 5. So, (3 – 5) + (8 – 5) + (4 – 5) = – 1 = 0. Symbolically, we write:

The Weighted Mean Definition: The weighted mean of a set of numbers X1, X2, …, Xn, with corresponding weights w1, w2, …, wn, is computed from the following formulas:

Example 3 Compute the median for the following data.
The age of a sample of 5 college students is: 21, 25, 19, 20, and 22. Arranging the data in ascending order gives: 19, 20, 21, 22, and 25. Thus, the median is 21. The height of 4 basketball players, in inches, is 76, 73, 80, and 75. Arranging the data is ascending order gives: 73, 75, 76, and 80. Thus, the median is 75.5.

Example 4 The exam scores for 10 students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, and 87. Since the score of 81 occurs the most, the modal score is 81.

Example 5 During a 1-hour period on a hot Saturday afternoon, cabana boy Chris served 50 drinks. Compute the weighted mean of the price of the drinks. Price (\$), Number sold): (0.50, 5); (0.75, 15); (0.90, 15); (1.10, 15). The weighted mean is: \$((0.50 x 5) + (0.75 x 15) + (0.90 x 15) + (1.10 x 15)) = \$43.75/50 = \$0.875.

The Geometric Mean (I) Definition: The geometric mean (GM) of a set of n numbers is defined as the nth root of the product of n numbers. The formula for the geometric mean is given by: One main use of the geometric mean is to average percents, indexes, and relatives.

The Geometric Mean (II)
The other main use of the geometric mean is to determine the average percent increase in sales, production, or other business or economic series from one time period to another. The formula for the geometric mean as applied to this type of problem is:

Example 6 The interest rates on 3 bonds were 5, 7, and 4 percent.
The geometric mean is: The arithmetic mean is: ( )/3 = The GM gives a more conservative profit figure because it is not heavily weighted by the rate of 7%.

Example 7 The total number of females enrolled in American colleges increased from 755,000 in 1986 to 835,000 in 1995. Here n = 10, so (n – 1) = 9. That is, the geometric mean rate of increase is 1.27%.

The Mean of Grouped Data
The mean of a sample of data organized in a frequency distribution is computed by the following formula: Where: is X-bar For ΣXf, the X value is the class midpoint and the f value is the class frequency For Σf is the sum of the frequencies n is the sample size

The Median of Grouped Data
The median of a sample of data organized in a frequency distribution is computed by the following formula: Where: L is the lower limit of the median class n is the sample size CF is the cumulative frequency preceding the median class f is the frequency of the median class i is the median class interval.

Example 8 A sample of 10 movie theatres in a large metropolitan area tallied the total number of movies showing last week. Compute the mean number of movies showing. To determine the median class for grouped data: Construct a cumulative frequency distribution. Divide the total number of data values by 2. Determine which class will contain this value. For example, if n = 50, 50/2 = 25, then determine which class will contain the 25th value – the median class.

Example 8 Movies Showing Frequency, f Class Midpoint, X (f)(X) 1 – 2 1
1.5 3 – 4 2 3.5 7.0 5 – 6 3 5.5 16.5 7 – 8 7.5 9 – 10 9.5 28.5 Total 10 61 61/10 = 6.1 movies

Example 8 Movies Showing Frequency, f Cumulative Frequency 1 – 2 1 3 – 4 2 3 5 – 6 6 7 – 8 7 9 – 10 10 The median class is 5 – 6, since it contains the 5th value (n/2=5). From the table, L = 5, n = 10, f = 3, i = 2, & CF = 3. Thus, the median

Example 9 A sample of 20 appliance stores in a large metropolitan area revealed the following number of DVD players sold last week. Compute the mean number sold. The formula and computation is shown below:

Example 9 Number Sold Frequency, f Class Midpoint, X (f)(X)
The table also gives the necessary computations: Number Sold Frequency, f Class Midpoint, X (f)(X) 5 – 9 2 7 14 10 – 14 4 12 48 15 – 19 10 17 170 20 – 24 3 22 66 25 – 29 1 27 Σf = 20 ΣfX = 325

Table: Comparison of the Three Averages
Mean (M) Median (Mdn) Mode (Mo) Definition Balance point Middle point Most frequent score When to Use Symmetrical distributions of interval/ratio data When mean is inappropriate (except for nominal data) Nominal data Frequency of Use in Scientific Reporting Very frequent Somewhat frequent Very infrequent1 Relative Values for Positive Skew2 Higher than Mdn and Mo Between M and Mo Lower than M and Mdn Relative Values for Negative Skew Lower than Mdn and Mo Higher than M and Mdn 1Although nominal data (for which the mode is appropriate) is frequently reported in scientific writing, most researchers report percentages for each category rather than reporting the modal category. Thus, the mode is seldom used. 2See Section 9 to review skewed distributions

Symmetric Distribution
Zero skewness Mode = Median = Mean

Right Skewed Distribution
Positively skewed: Mean and Median are to the right of the Mode Mode < Median < Mean

Left Skewed Distribution
Negatively skewed: Mean and Median are to the left of the Mode Mean < Median < Mode

Skewness If two averages of a moderately skewed frequency distribution are known, the third can be approximated. Mode = mean – 3(mean – median) Mean = [3(median) – mode]/2 Median = [2(mean) + mode]/3

Exercise 3A Springers sold 95 Antonelli men’s suits for the regular price of \$400. For the spring sale the suits were reduced to \$200 and 126 were sold. At the final clearance, the price was reduced to \$100 and the remaining 79 suits were sold. What was the weighted mean price of an Antonelli suit? Springers paid \$200 a suit for the 300 suits. Comment on the store’s profit per suit if a salesperson receives a \$25 commission for each suit sold.

Exercises 3B & 3C In 1950 there were 51 countries that belonged to the United Nations. By 1996 this number had increased to What was the geometric mean annual rate of increase in membership during this period? Darenfest and Associates stated that in 1988 hospitals spent \$3.9 billion on computer systems. They estimated that by the year 2000 this would increase to \$14.0 billion. If expenditures did increase to \$14.0 billion, what is the geometric mean annual rate of increase for the period?