 Measures of Location and Dispersion

Presentation on theme: "Measures of Location and Dispersion"— Presentation transcript:

Measures of Location and Dispersion
Central Tendency Measures of Location and Dispersion A measure of CENTRAL TENDENCY is exactly what it sounds like: it is a numerical value that indicates the ‘central location’ of the data; a measure of DISPERSION, on the other hand, is a measurement of how far spread the data are.

Mean, Median, Mode Of Statistical Data
Remember that data are the observations or scores recorded.

Population Mean The population mean is the sum of the population values divided by the number of population values: where µ is the population mean. X is a particular value.  indicates the operation of adding. N is the number of observations in the population. The mean is the arithmetic average computed by the sum of the scores divided by the total number of scores. The POPULATION is all scores or members of a group that are of interest to a researcher; the group to which the researcher wishes to generalize.  Remember that the SAMPLE is a pre-determined portion of the population. Read the formula like this: THE MEAN IS EQUAL TO THE SUM OF THE RAW SCORES DIVIDED BY N. The symbol for MEAN is pronounced “m-yu” and the symbol for sum is the Greek symbol “sigma”

PRACTICE Parameter: a measurable characteristic of a population.
The scores of 4 low-achieving students is compared: 56; 23; 42; and 73. Find the mean score. The mean is: ( )/4 = 48.5 A PARAMETER is to a POPULATION what a STATISTIC is to a SAMPLE.

The sample mean is the sum of the sample values divided by the number of sample values:
where X is for the sample mean n is the number of observations in the sample This formula is calculated exactly the same way as the POPULATION MEAN, but the symbols are different because it is for a sample. Don’t forget that: The POPULATION is all scores or members of a group that are of interest to a researcher; the group to which the researcher wishes to generalize.  Remember that the SAMPLE is a pre-determined portion of the population. Sample Mean

A sample of five part-time teacher salaries is compared: \$15,000, \$15,000, \$17,000, \$16,000, and \$14,000. Find the mean bonus for these five teachers. Since these values represent a sample size of 5, the sample mean is (14, , , , ,000)/5 = \$15,400. Statistic: a measurable characteristic of a sample. PRACTICE

Properties of the Arithmetic Mean
Every set of interval and ratio-level data has a mean All the values are included in computing the mean A data set will only have one unique mean The mean is a useful measure for comparing two or more populations (we'll discuss this one later when we get to t-tests and analysis of variance) and…

Properties of the Arithmetic Mean
The arithmetic mean is the only measure of central tendency where the sum of the deviations of each value from the mean is always zero. These properties, particularly the last one – that the sum of the deviations of each value from the mean is always zero – make the mean a valuable number in other calculations, such as the variance and standard deviation. Because calculating a zero answer is useless, the variance SQUARES the deviations from zero, giving a unique number.

PRACTICE What is the mean for: 3, 8, and 4? The mean is 5.
Illustrate the fifth property: (3-5) + (8-5) + (4-5) = = 0. PRACTICE

Median: The midpoint of the values after they have been ordered from the smallest to the largest, or the largest to the smallest. There are as many values above the median as below it in the data array. Note: For an even set of numbers, the median will be the arithmetic mean of the two middle numbers. The median is the score in the middle when the scores are arranged in order from the smallest to the largest (the 50th percentile). The Median

PRACTICE Compute the median for the following data:
The age of a sample of five college students is: 21, 25, 19, 20, and 22. Arranging the data in ascending order gives: 19, 20, 21, 22, 25. Thus the median is 21. The height of four basketball players, in inches, is 76, 73, 80, and 75. Arranging the data in ascending order gives: 73, 75, 76, 80. Thus the median is 75.5. What does the median tell you? In the first example, it tells you that the ‘middle’ age is 21, but it contains no information about the other ages. PRACTICE

Properties of the Median
There is a unique median for each data set. It is not affected by extremely large or small values and is therefore a valuable measure of central tendency when such values occur. It can be computed for ratio-level, interval-level, and ordinal-level data. It can be computed for an open-ended frequency distribution if the median does not lie in an open-ended class. The median is never used to describe NOMINAL DATA. (Remember that nominal is level of measurement where numbers represent different categories that are mutually exclusive and exhaustive; categories have no logical order.) The median is often preferred for describing ORDINAL DATA (The level of measurement that represents rank, where the differences in value are not equal; ordinal data categories are mutually exclusive and exhaustive). Computing the median ignores some important information available in the data, since it only reflects the frequency of scores– it does not consider the mathematical values of these scores. Properties of the Median

The mode is the value of the observation that appears most frequently.
EXAMPLE 6: the exam scores for ten students are: 83, 89, 84, 75, 99, 87, 83, 75, 83, 87. Find the mode. Since the score of 83 occurs the most, the modal score is 83. The mode is usually used to describe NOMINAL DATA. It does not take into account any scores other than the most frequent, and therefore is not a good reflection of the data. The Mode

Measures of Dispersion
Calculating Data SPREAD The mean, median, and mode are measures of ‘central tendency’: they indicate different ‘center points’ of the data. But alone, they give you little information about the data. There are various ‘measures of dispersion’ that tell you more about the data – highs to lows, and scales, such as the standard deviation.

Measures of Dispersion
The range is the difference between the highest and lowest values in a set of data. RANGE = Highest Value - Lowest Value PRACTICE: A sample of five accounting graduates revealed the following starting salaries: \$22,000, \$28,000, \$31,000, \$23,000, \$24,000. The range is \$31,000 - \$22,000 = \$9,000. The range is a simple measure of dispersion, simply telling you the highest and lowest values. It contains no information about where data tends to cluster. Measures of Dispersion

Population Variance The population variance is the arithmetic mean of the squared deviations from the population mean. This is a REALLY IMPORTANT FORMULA! It will show up many times in advanced statistical procedures. It’s really not a difficult formula: the sum of squares: (the sum - sigma – of the raw scores – X – minus the mean – “m-yu” – squared) divided by N, the total number in the population. It is nearly meaningless to ‘interpret’ the variance: it is often a huge number that contains little information in-and-of itself for you, the researcher. However, the variance becomes very important in other calculations, such as the standard deviation, which IS an important number that you, the researcher, can interpret.

3-35 PRACTICE The ages of the Jones family are 2, 18, 34, and 42 years. What is the population variance?

Alternative formulas for the population variance are:
This may look like a more difficult formula, but, like the weighted mean, it is a ‘computational formula’. It is useful if you have to do your calculations by hand. The good news is, with SPSS and other statistical packages, we rarely have to calculate these formulas by hand any more. Population Variance

The Population Standard Deviation
The population standard deviation (σ) is the square root of the population variance. For the PRACTICE, what is the standard deviation for the ages? (variance = 236) the population standard deviation is (square root of 236). Remember the difference between a population and a sample: one is the WHOLE GROUP you are trying to generalize to, the other is a PORTION OF THAT GROUP. Whenever you are dealing with numbers that describe the POPULATION, they are called PARAMETERS. PARAMETERS are usually represented by Greek letters. In the case of the population standard deviation, the symbol is <sigma> (the lower case sigma, as opposed to the upper case sigma, which means ‘sum’).

Sample Variance The sample variance estimates the population variance.
Conceptual Formula Computational Formula Sample Variance

PRACTICE (295-1369/5)/5-1 s2 = 21.2/(5-1) = 5.3
A sample of five hourly wages for various jobs on campus is: \$7, \$5, \$11, \$8, \$6. Find the variance. ( /5)/5-1 s2 = 21.2/(5-1) = 5.3 PRACTICE

Sample Standard Deviation
The sample standard deviation is the square root of the sample variance. Find the sample standard deviation for example 14. s2 = 5.3 s = 2.30 Sample Standard Deviation

Interpretation and Uses of the Standard Deviation
Empirical Rule: For any symmetrical, bell-shaped distribution, approximately 68% of the observations will lie within +/- one SD of the mean approximately 95% of the observations will lie within +/- 2 SD of the mean approximately 99.7% within +/- 3 SD of the mean. The bell-shaped curve is an important concept in statistics. It essentially means that the data are symmetrically distributed, with the most falling about the mean (with a true bell curve, the MEAN, MEDIAN, and MODE are the same number), and falling off toward either tail.

m-3s m-2s m-1s m m+1s m+2s m+ 3s

Download ppt "Measures of Location and Dispersion"

Similar presentations