Presentation is loading. Please wait.

Presentation is loading. Please wait.

CHAPTER 3 DESCRIPTIVE STATISTICS

Similar presentations


Presentation on theme: "CHAPTER 3 DESCRIPTIVE STATISTICS"— Presentation transcript:

1 CHAPTER 3 DESCRIPTIVE STATISTICS
This chapter begins our discussion of descriptive statistics. It focuses primarily on measures of central tendency. Later chapters will focus on other descriptive statistics. We have already seen how group patterns can be made clearer by sorting raw data to form frequency distributions and how to represent such distributions by graphing them. Frequency distribution patterns have three features: 1) Form or shape (symmetrical/skewed distributions) 2) The frequency data cluster around a central value (topic of this chapter) 3) Frequency data can be characterized by their spread or dispersion (topic of next chapter)

2 CHAPTER 3 DESCRIPTIVE STATISTICS
Often, however, we are not interested in group patterns but instead want to characterize a group as a whole. Frequency distributions organize data, but do not summarize this group with a single summary value. We often need a single, concise, and more convenient summary value to summarize a distribution of data. Why a single summary measure? We might have such single questions as: what is the average annual income of police officers? What is the average number of inmate assaults on staff? Such questions ask for a single number that will best represent a whole distribution of measurements. This representative number will usually be at what location of a distribution? Near the center, where the measures tend to be concentrated, rather that at either extreme, where typically, only a few measures fall. From this fact comes the term “measure of central tendency.” There are three measures of central tendency. Intro continued…

3 Learning Objectives: By the end of this chapter, you will be able to:
Explain the purposes of measures of central tendency and the interpret the information they convey. Calculate, explain, and compare and contrast the mode, median, and mean. Explain the mathematical characteristics of the mean. Select an appropriate measure of central tendency according to level of measurement and skew. Chapter 3 objectives include… 1. Explain the purposes of measures of central tendency and the interpret the information they convey. 2. Calculate, explain, and compare and contrast the mode, median, and mean. 3. Explain the mathematical characteristics of the mean. 4. Select an appropriate measure of central tendency according to level of measurement and skew.

4 Three measures: Mode: The most common score.
Median: The score of the middle case. Mean: The average score. They are the mode, the median, and the mean.

5 Mode The most common score.
Can be used with variables at all three levels of measurement. Most often used with nominal level variables. Let’s start with the mode. The mode is simply the most frequently occurring score in the distribution. It is also the most versatile measure of central tendency in that in can be used with all of the different levels of measurement.

6 Finding the Mode Count the number of times each score occurred.
The score that occurs most often is the mode. If the variable is presented in a frequency distribution, the mode is the largest category. If the variable is presented in a line chart, the mode is the highest peak. Here is how you find the mode…

7 Finding the Mode “People should live together before marriage.” Freq.
% Agree 864 58.98 Neutral 227 15.49 Disagree 374 25.53 1165 100.00 Here’s an example. The mode is in bold font.

8 Mode Most frequent score 68, 92, 92, 108, 110 In this set of scores…
…68, 92, 92, 108, 110, the mode is 92. Although the mode is the least used measure of central tendency, it has some advantages. The advantages of the mode include: -it’s appropriate when you need a quick, rough estimate of central tendency -it’s versatile in that it can be used with all levels of measurement, in fact it is the only measure of central tendency that can be used with nominal variables -it’s not affected by extreme scores

9 Crime Type Frequency Felony 20 Misdemeanor 60 Other 137 217
Mode Most frequent score Crime Type Frequency Felony Misdemeanor Other Here’s another example that looks at the variable “Crime Type.” Crime Type Frequency Felony 20 Misdemeanor 60 Other 137 The mode is what in the above example? The single largest category, which would be “other.”

10 Gender Frequency Male 20 Female 20 40
Keep in mind that the mode has some limitations: Some distributions may not have a mode Gender frequency (f) male 20 female 20 40 Note that there is no single most frequent score.

11 Scores Frequency 2) Second, with some data, the mode may not be centrally located in the distribution. Example of distribution of scores Scores frequency 68 3 70 3 72 4 74 3 76 4 77 5 78 2 79 2 95 6

12 Scores Frequency Mode is what? Mode is 95, but is this score close to the majority of scores? No If you tried to summarize this distribution by reporting only the mode, would you be conveying an accurate picture of the distribution as a whole? No, in some cases the mode should not be used to summarize a distribution. So the disadvantages of the mode are: Least used measures of central tendency Mode is not the best measure of central tendency because: Some distributions may not have a mode Mode may not be centrally located

13 Median The score of the middle case.
Can be used with variables measured at the ordinal or interval-ratio levels. Cannot be used for nominal-level variables. A second measure of central tendency is the median. The median is defined as that score in a distribution of scores, above and below which one half of the frequency's fall. Median is the exact center of a distribution or, rather, median is the score at the 50th percentile.

14 Finding the Median Array the cases from high to low.
Locate the middle case. If N is odd: the median is the score of the middle case. If N is even: the median is the average of the scores of the two middle cases. Here’s how you find the median… To find the median for an odd number of scores, simply find the middle value. To find the median for an even number of scores, get the mean of the 2 middle values.

15 Finding the Median Robbery Rate for 7 Cities Atlanta 1037.8 Chicago
668.0 Dallas 582.8 San Francisco 444.9 Los Angeles 420.2 Boston 416.0 New York 406.6 The median in this example is in bold font. Why is it so easy to find?

16 Finding the Median How would the median change if we added an 8th case? San Diego had a robbery rate of There are now two middle cases, so the median is the average of the scores of the two middle cases: ( )/2 =

17 Is the exact center of a distribution
Median Is the exact center of a distribution 7, 8, 9, 10, 11 7, 8, 9, 10, 11, 12 9.5 Median Position = (N + 1) / 2 (6 +1) / 2 = 3.5 Here’s another example. We have 5 scores - 7, 8, 9, 10, 11 What is the median? 9 To find the median for an even number of scores, get the mean of the 2 middle values, or 7, 8, 9, 10, 11, 12 Median= N+1 divided by 2 (is a general rule for getting the median- it tells you the median POSITION) (6+1) divided by 2 = 7/2= 3.5 Advantages of median -is a positional measure since it locates the position of a case relative to the positions of other cases -can be used with ordinal or interval scales -is not affected by extreme scores -why most appropriate when distribution shows peculiarities Disadvantages -once median and mode computed, little more can be done with them -does not enter into advanced statistical analysis -cannot be calculated with nominal data although used with both ordinal In addition to serving as a measure of central tendency, the median is also a member of a class of statistics that measure position or location.

18 Other Measures of Position
Percentiles -Divide a distribution into 100 parts Deciles -Divide a distribution into 10 parts Quartiles -Divide a distribution into 4 parts Percentiles-divide a distribution into 100 parts. If a score of 284 is reported as being at the 93rd percentile, what does this mean? 93% of the scores were lower than 284. Deciles- divide a distribution into 10 parts Quartile- divide a distribution into 4 parts

19 Deciles Deciles- divide a distribution into 10 parts

20 1 2 3 4 Quartiles Quartile- divide a distribution into 4 parts
Quartiles Quartile- divide a distribution into 4 parts 1st quartile is 25th percentile- means 25% of the cases had scores lower than the score associated with the 25th percentile.

21 What score corresponds to the 50th percentile?
CASE SCORES What score corresponds to the 50th percentile? 9 x .5 = 4.5 or 4.5th case or 10 What score corresponds to the 25th percentile? 9 x .25 = 2.25th case or 9 What score corresponds to the 8th decile? 9 x .8 = 7.2th case or 11 In 9 cases, what score is median or 50th percentile? 9 x .5 = 4.5th case or 10 Have 9 cases, what score is at 25th percentile? 9 x .25 = 2.25th case = 9 What score is at 8th decile? 9 x .8 = 7.2nd case or 11

22 Is the arithmetic average
Mean Is the arithmetic average X = (X ) i N X = X + X + X + … X N 1 2 3 N Mean The arithmetic mean is a measure with which most of you are already familiar, a measure popularly known as the “average.” The “mean” is the result of Adolphe Quetelet (early 19th century). He coined the term “average man” to describe a characteristic of a social aggregate. The “average man” was the first term that allowed people to talk about Man without mentioning men. The “average man” was an abstraction derived by taking the average of each important characteristic of all members of a group.

23 Mean The average score. Requires variables measured at the interval-ratio level but is often used with ordinal-level variables. Cannot be used for nominal-level variables. The mean is the sum of the scores or values of a variable divided by the number of scores. Because addition and division is conducted, interval level data are required. However, researchers often calculate the mean for variables measured at the ordinal level because the mean is much more flexible than the median.

24 Finding the Mean The mean or arithmetic average, is by far the most commonly used measure of central tendency. The mean reports the average score of a distribution. The calculation is straightforward: add the scores and then divide by the number of scores (N ). In computing the mean we must locate two values. The summation of all the scores as well as the actual number of scores present. We will then divide the total score by the number of scores or cases.

25 Robbery Rate for 7 Cities
Finding the Mean Robbery Rate for 7 Cities Atlanta 1037.8 Chicago 668.0 Dallas 582.8 San Francisco 444.9 Los Angeles 420.2 Boston 416.0 New York 406.6 Total 3976.3 Here’s an example… We first find the total of all the scores. =

26 Finding the Mean The mean is 3976.3/7 = 568.04
These cities averaged robberies per 100,000 population. Divide the total of the scores by the number of scores. 3976.3/7 = Results tell us that there is an average robberies per 100,000 population.

27 2. Is the point around which all of the scores cancel out
Properties of the mean 1. Stable 2. Is the point around which all of the scores cancel out (X – X) = 0 Since mean is most popular measure of central tendency, let’s look at its properties very closely. 1) Mean is more stable than median or mode. i.e. it will vary less than mode or median from sample to sample. Most stable or reliable measure of central tendency. For example, if we took 50 separate random samples of a population of 1000, the mean would show less fluctuation than any other measure of central tendency. 2) The mean is a point of balance where the sum of deviations of the scores above the mean is equal in absolute value (without regard to + or – signs) to the sum of deviations below the mean.

28 , The mean indicates the central tendency of a variable, but “central” has a special meaning. Suppose we have a teeter totter and each square represents a child of certain weight sitting on the teeter totter. The mean would be at the point, that if you placed a fulcrum under the teeter totter, that the teeter totter would balance. Simply put, the mean is the point around which all of the scores (Xi) canceled out. In other words, the mean is a point of balance where the sum of deviations of the scores above the mean is equal in absolute value (without regard to + or – signs) to the sum of deviations below the mean. Scores > than mean are plus deviations Scores < than mean are minus deviations If we add the plus deviations (those above mean) and the minus deviations (those below it), the two sums cancel each other out so that the sum of the total deviations is 0 (Σx =0). Notice that the mean is sensitive to cases or children with large values or weights. Note – the mean is very sensitive to outliers or extreme scores.

29 2. Is the point around which all of the scores cancel out
Properties of the mean 1. Stable 2. Is the point around which all of the scores cancel out 3. Sum of squares of deviations from the mean is less than the squares of deviations about any other score (X – X) = minimum 2 Third property 3) Sum of squares of deviations from the arithmetic mean is less than the sum of squares of deviations about any other score. If you take each score in a distribution and get the difference between it and the mean, and square it, and sum those squared differences, that value is less than if you had picked any other score and had done the same process. What this means is is that the mean is closer to all of the scores than any other score. This property becomes more important when we take up later statistical topics.

30 , 108 In the first teeter-totter, if you take each score in a distribution and get the difference between it and the mean, and square it, and sum those squared differences, you get a value of 108. In the second teeter-totter, if you take each score in a distribution and get the difference between it and any other score, say for example, 49, and you square it, and sum those squared differences, you get a value of 116. 108 is less than will be less than if you had picked any other score and had done the same process. As mentioned earlier, what this means is is that the mean is closer to all of the scores than any other score. This property becomes more important when we take up later statistical topics. , 116

31 2. Is the point around which all of the scores cancel out
Properties of the mean 1. Stable 2. Is the point around which all of the scores cancel out 3. Sum of squares of deviations from the mean is less than the squares of deviations about any other score (X – X) = minimum 2 4. Is affected by every score in the distribution The last important characteristic of the mean is it is affected by every score in the distribution. This quality is both an advantage and a disadvantage. It is an advantage because mean uses all of the information in the distribution (every score affects it). It can be a disadvantage because mean can be misleading, affected by extreme scores.

32 Symmetrical Distribution
Mean Median Mode Note that in a perfect or symmetrical (I.e., normal) distribution, all of the measures of central tendency fall at the exact same place (I.e., in the middle)

33 Mode Median Mean Positive Skew
When mean is higher than the median, the distribution is positively skewed. In a skewed distribution, the relative positions of the three averages is always predictable -mode will be under the peak of the distribution -mean will be pulled out in direction of the skew (I.e., to the positive side, or to the right) -median will be in between mode and mean Positive Skew

34 Mode Median Negative Skew Mean
When mean is lower than the median, the distribution is negatively skewed. Note that: - the mode is under the peak of the distribution - the mean will be pulled out in direction of the skew (I.e., to the negative side, or to the left) -median will be in between mode and mean Negative Skew

35 Relationship Between LOM and Measures of Central Tendency
Nominal Ordinal Interval-ratio Mode Yes Median No Mean Yes (?) Here’s a matrix that sums up the relationship between levels of measurement and the measures of central tendency.

36 Use Mode When: Variables are measured at the nominal level.
You want a quick and easy measure for ordinal and interval-ratio variables. You want to report the most common score. Mode is most appropriate when… Variables are measured at the nominal level You want a quick and easy measure for ordinal and interval-ratio variables You want to report the most common score

37 Use Median When: Variables are measured at the ordinal level.
Variables measured at the interval-ratio level have highly skewed distributions. You want to report the central score. The median always lies at the exact center of a distribution. The median is most appropriate when… Variables are measure at the ordinal level Variables measured at the interval-ratio level have highly skewed distributions You want to report the central score

38 Use Mean When: Variables are measured at the interval-ratio level (except for highly skewed distributions). You want to report the typical score. The mean is “the fulcrum that exactly balances all of the scores.” You anticipate additional statistical analysis. The mean is most appropriate when… Variables are measured at the interval-ratio level You want to report the typical score You anticipate additional statistical analysis

39 Used in advanced statistical analysis More stable
Summing Up Mean Preferred statistic Easily manipulated Used in advanced statistical analysis More stable Mean Preferred statistic for representing central tendency. Why? -more easily manipulated (mean used in further statistical operations) -since based on interval level data, can be used in advanced statistical analysis -since uses more information than mean (exact scores vs. only positions of the scores) -since more stable than median (varies less from sample to sample) -preferred measure of central tendency if distribution not skewed When in doubt, use the mean instead of the median but whenever a distribution is highly skewed, median will generally be more appropriate than the mean, because mean is pulled in the direction of skewness, or the tail.

40 Normal Distribution The last topic of this chapter pertains to something known as “kurtosis.” To understand kurtosis, first note that in a normal distribution where the curve beats the horizontal axis.

41 If a larger proportion of cases falls into the tails of a distribution then into those of a normal distribution, the distribution has positive kurtosis. Note the space now between the ends of the curve and the horizontal axis. Positive Kurtosis

42 The last topic of this chapter pertains to something known as “kurtosis.” To understand kurtosis, first note that in a normal distribution where the curve beats the horizontal axis. Negative Kurtosis

43 END CHAPTER 3


Download ppt "CHAPTER 3 DESCRIPTIVE STATISTICS"

Similar presentations


Ads by Google