Presentation is loading. Please wait.

Presentation is loading. Please wait.

Averages Dr. Richard Jackson

Similar presentations


Presentation on theme: "Averages Dr. Richard Jackson "— Presentation transcript:

1 Dr. Richard Jackson jackson_r@mercer.edu
Averages Dr. Richard Jackson In this module we will take a look at three descriptive statistics, specifically averages. We will take a look at each of these three averages and examine how they are calculated using raw data and group data. © Mercer University 2005 All Rights Reserved RJ Seg 3

2 Averages A number that represents a large number of subjects
Three Averages How are they calculated When are they used And misused Mean Median Mode The definition of an average is that it is a number that represents a number of subjects. There are three averages and we will take a look at each of them. How they are calculated, when they are used or misused. These three averages are the mean, the median, and the mode and we will discuss them in that order. RJ Seg 3

3 Mean  Arithmetic average Symbol Pronounced “X bar”
X by itself is a symbol for datum or data Lets begin our discussion with the most often used average and that is the mean. The mean is the arithmetic average and that it is the one that you are probably most familiar with, the symbol for the mean is a capital letter X with a line over it. It is pronounced in the statistical vernacular as X bar, a capital letter X by itself is a statistical symbol for an individual piece of data. RJ Seg 3

4 Mean Calculation Raw data X =  N 1324.5 14.717 = 90 14.717 = Mean
15.2 16.2 13.0 12.6 12.8 16.1 16.4 17.8 17.5 13.6 14.5 12.4 14.7 13.2 17.1 12.7 15.0 12.3 15.1 16.9 14.2 14.4 14.0 15.7 16.0 15.9 14.8 14.9 13.5 15.5 16.7 15.3 17.0 13.3 13.8 17.3 13.9 16.5 14.1 14.3 12.1 13.7 15.6 14.6 13.4 15.8 15.4 Hemoglobin concentrations in mg/dl in ninety adult women, ages 30-45 X = N . Lets begin with a calculation of the mean using raw data. The formula is X bar equals the sum of X divided by N. This is the first real formula that we have encountered and I should point out that on in any exam you will be given any and all formulas that you might need. There will be no need to memorize any formulas and if you need any formulas that are not given on the exam, please feel free to ask during the exam. In our example for the calculation of the mean from raw data we will use the data that we have already used concerning the hemoglobin concentrations of 90 women. The sum of X is simply the sum of all the individual values or 1,324.5 and in of course symbolizes the number of subjects in the sample. So for our calculation of the mean from these raw data we get a value of 1324.5 = 90 = Mean RJ Seg 3

5 Mean Calculation Grouped data The Frequency Distribution
Select arbitrary reference interval it could be any Construct x1 Construct fx1 Class Interval f x1 fx1 3 5 11 14 20 13 8 2 -5 -4 -3 -2 -1 +1 +2 +3 +4 +5 +6 -15 -12 -22 -14 +13 +16 +15 +12 The calculation of the mean from group data is somewhat different in that it involves first of all taking a look at the groups data or in other words the frequency distribution and choosing any interval in the frequency distribution as an arbitrary reference interval. We will choose for our example the interval 14.5 to Following that we construct another column data labeled little x prime and the numbers in the column little x prime are determined by indicating the number of intervals each interval is either below or above the reference interval. For example, you will see the interval 14.0 to 14.4 is one interval below our arbitrary reference interval so its assigned a -1. The next lowest interval -2 and so forth. The interval 15.0 to 15.4 is one interval above our arbitrary reference interval so its assigned a value +1 the others are +2, +3, and so forth. The next column is the product of the little f column and the little x prime column and is labeled fxprime. Its obtained by simply multiplying for each interval. The interval frequency times the value of x prime. RJ Seg 3

6 Mean Calculation Grouped data fx1 N X = m1 +  ( i ) Sum of fx1
Class Interval f x1 fx1 Calculation Grouped data Sum of fx1 5 Formula 3 5 11 14 20 13 8 2 -5 -4 -3 -2 -1 +1 +2 +3 +4 +5 +6 -15 -12 -22 -14 +13 +16 +15 +12 We can now calculate the mean from group data using the appropriate formula. At first it involves summing the fxprime column or determining the sigma fxprime which in this case comes out to be 5 and that number is then used in the formula. In the formula given on the slide the sigma sign should be in the numerator of that fraction. In this formula the M prime is the midpoint of the arbitrary reference interval that we chose. The I is of course the interval size. The sigma fxprime is the sum of the fxprime column. fx1 N X = m1 +  ( i ) 5 RJ Seg 3

7 Mean Calculation Grouped data fx1 N X = M1 +  ( i ) Formula 5 90 X
Class Interval f x1 fx1 Calculation Grouped data Formula 3 5 11 14 20 13 8 2 -5 -4 -3 -2 -1 +1 +2 +3 +4 +5 +6 -15 -12 -22 -14 +13 +16 +15 +12 fx1 N X = M1 +  ( i ) Substituting the appropriate numbers in the formula gives us the value of the mean of the group data. The M prime in this case is 14.7 which is the midpoint of our arbitrary reference interval which if you recall was 14.5 to The sum of fxprime is 5. N is 90 and the interval size is 0.5. Carrying out that mathematical manipulation gives us a value of 5 90 X = ( 0.5 ) X = 14.73 5 RJ Seg 3

8 Mean Calculation Grouped data Raw data Error of grouping X = 14.73 X
Class Interval f x1 fx1 3 5 11 14 20 13 8 2 -5 -4 -3 -2 -1 +1 +2 +3 +4 +5 +6 -15 -12 -22 -14 +13 +16 +15 +12 X = 14.73 Raw data The mean that we calculated for the data when grouped was The mean for raw data was The reason for the difference is due to what is known as the error of grouping. When we take raw data and put it into groups the assumption is made that within each of those groups the number of subjects is normally distributed. For example in the interval 14.5 to 14.9 grouping assumes that all 20 of those subjects in that interval are normally distributed from 14.5 to 14.9 and the mean of those 20 individuals is In reality however I think you can see that it would be possible that all 20 of those individuals could have a measurement of 14.9 and the mean would be of course not be 14.7 X = Error of grouping 5 RJ Seg 3

9 Median Point or measure in a distribution(collection of data) where 50% subjects are above and 50% subjects below Calculation Raw Data Odd number of subjects is equal to middle number if arranged in order regardless of the magnitude of the other numbers 100 . Lets take a look at the second average known as the median. The median as we have mentioned earlier is that point in a distribution or collection of data in which 50% of the subjects are above and 50% are below for example if we have an odd number of subjects the median is the middle number. You arrange the values in increasing or decreasing value and the middle number is the median representing that point in the distribution where 50% of the subjects are above and 50% are below. It makes no difference what the other numbers are and in this example if the top number was 1,000 instead of 100 the median would still be 50. If the bottom number were changed from 20 to 2 the median would still be 50. 90 50 50 40 20 RJ Seg 3

10 Median Calculation Raw Data
Even number of measures is the mid point of the difference between the two in the middle regardless of the other numbers 100 80 70 60 40 . If we had an even number of subjects the data are arranged in ascending or descending order and the median is the midpoint of the difference between the two measures in the middle. Again, regardless of the other numbers. The median of these four pieces of data is 70 which is halfway between 60 and 80. RJ Seg 3

11 Median Example Calculation Raw Data
15.2 16.2 13.0 12.6 12.8 16.1 16.4 17.8 17.5 13.6 14.5 12.4 14.7 13.2 17.1 12.7 15.0 12.3 15.1 16.9 14.2 14.4 14.0 15.7 16.0 15.9 14.8 14.9 13.5 15.5 16.7 15.3 17.0 13.3 13.8 17.3 13.9 16.5 14.1 14.3 12.1 13.7 15.6 14.6 13.4 15.8 15.4 Hemoglobin concentrations in mg/dl in ninety adult women, ages 30-45 Calculation Raw Data Even number of measures. Median is the mid-point of the difference between the two in the middle Regardless of the other numbers Lets take a look at the calculation of the median using the data in our example. There are 90 subjects an even number so the median is the midpoint of the difference between subject number 45 and 46. If we take a look at the data when they have been arranged in ascending or descending order subject number 45 had a value of 14.6 and subject number 46 had a value of halfway between those two is which is equal to the median. In other words in this example 50% of the subjects had hemoglobin concentrations greater than and half or 50% less than Ranked 45 45 = 14.6 N=90 Mdn = 14.65 45 46 = 14.7 RJ Seg 3

12 Median Example Calculation Grouped data 20 N=90 36 9 11 90 / 2 = 45
Cumulative Frequency Class Interval f 3 5 11 14 20 13 8 2 3 6 11 22 36 56 69 77 82 85 88 90 20 N=90 36 9 11 36 90 / 2 = 45 20 Calculating the median from group data is a little bit more difficult. In doing this we make use of the cumulative frequency distribution and in this example we want to know that point in this distribution where 50% of the subjects are above and 50% are below. In other words we have 90 subjects in this sample and 90 divided by 2 is 45, so we want to know that point in this distribution with 50% fall below and 50% fall above. If we start at the top of the cumulative frequency distribution and start moving downward we see the number of subjects that fall below the upper limit of each interval. We want to get to that subject which is 45th one and determine that value of the hemoglobin for that 45th individual. If we are reading down the cumulative frequency goes from 3 to 6, 11 to 22 to 36 and then in the next interval we have 20 individuals which carries us up to 56 so we know that the 45th individual falls in the interval 14.5 to If you take a look at the center of the slide at the interval 14.5 to 14.9 we see that there are 20 individuals in that interval up to the lower limit of that interval we had 36 individuals. We need therefore, 9 of the 20 individual in that interval to bring us up to the 45th individual. In order to determine how far we go through that interval we use the fraction 9/20 (9 over 20). In other words we want to go 9/20th of the way through that interval designated by the star. That is the point in the distribution where 45 of the subjects fall below and 45 above. To determine what that is we take the which is the lower limit of that interval and add to it that fraction of the way through the interval we want to go through. In other words 9/20th times 0.5. Doing that mathematical manipulation gives us a value of 14.5 14.9 9 ( 0.5) = 14.675 20 RJ Seg 3

13 Median Mean Grouped data Raw data Symbol Grouped data Raw data Symbol
14.675 Raw data 14.65 Symbol Mdn Grouped data 14.73 Raw data 14.717 Symbol x And so to summarize, the median for the group data was The median for the raw data was I should mention that the symbol for the median is MDN, for the mean the group data was The raw data mean was and its symbol of course is X bar. RJ Seg 3

14 Mode Raw Data Group Datum occurs most frequently Bimodal
Mid point of the interval with the greatest frequency Class Interval f . The third and last average is the mode and for raw data the mode is defined as the data that occurs most frequently. In our example there are two pieces of data that occur most frequently or with equal frequency appears five times and 14.8 appears five times. If we get a graph of frequency distribution that has two modes or peaks that distribution is said to be bimodal. Most frequency polygons are unimodal meaning that they have one peak and the peak is the mode for that piece of data that occurs most frequently. Sometimes in distributions however you get two peaks and that distribution again will be said to be bimodal. For group data the mode is the midpoint of the interval with the greatest frequency. An examination of the frequency distribution or great group data reveals that the interval 14.5 to 14.9 has the greatest frequency of 20 and its midpoint is 14.7. 20 MP = 14.7 RJ Seg 3

15 Normal Distribution Data normal, averages will be similar
If data are skewed, averages will be different 20 18 16 14 12 10 8 6 4 2 Frequency In a normal distribution if you calculate the three averages the mean, the median, and the mode they will be very similar in fact if the distribution is perfectly normal then the mean will equal the median which will equal the mode. If however the data are skewed the averages will be different. 12.2 12.7 13.2 13.7 14.2 14.7 15.2 15.7 16.2 16.7 17.2 17.7 Hemoglobin Concentration (mg/dl) RJ Seg 3

16 Skewed Data Positive Skewed Mode Median Mean Always at the peak
Pulled to the right Mode Median Mean In a positively skewed distribution the mode is always at the peak but the mean will be pulled far to the right by a few rather large values and the median occurs in the middle. In the bottom of the slide in the bottom part you see the normal distribution. Mean Median Mode RJ Seg 3

17 “Play Ball” This slide demonstrates the difference in the two averages, the mean and the median and it illustrates how sometimes statistics can be used to prove ones own point or misused. In this example, taken from the sports pages of the local news paper the dispute between the owners and the players for baseball is illustrated. The owners maintain that the average salary for a baseball player is 1.2 million dollars and therefore is already high enough and salaries shouldn’t be increased. The players on the other hand maintain that the average is about 371,000 dollars and therefore be increased. Who is right in this case. Well both. The owners and the players are both using averages but the salaries for baseball players is very highly positively skewed in that most players tend to make salaries toward the lower range but there are a few star players who make exceedingly high salaries producing a highly positively skewed distribution. The owners are using the mean which is a very high figure because as we just showed the means pull to the right in a positively skewed distribution. The players are using the median which is a lower number at 371,000 dollars. RJ Seg 3

18 Baseball Salaries Who was using statistics right Players were Median
appropriate average for skewed data Mean Normal data Who is misusing statistics in this example. In this case the players were correct and it was the owners who were misusing statistics to prove their point. When the data are skewed the appropriate average to use is always the median and not the mean. Because in a skewed distribution the mean is always pulled in the direction of the tail. As we see here it is pulled far to the right in the direction of the skew. For normal data, the appropriate average to use is the mean but for skewed data the appropriate average is the median. Mode Median $400,00 Mean $1.2 Million RJ Seg 3

19 Negative Skew Negative Skewed Mode Median Mean Always at the peak
Pulled to the left In a negatively skewed situation the mean as we stated earlier is pulled in the direction of the tail. In this case it is pulled to the left. In this situation for negatively skewed data the mean will be less then the median which is less then the mode. Mean Median Mode RJ Seg 3

20 Appropriate Statistics
Median appropriate average for skewed data Mean Normal data Mode Nominal data Skewed Data Mean Median Mode Mean Median Mode Normal Data To reiterate, the skewedness of the distribution determines what statistical average would be appropriate. The median is always the appropriate average to use when the data is skewed. The mean is used when we have normal data and the mode is used when we have nominal data. Mean Median Mode RJ Seg 3

21 Averaging Means X1 = 60 N1 = 10 X2 = 50 N2 = 60 X3 = 40 N3 = 30
Weighted Mean To conclude our discussion of averages lets take a look at a situation where in we want to take three groups of subjects with known N’s and means and determine the average of the entire group or determine a weighted mean. Recall that the simple formula for a mean is X bar equals the sum of X over N. by rearranging we determine that the sum of X equals the mean times N. We have three groups with unequal N’s. If they were all equal, that is the same number of subjects in each group we could simply add the three means and divide by three to get the mean of the entire group, but in this case we have to account for the fact that some groups have larger numbers in it and therefore carry more weight. So for each of those three groups of 10, 60, and 30 with means of 60, 50, and 40 respectively, we are going to determine the sum of X.  X N X = =  X ( N ) ( X ) RJ Seg 3

22 Averaging Means =  X ( N ) ( X )  X = ( 60 ) ( 10 ) = 600 X1 = 60 N1
= 10 X2 = 50 N2 = 60  X = ( 50 ) ( 60 ) = 3000 Carrying out that mathematical manipulation shows us that for group 1 the sum is 600, group 2 is 3000, and group 3 is Adding them gives us the sum of X for the entire group or 4,800. X3 = 40 N3 = 30  X = ( 40 ) ( 30 ) = 1200  N = 100  X = 4800 RJ Seg 3

23 Averaging Means  N = 100  X = 4800  X N 4800 X = = = 48 XT 100
We know then that the sum of X is 4,800 and the sum of the total number of subjects in all three samples is 100. Therefore it is a very simple task to plug in our formula for the mean, the sum of X and the sum of N for our entire group, giving us a mean of 48 for the weighted mean or the average mean for those three groups combined. This concludes our discussion of averages. A very important descriptive statistic. The average includes the mean, the median, and the mode. They are very easy to calculate and form a very integral part of a way to describe data, but also an integral part of other statistics that we will discuss later on. RJ Seg 3


Download ppt "Averages Dr. Richard Jackson "

Similar presentations


Ads by Google