Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measures of Central Tendency Measures of Dispersion.

Similar presentations


Presentation on theme: "Measures of Central Tendency Measures of Dispersion."— Presentation transcript:

1

2 Measures of Central Tendency

3 Measures of Dispersion

4 Multiple Choice. 30 questions. Lectures 4 and 5 (today). 10% of your course grade. In this room. 45 minutes. 8:10 start. 8:55 end. DO NOT MISS IT. THERE WILL BE NO MAKE-UPS. SECOND QUIZ NEXT WEEK

5 The Frightening Power of Central Tendency (George Carlin – funniest guy ever)

6

7 Measures of Central Tendency The value that best represents the mid-point of a set of values, but which may not actually be found in the set of values themselves. Major types are: Means: - Arithmetic - Weighted/Grouped - Geometric - Harmonic - Trimmed Median Mode Some are more robust than others… {

8 What Does Being Robust Mean? robust When a statistic is robust it means that deviations from the underlying assumptions of a data distribution do not affect the statistic’s ability to represent the data values that comprise a dataset’s distribution. WHAT DOES THAT MEAN? That a sample’s statistics are a good representation of what’s happening in the population from which it came. And And, the larger a sample gets, the closer its statistics approximate the population’s statistics. This is called regression to the mean, or The Wisdom of the Crowd.

9 The Wisdom of the Crowd Regression to the Mean Experiment You have a jar of popcorn kernels and ask people how many are in it. You get a range of answers, one guess for each person you ask. There are 52 guesses (blue dots). The actual number of kernels is 5,524 (red line). Note that the running mean of the guesses (green solid line) converges on (or regresses to (green dotted line)) the actual mean.

10 So What Does Regression to the Mean, Mean? What regression to the mean (more or less) means is that eventually, with a large enough sample, sample values for a variable will get closer and closer to the mean of all values for the variable. Put another way, the further a given sample value is from the mean, the higher the probability that the next sample will be closer to the mean. But to say this we need to have some assumptions about the sample’s view of the population’s reality. underlying assumptions These are called the sample’s underlying assumptions.

11 The Assumptions Underlying Why The Crowd’s View is a Good Approximate of The Population’s Reality These are the underlying assumptions for The Crowd: Data distributions are normally distributed (a.k.a. bell shaped). This means that: They have no outliers. They have no gaps. They are not skewed (skewness). They are not peaked (kurtosis). They have no extreme values. They are not bi-modal (two peaks). They are not poly-modal (many peaks). Their measures of central tendency are equal (hmmm). This is called being “robust”.

12 What Does Being Robust Mean? Deviant distributions (distributions that deviate from the assumptions) happen because of non-normal attributes such as: extreme values bi-modality or poly-modality outliers gaps skewness kurtosis extreme differences between values These, thankfully for Statistics, are rare occurrences, but we must always check for them.

13 What Does Being Robust Not Mean? Robustness: ability to withstand assumption violation Discrimination: ability to accurately represent a set of values Statistical tools with high robustness usually have lower ability to discriminate Statistical tools with low robustness usually have higher ability to discriminate Being robust does not mean being better or more accurate – quite the opposite. It means that a robust statistical tool is better able to withstand assumption violation but less able to discriminate accurately.

14 Parametric and Non-parametric Tools Parametric statistical tools usually have a low ability to withstand assumption violation. Non-parametric statistical tools usually have a high ability to withstand assumption violation. Non-parametric statistical tools usually have higher robustness but have a lower ability to discriminate. Parametric statistical tools usually have lower robustness but have a higher ability to discriminate. The assumptions underlying statistical distributions are called parameters. Therefore…

15 The Effect of Extreme Values and Outliers Ten people are sitting at a bar. Each earns $50K a year. Their average is $50,000. In walks Bill Gates and sits down. He earns $1,000,000,000 a year. Their average is now $90 million. The data and the averages are both correct but the result is ridiculous because an underlying assumption - that of extreme values - has been violated. Moral: don’t hang around in bars Moral: don’t hang around in bars

16 n=9ValueRank Incremental Difference each value Incremental Difference each rank Difference mean to value Difference median to rank 1111-13-4 2211-12-3 3311-11-2 4411-10 5511-90 6611-81 7711-72 88821-63 909na 764 Median5511n/a0 Arithmetic mean14511.13100 Mean is almost 3 times the median. Middle value Extreme value Why Median Income is a Robust Statistic Ranks ‘neutralize’ the extreme value.

17 These are our values and where they lie on the distribution. This is the median (5) This is the mean (14) This is the outlier – waaaaay out. …and it drags the mean out with it. Thus the median is the more ‘robust’ statistic because it is less effected by the extreme value – it represents the dataset more accurately.

18 n=9ValueRank Incremental Difference each value Incremental Difference each rank Difference mean to value Difference median to rank 1111-4 2211-3 3311-2 4411 551100 661111 771122 881133 99na 44 Median5511n/a0 Arithmetic mean551100 Being Robust – Removing the Extreme Value

19 These are our values and where they lie on the distribution. This is the median and the mean. Outlier is gone. …so the mean moves back to better represent the data values. Now the mean is the better statistic because it is more accurate since it uses arithmetic to represent the dataset. But the median is still the more robust statistic but dos not discriminate as well.

20 Calculating Mean Median Mode

21 Arithmetic Mean Returns the arithmetic centre of the data distribution, such that the sum of all differences between data values and the mean equals zero. The arithmetic mean is the arithmetic middle point of a set of values. This means that the differences between any value x in a dataset and the mean of that dataset will sum to zero. THIS DOES NOT MEAN THAT THE ARITHMETIC MEAN IS THE MID POINT OF THE DATASET’S DISTRIBUTION BECAUSE THE ARITHMETIC MEAN IS STRONGLY INFLUENCED BY EXTREME VALUES. Arithmetic lives here.

22 Returns the exact centre value of the dataset and hence the second quartile value. Half the values of the dataset will be above the median and half will be below. The median is the middle case of a set of cases (records or rows). THIS MEANS THAT THE MEDIAN IS THE EXACT CENTRAL POINT OF A DATASET’S DISTRIBUTION – 50% of values will be below the median and 50% will be above. THIS MEANS THAT THE MEDIAN IS NOT AFFECTED BY EXTREME VALUES. Median Arithmetic does not live here.

23 Returns the most frequently occurring data value in the dataset. Sometimes reported as a label, when the “value” is nominal level data – e.g. religious or political affiliation. CASERELIGIOUS AFFILIATION Person #1Protestant Person #2Protestant Person #3Muslim Person #4Catholic Person #5Protestant Person #6Jewish Person #7Catholic The “modality” of this sample would be Protestant because it is the most frequently occurring “value”. Mode (or Modality)

24 An Example of How They All Work Together You work for a computer manufacturer and are asked to do a quality control analysis, so you gather data on the number of faults your computers have. These data show that your company has an average of 9.1 faults per 100 computers and that your competition has 9.1 faults per 100 as well. In other words the average number of faults is about the same. Should you report to your boss that your company is alright? It’s no worse than your competitors? Perhaps not, because you are a smart statistician and you collected more than the bare bones dataset.

25 Percentage of Faults by Category Number of Faults Per Unit Your Company Your Competitor Zero3016 One2017 Two109 Three48 Four312 Five28 Six28 Seven18 Eight27 Nine03 Ten+264 TOTAL100 MEAN9.1 MEDIANOneThree Rather than the very basic arithmetic mean data you collected these. They show what proportion of the machines had no faults, 1 fault, 2 faults etc. The average proportion of faults stays at 9.1%, but the median shows that your company is doing much better with 50% of machines having 1 or no faults, whereas your competitor’s median shows 50% of machines having up to 3 faults. Your problem lies in in the 26% of machines having 10 or more faults. Further investigation shows that one of your assembly lines is the culprit due to sloppy workers

26

27 The Arithmetic Mean

28 SAMPLE AND POPULATION SYMBOLOGY In formulas for a sample (such as the arithmetic mean), Latin letters are used, and a lower case ‘n’ used for the number of cases. In formulas for a population (such as the arithmetic mean), Greek symbols and letters are used (here delta) and a upper case ‘N’ used for the number of cases.

29 THE ARITHMETIC MEAN - DIFFERENCES SUM TO ZERO e.g. 38.25 – 21 = 17.25 38.25 – 34 = 4.25 etc Data ValuesDifferences from Mean 2117.25 344.25 45-6.75 56-17.75 54-15.75 43-4.75 326.25 2117.25 Sum 3060 N 8 Mean 38.25 Remember this.

30 Dataset #1Dataset #2Dataset #3 $45,000.00$80,000.00 $45,000.00 $43,000.00 $41,000.00 $40,000.00 $37,000.00 $35,000.00 $1.00 Sum$331,000.00$366,000.00$331,001.00 n888 Mean$41,375.00$45,750.00$41,375.13 Median$42,000.00 Mode$45,000.00 THE ARITHMETIC MEAN - EFFECT OF EXTREME VALUES Middle dataset is unbalanced. It has one extreme value that pulls the mean higher than all but that extreme value. Note that median and mode are not affected. An opposite extreme value balances the dataset again. When an extreme value is present the median should be used and not the arithmetic mean because the distribution will be skewed. BUT YOU ARE STILL LEFT WITH EXTREME VALUES. These will affect the deviation in the dataset ( s and s 2 ).

31 Dataset #1Dataset #2Dataset #3 $45,000.00$80,000.00 $45,000.00 $43,000.00 $41,000.00 $40,000.00 $37,000.00 $35,000.00 $1.00 Sum$331,000.00$366,000.00$331,001.00 n888 Mean$41,375.00$45,750.00$41,375.13 Median$42,000.00 Mode$45,000.00 THE MEDIAN The median is the middle point of a set of cases. Half the values above… …and half below. Since there are an even number of values you take the mean of the centre two values - $42,000 If there were an odd number of values, then the single middle value is the median.

32 Mean and Median – Points to Remember The Mean is the arithmetic middle point of a set of values. It is calculated arithmetically from all data values. It is not the exact mid point of a set of values because it is strongly influenced by extreme values. BECAUSE OF THIS THE ARITHMETIC MEAN IS NOT A ROBUST STATISTIC BUT IT IS A DISCRIMINATING ONE. The Median is the middle point of a set of cases (records or rows). It is calculated by dividing the number of rows into two halves. Because it is the exact mid point of a set of cases it is not influenced by extreme values. BECAUSE OF THIS THE MEDIAN IS A ROBUST STATISTIC BUT IT IS NOT A DISCRIMINATING ONE.

33 Dataset #1Dataset #2Dataset #3 $45,000.00$80,000.00 $45,000.00 $43,000.00 $41,000.00 $40,000.00 $37,000.00 $35,000.00 $1.00 Sum$331,000.00$366,000.00$331,001.00 n888 Mean$41,375.00$45,750.00$41,375.13 Median$42,000.00 Mode$45,000.00 The Mode The most frequently occurring data value (in this example $45,000 in each dataset)

34 Gaps and Outliers The following histogram has outliers—there are three cities in the leftmost bar. This creates a gap where there are effectively no values. Gap Outliers

35 Using Measures of Central Tendency Use the method that returns the most information about the centre of the dataset – usually the arithmetic mean. BUT… With highly skewed (such as income) or non- unimodal datasets the median should be used. Means and medians cannot be used with nominal level data – the mode can be used to describe the most frequently occurring label. USING THE MODE IS CALLED MODAL ANALYSIS.

36 Other means to an end… Weighted mean: Useful when the ‘x’s have unequal weights as in grade calculations (e.g. tests worth 20% labs worth 30%, etc). Grouped data mean: Useful when you only have data in categories, as with income classes – is a special case of the weighted mean. Geometric mean: Useful when you have percentages, ratios, indexes or data covering several orders of magnitude. Harmonic mean: Useful when you have rates as in calculating average speeds. Trimmed mean: Useful for removing outliers.

37 Weighted Mean The weighted mean is used when data values have weighting schemes, as with the grades in this course.

38 Weighted mean example #1 ComponentWeight w i Your Mark x i Mark X Weight x i * w i Lab Assignments45%86%3870 Tests15%80%1200 Final Exam40%80%3200 Totals100%2468270 = 8270/100 = 82.7% 246/3=83% n=3 1 2 3 The weighted mean methodThe arithmetic mean method

39 Weighted mean example #2 Changing the weights ComponentWeight w i Your Mark x i Mark X Weight x i * w i Lab Assignments10%86%860 Tests30%80%2400 Final Exam60%80%4800 Totals100%2468060 = 8060/100 = 80.6%246/3=83% n=3 1 2 3 The weighted mean methodThe arithmetic mean method

40 Grouped data mean example Population in Census Tract 12345.6 Income Classes Frequency (f) w i Class midpoints (CM) x i CM X f w i * x i $0-$10,00022$5000$110,000 $10,001-$20,00056$15,000$840,000 $20,001-$30,00081$25,000$2,025,000 $30,001-$40,00045$35,000$1,575,000 $40,001-$50,00023$45,000$1,035,000 $50,001-$60,00015$55,000$825,000 >$60,0007Excluded0 Totals249$180,000$6,410,000 = $6,410,000/249 = $25,742.97$180,000/6 = $30,000 123456 $180,000 The weighted mean methodThe arithmetic mean method

41 Geometric mean Where: GM: geometric mean x : data values n √ : nth root of product of all x The ∏ symbol is the upper case Greek letter pi and signifies the product of a set multiplications. Used extensively in biology and finance

42 Geometric mean – use when your data: Are percentages, ratios, indexes or growth rates; Have an exponential distribution; Have high value more than 3 times the low value; Cover several orders of magnitude. Geometric mean – do not use when your data: Are already log scaled such as decibels or pH; Have high value less than 3 times the low value;

43 Geometric Mean Example Example using bacteria counts (they typically vary widely) Water Sample #Enteric Bacteria Count per ml 16 250 39 41200 Arithmetic mean316.25 Geometric Mean42.42 GM = = 42.42 Basically the data are log transformed. Thus extreme values are tempered.

44 Harmonic mean Where: HM: harmonic mean 1/x: reciprocals of data values n : number of data values Harmonic mean Useful when you have rates per unit (such as distance per unit of time (speed) to average out.

45 Harmonic mean example of transportation & speed What’s the average speed for a delivery truck given these data: SegmentLength(km)Speed (kph)Time taken Outbound18060180/60=3hrs Inbound18080180/80=2.25hrs Total distance360 Arithmetic mean speed = 60kph+80kph/2 = 70kph. But the time taken is 3hrs+2.25hrs=5.25hrs Therefore the actual average speed = 360km/5.25hrs=68.57kph This is what the harmonic mean does: HM = 2/((1/60kph)+(1/80kph)) = 68.57kph Difference small but over more segments it can be significant

46 Trimmed Mean This is easy: it is any mean where the outliers have been stripped or trimmed away. Thus you would sort your data and drop the top and bottom 10% of your values. This is called a 10% trimmed mean. You can drop whatever proportion of the dataset you wish - within reason.

47 Dataset #1Dataset #2Dataset #3 $45,000.00$80,000.00 $45,000.00 $43,000.00 $41,000.00 $40,000.00 $37,000.00 $35,000.00 $1.00 Sum$331,000.00$366,000.00$331,001.00 n888 Mean$41,375.00$45,750.00$41,375.13 Median$42,000.00 Mode$45,000.00 Mean $41,375.00$40,857.14$41,833.33 Median $42,000.00$41,000.00$42,000.00 Mode $45,000.00 7 6 $286,000.00 $251,000.00 Example of Trimmed Mean

48 Moving average trend lines produce a smoother ‘actual value’ trend line that is based on consecutive recalculations of an arithmetic mean of set size. The weakness is that the line loses values, depending on what size you make the averaging group.

49 Calculating the Moving Average Original Data Values Moving Average, Period 2 Moving Average Period 3 1010+22/2=1610+22+12/3=14.6 2222+12/2=1722+12+14/3=16 1212+14/2=1312+14+16/3=14 1414+16/2=1514+16+20/3=16.6 1616+20/2=1816+20+32/3=22.6 2020+32/2=26 32 # of values to be plotted 765

50 All Geography students are above average.

51 Measures of Dispersion

52 What is Dispersion? Refers to the way in which quantitative data values are dispersed or spread out in a dataset. The most powerful dispersion statistics calculate the quantitative spread of the data values around the arithmetic mean and are called measures of deviation. The various measures of deviation calculate the arithmetic differences between each data value and the arithmetic mean of the dataset.

53 Why bother with measuring deviation? Consider the following datasets: 3+3+3+3+3 1+1+1+2+10 First we calculate their arithmetic means using: Are they the same? According to the mean they are.

54 Then we calculate their standard deviations using: Same means, very different standard deviations. So are the datasets the same – or not?

55 Measures of Dispersion and Deviation The Range (a measure of dispersion): The range is the difference between the lowest value (called MIN) and the highest value (called MAX) in a dataset. The Standard Deviation (a measure of deviation): Measures the average difference between a data value and the arithmetic mean of all data values. The Variance (a measure of deviation): Squares the average difference between a data value and the arithmetic mean of the data set. Thus it is the standard deviation squared.

56 The Range (Range = MAX-MIN)

57 The Range The range describes the span of your dataset, from the minimum value (MIN) to the maximum value (MAX) using: Range = MAX – MIN Used as a measure of data dispersion NOT deviation, because deviation implies a difference between your data values and something, e.g. the arithmetic mean. The Range is used in finding histogram (or bar chart) classes.

58 Dataset #1Dataset #2Dataset #3 $45,000.00$80,000.00 $45,000.00 $43,000.00 $41,000.00 $40,000.00 $37,000.00 $35,000.00 $1.00 Sum$331,000.00$366,000.00$331,001.00 n888 Mean$41,375.00$45,750.00$41,375.13 Median$42,000.00 Mode$45,000.00 MAX$45,000.00$80,000.00 MIN$35,000.00 $1.00 Range$10,000.00$45,000.00$79,999.00 Even the range is telling us more about the data than just the central tendency measures do. Compare dataset #1 with #3.

59 The Standard Deviation ( s )

60 The Standard Deviation Where: s is the sample standard deviation x is a value in the dataset is the arithmetic mean of the dataset n is the number of values in the dataset The standard deviation measures the average difference between a data value and the arithmetic mean of all data values. It is given by: The standard deviation and the variance are related insofar as the s is the square root of the variance (or the variance is s 2 ). s is the most widely used measure of deviation, though it should always be used in conjunction with the variance.

61 Interpreting the Standard Deviation Formula Subtract each data value x from the arithmetic mean and sum them: But this returns a set of plus and minus differences that add to zero. So to remove the signs we square each difference and sum the squared differences … … then take their square root to return the magnitudes of the original values.

62 A reminder of the effect of squaring… ##2#2 11 24 39 416 525 636 749 864 981 10100 11121 12144 13169 14196 15225 16256 17289 18324 19361 20400 … it emphasizes higher values An exponential progression An arithmetic progression

63 xx-mean x-mean squared sqrt of x- mean squared 1-9.590.259.5 2-8.572.258.5 3-7.556.257.5 4-6.542.256.5 5-5.530.255.5 6-4.520.254.5 7-3.512.253.5 8-2.56.252.5 9-1.52.251.5 10-0.50.250.5 110.50.250.5 121.52.251.5 132.56.252.5 143.512.253.5 154.520.254.5 165.530.255.5 176.542.256.5 187.556.257.5 198.572.258.5 209.590.259.5 10.50.0 Why Squares and Roots? The difference x-x produces negative numbers and a sum of zero, but … the square of a number is always positive, and… … differences between squares increase more rapidly than differences between original numbers, so… …taking the square root of the squared data values simply returns them to the original numbers, and also removes the sign. ‾ number square This is a list of numbers, x.

64 Dataset #1Dataset #2Dataset #3 $45,000.00$80,000.00 $45,000.00 $43,000.00 $41,000.00 $40,000.00 $37,000.00 $35,000.00 $1.00 Sum$331,000.00$366,000.00$331,001.00 n888 Mean$41,375.00$45,750.00$41,375.13 Median$42,000.00 Mode$45,000.00 MAX$45,000.00$80,000.00 MIN$35,000.00 $1.00 Range$10,000.00$45,000.00$79,999.00 s$3,852.18$14,290.36$21,559.86 Low s means that the data are clustered around mean (data are leptokurtic or ‘peaked’) REMEMBER s values do not indicate skewness. They do indicate kurtosis. High s means that the data are spread out around the mean (data are platykurtic or ‘flat’)

65 Standard deviation calculations the hard way

66 ‘Normal’ standard deviation ‘Small’ standard deviation ‘Large’ standard deviation Frequency Review Slide Standard Deviation and the ‘Shape’ of Data This ‘peakedness’ of the distribution is called kurtosis. Use the kurtosis statistic to test for normality.

67 The Variance ( s 2 )

68 The Variance Squares the average difference between a data value and the arithmetic mean of the data set. It is given by: Where: s 2 is the sample variance x is a value in the dataset is the arithmetic mean of the dataset n is the number of values in the dataset Since it uses the arithmetic mean, it is subject to the same effect of extreme values – except much more because of the effect of squaring.

69 Interpreting the Variance Formula Subtract each data value x from the arithmetic mean and sum them. But this returns a set of plus and minus differences that adds to zero. So to remove the signs we square each difference thus: …and sum the squared differences.

70 Variance and SD Compared By squaring the differences you remove the negative signs and exaggerate more extreme differences to make them more obvious for analysis. By taking the square root you return the differences to their original magnitude but the signs are removed so the differences no longer sum to zero. In comparing the two, when the s is small, the difference between the variance ( s 2 ) and the s is smaller than if the s is large – that’s what happens when you square numbers.

71 Dataset #1Dataset #2Dataset #3 $45,000.00$80,000.00 $45,000.00 $43,000.00 $41,000.00 $40,000.00 $37,000.00 $35,000.00 $1.00 Sum$331,000.00$366,000.00$331,001.00 n888 Mean$41,375.00$45,750.00$41,375.13 Median$42,000.00 Mode$45,000.00 MAX$45,000.00$80,000.00 MIN$35,000.00 $1.00 Range$10,000.00$45,000.00$79,999.00 s$3,852.18$14,290.36$21,559.86 s2s2 $14,839,285.71$204,214,285.71$464,827,464.41 Note that the highest s is 5.6 times the lowest whereas the highest s 2 is 31 times the lowest – this is the effect of squaring extreme values

72 N and n-1 Why do the sample standard deviation and sample variance (in fact, sample anything) formulas have n-1 as the denominator? Because n-1 gives a more conservative estimate of deviation by increasing the standard deviation and variance values. If you have a larger standard deviation or variance, you have a higher standard to pass in making your case. Why? Because if you are testing to see if a data value is 1.96 s away from the mean of its dataset, then a larger s means the data value has to meet a stricter test – i.e. it has to be higher.

73 Sample versus population – n -1 versus N Sample size ( n ) Value of numerator in standard deviation formula Biased estimate of population standard deviation (i.e. dividing by N ) Unbiased estimate of population standard deviation (dividing by n -1) Difference between biased and unbiased estimates 105007.077.45.38 1005002.242.25.01 10005000.70710.7075.0004 Source: After Salkind, page 40. Note: 1. With n -1 the standard deviation is higher. 2. The larger the sample, the smaller the effect of n-1 √(500/10)= √(500/100)= √(500/1000)= √(500/(10-1))= √(500/(100-1))= √(500/(1000-1))= 5.0%0.4%0.056% N

74 Sample versus population – n -1 versus N Sample size ( n ) Value of numerator in standard deviation formula Biased estimate of population standard deviation (i.e. dividing by N ) Unbiased estimate of population standard deviation (dividing by n -1) Difference between biased and unbiased estimates 105007.077.45.38 1005002.242.25.01 10005000.70710.7075.0004 Source: After Salkind, page 40. Note: 1. With n -1 the standard deviation is higher. 2. The larger the sample, the smaller the effect of n-1 N

75 Interpreting Variance & Standard Deviation s gives the average difference between each data value and the mean of a dataset and s 2 squares it and so exaggerates it. The larger the values, the more spread out the values are and the larger the differences between them. If the values are equal to zero then there are no differences between your data values. The standard deviation and the variance each require an arithmetic mean to work, not the median or the mode. Therefore they require the same rigour as the mean and are sensitive to extreme values as well, especially the variance.

76 The Coefficient of Variation ( Cv)

77 Calculating the Coefficient Of Variation The equation for the sample coefficient of variation is: And, for the population:

78 Interpreting The Coefficient Of Variation The coefficient of variation expresses the standard deviation as a percentage of the mean. Allows easy comparison of standard deviations with one another.

79 Interpreting The Coefficient Of Variation By way of example: Compare a s of $2,400 on a per capita average income of $55,000 against an s of $300 on a per capita average income of $2,000 – how to interpret? Here the coefficients of variation are 4.4% and 15% indicating a much wider range of variability in the poorer nation – that is a much wider gap between rich and poor. Case in point: the coefficient of variation for global GNI is 108.9%! This indicates an extraordinary gap between rich and poor nations.

80 Dataset #1Dataset #2Dataset #3 $45,000.00$80,000.00 $45,000.00 $43,000.00 $41,000.00 $40,000.00 $37,000.00 $35,000.00 $1.00 Sum$331,000.00$366,000.00$331,001.00 n888 Mean$41,375.00$45,750.00$41,375.13 Median$42,000.00 Mode$45,000.00 MAX$45,000.00$80,000.00 MIN$35,000.00 $1.00 Range$10,000.00$45,000.00$79,999.00 s$3,852.18$14,290.36$21,559.86 s2s2 $14,839,285.71$204,214,285.71$464,827,464.41 Cv9.31%31.24%52.11% Note that the highest Cv is 5.3 times the lowest indicating that dataset#3 is considerably more variable that dataset #1 – the effect of the two extreme values is evident.

81 Summary Stats So Far Arithmetic mean and standard deviation are fundamental to statistics. Form the heart of descriptive statistics. Are the essential building blocks of all other statistical methods – look for them as elements in future formulas. Other measures of dispersion have their roles, are more robust, but not as powerful.

82 All Geography students are deviants.

83 All Geography students are above average deviants. mg!


Download ppt "Measures of Central Tendency Measures of Dispersion."

Similar presentations


Ads by Google