Presentation on theme: "1 Multiple-choice example. 2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.)"— Presentation transcript:
1 Multiple-choice example
2 Solution The typical level of score achieved is the AVERAGE or CENTRAL TENDENCY. (Some authorities prefer the term LEVEL.) Statement A is false. B looks good; BUT TRY THE OTHERS. The SD is the square root of the variance, so C is false. D provides no further information. We accept B.
3 Study questions The mean weight of three people in a car is 170 pounds. They pick up another person, whose weight is 190 pounds. What is now the mean weight of the people in the car?
4 Answer From the definition of the mean as the total divided by the number of values, the total weight of the first three people is 170 × 3 = 510 lbs. Adding the fourth person, we have a new total weight of = 700 lbs. The new mean weight is 700/4 = 175 lbs.
5 Adding and multiplying by 2 We have seen that the mean of the scores in the Caffeine group is and the SD is Suppose we add a constant of 2 to each of the 20 scores. What effects would that have upon the values of the mean, the variance and the SD? What would be the effects of multiplying each score by 2?
6 A histogram showing the distribution of height in 523 men
7 A normal curve The histogram of mens heights is approximately symmetrical and bell- shaped. The curve, which is truly SYMMETRICAL and BELL-SHAPED, is known as a NORMAL curve. A variable with such a distribution is said to have a NORMAL DISTRIBUTION.
8 Representing a distribution The height distribution can be represented like this. Not all distributions have this shape. But the data from the Caffeine and Placebo conditions in our experiment do, more or less. mean
9 Notation Returning to the data from the Caffeine experiment, let M be the mean score of those participants tested after ingesting caffeine. Let s and s 2 be the variance and standard deviation of the 20 scores, respectively. (The standard deviation is the square root of the variance.)
10 Adding and multiplying by a constant Adding a constant k Multiplying by a constant k M M + k M s2s2 kM s2s2 k2s2k2s2
11 Adding a constant Adding a constant to every score simply shifts the whole distribution two units to the right. So the new mean will be the old one plus two: new mean = = The SPREAD of the scores, however, will be unaltered, so the variance and the SD will have the same values as before.
12 Adding a constant k If you have summation algebra, you can easily show that adding a constant k adds the same value to the mean. In the derivation, M X is the mean of the original scores; whereas M X+k is the mean of the scores with k added to each of them. The terms s X and s X+k 2 are to be interpreted in a similar way. The addition of k makes NO DIFFERENCE to the value of the variance (or the standard deviation).
13 Multiplying by a constant Multiplying each score by a constant k not only increases the mean by a factor of k, but also increases the SPREAD of the scores about the new mean. The new mean will be k times the old one. The new variance will be k 2 times the old variance. The new SD will be k times the old one.
14 Multiplying by a constant … The mean was originally The SD and variance were originally 3.28 and 10.73, respectively When all scores are multiplied by a factor of 2, the SD becomes 3.28 ×2 = 6.55; the variance becomes × 4 = The standard deviation has increased by a factor of 2; but the variance has increased by a factor of 4.
15 Multiplying by a constant k A little more summation algebra shows that multiplying by a constant k multiplies the value of the mean by k. The dispersion or variance also increases – by a factor of k 2.
16 Standard deviation of kX Since the standard deviation is the square root of the variance, multiplying the original scores by the constant k multiplies the standard deviation by k.
17 Lecture 4 MORE DESCRIPTIVE STATISTICS
18 Properties of a distribution The three most important properties of a distribution are: 1.The typical value, AVERAGE or CENTRAL TENDENCY. (The terms LEVEL and LOCATION are also used.) 2.The SPREAD or DISPERSION of scores around the average. 3.The SHAPE of the distribution.
19 Statistics The CENTRAL TENDENCY of a distribution is measured by AVERAGES, one measure of the average being the MEAN. Today I shall consider two additional measures of the average; but there are several others. The SPREAD or DISPERSION of a distribution is measured by the VARIANCE, STANDARD DEVIATION and various RANGE STATISTICS. There are also statistics for measuring the asymmetry or SKEWNESS of a distribution.
20 The Poisson distribution By no means all variables are normally distributed. In a nursery, there are 20 electric lights. The mean rate at which light bulbs blow is 2 per week. But occasionally many more blow. The distribution looks like this.
21 A Poisson distribution
22 Poisson distribution … This distribution is POSITIVELY SKEWED: that is, it has a tail to the right. Most of the values bunch around a mean of 2 bulbs blowing. The tail represents the occasional large numbers of blow-outs.
23 Measuring skewness Asymmetry or skewness is measured with a statistic which I shall call simply Skewness. (Skewness is a complex measure, involving the cube of the deviations of the scores about their mean.) The Statistical Package for the Social Sciences (SPSS) will calculate the value of Skewness for any distribution. If the value of Skewness is positive, the distribution is positively skewed; a negative value indicates negative skewness.
26 Outliers Often data sets contain scores that are atypical of the distribution as a whole. Such an atypical score is known as an OUTLIER. With small data sets, outliers can have marked effects upon the values of some statistics. Such statistics can become UNREPRESENTATIVE of the data as a whole.
27 The mean as the centre of gravity The mean can be thought of as THE CENTRE OF GRAVITY of a distribution, the point at which it would BALANCE on a knife-point. Positive and negative deviations can be thought of as the distances of the points to the right and the left of the balance point. The deviations must sum to zero if balance is to be maintained. Points further from the balance point exert more LEVERAGE: the scores 0 and 6 exert more leverage than the 4s and the 1s. The 3s exert no leverage at all, since they are situated at the balance point.
28 An outlier exerts leverage upon the value of the mean. Add a score of 20 to the set. This is clearly an OUTLIER. There were 16 scores in the old set; 17 in the augmented set. The new mean is [(3×16) + 20]/17 = 68/17 = 4. (See the car problem.) The outlier has exerted LEVERAGE upon the value of the mean. Arguably the value of the new mean isnt typical of the distribution. old mean new mean outlier exerting leverage
29 Other measures of the average There are other measures of the average or central tendency which are more ROBUST to the influence of outliers. Two such measures are the MEDIAN and the MODE.
30 The mode The MODE is the MOST FREQUENT score. In the original distribution, the mode is 3. In the new distribution, the mode is still 3. But the mean has been drawn to the right by the outlier. The mode is more resistant to the outliers influence. mode mean mode
31 Problems with the mode The mode is only useful as a measure of the average with well-shaped data sets. Take the distribution of salaries in a firm which would, of course, be positively skewed, with the directors up in the tail on the right. Several of the directors, however, might be on exactly the same salary, which might therefore be the modal salary. Here the mode (200 K?) might be quite atypical of the salary of employees.
32 A bimodal distribution
33 The median The MEDIAN is the MIDDLE SCORE. The median is the score below (or above) which half the distribution lies. Obtain the median by arranging the scores in order and taking the middle one. The median of the scores (1, 2, 7, 8, 9) is 7. The median of the original distribution on the left is 3. The median of the augmented distribution is still 3. Like the mode, the median is ROBUST to the pull of outliers. median mean median
34 Salaries again The mean value of 34K seems rather atypically high. The median of 29K seems somewhat more typical. Pay no attention to the mode with salary distributions – it can be very misleading. Mean=34K Median=29K Mode=31K
35 Uses of the median Classical statistical theory is based upon the MEAN, rather than the median. But the median is very useful for EXPLORING YOUR DATA before proceeding to the stage of making formal statistical tests. The comparison between the values of the mean and the median provides important information about the shape of a distribution. Early measures of skewness were based upon the difference between the mean and the median.
36 Relative frequency as an area Think of the AREAS of the bars as representing the RELATIVE FREQUENCIES with which values within their class intervals occur in the distribution. Their total area is approximately the area under the normal curve. The total area under the curve represents UNITY or 100%. ALL values lie SOMEWHERE under the curve.
37 Relative frequency as an area Relative frequency of heights between 65 inches and 70 inches.
38 Percentiles A PERCENTILE is the VALUE or SCORE below which a specified percentage or proportion of the distribution lies. The 30 th percentile is the value below which 30% of the distribution lies. The 70th percentile is the value below which 70% of scores lie.
39 The 30 th and 70 th percentiles th percentile 70 th percentile 0.70 (0.70) (0.30)
40 The median is the 50 th percentile th percentile (median)
41 Quartiles The three QUARTILES are percentiles which divide the distribution into four parts. The FIRST QUARTILE Q 1 (also known as the LOWER QUARTILE) is the value below which 25% of scores lie. The SECOND QUARTILE Q 2 is the score below which 50% of scores lie. The second quartile is the MEDIAN. The THIRD QUARTILE Q 3 (also known as the UPPER QUARTILE) is the value below which 75% of the distribution lies.
43 More measures of spread: the range statistics
44 The interquartile range (IQR) The interquartile range includes 50% of the values in the distribution.
45 The semi-interquartile range (SIQR) The midquartile is NOT the median. The semi-interquartile range is the median of the absolute deviations (that is, the deviations with signs ignored) of scores from the mid-quartile.
46 Heights of men IQR = 3.44 SIQR = 1.72
47 Comparison of the measures In this distribution, the SIQR and the SD have different values. For a well-shaped distribution like this, we should prefer the SD. For the purposes of exploration, however, the SIQR might provide a more useful measure of spread.
48 95% of the distribution 95% of ANY distribution lies between the 2.5 th percentile and the 97.5 th percentile. BELOW the 2.5 th percentile lie.025 (2.5%) of the scores. ABOVE the 97.5 th percentile lie.025 (2.5%) of the scores. Outside those limits lie =.05 (5%) of the scores.
49 95% of ANY continuous distribution 0.95 (95%) th percentile 97.5 th percentile
50 Normal distribution A NORMAL DISTRIBUTION is symmetrical and bell- shaped. If a variable is normally distributed, 95% of values lie within 1.96 standard deviations (2 approx.) on EITHER side of the mean (95%) mean mean – 1.96×SD mean +1.96×SD 2 ½ % =.025
51 Another useful interval NINETY-NINE per cent of values in a normal distribution lie within 2.58 standard deviations on either side of the mean. Only.01/2 =.005 or ½ % of values lie above this range. Only.01/2 =.005 or ½ % of values lie below this range. The upper value is the 99.5 th percentile; the lower value is the.5 th percentile.
52 99% of values Mean ×SD Mean – 2.58×SD 0.99 (99%) ½ % = 0.005
53 Within ONE standard deviation SIXTY-EIGHT per cent of values in a normal distribution lie within 1 standard deviation on either side of the mean. So the upper limit of this interval (mean + 1SD) is the [ /2] th percentile, that is, the 84 th percentile. The lower limit of this interval (mean – 1SD) is the 32/2 th percentile, that is, the 16 th percentile.
54 The 95 th percentile NINETY-FIVE per cent of values lie BELOW 1.64 standard deviations above the mean. (Because of the symmetry of the normal distribution, we can also say that 95% of values lie ABOVE the value that is 1.64 standard deviations BELOW the mean, i.e, mean – 1.64×SD.) These statements apply only to the normal distribution.
55 The 95 th percentile of a normal distribution 0.95 (95%) Mean ×SD
56 Study question: distribution of IQ The IQ has an approximately normal distribution, with a mean of 100 and a standard deviation of 15. If 1000 people are drawn at random from the population, how many of them can we expect to have IQs … 1.greater than 130? 2.between 100 and 130? 3.less than 85?
57 Study question At which percentile in the IQ distribution is 1.an IQ of 130? 2.an IQ of 115? 3.an IQ of 100? 4.an IQ of 85?
58 Next week Next week, I shall show you how to use the statistical package SPSS to explore the results of an experiment and obtain the sorts of statistics I have been talking about this afternoon, including percentiles. I shall show you how to obtain graphs of a distribution.