5Univariate, bivariate and multivariate data sets We can classify data according to the number of measured variables in the data set.If there is one measured variable, we have a UNIVARIATE data set.If there are two measured variables, we have a BIVARIATE data set.If there are three or more measured variables, we have a MULTIVARIATE data set.
6Levels of measurement There are three levels of measurement: Scale, interval or continuous.Ordinal.Nominal.
7Scale dataMeasures on an independent scale with units. Heights, weights, performance scores, IQs and number of Hits are all scale data. So also are counts of the number of hits and so on. Each score has ‘stand-alone’ meaning.
8Ordinal dataData in the form of RANKS (1st, 3rd, 53rd). A rank has meaning only in relation to the other individuals in the sample. A rank does not express, in units, the extent to which a property is possessed.Rarely would a researcher collect data in the form of ranks. But there are hidden issues here. Some would argue that ratings are really ordinal data (with ties) and should be treated as such in statistical analysis.
9Nominal dataAssignments to categories (so-many males, so-many females.) Nominal data are numerical, but the numbers are arbitrary LABELS, as when John receives a 1 for Sex, while Jane receives a 2.Nominal data are not really measurements at all.
10Experimental versus correlational research In a true experiment such as a randomised clinical trial, the researcher manipulates one variable, the INDEPENDENT VARIABLE (IV), with a view to demonstrating that is has a causal effect upon the DEPENDENT VARIABLE (DV).The DV is measured during the course of the experiment.In correlational research, ALL variables are measured as they occur in the people studied.
11ComparisonExperimental research usually results in univariate data sets.The statistical analysis usually involves COMPARISON of scores obtained under the different experimental conditions.For example, performance under an active condition might be compared with performance under a control condition.
12AssociationCorrelational research results in bivariate or multivariate data sets.Here, the interest centres on the possible existence of statistical ASSOCIATIONS among the variables measured.If watching screened violence promotes actual violence, we should find that those who watch most screened violence should tend to be the most violent, those who watch least should be the least violent and so on.
13Uses of statisticsWe use statistics to SUMMARISE and DESCRIBE our data.We use statistics to CONFIRM patterns in our data. One aspect of this process of confirmation is the making of statistical TESTS.
14A simple two-group experiment The experimenter wants to show that ingestion of caffeine improves shooting accuracy, as measured by number of Hits.Participants are randomly assigned to one of the two conditions.All participants shoot at the same target.
16The raw dataThe table shows the RAW DATA, that is, the ORIGINAL SCORES achieved by the participants.From inspection, it seems that the Caffeine group tended to have higher scores.With larger data sets, however, it can be very difficult to see what’s going on merely from inspection.
17DistributionThe DISTRIBUTION of a variable is a table or diagram showing the relative FREQUENCIES, over the entire range, with which different values occur.A good first move in a statistical analysis is to draw a graph of the distribution.
19Three important aspects of a distribution Its LEVEL or CENTRAL TENDENCY.The SPREAD or DISPERSION of scores around the centre.The SHAPE of the distribution.
20Different central tendencies The scores of the Caffeine group TEND to be higher than do the scores of the Placebo group. The two distributions differ in LEVEL or CENTRAL TENDENCY.There is, however, considerable overlap: some participants in the Placebo condition outperformed those in the Caffeine condition.
21Individual differences In the Caffeine distribution, values are densest around 13; whereas in the Placebo distribution, values are densest around 9.But there is a huge RANGE in performance.The worst performer (who scored 2) was in the Caffeine group; the best (who scored 20) was in the Placebo group.
22Central tendency: the “average” An average is a measure of level or central tendency, the “typical” value.It is clear from inspection of the figure that the average score of the Caffeine distribution should be higher than the average score of the Placebo distribution.There are several different measures of the “average” of a set of scores.
23The meanThe MEAN of a set of scores is the sum of their values divided by the number of scores.If X is a score and n is the number of scores, the mean M is:
24ExampleThe mean of the scores 10, 1, 3, 4 and 2 is …
26Deviation scoresA deviation score d is a score from which the mean has been subtracted.Deviation scores have the very important property that they sum to zero.Therefore, their mean is also zero.
27CentringIn column X, are raw scores, centred on their mean value of 2.Place the deviation scores d in the next column. This operation is known as CENTRING and is common in regression analysis.The new values are now centred on zero, rather than the mean of the original values.
28The mean as the ‘centre of gravity’ The mean can be thought of as THE CENTRE OF GRAVITY of a distribution, the point at which it would BALANCE on a knife-point.We can see (because this distribution is symmetrical) that the mean of this distribution is 3.
29OutliersOften data sets contain scores that are atypical of the distribution as a whole.Such an atypical score is known as an OUTLIER.With small data sets, outliers can have marked effects upon the values of some statistics.Such statistics can become UNREPRESENTATIVE of the data as a whole.
30An outlier (20 hits) exerts ‘leverage’ upon the value of the mean.
31Other measures of ‘the average’ There are other measures of the average or central tendency which are more ROBUST to the influence of outliers.Two such measures are the MEDIAN and the MODE.
32The medianThe MEDIAN of a distribution is the MIDDLE number. It is the value below which 50% of the distribution lies.The medians of the scores in the Placebo and Caffeine groups are, respectively, 9 and
33Points about the median Notice that, for the Placebo group, the median does not have the value of any of the actual scores.With symmetrical distributions, the median and the mean have similar values.
34The mode The MODE is the MOST FREQUENT value. For the Placebo and Caffeine groups, the values of the mode are 8 and 13, respectively.On all three measures of central tendency or level, therefore, the three averages agree that the Caffeine group typically performed at a higher level than did the Placebo group.
35Comparison of the three measures The mean is the basis of classical statistical theory, because it has many useful mathematical properties.The median is useful for exploring data sets, particularly in comparison with the mean. With an extremely asymmetrical distribution, the median is arguably a truer measure of level in the data as a whole.The mode is seldom used.
36Properties of the meanWe have seen that deviations about the mean sum to zero.The sum of the SQUARES of deviations about the mean is a MINIMUM, that is, it is smaller than the sum of squared deviations about any other value.
37A property of the median The sum of ABSOLUTE deviations about the MEDIAN is also a minimum.But absolute values are less useful mathematically.
38A second scenarioThe scores of both groups cluster around the same value: Since the distributions are completely symmetrical, the mean of either is clearly 12.In the Caffeine distribution, however, the scores are more widely SPREAD OUT or DISPERSED than those of the Placebo group.
39The simple rangeThe SIMPLE RANGE is the highest score minus the lowest score.So, for the Placebo group in Scenario 2, the simple range is (15 – 9) = 6 score units.For the Caffeine group, the simple range is (18 – 6) = 12 score units.On this measure of dispersion, therefore, the Caffeine distribution shows twice as much spread or dispersion of scores around the mean.
40A problem with the simple range The simple range statistic only uses TWO scores out of the whole distribution.Should those particular scores be highly atypical of the distribution, the range may not reflect the true spread of scores about the mean of the distribution. The data from the original scenario (left) exemplify this situation.
41Other range statistics Nevertheless, the simple range can be a very useful statistic when you are EXPLORING a data set.Also available are more complex RANGE STATISTICS (the interquartile range, the seminterquartile range) which use more of the information in a data set than does the simple range.
42The variance and the standard deviation (SD) The VARIANCE (s2) and the STANDARD DEVIATION (s or SD) are also measures of dispersion.Both statistics use the values of ALL the scores in the distribution.
43Deviation scores again The DEVIATION SCORE is the building block from which the variance and SD are calculated.Could the mean deviation serve as a measure of spread?No, because deviations about the mean sum to zero. So the mean deviation is also zero, whatever the spread of your data.
44Squared deviationsThe sum of the SQUARED deviations is always either positive (when scores have different values) or zero (if all the scores have the same value).If there is any variability in the scores at all, the sum of the squared deviations will have a positive value.
45Formula for the variance The Greek letter sigma (Σ) is used to indicate that you are to obtain the deviation of each score from the mean, square it, then add up all the squared deviations.The sample variance s2 is close to being the MEAN SQUARED DEVIATION.The value 1 is subtracted from n in order to improve the sample variance as an estimate of the spread of values in the population.
48Adding a constantAdding a constant of ten to every score in the Caffeine group simply shifts the whole distribution ten units to the right.So the new mean will be the old one plus ten: new mean = = 13.90The SPREAD of the scores, however, will be unaltered, so the variance and the SD will have the same values as before.
49Multiplying by a constant Multiplying each score by a constant of ten not only increases the mean by a factor of ten, but also increases the SPREAD of the scores about the new mean.The new mean will be ten times the old one.The new variance will be ten SQUARED, that is one hundred, times the old variance.The new SD will be ten times the old one.
50Adding and multiplying scores by a constant of ten
52Effect of centringWhen you centre scores by subtracting the mean, the mean becomes zero.The variance, however, remains unaltered.
53Interpreting the variance The simple range statistic has the merit of being in the same units as the raw data.The variance, since it is based on the squares of the deviations, is in SQUARED UNITS and is therefore difficult to interpret.If you take the (positive) square root of the variance, you have the STANDARD DEVIATION, which is in the original units of measurement.
54The standard deviation is the positive square root of the variance We found that the variance of the scores of the Caffeine group was 10.73To obtain the standard deviation, we take the square root of 10.73, which isThe square root operation restores the measure of spread to the original measurement units: we can say that the standard deviation is 3.28 hits.
55Tables of resultsAs well as means, always include the standard deviations.
56Vulnerability of variance and SD to outliers We have seen that the mean is vulnerable to the leverage exerted by outliers.This is true, a fortiori, of the variance, because it is the sum of the SQUARES of deviations from the mean.The leverage effect is NOT removed by taking the square root of the variance to obtain the standard deviation.
57Standard or z scoresA standard or z score is a special kind of deviation score which expresses a value as so-many standard deviations above or below the mean (0):
58Mean and SD of z scoresTheir mean is always zero (because they are deviation scores).Their variance and standard deviation are 1.
59Advantage of z scoresScores in different units (heights and weights) cannot be directly compared.But when someone’s weight has a z score of –1 (one SD below the mean (0) and their height has a z score of +2 (two SDs above the mean), we can say that someone is tall and thin.If we can make additional assumptions about the distribution, knowledge of z scores is even more informative.
60Distribution shapeWe have measured the AVERAGE and the SPREAD of the Caffeine and Placebo distributions.We noted that both distributions were (at least approximately) SYMMETRICAL.There are circumstances in which that would not be the case.
61A disappointing result The mean for the Caffeine group is only very slightly greater than the Placebo mean.But note that both means are near the top of the scale (20).And notice how small the SD’s are.
62Ceiling effectThe scores of both groups are bunched around the top of the scale.Any possible effect of caffeine intake has been masked by a CEILING EFFECT.The task chosen was TOO EASY for the participants.No conclusions about the effects of ingestion of caffeine can be drawn from these data.
63Another disappointing result Again the Caffeine mean is only slightly greater than the Placebo mean.But both means are near the bottom of the scale (zero).Once again, note the small SD’s.
64Floor effectThe scores of either group are bunched around the bottom of the scale.The task was too difficult.No conclusions about the effects of ingestion of caffeine can be drawn from these data either.
65SkewnessIn both Scenarios 3 and 4, the distributions are asymmetric or SKEWED.When a distribution has a tail to the left, it is said to be NEGATIVELY SKEWED; when it has a tail to the right, it is POSITIVELY SKEWED.When there is a ceiling effect, the distributions are negatively skewed; when there is a floor effect, they are positively skewed.
66Screen violence and actual violence Does screened violence promote actual violence?Ethical and practical considerations may rule out direct manipulation of the amount of violent material that children watch.It may be more feasible to measure children on the amount of screen violence they watch and upon their actual violence.
67CorrelationA statistical ASSOCIATION or CORRELATION is a tendency for events or values to occur together.If exposure to screen violence promotes actual violence, we should expect those who watch more violence to be more violent and those who watch less violence to be less violent.Such a POSITIVE ASSOCIATION would be at least consistent with the hypothesis.
68A scatterplot Here is a picture of the results of our study. In this SCATTERPLOT, each point represents one of the children.Richard got a score of 2 on Exposure and 4 on Actual.John got 9 on Exposure and 8 on Actual.Jim got scores of 5 on both Exposure and Actual.RichardJohnJim
69A strong positive correlation When the shape of a scatterplot is a narrow ellipse like this, a strong correlation is indicated.The results of the study are consistent with the hypothesis.
70A negative correlation? Does the number of complaints made against GPs very inversely with the average length of their appointments?The following scatterplot supports this hypothesis.
72Scatterplot indicating no association When the cloud of points is circular, there is NO ASSOCIATION between the variables.
73Linear functionsY is a LINEAR FUNCTION of X if the graph of Y upon X is a straight line.For example, temperature in degrees Fahrenheit is a linear function of temperature in degrees Celsius.
74The Pearson correlation The PEARSON CORRELATION (r), is designed to measure the strength of a supposed linear relationship between two variables.A correlation can only take values within the range from –1 to +1, inclusive.The closer the value of a correlation to unity (forgetting the sign), the STRONGER the linear association.
75Formula for the Pearson correlation There are several equivalent formulae. Here is the simplest.Transform X and Y to standard scores z.Divide the sum of the products of the pairs of standard scores by (n – 1).
76The calculation of r for the violence data The value of r (.892) is high and positive, consistent with the appearance of the scatterplot.
77Centring againWhat is the effect upon the value of r when the variables involved are centred?There is no effect.In fact, no linear transformation of either variable (or both variables) will change the ABSOLUTE value of r.Suppose you measure the heights and weights of 100 people in inches and pounds and find that the correlation is If you convert the heights and weights to cms and grams, respectively, the correlation is still +.6 .Merely subtracting their mean from the values of each variable leaves the correlation unchanged.
78Reversing the slopeIf you multiply all the scores on one variable by –1, you will change the slope of the scatterplot; but the absolute value of r will remain the same.
79Centring in regression We have seen that centring does not change the variance of a variable in the data set.Nor does centring change the correlations among the variables.Centring is used in several multivariate procedures in order to help the algorithm to find a unique solution.
80QuestionWe have been told of a bivariate data set, from which the calculated Pearson correlation is ZERO: r = 0.From this information alone, can we conclude that the two variables are independent, that is, there is no association between them?The answer is NO!
81The scatterplotThere is a perfect, but nonlinear association between the two variables.Yet the Pearson correlation is zero.
82Anscombe’s data setMany years ago, Fred Anscombe (American Statistician, 1973) published a famous paper warning readers of the pitfalls awaiting the unwary user of information about correlations.There were four bivariate data sets, all of which produced a Pearson correlation with a value of +.82.
83An elliptical scatterplot This is fine.The elliptical scatterplot indicates that there is indeed a basically linear relationship between variable Y1 and variable X1.
84A non-linear relationship There is actually a perfect association between variable Y2 and variable X1.This relationship, however, is non-linear and is understated by the value of r.
85An understatement by r There is a substantial correlation. The scatterplot, however, is not elliptical.Basically there is a perfect linear relationship between Y3 and X1.The outlier (a typo?) has depressed the value of r.
86Anscombe’s ruleWhen you examine a scatterplot (something you should ALWAYS do when interpreting a correlation), ask yourself the following question:“Would the removal of one or two points at random affect the basically ellipical shape of the scatterplot? If the shape would remain essentially the same, the value of r accurately reflects the association between the variables”.
87In summary …The Pearson correlation r is a measure of the strength of a supposed LINEAR relationship between 2 variables.It is one of the most widely used of statistical measures; but it is also one of the most misused.Wherever possible, a value of r should be interpreted in the context of the scatterplot.
88Have we really gathered evidence for the hypothesis that viewing screened violence increases actual violence?
89A famous dictumCORRELATIONdoes not implyCAUSATION
90A causal model The scientific hypothesis implies this CAUSAL MODEL. The results are CONSISTENT with the hypothesis.
91Another causal modelThe child’s violent tendencies towards and appetite for violence lead to his watching violent programmes as often as possible.This model is also consistent with the data.
92Yet another causal model NEITHER variable causes the other.Both are determined by the behaviour of the child’s parents.
93Direction of causality Returning to the caffeine experiment, it would be ridiculous to suggest that shooting accuracy determines the group to which one is assigned.In the violence study, however, which was of CORRELATIONAL, rather than EXPERIMENTAL design, the direction of causation is uncertain.Indeed, at least three possible MODELS OF CAUSATION are consistent with the results.
94A background variablePerhaps neither Exposure nor Actual violence cause one another.Perhaps they are caused by a background parental behaviour variable.We have data on such a variable.The background variable correlates highly with both Exposure and Actual violence.
95Partial correlationA PARTIAL CORRELATION is what remains of a Pearson correlation between two variables when the influence of a third variable has been removed, or PARTIALLED OUT.
96The partial correlation The partial correlation fails to reach significance.Now that we have taken the background variable into consideration, we see that there is no significant correlation between Exposure and Actual violence.It appears that, of the three possible causal models, the ‘third party’ model gives the most convincing account of these data.
100Features of a histogram The entire range of variation (shown on the x-axis) is divided into CLASS INTERVALS.The heights of the bars are proportional to the FREQUENCIES of values (y-axis) falling within the class intervals represented by the bases of the bars.The bars touch each other, indicating the CONTINUOUS variation of the variable.
102Salaries in the US Many variables have asymmetrical distributions. Skewness = 2.13
103Measuring skewnessAsymmetry or skewness is measured with a statistic which I shall call simply ‘Skewness’.(Skewness is a complex measure, involving the cube of the deviations of the scores about their mean.)PASW will calculate the value of Skewness for any distribution.If the value of Skewness is positive, the distribution is positively skewed; a negative value indicates negative skewness.
107Proportion of heights either below 65” or above 75”.
108Unity All values lie within the total range. The area of the green bars is 100% or unity.
109Populations and samples We have some scores on shooting accuracy from the caffeine trial.The POPULATION of such scores is the reference set, that is, the infinite set of all possible scores.Our data are merely a subset or SAMPLE from the population.
110Theoretical populations or distributions In these talks, the term “population” always refers to a theoretical distribution.For example the 1000 men’s heights are a sample from a theoretical NORMAL population whose mean is 69” and whose standard deviation is 2.59”.This NORMAL distribution is symmetrical and bell-shaped.
111Statistics versus parameters STATISTICS are characteristics of SAMPLES.PARAMETERS are characteristics of populations.
112Notational convention Roman letters denote statistics such as our sample means and SDs.Greek letter denote the corresponding population characteristics or parameters
113Two parametersThere is an infinitely large family of normal distributions.To specify a normal distribution you must assign values to TWO parameters:The meanThe standard deviation
116ProbabilityThe PROBABILITY of an event is a measure of its likelihood, which can take values from zero (an impossible event) to unity (a certainty).There have been several definitions of probability.All of them raise serious philosophical questions.
117An ‘event’An EVENT is the outcome of an experiment of chance, such as rolling a die, tossing a coin – or running a psychological experiment.Chance is an important factor in the outcome of an experiment.Joe, Fred and Mary participated this time; but Anne, Jim and Fiona could easily have done so – and their scores would certainly have been different.
118Classical ‘probability’ “The first impetus came from a situation in which the dissolute nobility of France were competing in a race to ruin at the gaming tables” (Hogben, 1967; p.551).In 1654, Pascal and Fermat analysed the gambling strategies of one particular nobleman.Their approach was to determine the number of ways an outcome (such as a particular hand in cards) could occur in comparison with the total number of possibilities.
119Classical definition of a probability The probability of an event is the NUMBER OF WAYS in which the event can occur, divided by the TOTAL NUMBER OF OUTCOMES.Roll a die.What is the probability of a six?There is ONE way of getting a six. There are SIX possible outcomes.So the probability of a six is 1/6.
120More examples Roll a die. What is the probability of an even number? That could happen in three ways: 2 spots, 4 spots or six spots.So the probability is 3/6 = ½.What is the probability of a seven? There is NO WAY in which that could happen, so the probability is 0/6 = 0 (indicating an IMPOSSIBILITY).A number between 1 and 6, inclusive? That event could happen in six ways, so the probability is 6/6 = 1 (indicating a CERTAINTY).
121A formula for classical probability If an experiment of chance has N possible outcomes and an event E can occur in n ways,
122A problem with the classical definition The classical definition is circular.The “number of ways” in which an experiment of chance could turn out were stated to be “equally likely”, which (by implication) pressed the term into service for its own definition.
123The empirical definition of a probability This notion is implicit in the notion of a FAIR coin.A fair coin is one that, IN THE LONG RUN, shows heads half the time.This “convergence”, however, which is a special case of what I shall call simplistically “The law of large numbers”, is an empirical fact.It cannot, however, be proved “analytically”, that is, by mathematical deduction.
124Interpretation of a probability If a coin is ‘fair’, the probability of a head is ½.This does not mean that if I toss the coin 100 times, I shall get 50 heads.Nor does it mean that if I toss the coin a million times, I shall get close to half a million heads.But with a million tosses, the proportion of heads will be closer to ½ than it would be if I were to toss the coin 10 times, 100 times or 1000 times.A probability is a PROPORTION to which we can get as close as desired by taking a sample of sufficient size.
125Health eventsA HEALTH EVENT is an uncertain occurrence, such as acute appendicitis, admission to a dental clinic - or death.ADVERSE events are those occurring after admission to hospital.The likelihood of such events occurring is quantified as proportions obtainable from the records over a period of time.These proportions are thus EMPIRICAL PROBABILITIES.
126The laws of large numbers You can make a sample resemble the population as closely as you like by making it sufficiently large.So small samples from the same population can show considerable variation; whereas very large samples show little variation.
127ExampleI draw five samples of size ten from a normal population with mean zero and standard deviation 1. (The STANDARD normal distribution.)I then draw five samples of size one million from the same population.
129Large samples and populations With the lower histograms, you are looking at the population, rather than at samples.… relative frequencies become PROBABILITIES.Visualise the probability of a value within a specified interval as the area under the curve of the theoretical distribution between the limits of the interval.
131Probability distribution When we take a measurement such as a person’s height, we assume we have performed an experiment of chance.We have sampled from a theoretical population.Since areas under the curve represent probabilities, theoretical distributions are known as PROBABILITY DISTRIBUTIONS.
132Random variable or variate A RANDOM VARIABLE or VARIATE is a variable that takes values in an unpredictable way.The values of a random variable make up a theoretical distribution or population, i.e., a probability distribution.Let X be a value selected at random from a normal population with mean 69 and standard deviation The variable X is a normal random variable or normal VARIATE.
133Cumulative probability The cumulative probability of a value from a distribution is the probability of a value less than or equal to that value.The cumulative probability of 75 is .99; the cumulative probability of 70 is .65 .
136Probability of a height in the range from 70 to 75 inches Just subtract the cumulative probability of 70 from the cumulative probability of 75.
137PercentilesA PERCENTILE is the value below which a specified proportion of the distribution lies.The 90th percentile is the value below which 90% of values lie.The 10th percentile is the value below which 10% of values lie.The 50th percentile (the MEDIAN) is the value below which 50% of values lie.
138The 30th and 70th percentiles The green areas are the cumulative probabilities of the 30th and 70th percentile values.
139The median is the 50th percentile The cumulative probability of the median or middle value is .50.
14095% of the distribution95% of ANY distribution lies between the 2.5th percentile and the 97.5th percentile.BELOW the 2.5th percentile lie .025 (2.5%) of the scores.ABOVE the 97.5th percentile lie .025 (2.5%) of the scores.Outside those limits lie = .05 (5%) of the scores.
14195% of ANY continuous distribution lies between the 2. 5th and 97 95% of ANY continuous distribution lies between the 2.5th and 97.5th percentiles
142Normal distributionA NORMAL DISTRIBUTION is symmetrical and bell-shaped.If a variable is normally distributed, 95% of values lie within 1.96 standard deviations (2 approx.) on EITHER side of the mean.
143The 95th percentileNINETY-FIVE per cent of values lie BELOW 1.64 standard deviations above the mean.(Because of the symmetry of the normal distribution, we can also say that 95% of values lie ABOVE the value that is 1.64 standard deviations BELOW the mean, i.e, mean – 1.64×SD.)These statements apply only to the normal distribution.
145The standard normal variable z Let X be a normal variable with mean μ and SD σ.Let z be defined as in the formula.z is also normally distributed, and is known as the STANDARD NORMAL VARIABLE.
146Mean and standard deviation of the standard normal distribution We have seen that the effect of standardising scores is to centre the distribution on zero and produce a variance and standard deviation of 1.Thus the standard normal distribution has a mean of zero and an SD of 1.
150Questions about probability Questions about the probabilities of ranges of values of a normally distributed random variable can always be rephrased in terms of the standard normal distribution.Just convert the raw values to z scores by subtracting the mean and dividing by the standard deviation.
151A question about IQThe IQ measure has an approximately normal distribution, with a mean of 100 and a standard deviation of 15.If 1000 people are drawn at random from the population, how many of them can we expect to have IQs greater than 130?
152Solution Transform 130 to z (2). A proportion of .025, that is, 25 in a thousand values, are at least as large as 130.
153Taking samplesSuppose I take 16 people’s IQs and calculate the mean. It might be I take another 16 people and find that their mean isI draw a total of 4000 samples, calculating the value of the mean each time.The means will vary considerably, but not so much as the original distribution of IQs.
154The mean is a random variable A random variable X is one whose values are not predictable. One can only assign probabilities to ranges of its values.A statistic such as the mean M, since its value depends upon the values of X selected for the sample, is also a random variable or variate.The variate M has a distribution of its own.
155Sampling distribution The probability distribution of a STATISTIC (such as the mean or the variance) is known as its SAMPLING DISTRIBUTION.If X is normally distributed, then so is M.If we can specify the sampling distribution of M by giving a value to its SD, we can assign probabilities to ranges of values for M.
157Drawing to scaleIf I request a histogram of the sampling distribution of the mean, it will look similar to the histogram of IQ.But if I ask for BACK-TO-BACK HISTOGRAMS, we can compare the two distributions drawn to the same scale.In the following figure, the distribution on the right is the sampling distribution of the mean.
159Shape of the sampling distribution It’s narrower than the original distribution.The standard deviation has been much reduced.The areas of both distributions are the same (unity, 100%, or a probability of one).But values of the mean are particularly thick on the ground in the region of the population mean value of the IQ, that is 100.
161Standard error of the mean The STANDARD ERROR of a statistic is the standard deviation of its SAMPLING or PROBABILITY distribution.It is called the standard “error” because, if a sample value were to be used as an estimate of the corresponding parameter (the population mean), the estimate would be, to at least some degree, wide of the mark.
162Standard error of the mean If we draw samples of size n from a normal distribution with mean μ and standard deviation σ, the standard error of the mean σM is given by
164Sample sizeAs the sample size n increases, the denominator of the formula increases and the standard error of the mean is reduced.The distribution becomes taller and narrower.The effect of increasing the size of the sample is to reduce the dispersion or variance of the sampling distribution of the mean.
165Effect of increasing the sample size n Sampling distributions of the mean for n = 16 and n = 64.n = 16The IQ distributionμ
166Referring to zA question about a range of values of ANY normally distributed variable can always be translated into a question about a range of values of the standard normal variable z.Just subtract the mean and divide by the standard deviation.BUT if your question is about a range of values for the MEAN, you must divide by the STANDARD ERROR, not the original population SD.
167QuestionIf I select 9 IQs at random and take their mean M, what is the probability that M is at least 110?
168Convert values to zThis question is about a mean, so we must refer to the sampling distribution of the mean.The standard error or the mean is 15 divided by the square root of 9, that is, 5.If M = 110, z = (110 – 100)/5 = 2.So we want the probability of a value of z of more than 2.
171Important!If your question is about MEANS, divide by the STANDARD ERROR OF THE MEAN σM, not the standard deviation of the original population.
172QuestionIf I select a sample of size n = 16 from the IQ population, what is the probability that the mean lies between 92.5 and 100?
173Convert values to zThe question is about a mean, so we must use the standard error of the mean to find the z values.The SEM is 15 divided by root 16 (4), that is,So z = (92.5 – 100)/3.75 = –2.For 100, z = 0.
174Referring to the standard normal distribution 95% of values lie between –2 and +2.So green area is 47.5%.The probability is
175Two populationsSuspend your disbelief and suppose that two barrels each contain millions of tickets, on each of which is the value of an IQ. So each barrel contains a normal distribution with mean 100 and SD 15.I draw a sample of size 16 from each barrel and calculate the means M1 and M2.I also calculate the difference M1 – M2 and put it in a third barrel.The process is repeated millions of times.The third barrel now contains the sampling distribution of the DIFFERENCE (between means).The sampling distribution of the difference is also normal.
177Another random variable We have seen that the sample mean M is a random variable, whose probability distribution is the sampling distribution of the mean.The difference between means M1 – M2 is also a random variable.Its probability distribution is known as the SAMPLING DISTRIBUTION OF THE DIFFERENCE (between means).
178Sampling distribution of the difference (between means)
179Variance of the difference We have seen that the sample means M1 and M2 are random variables.They are INDEPENDENT random variables – separate barrels.The variance of the sum OR DIFFERENCE BETWEEN independent random variables is the sum of their separate variances. (Remember that a variance cannot be negative.)
180Sampling variance of the difference Sampling variance of means from the first barrel:From the second:Sampling variance of M1– M2 :Standard error of the difference between means
183QuestionI draw a sample of size 16 from each of two identical IQ distributions, with mean 100 and SD 15.What is the probability that the difference (M1 – M2) is at least ?What is the probability of a difference in EITHER direction?
184AnswerThe question is about a difference between means, so we must refer to the sampling distribution of the difference.We have found that the standard error of the difference isAs usual, we convert the value to z:z = (10.61 – 0)/ = +2 .So we want the probability of a value of z at least as great as +2.
185Referring to the standard normal distribution We know that .025 (2.5%) of the distribution lies above z = 1.96 (2 approx).So the probability of a difference greater than is .025.The probability of a difference this large in EITHER direction is .025 × 2 = .05 .
186SummaryThe three most important properties of a distribution are LEVEL, SPREAD and SHAPE.Several measures of these properties were discussed.The notion of population was introduced and the notion of probability introduced in that context.The concept of a sampling distribution was introduced.The sampling distributions of the mean and of the difference between means were discussed.Questions about the probabilities of ranges of values for the mean and difference between means can be answered with reference to the standard normal distribution.
188An experiment of chance An EXPERIMENT OF CHANCE is a procedure with an uncertain outcome, such as tossing a coin or rolling a die.The classical notion of PROBABILITY arises in the context of an experiment of chance.
189The sample spaceConsider an experiment of chance in which a coin is tossed and a die is rolled.There are twelve possible outcomes, which can be set out in an array called a SAMPLE SPACE (S).Each outcome is known as an ELEMENTARY EVENT.The number of elementary events, n(S), is 12.
192The classical definition revisited Let E be “a one or a two on the die”.Then n(E) = 4.Following the classical definition of a probability
193Complementary events Two elements are complementary if they are Mutually exclusive;Exhaustive.If E is “a one or a two on the die”, the event “not E”, which is denoted by Ē, is “any other number on the die”.Events E and Ē are complementary: they have no common outcome points and they exhaust the possibilities.
194Probabilities of complementary events If E and Ē are complementary events, their probabilities, p and q, respectively, sum to zero.So p + q = 1; p = 1 – q; q = 1 – p.
195Mutually exclusive events Two events, A and B, are said to be MUTUALLY EXCLUSIVE if the probability of their joint occurrence is zero.In terms of S, the event spaces of A and B have no elementary outcome points in common.For example, if A is “a six on the die” and B is “a one or a two on the die”, A and B are mutually exclusive.
199Independent eventsTwo events A and B are INDEPENDENT if the occurrence of either has no effect upon the probability of the occurrence of the other.For example, if A is “a head” and B is “a six”, A and B are independent.
200AND rule for independent events If events A and B are independent, the probability of their joint occurrence Prob(A and B) is the product of their separate probabilities.In our example,
201ReferencesHogben, L. (1967). Mathematics for the million. London: Pan Books. Chapter 12. The Algebra of Choice and Chance.Ross, S, (1976). A first course in probability New York: Macmillan. Pages 20 onwards.Woodroofe, M. (1975). Probability with Applications. Tokyo: McGraw-Hill Kogakusha. Chapter 2 - page 38 in particular.