My contact details Colin Gray Room S2 (Thursday mornings, especially)

My contact details Colin Gray Room S2 (Thursday mornings, especially)
address: Telephone: (27) 2234 A rapid response to any queries assured!

This afternoon’s programme
1:30 – 3:30pm Simple descriptive statistics. 3:30 – 4:00pm A break for coffee. 4:00 – 4:45pm Finding probabilities.

SESSION 1 Describing data

Kinds of data

Univariate, bivariate and multivariate data sets
We can classify data according to the number of measured variables in the data set. If there is one measured variable, we have a UNIVARIATE data set. If there are two measured variables, we have a BIVARIATE data set. If there are three or more measured variables, we have a MULTIVARIATE data set.

Levels of measurement There are three levels of measurement:
Scale, interval or continuous. Ordinal. Nominal.

Scale data Measures on an independent scale with units. Heights, weights, performance scores, IQs and number of Hits are all scale data. So also are counts of the number of hits and so on. Each score has ‘stand-alone’ meaning.

Ordinal data Data in the form of RANKS (1st, 3rd, 53rd). A rank has meaning only in relation to the other individuals in the sample. A rank does not express, in units, the extent to which a property is possessed. Rarely would a researcher collect data in the form of ranks. But there are hidden issues here. Some would argue that ratings are really ordinal data (with ties) and should be treated as such in statistical analysis.

Nominal data Assignments to categories (so-many males, so-many females.) Nominal data are numerical, but the numbers are arbitrary LABELS, as when John receives a 1 for Sex, while Jane receives a 2. Nominal data are not really measurements at all.

Experimental versus correlational research
In a true experiment such as a randomised clinical trial, the researcher manipulates one variable, the INDEPENDENT VARIABLE (IV), with a view to demonstrating that is has a causal effect upon the DEPENDENT VARIABLE (DV). The DV is measured during the course of the experiment. In correlational research, ALL variables are measured as they occur in the people studied.

Comparison Experimental research usually results in univariate data sets. The statistical analysis usually involves COMPARISON of scores obtained under the different experimental conditions. For example, performance under an active condition might be compared with performance under a control condition.

Association Correlational research results in bivariate or multivariate data sets. Here, the interest centres on the possible existence of statistical ASSOCIATIONS among the variables measured. If watching screened violence promotes actual violence, we should find that those who watch most screened violence should tend to be the most violent, those who watch least should be the least violent and so on.

Uses of statistics We use statistics to SUMMARISE and DESCRIBE our data. We use statistics to CONFIRM patterns in our data. One aspect of this process of confirmation is the making of statistical TESTS.

A simple two-group experiment
The experimenter wants to show that ingestion of caffeine improves shooting accuracy, as measured by number of Hits. Participants are randomly assigned to one of the two conditions. All participants shoot at the same target.

Results of the Caffeine experiment

The raw data The table shows the RAW DATA, that is, the ORIGINAL SCORES achieved by the participants. From inspection, it seems that the Caffeine group tended to have higher scores. With larger data sets, however, it can be very difficult to see what’s going on merely from inspection.

Distribution The DISTRIBUTION of a variable is a table or diagram showing the relative FREQUENCIES, over the entire range, with which different values occur. A good first move in a statistical analysis is to draw a graph of the distribution.

Distributions of the Caffeine and Placebo data

Three important aspects of a distribution
Its LEVEL or CENTRAL TENDENCY. The SPREAD or DISPERSION of scores around the centre. The SHAPE of the distribution.

Different central tendencies
The scores of the Caffeine group TEND to be higher than do the scores of the Placebo group. The two distributions differ in LEVEL or CENTRAL TENDENCY. There is, however, considerable overlap: some participants in the Placebo condition outperformed those in the Caffeine condition.

Individual differences
In the Caffeine distribution, values are densest around 13; whereas in the Placebo distribution, values are densest around 9. But there is a huge RANGE in performance. The worst performer (who scored 2) was in the Caffeine group; the best (who scored 20) was in the Placebo group.

Central tendency: the “average”
An average is a measure of level or central tendency, the “typical” value. It is clear from inspection of the figure that the average score of the Caffeine distribution should be higher than the average score of the Placebo distribution. There are several different measures of the “average” of a set of scores.

The mean The MEAN of a set of scores is the sum of their values divided by the number of scores. If X is a score and n is the number of scores, the mean M is:

Example The mean of the scores 10, 1, 3, 4 and 2 is …

The two group means

Deviation scores A deviation score d is a score from which the mean has been subtracted. Deviation scores have the very important property that they sum to zero. Therefore, their mean is also zero.

Centring In column X, are raw scores, centred on their mean value of 2. Place the deviation scores d in the next column. This operation is known as CENTRING and is common in regression analysis. The new values are now centred on zero, rather than the mean of the original values.

The mean as the ‘centre of gravity’
The mean can be thought of as THE CENTRE OF GRAVITY of a distribution, the point at which it would BALANCE on a knife-point. We can see (because this distribution is symmetrical) that the mean of this distribution is 3.

Outliers Often data sets contain scores that are atypical of the distribution as a whole. Such an atypical score is known as an OUTLIER. With small data sets, outliers can have marked effects upon the values of some statistics. Such statistics can become UNREPRESENTATIVE of the data as a whole.

An outlier (20 hits) exerts ‘leverage’ upon the value of the mean.

Other measures of ‘the average’
There are other measures of the average or central tendency which are more ROBUST to the influence of outliers. Two such measures are the MEDIAN and the MODE.

The median The MEDIAN of a distribution is the MIDDLE number. It is the value below which 50% of the distribution lies. The medians of the scores in the Placebo and Caffeine groups are, respectively, 9 and

Points about the median
Notice that, for the Placebo group, the median does not have the value of any of the actual scores. With symmetrical distributions, the median and the mean have similar values.

The mode The MODE is the MOST FREQUENT value.
For the Placebo and Caffeine groups, the values of the mode are 8 and 13, respectively. On all three measures of central tendency or level, therefore, the three averages agree that the Caffeine group typically performed at a higher level than did the Placebo group.

Comparison of the three measures
The mean is the basis of classical statistical theory, because it has many useful mathematical properties. The median is useful for exploring data sets, particularly in comparison with the mean. With an extremely asymmetrical distribution, the median is arguably a truer measure of level in the data as a whole. The mode is seldom used.

Properties of the mean We have seen that deviations about the mean sum to zero. The sum of the SQUARES of deviations about the mean is a MINIMUM, that is, it is smaller than the sum of squared deviations about any other value.

A property of the median
The sum of ABSOLUTE deviations about the MEDIAN is also a minimum. But absolute values are less useful mathematically.

A second scenario The scores of both groups cluster around the same value: Since the distributions are completely symmetrical, the mean of either is clearly 12. In the Caffeine distribution, however, the scores are more widely SPREAD OUT or DISPERSED than those of the Placebo group.

The simple range The SIMPLE RANGE is the highest score minus the lowest score. So, for the Placebo group in Scenario 2, the simple range is (15 – 9) = 6 score units. For the Caffeine group, the simple range is (18 – 6) = 12 score units. On this measure of dispersion, therefore, the Caffeine distribution shows twice as much spread or dispersion of scores around the mean.

A problem with the simple range
The simple range statistic only uses TWO scores out of the whole distribution. Should those particular scores be highly atypical of the distribution, the range may not reflect the true spread of scores about the mean of the distribution. The data from the original scenario (left) exemplify this situation.

Other range statistics
Nevertheless, the simple range can be a very useful statistic when you are EXPLORING a data set. Also available are more complex RANGE STATISTICS (the interquartile range, the seminterquartile range) which use more of the information in a data set than does the simple range.

The variance and the standard deviation (SD)
The VARIANCE (s2) and the STANDARD DEVIATION (s or SD) are also measures of dispersion. Both statistics use the values of ALL the scores in the distribution.

Deviation scores again
The DEVIATION SCORE is the building block from which the variance and SD are calculated. Could the mean deviation serve as a measure of spread? No, because deviations about the mean sum to zero. So the mean deviation is also zero, whatever the spread of your data.

Squared deviations The sum of the SQUARED deviations is always either positive (when scores have different values) or zero (if all the scores have the same value). If there is any variability in the scores at all, the sum of the squared deviations will have a positive value.

Formula for the variance
The Greek letter sigma (Σ) is used to indicate that you are to obtain the deviation of each score from the mean, square it, then add up all the squared deviations. The sample variance s2 is close to being the MEAN SQUARED DEVIATION. The value 1 is subtracted from n in order to improve the sample variance as an estimate of the spread of values in the population.

Applying the formula

Variance of the Caffeine scores in Scenario 1

Adding a constant Adding a constant of ten to every score in the Caffeine group simply shifts the whole distribution ten units to the right. So the new mean will be the old one plus ten: new mean = = 13.90 The SPREAD of the scores, however, will be unaltered, so the variance and the SD will have the same values as before.

Multiplying by a constant
Multiplying each score by a constant of ten not only increases the mean by a factor of ten, but also increases the SPREAD of the scores about the new mean. The new mean will be ten times the old one. The new variance will be ten SQUARED, that is one hundred, times the old variance. The new SD will be ten times the old one.

Adding and multiplying scores by a constant of ten

Examples

Effect of centring When you centre scores by subtracting the mean, the mean becomes zero. The variance, however, remains unaltered.

Interpreting the variance
The simple range statistic has the merit of being in the same units as the raw data. The variance, since it is based on the squares of the deviations, is in SQUARED UNITS and is therefore difficult to interpret. If you take the (positive) square root of the variance, you have the STANDARD DEVIATION, which is in the original units of measurement.

The standard deviation is the positive square root of the variance
We found that the variance of the scores of the Caffeine group was 10.73 To obtain the standard deviation, we take the square root of 10.73, which is The square root operation restores the measure of spread to the original measurement units: we can say that the standard deviation is 3.28 hits.

Tables of results As well as means, always include the standard deviations.

Vulnerability of variance and SD to outliers
We have seen that the mean is vulnerable to the leverage exerted by outliers. This is true, a fortiori, of the variance, because it is the sum of the SQUARES of deviations from the mean. The leverage effect is NOT removed by taking the square root of the variance to obtain the standard deviation.

Standard or z scores A standard or z score is a special kind of deviation score which expresses a value as so-many standard deviations above or below the mean (0):

Mean and SD of z scores Their mean is always zero (because they are deviation scores). Their variance and standard deviation are 1.

Advantage of z scores Scores in different units (heights and weights) cannot be directly compared. But when someone’s weight has a z score of –1 (one SD below the mean (0) and their height has a z score of +2 (two SDs above the mean), we can say that someone is tall and thin. If we can make additional assumptions about the distribution, knowledge of z scores is even more informative.

Distribution shape We have measured the AVERAGE and the SPREAD of the Caffeine and Placebo distributions. We noted that both distributions were (at least approximately) SYMMETRICAL. There are circumstances in which that would not be the case.

A disappointing result
The mean for the Caffeine group is only very slightly greater than the Placebo mean. But note that both means are near the top of the scale (20). And notice how small the SD’s are.

Ceiling effect The scores of both groups are bunched around the top of the scale. Any possible effect of caffeine intake has been masked by a CEILING EFFECT. The task chosen was TOO EASY for the participants. No conclusions about the effects of ingestion of caffeine can be drawn from these data.

Another disappointing result
Again the Caffeine mean is only slightly greater than the Placebo mean. But both means are near the bottom of the scale (zero). Once again, note the small SD’s.

Floor effect The scores of either group are bunched around the bottom of the scale. The task was too difficult. No conclusions about the effects of ingestion of caffeine can be drawn from these data either.

Skewness In both Scenarios 3 and 4, the distributions are asymmetric or SKEWED. When a distribution has a tail to the left, it is said to be NEGATIVELY SKEWED; when it has a tail to the right, it is POSITIVELY SKEWED. When there is a ceiling effect, the distributions are negatively skewed; when there is a floor effect, they are positively skewed.

Screen violence and actual violence
Does screened violence promote actual violence? Ethical and practical considerations may rule out direct manipulation of the amount of violent material that children watch. It may be more feasible to measure children on the amount of screen violence they watch and upon their actual violence.

Correlation A statistical ASSOCIATION or CORRELATION is a tendency for events or values to occur together. If exposure to screen violence promotes actual violence, we should expect those who watch more violence to be more violent and those who watch less violence to be less violent. Such a POSITIVE ASSOCIATION would be at least consistent with the hypothesis.

A scatterplot Here is a picture of the results of our study.
In this SCATTERPLOT, each point represents one of the children. Richard got a score of 2 on Exposure and 4 on Actual. John got 9 on Exposure and 8 on Actual. Jim got scores of 5 on both Exposure and Actual. Richard John Jim

A strong positive correlation
When the shape of a scatterplot is a narrow ellipse like this, a strong correlation is indicated. The results of the study are consistent with the hypothesis.

A negative correlation?
Does the number of complaints made against GPs very inversely with the average length of their appointments? The following scatterplot supports this hypothesis.

A strong negative correlation

Scatterplot indicating no association
When the cloud of points is circular, there is NO ASSOCIATION between the variables.

Linear functions Y is a LINEAR FUNCTION of X if the graph of Y upon X is a straight line. For example, temperature in degrees Fahrenheit is a linear function of temperature in degrees Celsius.

The Pearson correlation
The PEARSON CORRELATION (r), is designed to measure the strength of a supposed linear relationship between two variables. A correlation can only take values within the range from –1 to +1, inclusive. The closer the value of a correlation to unity (forgetting the sign), the STRONGER the linear association.

Formula for the Pearson correlation
There are several equivalent formulae. Here is the simplest. Transform X and Y to standard scores z. Divide the sum of the products of the pairs of standard scores by (n – 1).

The calculation of r for the violence data
The value of r (.892) is high and positive, consistent with the appearance of the scatterplot.

Centring again What is the effect upon the value of r when the variables involved are centred? There is no effect. In fact, no linear transformation of either variable (or both variables) will change the ABSOLUTE value of r. Suppose you measure the heights and weights of 100 people in inches and pounds and find that the correlation is If you convert the heights and weights to cms and grams, respectively, the correlation is still +.6 . Merely subtracting their mean from the values of each variable leaves the correlation unchanged.

Reversing the slope If you multiply all the scores on one variable by –1, you will change the slope of the scatterplot; but the absolute value of r will remain the same.

Centring in regression
We have seen that centring does not change the variance of a variable in the data set. Nor does centring change the correlations among the variables. Centring is used in several multivariate procedures in order to help the algorithm to find a unique solution.

Question We have been told of a bivariate data set, from which the calculated Pearson correlation is ZERO: r = 0. From this information alone, can we conclude that the two variables are independent, that is, there is no association between them? The answer is NO!

The scatterplot There is a perfect, but nonlinear association between the two variables. Yet the Pearson correlation is zero.

Anscombe’s data set Many years ago, Fred Anscombe (American Statistician, 1973) published a famous paper warning readers of the pitfalls awaiting the unwary user of information about correlations. There were four bivariate data sets, all of which produced a Pearson correlation with a value of +.82.

An elliptical scatterplot
This is fine. The elliptical scatterplot indicates that there is indeed a basically linear relationship between variable Y1 and variable X1.

A non-linear relationship
There is actually a perfect association between variable Y2 and variable X1. This relationship, however, is non-linear and is understated by the value of r.

An understatement by r There is a substantial correlation.
The scatterplot, however, is not elliptical. Basically there is a perfect linear relationship between Y3 and X1. The outlier (a typo?) has depressed the value of r.

Anscombe’s rule When you examine a scatterplot (something you should ALWAYS do when interpreting a correlation), ask yourself the following question: “Would the removal of one or two points at random affect the basically ellipical shape of the scatterplot? If the shape would remain essentially the same, the value of r accurately reflects the association between the variables”.

In summary … The Pearson correlation r is a measure of the strength of a supposed LINEAR relationship between 2 variables. It is one of the most widely used of statistical measures; but it is also one of the most misused. Wherever possible, a value of r should be interpreted in the context of the scatterplot.

Have we really gathered evidence for the hypothesis that viewing screened violence increases actual violence?

A famous dictum CORRELATION does not imply CAUSATION

A causal model The scientific hypothesis implies this CAUSAL MODEL.
The results are CONSISTENT with the hypothesis.

Another causal model The child’s violent tendencies towards and appetite for violence lead to his watching violent programmes as often as possible. This model is also consistent with the data.

Yet another causal model
NEITHER variable causes the other. Both are determined by the behaviour of the child’s parents.

Direction of causality
Returning to the caffeine experiment, it would be ridiculous to suggest that shooting accuracy determines the group to which one is assigned. In the violence study, however, which was of CORRELATIONAL, rather than EXPERIMENTAL design, the direction of causation is uncertain. Indeed, at least three possible MODELS OF CAUSATION are consistent with the results.

A background variable Perhaps neither Exposure nor Actual violence cause one another. Perhaps they are caused by a background parental behaviour variable. We have data on such a variable. The background variable correlates highly with both Exposure and Actual violence.

Partial correlation A PARTIAL CORRELATION is what remains of a Pearson correlation between two variables when the influence of a third variable has been removed, or PARTIALLED OUT.

The partial correlation
The partial correlation fails to reach significance. Now that we have taken the background variable into consideration, we see that there is no significant correlation between Exposure and Actual violence. It appears that, of the three possible causal models, the ‘third party’ model gives the most convincing account of these data.

Coffee break

Histograms A HISTOGRAM is useful for displaying the distribution of a large data set. Here is a histogram of the heights of 1000 men.

Heights of 1000 men

Features of a histogram
The entire range of variation (shown on the x-axis) is divided into CLASS INTERVALS. The heights of the bars are proportional to the FREQUENCIES of values (y-axis) falling within the class intervals represented by the bases of the bars. The bars touch each other, indicating the CONTINUOUS variation of the variable.

A normal distribution

Salaries in the US Many variables have asymmetrical distributions.
Skewness = 2.13

Measuring skewness Asymmetry or skewness is measured with a statistic which I shall call simply ‘Skewness’. (Skewness is a complex measure, involving the cube of the deviations of the scores about their mean.) PASW will calculate the value of Skewness for any distribution. If the value of Skewness is positive, the distribution is positively skewed; a negative value indicates negative skewness.

Skewness of three distributions

Relative frequency as an area
The area of a bar is the proportion of values within the range of its base. The green area is the proportion of heights between 70 inches and 75 inches.

Proportion between 65” and 75”

Proportion of heights either below 65” or above 75”.

Unity All values lie within the total range.
The area of the green bars is 100% or unity.

Populations and samples
We have some scores on shooting accuracy from the caffeine trial. The POPULATION of such scores is the reference set, that is, the infinite set of all possible scores. Our data are merely a subset or SAMPLE from the population.

Theoretical populations or distributions
In these talks, the term “population” always refers to a theoretical distribution. For example the 1000 men’s heights are a sample from a theoretical NORMAL population whose mean is 69” and whose standard deviation is 2.59”. This NORMAL distribution is symmetrical and bell-shaped.

Statistics versus parameters
STATISTICS are characteristics of SAMPLES. PARAMETERS are characteristics of populations.

Notational convention
Roman letters denote statistics such as our sample means and SDs. Greek letter denote the corresponding population characteristics or parameters

Two parameters There is an infinitely large family of normal distributions. To specify a normal distribution you must assign values to TWO parameters: The mean The standard deviation

The height population

Probability

Probability The PROBABILITY of an event is a measure of its likelihood, which can take values from zero (an impossible event) to unity (a certainty). There have been several definitions of probability. All of them raise serious philosophical questions.

An ‘event’ An EVENT is the outcome of an experiment of chance, such as rolling a die, tossing a coin – or running a psychological experiment. Chance is an important factor in the outcome of an experiment. Joe, Fred and Mary participated this time; but Anne, Jim and Fiona could easily have done so – and their scores would certainly have been different.

Classical ‘probability’
“The first impetus came from a situation in which the dissolute nobility of France were competing in a race to ruin at the gaming tables” (Hogben, 1967; p.551). In 1654, Pascal and Fermat analysed the gambling strategies of one particular nobleman. Their approach was to determine the number of ways an outcome (such as a particular hand in cards) could occur in comparison with the total number of possibilities.

Classical definition of a probability
The probability of an event is the NUMBER OF WAYS in which the event can occur, divided by the TOTAL NUMBER OF OUTCOMES. Roll a die. What is the probability of a six? There is ONE way of getting a six. There are SIX possible outcomes. So the probability of a six is 1/6.

More examples Roll a die. What is the probability of an even number?
That could happen in three ways: 2 spots, 4 spots or six spots. So the probability is 3/6 = ½. What is the probability of a seven? There is NO WAY in which that could happen, so the probability is 0/6 = 0 (indicating an IMPOSSIBILITY). A number between 1 and 6, inclusive? That event could happen in six ways, so the probability is 6/6 = 1 (indicating a CERTAINTY).

A formula for classical probability
If an experiment of chance has N possible outcomes and an event E can occur in n ways,

A problem with the classical definition
The classical definition is circular. The “number of ways” in which an experiment of chance could turn out were stated to be “equally likely”, which (by implication) pressed the term into service for its own definition.

The empirical definition of a probability
This notion is implicit in the notion of a FAIR coin. A fair coin is one that, IN THE LONG RUN, shows heads half the time. This “convergence”, however, which is a special case of what I shall call simplistically “The law of large numbers”, is an empirical fact. It cannot, however, be proved “analytically”, that is, by mathematical deduction.

Interpretation of a probability
If a coin is ‘fair’, the probability of a head is ½. This does not mean that if I toss the coin 100 times, I shall get 50 heads. Nor does it mean that if I toss the coin a million times, I shall get close to half a million heads. But with a million tosses, the proportion of heads will be closer to ½ than it would be if I were to toss the coin 10 times, 100 times or 1000 times. A probability is a PROPORTION to which we can get as close as desired by taking a sample of sufficient size.

Health events A HEALTH EVENT is an uncertain occurrence, such as acute appendicitis, admission to a dental clinic - or death. ADVERSE events are those occurring after admission to hospital. The likelihood of such events occurring is quantified as proportions obtainable from the records over a period of time. These proportions are thus EMPIRICAL PROBABILITIES.

The laws of large numbers
You can make a sample resemble the population as closely as you like by making it sufficiently large. So small samples from the same population can show considerable variation; whereas very large samples show little variation.

Example I draw five samples of size ten from a normal population with mean zero and standard deviation 1. (The STANDARD normal distribution.) I then draw five samples of size one million from the same population.

Size ten versus size one million

Large samples and populations
With the lower histograms, you are looking at the population, rather than at samples. … relative frequencies become PROBABILITIES. Visualise the probability of a value within a specified interval as the area under the curve of the theoretical distribution between the limits of the interval.

Relative frequency becomes probability

Probability distribution
When we take a measurement such as a person’s height, we assume we have performed an experiment of chance. We have sampled from a theoretical population. Since areas under the curve represent probabilities, theoretical distributions are known as PROBABILITY DISTRIBUTIONS.

Random variable or variate
A RANDOM VARIABLE or VARIATE is a variable that takes values in an unpredictable way. The values of a random variable make up a theoretical distribution or population, i.e., a probability distribution. Let X be a value selected at random from a normal population with mean 69 and standard deviation The variable X is a normal random variable or normal VARIATE.

Cumulative probability
The cumulative probability of a value from a distribution is the probability of a value less than or equal to that value. The cumulative probability of 75 is .99; the cumulative probability of 70 is .65 .

Cumulative probability of 75”

Cumulative probability of 70”

Probability of a height in the range from 70 to 75 inches
Just subtract the cumulative probability of 70 from the cumulative probability of 75.

Percentiles A PERCENTILE is the value below which a specified proportion of the distribution lies. The 90th percentile is the value below which 90% of values lie. The 10th percentile is the value below which 10% of values lie. The 50th percentile (the MEDIAN) is the value below which 50% of values lie.

The 30th and 70th percentiles
The green areas are the cumulative probabilities of the 30th and 70th percentile values.

The median is the 50th percentile
The cumulative probability of the median or middle value is .50.

95% of the distribution 95% of ANY distribution lies between the 2.5th percentile and the 97.5th percentile. BELOW the 2.5th percentile lie .025 (2.5%) of the scores. ABOVE the 97.5th percentile lie .025 (2.5%) of the scores. Outside those limits lie = .05 (5%) of the scores.

95% of ANY continuous distribution lies between the 2. 5th and 97
95% of ANY continuous distribution lies between the 2.5th and 97.5th percentiles

Normal distribution A NORMAL DISTRIBUTION is symmetrical and bell-shaped. If a variable is normally distributed, 95% of values lie within 1.96 standard deviations (2 approx.) on EITHER side of the mean.

The 95th percentile NINETY-FIVE per cent of values lie BELOW 1.64 standard deviations above the mean. (Because of the symmetry of the normal distribution, we can also say that 95% of values lie ABOVE the value that is 1.64 standard deviations BELOW the mean, i.e, mean – 1.64×SD.) These statements apply only to the normal distribution.

The 95th percentile of a normal distribution

The standard normal variable z
Let X be a normal variable with mean μ and SD σ. Let z be defined as in the formula. z is also normally distributed, and is known as the STANDARD NORMAL VARIABLE.

Mean and standard deviation of the standard normal distribution
We have seen that the effect of standardising scores is to centre the distribution on zero and produce a variance and standard deviation of 1. Thus the standard normal distribution has a mean of zero and an SD of 1.

Standard normal curve

Any normal distribution can be transformed to the standard normal distribution by subtracting the mean from each value and dividing the difference by the standard deviation.

The standard normal distribution

Questions about probability
Questions about the probabilities of ranges of values of a normally distributed random variable can always be rephrased in terms of the standard normal distribution. Just convert the raw values to z scores by subtracting the mean and dividing by the standard deviation.

A question about IQ The IQ measure has an approximately normal distribution, with a mean of 100 and a standard deviation of 15. If 1000 people are drawn at random from the population, how many of them can we expect to have IQs greater than 130?

Solution Transform 130 to z (2).
A proportion of .025, that is, 25 in a thousand values, are at least as large as 130.

Taking samples Suppose I take 16 people’s IQs and calculate the mean. It might be I take another 16 people and find that their mean is I draw a total of 4000 samples, calculating the value of the mean each time. The means will vary considerably, but not so much as the original distribution of IQs.

The mean is a random variable
A random variable X is one whose values are not predictable. One can only assign probabilities to ranges of its values. A statistic such as the mean M, since its value depends upon the values of X selected for the sample, is also a random variable or variate. The variate M has a distribution of its own.

Sampling distribution
The probability distribution of a STATISTIC (such as the mean or the variance) is known as its SAMPLING DISTRIBUTION. If X is normally distributed, then so is M. If we can specify the sampling distribution of M by giving a value to its SD, we can assign probabilities to ranges of values for M.

The IQ distribution

Drawing to scale If I request a histogram of the sampling distribution of the mean, it will look similar to the histogram of IQ. But if I ask for BACK-TO-BACK HISTOGRAMS, we can compare the two distributions drawn to the same scale. In the following figure, the distribution on the right is the sampling distribution of the mean.

Back-to-back histograms

Shape of the sampling distribution
It’s narrower than the original distribution. The standard deviation has been much reduced. The areas of both distributions are the same (unity, 100%, or a probability of one). But values of the mean are particularly thick on the ground in the region of the population mean value of the IQ, that is 100.

Sampling distribution of the mean

Standard error of the mean
The STANDARD ERROR of a statistic is the standard deviation of its SAMPLING or PROBABILITY distribution. It is called the standard “error” because, if a sample value were to be used as an estimate of the corresponding parameter (the population mean), the estimate would be, to at least some degree, wide of the mark.

If we draw samples of size n from a normal distribution with mean μ and standard deviation σ, the standard error of the mean σM is given by

σ

Sample size As the sample size n increases, the denominator of the formula increases and the standard error of the mean is reduced. The distribution becomes taller and narrower. The effect of increasing the size of the sample is to reduce the dispersion or variance of the sampling distribution of the mean.

Effect of increasing the sample size n
Sampling distributions of the mean for n = 16 and n = 64. n = 16 The IQ distribution μ

Referring to z A question about a range of values of ANY normally distributed variable can always be translated into a question about a range of values of the standard normal variable z. Just subtract the mean and divide by the standard deviation. BUT if your question is about a range of values for the MEAN, you must divide by the STANDARD ERROR, not the original population SD.

Question If I select 9 IQs at random and take their mean M, what is the probability that M is at least 110?

Convert values to z This question is about a mean, so we must refer to the sampling distribution of the mean. The standard error or the mean is 15 divided by the square root of 9, that is, 5. If M = 110, z = (110 – 100)/5 = 2. So we want the probability of a value of z of more than 2.

Referring to the standard normal distribution

Answer

Important! If your question is about MEANS, divide by the STANDARD ERROR OF THE MEAN σM, not the standard deviation of the original population.

Question If I select a sample of size n = 16 from the IQ population, what is the probability that the mean lies between 92.5 and 100?

Convert values to z The question is about a mean, so we must use the standard error of the mean to find the z values. The SEM is 15 divided by root 16 (4), that is, So z = (92.5 – 100)/3.75 = –2. For 100, z = 0.

95% of values lie between –2 and +2. So green area is 47.5%. The probability is

Two populations Suspend your disbelief and suppose that two barrels each contain millions of tickets, on each of which is the value of an IQ. So each barrel contains a normal distribution with mean 100 and SD 15. I draw a sample of size 16 from each barrel and calculate the means M1 and M2. I also calculate the difference M1 – M2 and put it in a third barrel. The process is repeated millions of times. The third barrel now contains the sampling distribution of the DIFFERENCE (between means). The sampling distribution of the difference is also normal.

Barrels

Another random variable
We have seen that the sample mean M is a random variable, whose probability distribution is the sampling distribution of the mean. The difference between means M1 – M2 is also a random variable. Its probability distribution is known as the SAMPLING DISTRIBUTION OF THE DIFFERENCE (between means).

Sampling distribution of the difference (between means)

Variance of the difference
We have seen that the sample means M1 and M2 are random variables. They are INDEPENDENT random variables – separate barrels. The variance of the sum OR DIFFERENCE BETWEEN independent random variables is the sum of their separate variances. (Remember that a variance cannot be negative.)

Sampling variance of the difference
Sampling variance of means from the first barrel: From the second: Sampling variance of M1– M2 : Standard error of the difference between means

Standard error of the difference

In our example,

Question I draw a sample of size 16 from each of two identical IQ distributions, with mean 100 and SD 15. What is the probability that the difference (M1 – M2) is at least ? What is the probability of a difference in EITHER direction?

Answer The question is about a difference between means, so we must refer to the sampling distribution of the difference. We have found that the standard error of the difference is As usual, we convert the value to z: z = (10.61 – 0)/ = +2 . So we want the probability of a value of z at least as great as +2.

We know that .025 (2.5%) of the distribution lies above z = 1.96 (2 approx). So the probability of a difference greater than is .025. The probability of a difference this large in EITHER direction is .025 × 2 = .05 .

Summary The three most important properties of a distribution are LEVEL, SPREAD and SHAPE. Several measures of these properties were discussed. The notion of population was introduced and the notion of probability introduced in that context. The concept of a sampling distribution was introduced. The sampling distributions of the mean and of the difference between means were discussed. Questions about the probabilities of ranges of values for the mean and difference between means can be answered with reference to the standard normal distribution.

Appendix PROBABILITY

An experiment of chance
An EXPERIMENT OF CHANCE is a procedure with an uncertain outcome, such as tossing a coin or rolling a die. The classical notion of PROBABILITY arises in the context of an experiment of chance.

The sample space Consider an experiment of chance in which a coin is tossed and a die is rolled. There are twelve possible outcomes, which can be set out in an array called a SAMPLE SPACE (S). Each outcome is known as an ELEMENTARY EVENT. The number of elementary events, n(S), is 12.

Drawing of the sample space

Drawing of an event space

The classical definition revisited
Let E be “a one or a two on the die”. Then n(E) = 4. Following the classical definition of a probability

Complementary events Two elements are complementary if they are
Mutually exclusive; Exhaustive. If E is “a one or a two on the die”, the event “not E”, which is denoted by Ē, is “any other number on the die”. Events E and Ē are complementary: they have no common outcome points and they exhaust the possibilities.

Probabilities of complementary events
If E and Ē are complementary events, their probabilities, p and q, respectively, sum to zero. So p + q = 1; p = 1 – q; q = 1 – p.

Mutually exclusive events
Two events, A and B, are said to be MUTUALLY EXCLUSIVE if the probability of their joint occurrence is zero. In terms of S, the event spaces of A and B have no elementary outcome points in common. For example, if A is “a six on the die” and B is “a one or a two on the die”, A and B are mutually exclusive.

Two mutually exclusive events

The exclusive OR rule If A and B are two mutually exclusive events, the Probability of either occurring, that is, Prob(A or B), is the sum of their separate probabilities.

In our example,

Independent events Two events A and B are INDEPENDENT if the occurrence of either has no effect upon the probability of the occurrence of the other. For example, if A is “a head” and B is “a six”, A and B are independent.

AND rule for independent events
If events A and B are independent, the probability of their joint occurrence Prob(A and B) is the product of their separate probabilities. In our example,

References Hogben, L. (1967). Mathematics for the million. London: Pan Books. Chapter 12. The Algebra of Choice and Chance. Ross, S, (1976). A first course in probability New York: Macmillan. Pages 20 onwards. Woodroofe, M. (1975). Probability with Applications. Tokyo: McGraw-Hill Kogakusha. Chapter 2 - page 38 in particular.

My contact details Colin Gray Room S2 (Thursday mornings, especially)

Similar presentations

Presentation on theme: "My contact details Colin Gray Room S2 (Thursday mornings, especially)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

My contact details Colin Gray Room S2 (Thursday mornings, especially)

Similar presentations

Presentation on theme: "My contact details Colin Gray Room S2 (Thursday mornings, especially)"— Presentation transcript:

Similar presentations

About project

Feedback