Presentation is loading. Please wait. # Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION

## Presentation on theme: "Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION"— Presentation transcript:

Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION

Between subjects experiments
The caffeine experiment was of between subjects design, that is, each participant was tested under only one condition. Participants were RANDOMLY ASSIGNED to the conditions, so that there was no basis on which the data could be paired. Between subjects experiments result in INDEPENDENT SAMPLES of data.

More than two conditions
In more complex experiments, there may be three or more conditions. For example, we could compare the performance of groups of participants who have ingested four different supposedly performance-enhancing drugs with that of a control or placebo group.

Factors In the context of analysis of variance (ANOVA), a FACTOR is a set of related treatments, conditions or categories. The ANOVA term ‘factor’ is a synonym for the term ‘independent variable’.

One-factor experiments
In the drug experiment, there is just ONE set of (drug-related) conditions. The experiment therefore has ONE treatment factor. The conditions making up a factor are known as its LEVELS. In the drug experiment, the treatment factor has 5 levels.

Results of the experiment
raw scores grand mean

Statistics of the results
group (cell) means Group (cell) variances group (cell) standard deviations

The null hypothesis The null hypothesis states that, in the population, all the means have the same value. We cannot test this hypothesis with the t statistic.

The alternative hypothesis
The alternative hypothesis is that, in the population, the means do NOT all have the same value. MANY POSSIBILITIES are implied by H1.

The One-way ANOVA The ANOVA of a one-factor between groups experiment is also known as the ONE-WAY ANOVA. The one-way ANOVA must be sharply distinguished from the one-factor WITHIN SUBJECTS (or REPEATED MEASURES) ANOVA, which is appropriate when participants are tested at every level of the treatment factor. The between subjects and within subjects ANOVA are based upon different statistical models.

There are some large differences among the five treatment means, suggesting that the null hypothesis is false.

Mean square (MS) In ANOVA, the numerator of a variance estimate is known as a SUM OF SQUARES (SS). The denominator is known as the DEGREES OF FREEDOM (df). The variance estimate itself is known as a MEAN SQUARE (MS), so that MS = SS/df .

Accounting for variability
grand mean Accounting for variability total deviation between groups deviation within groups deviation The building block for any variance estimate is a DEVIATION of some sort. The TOTAL DEVIATION of any score from the grand mean (GM) can be divided into 2 components: 1. a BETWEEN GROUPS component; 2. a WITHIN GROUPS component.

Example of the breakdown
The score, the group mean and the grand mean have been ringed in the table. This breakdown holds for each of the fifty scores in the data set. score grand mean group mean

Breakdown (partition) of the total sum of squares
If you sum the squares of the deviations over all 50 scores, you obtain an expression which breaks down the total variability in the scores into between groups and within groups components.

How ANOVA works The variability BETWEEN the treatment means is compared with the average spread of scores around their means WITHIN the treatment groups. The comparison is made with a statistic called the F-RATIO.

The variances of the scores in each group around their group mean are averaged to obtain a WITHIN GROUPS MEAN SQUARE

From the values of the five treatment means, a BETWEEN GROUPS MEAN SQUARE is calculated.

The statistic F is calculated by dividing the between groups MS by the within groups MS thus

The F ratio

The value of the MSbetween , since it is calculated from the MEANS, reflects random error, plus any real differences among the population means that there may be.

The value of MSwithin , since it is calculated only from the variances of the scores within groups and ignores the values of the group means, reflects ONLY RANDOM ERROR.

What F is measuring If there are differences among the population means, the numerator will be inflated and F will increase. If there are no differences, F will be close to 1. error + real differences error only

Expectations If the null hypothesis is true, the values of MSbetween and MSwithin will be similar, because both variance estimates merely reflect individual differences and random variation or ERROR. If so, the value of F will be around 1. If the null hypothesis is false, real differences among the population means will inflate the value of MSbetween but the value of MSwithin will be unaffected. The result will be a LARGE value of F.

Range of variation of F The F statistic is the ratio of two sample variances. A variance can take only non-negative values. So the lower limit for F is zero. There is no upper limit for F.

Imagine… Suppose the null hypothesis is true.
Imagine the experiment were to be repeated thousands and thousands of times, with fresh samples of participants each time. There would be thousands and thousands of data sets, from each of which a value of F could be calculated.

Sampling distribution
To test the null hypothesis, you must be able to locate YOUR value of F in the population or PROBABILITY DISTRIBUTION of such values. The probability distribution of a statistic is known as its SAMPLING DISTRIBUTION. To specify a sampling distribution, you must assign values to properties known as PARAMETERS.

Parameters of F Recall that the t distribution has ONE parameter: the DEGREES OF FREEDOM (df ). The F distribution has TWO parameters: the degrees of freedom of the between groups and within groups mean squares, which we shall denote by dfbetween and dfwithin, respectively.

Rule for finding the degrees of freedom
There’s a useful rule for finding the degrees of freedom of a statistic. Take the number of independent observations and subtract the number of parameters estimated. The sample variance of n scores is based upon n independent observations. But to obtain the deviations, we need an estimate of ONE parameter, namely, the mean. So the degrees of freedom of the sample variance is n – 1, not n.

Rule for obtaining the df

Degrees of freedom of the two mean squares
The degrees of freedom of MSbetween is the number of treatment groups minus 1. (One parameter estimated: the grand mean.) The degrees of freedom of MSwithin is the total number of scores minus the number of treatment groups. (Five parameters are estimated: the five group means.)

The correct F distribution
We shall specify an F distribution with the notation F(dfbetween, dfwithin). We have seen that in our example, dfbetween = 4 and dfwithin = 45. The correct F distribution for our test of the null hypothesis is therefore F(4, 45).

The distribution of F(1, 45)
F distributions are POSITIVELY SKEWED, i.e., they have a long tail to the right. However, the shape of F varies quite markedly with the values of the df.

The distribution of F(4, 45)

Distribution of F(4, 45) The critical region is in the upper tail of this F distribution. If we set the significance level at .05, the value of F must be at least 2.6. The value 2.58 is the 95th Percentile of the distribution F(4, 45).

The F distribution F(dfbetween, dfwithin) = F(4, 45) .05 .95 F 95th percentile = 2.58 An F distribution is asymmetric, with an infinitely long tail to the right. The critical region lies above the 95th percentile which, in this F distribution, is 2.58.

The ANOVA summary table
F large, nine times larger than unity, the expected value from the null hypothesis and well over the critical value 2.58. The p-value (Sig.) <.01. So F is significant beyond the .01 level. Write this result as follows: ‘with an alpha-level of .05, F is significant: F(4, 45) = 9.09; p <.01’. Do NOT write the p-value as ‘.000’! Notice that SStotal= SSbetween groups + SSwithin groups

SPSS advice A few general points.
Give close attention to the labels you give to your variables, and to the appearance of your data. Unnecessary decimal places clutter the display. It is particularly important to assign VALUE LABELS to the code numbers you choose for any grouping variables. Specify also the LEVEL OF MEASUREMENT of each variable.

Start in Variable View Work in Variable View first, amending the settings so that when you enter Data View, your variables are already labelled, the scores appear without unnecessary decimals and you will have the option of displaying the value labels of your grouping variable.

Graphics The latest SPSS graphics require you to specify the level of measurement of the data on each variable. The group code numbers are at the NOMINAL level of measurement, because they are merely CATEGORY LABELS. Make the appropriate entry in the Measure column.

Grouping variables To instruct SPSS to analyse data from between subjects experiments, you must construct a GROUPING VARIABLE consisting of code numbers identifying the treatment condition under which a score was achieved. So we could set 1 = Placebo, 2 = Drug A, 3 = Drug B, 4 = Drug C, and 5 = Drug D.

Data View This is what Data View will look like.
The entry of data for an ANOVA on SPSS is similar to the procedure we followed when making an independent-samples t-test. On the right, the VALUE LABELS are displayed, instead of the values themselves. (This option appears in the Data menu.)

Assignment of values in Variable View

Variable View completed
Note the setting of Decimals so that only whole numbers will appear in Data View. Note the informative variable LABELS, which will appear in the output. Note the VALUE LABELS giving the key to the code numbers you have chosen for your grouping variable. (The ‘values’ themselves are the code numbers you have chosen.)

The One-Way ANOVA dialog box

More statistics By clicking Options, you can order more statistics than would normally appear in the ANOVA output. Click the Descriptive button to order the extra statistics and then Continue, to return to the ANOVA dialog box.

A word of warning Modern computing packages such as SPSS afford a bewildering variety of attractive graphs and displays to help you bring out the most important features of your results. You should certainly use them. But there are pitfalls awaiting the unwary. Suppose the drug experiment had turned out rather differently. The researcher proceeds as follows.

Ordering a means plot

A picture of the results

The picture is false! The table of means shows miniscule differences among the five group means. The value of F is very small indeed. The p-value of F is very high – unity to two places of decimals. The experiment has failed to show that any of the drugs works.

A small scale view Only a microscopically small section of the scale is shown on the vertical axis. This greatly magnifies even small differences among the group means.

Putting things right Double-click on the image to get into the Graph Editor. Double-click on the vertical axis to access the scale specifications. Click here

Putting things right … Uncheck the minimum value box and enter zero as the desired minimum point. Click Apply. Amend entry

The true picture!

The true picture … The effect is dramatic.
The profile now reflects the true situation. Always be suspicious of graphs that do not show the complete vertical scale.

Summary In the one-way ANOVA, we compare two variance estimates, MSbetween and MSwithin by means of their ratio, which is called the F statistic. If F is large, we conclude that there is at least one significant difference somewhere among the array of treatment means.

Multiple-choice question

Multiple-choice example

Download ppt "Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION"

Similar presentations

Ads by Google