Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION

Slides:



Advertisements
Similar presentations
The t Test for Two Independent Samples
Advertisements

Introductory Mathematics & Statistics for Business
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS Univariate Distributions
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Overview of Lecture Parametric vs Non-Parametric Statistical Tests.
Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
C82MST Statistical Methods 2 - Lecture 2 1 Overview of Lecture Variability and Averages The Normal Distribution Comparing Population Variances Experimental.
Your lecturer and course convener
Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS
Multiple-choice question
1 Matters arising 1.Summary of last weeks lecture 2.The exercises 3.Your queries.
1 Contact details Colin Gray Room S16 (occasionally) address: Telephone: (27) 2233 Dont hesitate to get in touch.
Lecture 7 POWER IN STATISTICAL TESTING
Chapter 7 Sampling and Sampling Distributions
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
3/2003 Rev 1 I – slide 1 of 33 Session I Part I Review of Fundamentals Module 2Basic Physics and Mathematics Used in Radiation Protection.
You will need Your text Your calculator
Elementary Statistics
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 10-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
(This presentation may be used for instructional purposes)
Chi-Square and Analysis of Variance (ANOVA)
5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.
Hypothesis Tests: Two Independent Samples
Comparing several means: ANOVA (GLM 1)
Chapter 15 ANOVA.
Module 16: One-sample t-tests and Confidence Intervals
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
Putting Statistics to Work
Chi-square and F Distributions
Statistical Inferences Based on Two Samples
© The McGraw-Hill Companies, Inc., Chapter 10 Testing the Difference between Means and Variances.
Chapter Thirteen The One-Way Analysis of Variance.
Ch 14 實習(2).
Chapter 8 Estimation Understandable Statistics Ninth Edition
Intracellular Compartments and Transport
PSSA Preparation.
Chapter 11: The t Test for Two Related Samples
Experimental Design and Analysis of Variance
Essential Cell Biology
1 Chapter 20: Statistical Tests for Ordinal Data.
Lecture 11 One-way analysis of variance (Chapter 15.2)
Simple Linear Regression Analysis
Multiple Regression and Model Building
January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Section 7-2 Estimating a Population Proportion Created by Erin.
9. Two Functions of Two Random Variables
Commonly Used Distributions
1 1 Slide © 2009, Econ-2030 Applied Statistics-Dr Tadesse Chapter 10: Comparisons Involving Means n Introduction to Analysis of Variance n Analysis of.
Experimental Design & Analysis
Lecture 9: One Way ANOVA Between Subjects
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
AM Recitation 2/10/11.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
1 1 Slide © 2006 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
One-Way Analysis of Variance Comparing means of more than 2 independent samples 1.
Testing Hypotheses about Differences among Several Means.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Interval Estimation and Hypothesis Testing Prepared by Vera Tabakova, East Carolina University.
Analysis of Variance (One Factor). ANOVA Analysis of Variance Tests whether differences exist among population means categorized by only one factor or.
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics S eventh Edition By Brase and Brase Prepared by: Lynn Smith.
Introduction to ANOVA Research Designs for ANOVAs Type I Error and Multiple Hypothesis Tests The Logic of ANOVA ANOVA vocabulary, notation, and formulas.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 4 Investigating the Difference in Scores.
 List the characteristics of the F distribution.  Conduct a test of hypothesis to determine whether the variances of two populations are equal.  Discuss.
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Interval Estimation and Hypothesis Testing
Presentation transcript:

Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION

Between subjects experiments The caffeine experiment was of between subjects design, that is, each participant was tested under only one condition. Participants were RANDOMLY ASSIGNED to the conditions, so that there was no basis on which the data could be paired. Between subjects experiments result in INDEPENDENT SAMPLES of data.

More than two conditions In more complex experiments, there may be three or more conditions. For example, we could compare the performance of groups of participants who have ingested four different supposedly performance-enhancing drugs with that of a control or placebo group.

Factors In the context of analysis of variance (ANOVA), a FACTOR is a set of related treatments, conditions or categories. The ANOVA term ‘factor’ is a synonym for the term ‘independent variable’.

One-factor experiments In the drug experiment, there is just ONE set of (drug-related) conditions. The experiment therefore has ONE treatment factor. The conditions making up a factor are known as its LEVELS. In the drug experiment, the treatment factor has 5 levels.

Results of the experiment raw scores grand mean

Statistics of the results group (cell) means Group (cell) variances group (cell) standard deviations

The null hypothesis The null hypothesis states that, in the population, all the means have the same value. We cannot test this hypothesis with the t statistic.

The alternative hypothesis The alternative hypothesis is that, in the population, the means do NOT all have the same value. MANY POSSIBILITIES are implied by H1.

The One-way ANOVA The ANOVA of a one-factor between groups experiment is also known as the ONE-WAY ANOVA. The one-way ANOVA must be sharply distinguished from the one-factor WITHIN SUBJECTS (or REPEATED MEASURES) ANOVA, which is appropriate when participants are tested at every level of the treatment factor. The between subjects and within subjects ANOVA are based upon different statistical models.

There are some large differences among the five treatment means, suggesting that the null hypothesis is false.

Mean square (MS) In ANOVA, the numerator of a variance estimate is known as a SUM OF SQUARES (SS). The denominator is known as the DEGREES OF FREEDOM (df). The variance estimate itself is known as a MEAN SQUARE (MS), so that MS = SS/df .

Accounting for variability grand mean Accounting for variability total deviation between groups deviation within groups deviation The building block for any variance estimate is a DEVIATION of some sort. The TOTAL DEVIATION of any score from the grand mean (GM) can be divided into 2 components: 1. a BETWEEN GROUPS component; 2. a WITHIN GROUPS component.

Example of the breakdown The score, the group mean and the grand mean have been ringed in the table. This breakdown holds for each of the fifty scores in the data set. score grand mean group mean

Breakdown (partition) of the total sum of squares If you sum the squares of the deviations over all 50 scores, you obtain an expression which breaks down the total variability in the scores into between groups and within groups components.

How ANOVA works The variability BETWEEN the treatment means is compared with the average spread of scores around their means WITHIN the treatment groups. The comparison is made with a statistic called the F-RATIO.

The variances of the scores in each group around their group mean are averaged to obtain a WITHIN GROUPS MEAN SQUARE

From the values of the five treatment means, a BETWEEN GROUPS MEAN SQUARE is calculated.

The statistic F is calculated by dividing the between groups MS by the within groups MS thus

The F ratio

The value of the MSbetween , since it is calculated from the MEANS, reflects random error, plus any real differences among the population means that there may be.

The value of MSwithin , since it is calculated only from the variances of the scores within groups and ignores the values of the group means, reflects ONLY RANDOM ERROR.

What F is measuring If there are differences among the population means, the numerator will be inflated and F will increase. If there are no differences, F will be close to 1. error + real differences error only

Expectations If the null hypothesis is true, the values of MSbetween and MSwithin will be similar, because both variance estimates merely reflect individual differences and random variation or ERROR. If so, the value of F will be around 1. If the null hypothesis is false, real differences among the population means will inflate the value of MSbetween but the value of MSwithin will be unaffected. The result will be a LARGE value of F.

Range of variation of F The F statistic is the ratio of two sample variances. A variance can take only non-negative values. So the lower limit for F is zero. There is no upper limit for F.

Imagine… Suppose the null hypothesis is true. Imagine the experiment were to be repeated thousands and thousands of times, with fresh samples of participants each time. There would be thousands and thousands of data sets, from each of which a value of F could be calculated.

Sampling distribution To test the null hypothesis, you must be able to locate YOUR value of F in the population or PROBABILITY DISTRIBUTION of such values. The probability distribution of a statistic is known as its SAMPLING DISTRIBUTION. To specify a sampling distribution, you must assign values to properties known as PARAMETERS.

Parameters of F Recall that the t distribution has ONE parameter: the DEGREES OF FREEDOM (df ). The F distribution has TWO parameters: the degrees of freedom of the between groups and within groups mean squares, which we shall denote by dfbetween and dfwithin, respectively.

Rule for finding the degrees of freedom There’s a useful rule for finding the degrees of freedom of a statistic. Take the number of independent observations and subtract the number of parameters estimated. The sample variance of n scores is based upon n independent observations. But to obtain the deviations, we need an estimate of ONE parameter, namely, the mean. So the degrees of freedom of the sample variance is n – 1, not n.

Rule for obtaining the df

Degrees of freedom of the two mean squares The degrees of freedom of MSbetween is the number of treatment groups minus 1. (One parameter estimated: the grand mean.) The degrees of freedom of MSwithin is the total number of scores minus the number of treatment groups. (Five parameters are estimated: the five group means.)

The correct F distribution We shall specify an F distribution with the notation F(dfbetween, dfwithin). We have seen that in our example, dfbetween = 4 and dfwithin = 45. The correct F distribution for our test of the null hypothesis is therefore F(4, 45).

The distribution of F(1, 45) F distributions are POSITIVELY SKEWED, i.e., they have a long tail to the right. However, the shape of F varies quite markedly with the values of the df.

The distribution of F(4, 45)

Distribution of F(4, 45) The critical region is in the upper tail of this F distribution. If we set the significance level at .05, the value of F must be at least 2.6. The value 2.58 is the 95th Percentile of the distribution F(4, 45).

The F distribution F(dfbetween, dfwithin) = F(4, 45) .05 .95 F 95th percentile = 2.58 An F distribution is asymmetric, with an infinitely long tail to the right. The critical region lies above the 95th percentile which, in this F distribution, is 2.58.

The ANOVA summary table F large, nine times larger than unity, the expected value from the null hypothesis and well over the critical value 2.58. The p-value (Sig.) <.01. So F is significant beyond the .01 level. Write this result as follows: ‘with an alpha-level of .05, F is significant: F(4, 45) = 9.09; p <.01’. Do NOT write the p-value as ‘.000’! Notice that SStotal= SSbetween groups + SSwithin groups

SPSS advice A few general points. Give close attention to the labels you give to your variables, and to the appearance of your data. Unnecessary decimal places clutter the display. It is particularly important to assign VALUE LABELS to the code numbers you choose for any grouping variables. Specify also the LEVEL OF MEASUREMENT of each variable.

Start in Variable View Work in Variable View first, amending the settings so that when you enter Data View, your variables are already labelled, the scores appear without unnecessary decimals and you will have the option of displaying the value labels of your grouping variable.

Graphics The latest SPSS graphics require you to specify the level of measurement of the data on each variable. The group code numbers are at the NOMINAL level of measurement, because they are merely CATEGORY LABELS. Make the appropriate entry in the Measure column.

Grouping variables To instruct SPSS to analyse data from between subjects experiments, you must construct a GROUPING VARIABLE consisting of code numbers identifying the treatment condition under which a score was achieved. So we could set 1 = Placebo, 2 = Drug A, 3 = Drug B, 4 = Drug C, and 5 = Drug D.

Data View This is what Data View will look like. The entry of data for an ANOVA on SPSS is similar to the procedure we followed when making an independent-samples t-test. On the right, the VALUE LABELS are displayed, instead of the values themselves. (This option appears in the Data menu.)

Assignment of values in Variable View

Variable View completed Note the setting of Decimals so that only whole numbers will appear in Data View. Note the informative variable LABELS, which will appear in the output. Note the VALUE LABELS giving the key to the code numbers you have chosen for your grouping variable. (The ‘values’ themselves are the code numbers you have chosen.)

The One-Way ANOVA dialog box

More statistics By clicking Options, you can order more statistics than would normally appear in the ANOVA output. Click the Descriptive button to order the extra statistics and then Continue, to return to the ANOVA dialog box.

A word of warning Modern computing packages such as SPSS afford a bewildering variety of attractive graphs and displays to help you bring out the most important features of your results. You should certainly use them. But there are pitfalls awaiting the unwary. Suppose the drug experiment had turned out rather differently. The researcher proceeds as follows.

Ordering a means plot

A picture of the results

The picture is false! The table of means shows miniscule differences among the five group means. The value of F is very small indeed. The p-value of F is very high – unity to two places of decimals. The experiment has failed to show that any of the drugs works.

A small scale view Only a microscopically small section of the scale is shown on the vertical axis. This greatly magnifies even small differences among the group means.

Putting things right Double-click on the image to get into the Graph Editor. Double-click on the vertical axis to access the scale specifications. Click here

Putting things right … Uncheck the minimum value box and enter zero as the desired minimum point. Click Apply. Amend entry

The true picture!

The true picture … The effect is dramatic. The profile now reflects the true situation. Always be suspicious of graphs that do not show the complete vertical scale.

Summary In the one-way ANOVA, we compare two variance estimates, MSbetween and MSwithin by means of their ratio, which is called the F statistic. If F is large, we conclude that there is at least one significant difference somewhere among the array of treatment means.

Multiple-choice question

Multiple-choice example