ANALYSIS OF VARIANCE  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors. One factor.

Slides:



Advertisements
Similar presentations
Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.
Advertisements

Unreplicated ANOVA designs Block and repeated measures analyses  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.
Analysis of covariance Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis.
Inference for Regression
Model Adequacy Checking in the ANOVA Text reference, Section 3-4, pg
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Linear regression models
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.
Copyright, Gerry Quinn & Mick Keough, 1998 Please do not copy or distribute this file without the authors’ permission Experimental Design & Analysis Factorial.
© 2010 Pearson Prentice Hall. All rights reserved The Complete Randomized Block Design.
Independent Sample T-test Formula
Lecture 10 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
Lecture 13 – Tues, Oct 21 Comparisons Among Several Groups – Introduction (Case Study 5.1.1) Comparing Any Two of the Several Means (Chapter 5.2) The One-Way.
Part I – MULTIVARIATE ANALYSIS
ANOVA Analysis of Variance: Why do these Sample Means differ as much as they do (Variance)? Standard Error of the Mean (“variance” of means) depends upon.
Chapter 3 Analysis of Variance
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
PSY 307 – Statistics for the Behavioral Sciences
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Lecture 9: One Way ANOVA Between Subjects
Copyright © 2006 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide Are the Means of Several Groups Equal? Ho:Ha: Consider the following.
One-way Between Groups Analysis of Variance
Lecture 12 One-way Analysis of Variance (Chapter 15.2)
Ch. 14: The Multiple Regression Model building
One-Way ANOVA Independent Samples. Basic Design Grouping variable with 2 or more levels Continuous dependent/criterion variable H  :  1 =  2 =... =
Administrata New final exam schedule: Handed out Tues, Dec 3
Lecture 5 Correlation and Regression
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
PS 225 Lecture 15 Analysis of Variance ANOVA Tables.
Inference for regression - Simple linear regression
Copyright, Gerry Quinn & Mick Keough, 1998 Please do not copy or distribute this file without the authors’ permission Experimental design and analysis.
ANOVA Greg C Elvers.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Comparing Three or More Means 13.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
One-Way Analysis of Variance Comparing means of more than 2 independent samples 1.
Chapter 11 HYPOTHESIS TESTING USING THE ONE-WAY ANALYSIS OF VARIANCE.
t(ea) for Two: Test between the Means of Different Groups When you want to know if there is a ‘difference’ between the two groups in the mean Use “t-test”.
Sociology 5811: Lecture 14: ANOVA 2
PSY 307 – Statistics for the Behavioral Sciences Chapter 16 – One-Factor Analysis of Variance (ANOVA)
Chapter 13 Analysis of Variance (ANOVA) PSY Spring 2003.
ANOVA (Analysis of Variance) by Aziza Munir
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap th Lesson Analysis of Variance.
Psychology 301 Chapters & Differences Between Two Means Introduction to Analysis of Variance Multiple Comparisons.
Testing Hypotheses about Differences among Several Means.
Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis.
Jeopardy Opening Robert Lee | UOIT Game Board $ 200 $ 200 $ 200 $ 200 $ 200 $ 400 $ 400 $ 400 $ 400 $ 400 $ 10 0 $ 10 0 $ 10 0 $ 10 0 $ 10 0 $ 300 $
I. Statistical Tests: A Repetive Review A.Why do we use them? Namely: we need to make inferences from incomplete information or uncertainty þBut we want.
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
Analysis of Variance (One Factor). ANOVA Analysis of Variance Tests whether differences exist among population means categorized by only one factor or.
Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D.
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Randomized block designs  Environmental sampling and analysis (Quinn & Keough, 2002)
Chapter 12 Introduction to Analysis of Variance PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Eighth Edition by Frederick.
ANOVA P OST ANOVA TEST 541 PHL By… Asma Al-Oneazi Supervised by… Dr. Amal Fatani King Saud University Pharmacy College Pharmacology Department.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
One-way ANOVA Example Analysis of Variance Hypotheses Model & Assumptions Analysis of Variance Multiple Comparisons Checking Assumptions.
Introduction to ANOVA Research Designs for ANOVAs Type I Error and Multiple Hypothesis Tests The Logic of ANOVA ANOVA vocabulary, notation, and formulas.
Analysis of Variance STAT E-150 Statistical Methods.
Formula for Linear Regression y = bx + a Y variable plotted on vertical axis. X variable plotted on horizontal axis. Slope or the change in y for every.
BPS - 5th Ed. Chapter 231 Inference for Regression.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Chapter 12 Introduction to Analysis of Variance
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
Analysis of variance ANOVA.
Comparing Three or More Means
CHAPTER 29: Multiple Regression*
I. Statistical Tests: Why do we use them? What do they involve?
The Analysis of Variance
Presentation transcript:

ANALYSIS OF VARIANCE  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors. One factor

Barnacle recruitment Effects of 4 surface types on barnacle recruitment on rocky shore Surface type is independent variable: –alga sp.1, alga sp.2, bare rock, scraped rock Independent variable: –categorical with 2 or more levels –factor –levels termed groups (“treatments”)

Five replicate plots for each surface type Dependent (response) variable: –number barnacles after 4 weeks etc. Bare Alg 2 Scraped Alg 1 Scraped

Treatment group: Alga sp.1Alga sp.2BareScraped Data Number of barnacles per plot

ANOVA vs regression One factor ANOVA: –1 continuous dependent variable and 1 categorical independent variable (factor) Compare with regression: –1 continuous dependent variable and 1 continuous independent variable

Aims Measure relative contribution of different sources of variation (factors or combination of factors) to total variation in dependent variable Test hypotheses about group (treatment) population means for dependent variable

Terminology Factor (independent variable): –usually designated factor A –number of levels/groups/treatments = a Number of replicates within each group –n–n Each observation: –y–y

Data layout Factor level (group)12…i Replicatesy 11 y 21...y i1 y 1j y 2j...y ij y 1n y 2n...y in Population means  1  2  i Sample means y 1 y 2 y i Grand mean y estimates 

Alg sp1Alg sp2BareScraped y 11 =27y 21 =24y 31 = 9y 41 =12 y 12 =19y 22 =33y 32 =13y 42 = 8 y 13 =18y 23 =27y 33 =17y 43 =15 y 14 =23y 24 =26y 34 =14y 44 =20 y 15 =25y 25 =32y 35 =22y 45 =11 Means: Overall mean: 19.75

Linear model for 1 factor ANOVA The linear model for 1 factor ANOVA: y ij =  +  i +  ij where  = overall population mean  i = effect of ith treatment or group (  -  i )  ij = random or unexplained error (i.e. variation not explained by treatment effects)

Compare with regression model y i =  0 +  1 x 1 +  i intercept is replaced by  slope is replaced by  i (treatment effect): –independent variable is categorical rather than continuous –still measures “effect” of independent variable

Types of factor Fixed factor: –all levels or groups of interest are used in study –conclusions are restricted to those groups Random factor: –random sample of all groups of interest are used in study –conclusions extrapolate to all possible groups

Null hypothesis Ho:  1 =  2 =  i =  No difference between population group (treatment) means Mean number of barnacle recruits is same on four substrata

H O - fixed factor Treatment or group effects:  1 = (  1 -  ),  2 = (  2 -  ),  i = (   -  ) where  i = the effect of group or treatment i H O :  1 =  2 =  i = 0 No group (treatment) effects No effect of any surface type on barnacle recruitment

H O - random factor H O :  1 =  2 =  i =  a =  i.e. all possible group means the same H O :  a 2 = 0 i.e. no variance between groups

Basic assumption of ANOVA  1 2 =  2 2 =  i 2 =  2 where  i 2 = population variance of dependent variable (y) in each group Each group (or treatment) population has same variance –homogeneity of variance assumption

Partitioning variation Variation in DV partitioned into: –variation explained by difference between groups (or treatments) –variation not explained (residual variation)

SS Total SS Between groups + SS Within groups (Residual)

SS Total = ( ) 2 + ( ) ( ) 2 = Total variation in dependent variable (y) across all groups

SS Between groups = 5*[( ) 2 + ( ) 2 + etc. = Variation between group means = treatment variation

SS Residual = Within groups = ( ) 2 + ( ) ( ) 2 = Pooled variation between replicates within each group

Mean squares Average sum-of-squared deviations Degrees of freedom: –number of components minus 1 –df total [an-1] = df groups [a-1] + df residual [a(n-1)] Mean square is a variance: –SS divided by df

SourceSSdfMS Groupsa-1 Residuala(n-1) Totalan-1 ANOVA table

Worked example SourceSSdfMS Groups Residual Total

Treatments (= groups) explain nothing, ie. SS Groups equals zero ReplicateGroup1Group2Group3Group Mean Grand mean = 16.0

Treatments (= groups) explain everything, ie. SS Residual equals zero ReplicateGroup1Group2Group3Group Mean Grand mean = 16.0

Testing ANOVA H O All population group means the same  1 =  2 =  i =  a =  No population group or treatment effects a 1 = a 2 = a i = 0

F-ratio statistic F-ratio statistic is ratio of 2 sample variances (i.e. 2 mean squares) Probability distribution of F-ratio known –different distributions depending on df of 2 variances If homogeneity of variances holds, F- ratio follows F-distribution

F-ratio distribution F P(F)P(F) 3, 16 df

Expected mean squares If fixed factor and if homogeneity of variance assumption holds: MS Groups estimates  2 + n  (  i ) 2 /a-1 MS Residual estimates  2

If H O is true: –all a i ’s = 0 –MS Groups and MS Residual both estimate  2 –so F-ratio = 1 If H O is false: –at least one a i  0 –MS Groups estimates  2 + treatment effects –so F-ratio > 1

Expected mean squares If random factor and homogeneity of variance assumption holds: MS Groups estimates  2 + n  a 2 MS Residual estimates  2

If H O is true: –  a 2 = 0 –MS Groups and MS Residual both estimate  2 –so F-ratio = 1 If H O is false: –  a 2 > 0 –MS Groups estimates  2 plus added variance due to groups or treatments –so F-ratio > 1

Worked example MS Groups = MS Residual = F-ratio = /18.58 = Probability of getting F-ratio of (or larger) if H O true (and F-ratio should be 1)?

Testing H O Compare sample F-ratio (13.22) to probability distribution of F-ratio: –distribution of F when H O is true (sampling distribution of F-ratio) Degrees of freedom: –df Groups = 3 –df Residual =16

F P(F)P(F) 3, 16 df F = 3.24  = 0.05

Any F-ratio > 3.24 has < 0.05 (5%) chance of occurring if H O is true F = >> 3.24 Much less than 0.05 chance of occurring if H O is true We reject H O –statistically significant result

Assumptions ANOVA assumptions apply to dependent variable. Observations within each group come from normally distributed populations ANOVA robust: –use boxplots to check for skewness and outliers

Assumptions (cont.) Variances of group populations are the same - homogeneity of variance assumption –skewed populations produce unequal group variances –ANOVA reliable if group n’s are equal and variances not too different: ratio of largest to smallest variance  3:1

Residuals in ANOVA Residual: difference between observed and predicted value of dependent variable in ANOVA, residual is difference between each y-value and group mean

Residual Mean Residual plots - residuals vs group means Even spread of residuals Assumptions OK

Residual plots Mean Residual Wedge-shaped spread of residuals Indicates unequal variances and skewed dependent variable Transformation will help

Variance vs mean plot Plot group variances against group means In skewed distributions (lognormal and Poisson), variance is +vely related to mean In normal distributions, variance is independent of mean

Variance vs mean No relationship between variance and mean Distribution(s) probably normal Variance Mean

Variance vs mean Positive relationship between variance and mean Distribution(s) probably skewed Transformation required Mean Variance

Barnacle example No pattern in residuals Normality & homogeneity of variances OK Group mean Residual

Barnacle example No relationship between variance and mean Suggests non- skewed distribution Group mean Group variance

Asssumptions (cont.) Data should be independent within and between groups –no replicate used more than once –must be considered at design stage

ANOVA with 2 groups Null hypothesis: –no difference between 2 population means ANOVA F-test or t-test F = t 2 P-values identical

Specific comparisons of groups

Type I error Probability of rejecting H O when true –probability of false significant result Set by significance level (e.g. 0.05) –5% chance of falsely rejecting H O Probability of Type I error for each separate test

Specific comparisons of means Which groups are significantly different from which? Multiple pairwise t-tests: –each test with  = 0.05 Increasing Type I error rate: –probability of at least one Type I error among all comparisons (family-wise Type I error rate) increases

No. of No. of Familywise groups comparisonsprobability Type I error

Unplanned pairwise comparisons

Unplanned comparisons Comparisons done after a significant ANOVA F-test Usually comparing each group to each other group: –which are significantly different from which? Lots of comparisons: –not independent

Unplanned comparisons Control familywise Type I error rate to 0.05: –significance level for each comparison must be below 0.05 Many different tests that try to achieve this Called unplanned (pairwise) multiple comparisons

Tukey’s test Tests every pair of group means: –adjusts  (significance level) so probability of Type I error among all tests < 0.05 Uses Q distribution (studentized range distribution) Uses SE =  (MS Residual /n)

Compares difference between each pair of means to Q*SE: –differences larger than Q*SE significant –differences less than Q*SE non-significant Available in SYSTAT and SPSS

Barnacle example SE =  (MS Residual /n) =  (18.58/5) = 1.93 Q with 16df for 4 groups (from Q table) = 4.05 Q*SE = 7.82 Compare difference between each pair of group means with 7.82

Pairwise comparisons: –alg2 vs scraped = 15.2 (significant) –alg2 vs bare = 13.4 (significant) –alg1 vs scraped = 9.2 (significant) –alg2 vs alg1 = 6.0 (not significant) –alg1 vs bare = 7.4 (not significant) –bare vs scraped = 1.8 (not significant)

Underlines join means not significantly different: ScrapedBareAlg1Alg2

Pairwise t-tests Use SE =  (MS Residual /n) in denominator of test Adjust significance levels of each test with Bonferroni adjustment: –0.05 / no. tests 6 t-tests to compare all pairs of 4 groups: –use  = 0.05/6 = Available in SYSTAT and SPSS

Planned comparisons

Also called contrasts Interesting and logical comparisons of means or combinations of means Planned before data analysis Ideally independent: –therefore only small number of comparisons allowed

Number of independent comparisons < df Groups –e.g. 4 groups, 3 df, maximum 3 independent contrasts Each test can be done at 0.05 –no correction for increased family-wise error rate ????

Methods for planned comparisons

t - tests Usual t-tests to compare 2 means Use standard error based on whole data set, not just two groups being compared –SE =  (MS Residual /n)

Partition variance - ANOVA Partition SS Groups : –SS for each comparison –1 df –test with F-test as part of ANOVA F-test vs t-test: –F = t 2 because each comparison compares 2 groups

Barnacle example Specific comparisons planned as part of barnacle experiment H O : No difference in recruitment between 2 algal species H O :  1 =  2

H O :  1 =  2 or H O :  1 -  2 = 0 Linear combination of means using coefficients (c i ’s): where  c i = 0

H O :  1 =  2 Note  c i = 0: (+1) + (-1) + (0) + (0) = 0 (+10) + (-10) +(0) + (0) = 0

H O : no difference in recruitment between algal and bare surfaces (  1 +  2 )/2 = (  3 +  4 )/2 0.5   2 = 0.5   4 Note  c i = 0

SS for each contrast determined df = 1: –2 groups or combinations of groups being compared SS Contrast = MS Contrast Test with F-test: MS Contrast MS Residual

Partitioning SS Groups SourceSSdfMSFP Groups <0.001 Alg1 vs alg Alg&alg2 vs bare&scraped <0.001 Residual Total

Reject H O : –significant difference between algal types in barnacle recruitment. Reject H O : –significant difference between algal types (1 & 2) and bare substrata (bare & scraped).

Planned comparisons in the literature

Newman (1994) Ecology 75: Effects of changing food levels on size and age at metamorphosis of tadpoles Four treatments used: –low food (n=5), medium food (n=8), high food (n=6), food decreasing from high to low (n=7) H O : no effect of food levels on size of toads at metamorphosis.

Planned comparison of decreasing food vs constant high food: H O : no difference between decreasing food and high food on size of toads at metamorphosis. SourcedfSSFP Food <0.001 High vs decreasing <0.001 Residual

2.Pairwise t-tests with adjustment to  uses Bonferroni adjustment (  /c, where c is number of tests), eg. 4 groups, 6 pairwise comparisons, Bonferroni  /6 = more conservative than Tukey’s, ie. fewer significant results

One factor ANOVAs in the literature

Shapiro et al. (1994) Ecology 75: Spawning time (DV) of female coral reef fish (Thalassoma bifasciatum) of 3 size classes. Null hypothesis: –no difference in mean spawning time between 3 size classes of T. bifasciatum

Female sizenTime (mean ± SE) small medium large ANOVA F = 6.98, P = Reject H O.

Cushman et al. (1994) Ecology 75: Effect of different diets on survival of ants: –plant only, plant+butterfly larvae, plant+artificial ant food –three groups (treatments), 10 replicate vials of ants per treatment. Null hypothesis: –no difference in survival between groups.

SourcedfMSFP Diet Residual Reject H O

Independence of comparisons All pairwise comparisons not independent of each other: –e.g. pairwise comparisons of 3 means –Test 1: mean 1 significantly less than mean 2 –Test 2: mean 2 significantly less than mean 3 –Test 3: compare mean 1 to mean 3 Ho much less likely to be true given tests 1 and 2 P-value difficult to interpret