ANOVA: Analysis of Variance Xuhua Xia

Presentation on theme: "ANOVA: Analysis of Variance Xuhua Xia"— Presentation transcript:

ANOVA: Analysis of Variance Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca

Review of t-test Parametric –Pair-sample t-test: t.test(x1, x2, paired=TRUE) –Unpaired two-sample t-test assuming equal variance: t.test(x1, x2, var.equal=TRUE) when the two variances are not equal (Always do a non-parametric test and use the results of the more sensitive test): t.test(x1, x2) –Consequence of violating the assumption Nonparametric Man-Whitney-Wilcoxon test (Ensure that x is a 'factor'): wilcox.test(y~x,data=myDat,paired=T|F) Test equality of variance var.test(x1,x2) p <- 2*pf(Var small /Var large,DF small,DF large ) Alternative: rank the variables and perform a regular t-test) Equivalent methods in EXCEL Xuhua Xia

Review of Standard Error (SE) Xuhua Xia

Head of the statistics Division at the Rothamsted Experimental Station in Hertfordshire. One of the three founders of theoretical population genetics. Developer of statistical methods, especially the likelihood methods. Published The Genetical Theory of Natural Selection in 1930, in which he proposed the fundamental theory of natural selection. ANOVA was mainly developed by Ronald A. Fisher The F statistic was named after him. “To call in a statistician after the experiment is done may be no more than asking him to perform a postmortem examination; he may be able to say what the experiment died of.” Ronald A. Fisher (1890-1962)

Xuhua Xia x ij =  +  i +  ij vs. x ij =  +  ij One-way ANOVA Model Is this effect zero? This is the same model for t-test, except that the subscript i is 1 and 2 in t-test, but 1, 2,..., n in one-way ANOVA

Xuhua Xia ANOVA Rationale The essence of ANOVA is to partition the total variation into its components. Suppose we have three groups (e.g., Control plus two treatment), each with N 1 =N 2 =N 3 =200 test animals. Given the null hypothesis that all three groups do not differ from each other, i.e., they all represent random samples from the same underlying population, we can estimate the population variance in three ways: –From all 600 animals: Var = Total SS/DF –From individual groups: SS 1 /DF 1, SS 2 /DF 2, SS 3 /DF 3 Var withinGroup = (SS 1 +SS 2 +SS 3 )/(DF 1 +DF 2 +DF 3 ) –From the three group means: M 1, M 2, M 3 and the grand mean M: SE = sqrt{[(M 1 -M) 2 + (M 2 -M) 2 + (M 3 -M) 2 ]/2} Var betweenGroup = SE 2 *200 = [N 1 *(M 1 -M) 2 + N 2 *(M 2 -M) 2 + N 3 (M 3 -M) 2 ]/2 Given the null hypothesis, Var withinGroup = Var betweenGroup. So ANOVA is an F-test of the two variances. In ANOVA termination, Var withinGroup is MS Error and Var betweenGroup is MS Model.

Xuhua Xia Low-fat foodMedium-fat foodHigh-fat food Weight048 gain2610 One-way experimental design

Xuhua Xia Numerical Illustration of One-Way ANOVA Assignment: Repeat the ANOVA computation by first replacing 10 in the High-fat food group by two values 9 and 20. Submit this slide with all updated values. Name: ID:

Xuhua Xia Dependent variable: Weight Gain SourceDFSSMSFp Model 264.032.016.00.0251 Error3 6.0 2.0 Total570.0 ANOVA Table The null hypothesis H0: X1 = X2 = X3 is rejected. The three kinds of food differ significantly in their effect on weight gain of rabbits. In particular, Medium-fat and High-fat foods are significantly better than Low-fat food. However, Medium-fat and High-fat foods do not differ in their effect on rabbit weight gain.

ANOVA and t-test Parametric: –aov(DV~IV1+IV2+… –aov(DV~IV1+IV2+IV1:IV2) or aov(DV~IV1*IV2) –Contrast ANOVA and t-test by using Mercury2Gr_A.txt and Mercury2Gr_B.txt (same data in two different format, one for t.test and one for aov : DarwinPlantBreeding_A.txt and DarwinPlantBreeding_B.txt (Ensure that the variable Speies is a factor Nonparametric: –One-way ANOVA: kruskal.test(DV~IV) –Randomized block design: friedman.test(y~A+B) Others: –summary(fit) print(model.tables(fit,"means"),digits=3) –boxplot(DV~IV) Xuhua Xia

Which of the six strains of clover has the highest protein content? The experimenter divided his field into 5 relatively homogenous blocks each with 6 plots, and randomly assigned his 6 strains to the 6 plots within each block. After harvesting, he determined the nitrogen content for each strain in each plot. Randomized complete blocks 6 3 2 5 4 3 3 1 6 2 6 4 4 1 2 5 5 2 4 1 1 6 2 3 1 4 5 6 3 5

Xuhua Xia Randomized complete blocks Block3dok13dok133dok43dok53dok7compos B13314.419.427.92120.8 B232.614.31725.220.719.4 B332.114.215.824.820.519.1 B42711.811.924.318.817.3 B519.411.69.117.718.616.9 Recode the data into three columns (variables): Yield, Variety and Block, and save it to a text file such as RandCompleteBlock.txt for data analysis in R, e.g., YieldVarietyBlock 333dok1B1 32.63dok1B2 ……

R functions Xuhua Xia md<-read.table("RandCompleteBlock.txt",header=T) attach(md) fit<-aov(Yield~Block+Variety) summary(fit) anova(fit) TukeyHSD(fit) \$Block diff lwr upr p adj B2-B1 -1.216667 -4.773553 2.3402194 0.8415528 B3-B1 -1.666667 -5.223553 1.8902194 0.6333566 B4-B1 -4.233333 -7.790219 -0.6764472 0.0149154 B5-B1 -7.200000 -10.756886 -3.6431139 0.0000569 B3-B2 -0.450000 -4.006886 3.1068861 0.9952717... \$Variety diff lwr upr p adj 3dok13-3dok1 -15.56 -19.65283761 -11.46716239 0.0000000 3dok4-3dok1 -14.18 -18.27283761 -10.08716239 0.0000000 3dok5-3dok1 -4.84 -8.93283761 -0.74716239 0.0148040 3dok7-3dok1 -8.90 -12.99283761 -4.80716239 0.0000160 compos-3dok1 -10.12 -14.21283761 -6.02716239 0.0000024 3dok4-3dok13 1.38 -2.71283761 5.47283761 0.8913398 3dok5-3dok13 10.72 6.62716239 14.81283761 0.0000010 3dok7-3dok13 6.66 2.56716239 10.75283761 0.0006551...

Xuhua Xia Example A researcher needs to assess the effect of 3 drugs on reduce appetite. Appetite reduction is measured by inter-meal interval (in minutes). The half-life of the drugs is about 3 days. Seven human subjects differ in age, gender, appetite, degree of obesity and potentially many other ways. If the researcher randomly allocates these seven subjects into three groups, then some groups may contain young subjects than others or more males than others, etc., so that any group differences would be confounded by potentially many other factors. He decided to use randomized complete block design and administer the drugs on Monday in three consecutive weeks. For each subject, he randomized the three drugs into the three Mondays (top right), took an index of appetite, and obtained the data table (bottom right) Using test subjects as blocks is also called repeated measures ANOVA or within-subject ANOVA Assignment A: analyze the data and report the effect size and the result of the significance test (in short, what you want to include in a manuscript) SubjectDrug 1Drug 2Drug 3 1164152178 2202181222 3143136156 4210194216 5228219245 6173159182 7161157165 SubjectWeek1Week 2Week 3 1Drug2Drug1Drug3 2Drug1Drug3Drug2 3 Drug3Drug1 4Drug3Drug1Drug2 5Drug1Drug2Drug3 6 Drug2Drug1 7Drug2Drug1Drug3

Download ppt "ANOVA: Analysis of Variance Xuhua Xia"

Similar presentations