Presentation is loading. Please wait.

Presentation is loading. Please wait.

ANOVA: Analysis of Variance Xuhua Xia

Similar presentations


Presentation on theme: "ANOVA: Analysis of Variance Xuhua Xia"— Presentation transcript:

1

2 ANOVA: Analysis of Variance Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca

3 Xuhua Xia Head of the statistics Division at the Rothamsted Experimental Station in Hertfordshire. One of the three founders of theoretical population genetics. Developer of statistical methods, especially the likelihood methods. Published The Genetical Theory of Natural Selection in 1930, in which he proposed the fundamental theory of natural selection: “To call in a statistician after the experiment is done may be no more than asking him to perform a postmortem examination; he may be able to say what the experiment died of.” Ronald A. Fisher (1890-1962)

4 Xuhua Xia Analysis of Variance (ANOVA) ANOVA was mainly developed by Ronald A. Fisher The F statistic was named after him. The essence of ANOVA is to partition the total variation into its components. Assumptions –Normality –Equal variance among treatment groups Alternative methods

5 Xuhua Xia x ij =  +  i +  ij vs. x ij =  +  ij One-way ANOVA Model Is this effect zero? This is the same model for t-test, except that the subscript i is 1 and 2 in t-test, but 1, 2,..., n in one-way ANOVA

6 Xuhua Xia t-test and ANOVA

7 Xuhua Xia Variance and Sum of Squares Sum of Squared Deviations Degree of Freedom

8 Xuhua Xia Within-group deviation Between-group deviation Partition of Variance Grand Mean

9 Xuhua Xia Numerical Illustration of One-Way ANOVA 1 5 9 Now repeat the ANOVA computation with the addition of the numbers in red. Email me SS B, SS W, DF num, and DF denom.

10 Xuhua Xia Dependent variable: Weight Gain SourceDFSSMSFp Model 264.032.016.00.0251 Error3 6.0 2.0 Total570.0 ANOVA Table

11 Xuhua Xia Mean 1 s 1 2 Mean 2 s 2 2 s 1 2 /s 2 2 331 321.5 F-distribution 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 00.511.522.533.5 F f 1.4 1.6 0.6... 2.4 3.0 2.6 2.9 Empirical F distribution

12 Xuhua Xia Low-fat foodMedium-fat foodHigh-fat food Weight048 gain2610 The null hypothesis H0: X1 = X2 = X3 is rejected. The three kinds of food differ significantly in their effect on weight gain of rabbits. In particular, Medium-fat and High-fat foods are significantly better than Low-fat food. However, Medium-fat and High-fat foods do not differ in their effect on rabbit weight gain. One-way experimental design

13 Xuhua Xia Assumptions

14 Xuhua Xia 12121212 How should we allocate the two crop varieties to the plots? What comparison would be fair? Block 1 Block 2 Block 3 Block 4 Using blocks to reduce confounding environmental factors (Everything else being equal except for the treatment effect) in evaluating the protein content of two wheat variaties. Paired-sample t-test: 3 21212121 11221122 11221122

15 Xuhua Xia 1 3 2 2 4 3 3 1 4 3 4 1 The three crop varieties are randomly allocated to the plots within each block. Block 1 Block 2 Block 3 Block 4 Using blocks to reduce confounding environmental factors (Everything else being equal except for the treatment effect). Randomized Complete Blocks: Plots 4 1 2 2 1 1 1 2 2 2 3 3 3 4 4 4 1 2 3 4

16 Xuhua Xia Which of the six strains of clover has the highest protein content? The experimenter divided his field into 5 relatively homogenous blocks each with 6 plots, and randomly assigned his 6 strains to the 6 plots within each block. After harvesting, he determined the nitrogen content for each strain in each plot. Randomized complete blocks 3dok1 3dok4 3dok1 3dok4 3dok1 3dok4 3dok5 3dok13 3dok13 3dok7 compo 3dok5 3dok13 3dok5 3dok13 3dok5 3dok7 3dok13 compo 3dok133dok4 Block 1 Block 2 Block 3 Block 4 Block 5 3dok133dok4 If only two strains:

17 Xuhua Xia Data and SAS Program Options ls=75; data clover; input strain $ nitrogen @@; cards; 3dok1 19.4 3dok1 32.6 3dok1 27.0 3dok1 32.1 3dok1 33.0 3dok5 17.7 3dok5 24.8 3dok5 27.9 3dok5 25.2 3dok5 24.3 3dok4 17.0 3dok4 19.4 3dok4 9.1 3dok4 11.9 3dok4 15.8 3dok7 20.7 3dok7 21.0 3dok7 20.5 3dok7 18.8 3dok7 18.6 3dok13 14.3 3dok13 14.4 3dok13 11.8 3dok13 11.6 3dok13 14.2 compos 17.3 compos 19.4 compos 19.1 compos 16.9 compos 20.8 ; proc anova; class strain; model nitrogen=strain; means strain / duncan HOVTEST=LEVENE; run; For multiple comparison and test for heteoscedasticity (Default is LEVENE. Alternatives are BARTLETT, BF, and OBRIEN

18 Xuhua Xia Bartlett’s Test The null hypothesis for the F-test (or variance ratio test): H 0 : v 1 = v 2. The null hypothesis for Bartlett’s or Levene test: H 0 : v 1 = v 2 =... = v n. The formulae in this sheet use defined variables in EXCEL: Insert|name|define

19 Xuhua Xia Do Six Strains of Clover Differ?

20 Xuhua Xia Duncan's Multiple Range Test for variable: NITROGEN NOTE: This test controls the type I comparisonwise error rate, not the experimentwise error rate Alpha= 0.05 df= 24 MSE= 11.78867 Difference spanning Number of Means 2 3 4 5 6 Critical Range 4.482 4.707 4.852 4.954 5.031 Means with the same letter are not significantly different. Duncan Grouping Mean N STRAIN A 28.820 5 3dok1 B 23.980 5 3dok5 C B 19.920 5 3dok7 C D 18.700 5 compos E D 14.640 5 3dok4 E 13.260 5 3dok13 Multiple Comparison Means are arranged in descending order.

21 Xuhua Xia Comparisonwise & Experimentwise Errors Type I comparisonwise error rate is the probability of a Type I error for an individual test of hypothesis, symbolized by  c. Type I experimentwise error rate is the probability of making at least one Type I error for a set of hypothesis tests, symbolized by  e. If  c = 0.05, and N hypotheses are tested, then  e  1 – (1 -  c ) N. For 5 treatments in our case, there are a total of 10 pairwise comparisons between means. Thus,  c = 0.05 would imply  e  0.40. That is, if all means are in fact equal, there is roughly a probability of 0.4 that at least one hypothesis will be incorrectly rejected. If we are to control the experimentwise error rate below 0.05, we can set  e = 0.05:  e  1 – (1 -  c ) N = 1 – (1 -  c ) 10 = 0.05 and solve the equation, which yield  c = 0.005. This of course would increase the difficulty to reject a null hypothesis, even if the null hypothesis is false.

22 Xuhua Xia Control for experimentwise error rate... proc anova; class strain; model nitrogen=strain; means strain / tukey; run; Tukey's Studentized Range (HSD) Test for nitrogen This test controls the Type I experimentwise error rate, but itgenerally has a higher Type II error rate than REGWQ. Alpha 0.05 Error Degrees of Freedom 20 Error Mean Square 4.238667 Critical Value of Studentized Range 4.44524 Minimum Significant Difference 4.0928 Means with the same letter are not significantly different. Tukey Grouping Mean N strain A 28.820 5 3dok1 B 23.980 5 3dok5 C B 19.920 5 3dok7 C D 18.700 5 compos E D 14.640 5 3dok4 E 13.260 5 3dok13

23 Xuhua Xia Taking the Block into consideration Options ls=75; data clover; input strain $ nitrogen Block @@; cards; 3dok1 19.4 5 3dok1 32.6 2 3dok1 27.0 4 3dok1 32.1 3 3dok1 33.0 1 3dok5 17.7 5 3dok5 24.8 3 3dok5 27.9 1 3dok5 25.2 2 3dok5 24.3 4 3dok4 17.0 2 3dok4 19.4 1 3dok4 9.1 5 3dok4 11.9 4 3dok4 15.8 3 3dok7 20.7 2 3dok7 21.0 1 3dok7 20.5 3 3dok7 18.8 4 3dok7 18.6 5 3dok13 14.3 2 3dok13 14.4 1 3dok13 11.8 4 3dok13 11.6 5 3dok13 14.2 3 compos 17.3 4 compos 19.4 2 compos 19.1 3 compos 16.9 5 compos 20.8 1 ; proc anova; class strain Block; model nitrogen=strain Block; means strain / duncan; run;

24 Xuhua Xia SAS output: I Dependent Variable: nitrogen Sum of Source DF Squares Mean Square F Value Pr > F Model 9 1045.201333 116.133481 27.40 <.0001 Error 20 84.773333 4.238667 Corrected Total 29 1129.974667 R-Square Coeff Var Root MSE nitrogen Mean 0.924978 10.35268 2.058802 19.88667 Source DF Anova SS Mean Square F Value Pr > F strain 5 847.0466667 169.4093333 39.97 <.0001 Block 4 198.1546667 49.5386667 11.69 <.0001

25 Xuhua Xia Duncan's Multiple Range Test for nitrogen NOTE: This test controls the Type I comparisonwise error rate, not the experimentwise error rate. Alpha = 0.05, DFE = 20 MSE = 4.238667 Number of Means 2 3 4 5 6 Critical Range 2.716 2.851 2.937 2.997 3.041 Means with the same letter are not significantly different. Duncan Grouping Mean N strain A 28.820 5 3dok1 B 23.980 5 3dok5 C 19.920 5 3dok7 C 18.700 5 compos D 14.640 5 3dok4 D 13.260 5 3dok13 Multiple Comparison

26 Xuhua Xia Ex. ANOVA with repeated measures What is the treatment effect? What is the block? Analyze the data with SAS. Write a concise 1-page report. Submit at the beginning of the next class in hardcopy.

27 Xuhua Xia Fresh foodRancid food Male 695.67 535.33 Female 642.67 517.33 Food 709, 679, 699 592, 538, 476 Consumed 657, 594, 677508, 505, 539 Testing the effect of food and sex on rabbit food consumption Two-way experimental design

28 Xuhua Xia Dependent Variable: CONSUMED Sum of Mean Source DF Squares Square F Value Pr > F Model 3 65903.5833 21967.8611 15.06 0.0012 Error 8 11666.6667 1458.3333 Corrected Total 11 77570.2500 R-Square C.V. Root MSE CONSUMED Mean 0.849599 6.388646 38.1881 597.750 Source DF Anova SS Mean Square F Value Pr > F FOOD 1 61204.0833 61204.0833 41.97 0.0002 SEX 1 3780.7500 3780.7500 2.59 0.1460 FOOD*SEX 1 918.7500 918.7500 0.63 0.4503 What is the interaction effect?

29 Xuhua Xia What is Interaction? When the effect of FOOD is independent of SEX, e.g., when fresh food is preferred by both males and females to the same extent, then there is no interaction term. When the effect of FOOD depends on SEX, e.g., when males eat more fresh food than rancid food but females eat less rancid food than fresh food, then there is an interaction effect. 0 200 400 600 800 1000 1200 1400 1600 MaleFemale Sex Consumption 500 550 600 650 700 MaleFemale Sex Consumption Fresh Rancid Fresh Rancid

30 Xuhua Xia Fresh foodRancid food Male 568.67 695.67 Female 642.67517.33 Food 592, 538, 576 709, 679, 699 Consumed 657, 594, 677508, 505, 539 Interaction Effect: Example

31 Xuhua Xia Significant Interaction Dependent Variable: CONSUMED Sum of Mean Source DF Squares Square F Value Pr > F Model 3 55920.2500 18640.0833 23.06 0.0003 Error 8 6466.6667 808.3333 Total 11 62386.9167 R-Square C.V. Root MSE CONSUMED Mean 0.896346 4.690973 28.4312 606.083 Source DF Anova SS Mean Square F Value Pr > F FOOD 1 47754.0833 47754.0833 59.08 0.0001 SEX 1 2.0833 2.0833 0.00 0.9608 FOOD*SEX 1 8164.0833 8164.0833 10.10 0.0130 Can we conclude that SEX has no effect on food consumption?

32 Xuhua Xia proc format; value sexLevel 1='male' 2='female'; value foodLevel 1='fresh' 2='rancid'; data assign63; do food=1 to 2; do sex=1 to 2; do n=1 to 3; input Consumed @@; output; end; format sex sexLevel. food foodLevel.; cards; 709 679 699 657 594 677 592 538 476 508 505 539 ; proc anova; class food sex; model Consumed=food|sex; means food / duncan; run; SAS Program for two-way ANOVA Ex. 1. Rewrite the “data” block of the SAS program by using: data assign63; input food sex consumed; cards;...... ; 2. Run the resulting program to check if the rewriting is correct.

33 Xuhua Xia RaceSexFreshRancid Short-earMale 647.5515.5 Female 611500.5 Long-earMale 706594.5 Female 652.5548 Short-earMale650, 645511, 520 Female610, 612500, 501 Long-earMale700, 712601, 588 Female650, 655550, 546 Three-way ANOVA

34 Xuhua Xia SAS Program proc format; value sex 1='male' 2='female'; value food 1='fresh' 2='rancid'; value race 1='short-ear' 2='long-ear'; format sex sex. food food. race race.; data assign71; input race sex food Consumed; cards; 1 1 1 650 1 1 1 645 1 1 2 511 1 1 2 520 1 2 1 610 1 2 1 612 1 2 2 500 1 2 2 501 2 1 1 700 2 1 1 712 2 1 2 601 2 1 2 588 2 2 1 650 2 2 1 655 2 2 2 550 2 2 2 546 ; proc anova; class food sex race; model Consumed=food|sex|race; Optional, but will increase clarity in the output Need to be in a new line, i.e., not 2 2 2 546;

35 Xuhua Xia Dependent Variable: CONSUMED Sum of Mean Source DF Squares Square F Value Pr > F Model 7 72138.4375 10305.4911 354.60 0.0001 Error 8 232.5000 29.0625 Corrected Total 15 72370.9375 R-Square C.V. Root MSE CONSUMED Mean 0.996787 0.903104 5.39096 596.938 Source DF Anova SS Mean Square F Value Pr > F FOOD 1 52555.5625 52555.5625 1808.36 0.0001 SEX 1 5738.0625 5738.0625 197.44 0.0001 FOOD*SEX 1 203.0625 203.0625 6.99 0.0296 RACE 1 12825.5625 12825.5625 441.31 0.0001 FOOD*RACE 1 175.5625 175.5625 6.04 0.0395 SEX*RACE 1 588.0625 588.0625 20.23 0.0020 FOOD*SEX*RACE 1 52.5625 52.5625 1.81 0.2156 ANOVA Table

36 Xuhua Xia data assign71; do race=1 to 2; do sex=1 to 2; do food=1 to 2; do n=1 to 2; input Consumed @@; output; end; cards; 650 645 511 520 610 612 500 501 700 712 601 588 650 655 550 546 ; proc anova; class food sex race; model Consumed=food|sex|race; run; data assign71; input race sex food Consumed; cards; 1 1 1 650 1 1 1 645 1 1 2 511 1 1 2 520 1 2 1 610 1 2 1 612 1 2 2 500 1 2 2 501 2 1 1 700 2 1 1 712 2 1 2 601 2 1 2 588 2 2 1 650 2 2 1 655 2 2 2 550 2 2 2 546 ; SAS program listing

37 Xuhua Xia ClassNMean Members of Royal family9764.04 Clergy94569.49 Lawyers29468.14 Medical Profession24467.31 English aristocracy117967.31 Gentry163270.22 Trade and commerce51368.74 Officers in the Royal Navy36668.40 English literature and science39567.55 Officers of the Army56967.07 Fine arts23965.96 The Efficacy of Prayer Other data collected by Galton: 1. Rate of successful delivery between church-going parents and others 2. Life span of believers and non-believers from insurance companies Galton’s data could be analyzed by an one-way ANOVA. One criterion for a good ANOVA design is that everything else being equal except for the treatment effect. Does the data set above satisfy this criterion? (1822-1911)

38 Xuhua Xia Replicate 1213 2675 Metabolic rate in rabbit liver cells, taken for two samples of liver tissue Model I and Model II ANOVA How can we optimize the experiment? More rabbits or more replicates?

39 Xuhua Xia 3.28 3.52 2.88 3.09 3.48 2.80 2.46 1.87 2.19 2.44 1.92 2.19 2.77 3.74 2.55 2.66 3.44 2.55 3.78 4.07 3.31 3.87 4.12 3.32 Determining Calcium Content in Leaves

40 Xuhua Xia SAS Program data turnip; Input plant leaf calcium @@; cards; 1 1 3.28 1 1 3.09 1 2 3.52 1 2 3.48 1 3 2.88 1 3 2.80 2 1 2.46 2 1 2.44 2 2 1.87 2 2 1.92 2 3 2.19 2 3 2.19 3 1 2.77 3 1 2.66 3 2 3.74 3 2 3.44 3 3 2.55 3 3 2.55 4 1 3.78 4 1 3.87 4 2 4.07 4 2 4.12 4 3 3.31 4 3 3.31 ; proc nested; class plant leaf; var calcium; run; proc glm; class plant leaf; model calcium=plant leaf(plant); run;

41 Xuhua Xia SAS Output: NESTED Nested Random Effects Analysis of Variance for Variable CALCIUM Variance DF Sum of Error Source Squares F Value Pr > F Term TOTAL 23 10.270396 PLANT 3 7.560346 7.665 0.0097 LEAF LEAF 8 2.630200 49.409 0.0000 ERROR ERROR 12 0.079850 Variance Variance Percent Source Mean Square Component of Total TOTAL 0.446539 0.532938 100.0000 PLANT 2.520115 0.365223 68.5302 LEAF 0.328775 0.161060 30.2212 ERROR 0.006654 0.006654 1.2486 Mean 3.01208333 Standard error of mean 0.32404445

42 Xuhua Xia SAS Output: GLM Dependent Variable: calcium Sum of Source DF Squares Mean Square F Value Pr > F Model 11 10.19054583 0.92641326 139.22 <.0001 Error 12 0.07985000 0.00665417 Corrected Total 23 10.27039583 R-Square Coeff Var Root MSE calcium Mean 0.992225 2.708195 0.081573 3.012083 Source DF Type I SS Mean Square F Value Pr > F plant 3 7.56034583 2.52011528 378.73 <.0001 leaf(plant) 8 2.63020000 0.32877500 49.41 <.0001 Source DF Type III SS Mean Square F Value Pr > F plant 3 7.56034583 2.52011528 378.73 <.0001 leaf(plant) 8 2.63020000 0.32877500 49.41 <.0001


Download ppt "ANOVA: Analysis of Variance Xuhua Xia"

Similar presentations


Ads by Google