Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 1 Single classification analysis of variance.

Similar presentations


Presentation on theme: "University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 1 Single classification analysis of variance."— Presentation transcript:

1 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 1 Single classification analysis of variance (ANOVA) When to use ANOVA ANOVA models and partitioning sums of squares ANOVA: hypothesis testing ANOVA: assumptions A non-parametric alternative: Kruskal-Wallis ANOVA Power analysis in single classification ANOVA

2 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 2 When to use ANOVA Tests for effect of “discrete” independent variables. Each independent variable is called a factor, and each factor may have two or more levels or treatments (e.g. crop yields with nitrogen (N) or nitrogen and phosphorous (N + P) added). ANOVA tests whether all group means are the same. Use when number of levels (groups) is greater than two. Control Experimental (N) Experimental (N+P) Yield CC NN  N+P Frequency

3 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 3 Why not use multiple 2-sample tests? For k comparisons, the probability of accepting a true H 0 for all k is (1 -  ) k. For 4 means, (1 -  ) k = (0.95) 6 =.735. So  (for all comparisons) = 0.265. So, when comparing the means of four samples from the same population, we would expect to detect significant differences among at least one pair 27% of the time. Yield CC NN  N+P Control Experimental (N) Experimental (N+P) c:Nc:N  N :  N+P  C :  N+P Frequency

4 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 4 What ANOVA does/doesn’t do Tells us whether all group means are equal (at a specified  level)......but if we reject H 0, the ANOVA does not tell us which pairs of means are different from one another. Control Experimental (N) Experimental (N+ P) Frequency CC NN  N+P Yield Frequency CC NN  N+P

5 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 5 Model I ANOVA: effects of temperature on trout growth 3 treatments determined (set) by investigator. Dependent variable is growth rate ( ), factor (X) is temperature. Since X is controlled, we can estimate the effect of a unit increase in X (temperature) on  the effect size ... …and can predict  at other temperatures. Water temperature (°C) 16202428 0.00 0.04 0.08 0.12 0.16 0.20 Growth rate (cm/day)

6 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 6 Model II ANOVA: geographical variation in body size of black bears 3 locations (groups) sampled from set of possible locations. Dependent variable is body size, factor (X) is location. Even if locations differ, we have no idea what factors are controlling this variability... …so we cannot predict body size  at other locations. Body size (kg) 120 160 200 240 280 Riding Mountain Kluane Algonquin

7 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 7 Model differences In Model I, the putative causal factor(s) can be manipulated by the experimenter, whereas in Model II they cannot. In Model I, we can estimate the magnitude of treatment effects and make predictions, whereas in Model II we can do neither. In one-way (but NOT multi-way!) ANOVA, calculations are identical for both models.

8 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 8 How is it done? And why call it ANOVA? In ANOVA, the total variance in the dependent variable is partitioned into two components: –among-groups: variance of means of different groups (treatments) –within-groups (error): variance of individual observations within groups around the mean of the group

9 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 9 Statistical analysis as model building All statistical analyses begin with a mathematical model that supposedly “describes” the data, e.g., regression, ANOVA. “Model fitting” is then the process by which model parameters are estimated. X Y Y 22 22   42 Group 1 Group 2 Group 3 Linear regression ANOVA

10 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 10 Least squares estimation (LSE) An ordinary least squares (OLS) estimate of a model parameter  is that which minimizes the sum of squared differences between observed and predicted values: Predicted values are derived from some model whose parameters we wish to estimate OLS  SS R

11 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 11 Example: LSE of model parameters in simple linear regression Data consists of a set of n paired observations (x 1, y 1 ), …, (x n y n ) The “model” for the I th observation is: What is the LSE of the model parameters  and  ? X Y ii Residual:

12 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 12 The general model is: ANOVA algorithms fit the above model (by ordinary least squares) to estimate the  i ’s. H 0 : all  i ’s = 0 The general ANOVA model Group 1 Group 2 Group 3 Group Y               Y 22 22   42

13 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 13 Partitioning the total sums of squares Group 1 Group 2 Group 3 Y  Total SSModel (Groups) SSError SS   

14 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 14 The ANOVA table Source of Variation Sum of Squares Mean Square Degrees of freedom (df) F Total Error n - 1 n - k SS/df Groupsk - 1SS/df MS groups MS error

15 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 15 Variance components and group means MS groups measures average squared difference among group means. MS error is a measure of precision. Control Experimental (N) Experimental (N+ P) Yield Frequency CC NN  N+P Frequency CC NN  N+P

16 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 16 ANOVA: the null hypothesis H 0 : all group means are the same, or... H 0 : all group effects (  i ) are zero, or... H 0 : F = MS groups / MS error = 1 For k groups and N observations, compare with F distribution at desired  level with k - 1 and N - k degrees of freedom. Control Experimental (N) Experimental (N+ P) Yield Frequency CC NN  N+P Frequency CC NN  N+P

17 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 17 Lab example: temporal variation in size of sturgeon (Model II ANOVA) Prediction: dam construction resulted in loss of large sturgeon Test: compare sturgeon size before and after dam construction H 0 : mean size is the same for all years (?) Dam construction

18 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 18 *** Analysis of Variance Model *** Short Output: Call: aov(formula = FKLNGTH ~ YEAR, data = Dam10dat, na.action = na.exclude) Terms: YEAR Residuals Sum of Squares 485.264 3095.295 Deg. of Freedom 3 114 Residual standard error: 5.210731 Estimated effects may be unbalanced Type III Sum of Squares Df Sum of Sq Mean Sq F Value Pr(F) YEAR 3 485.264 161.7547 5.95744 0.0008246026 Residuals 114 3095.295 27.1517 Temporal variation in size of sturgeon (ANOVA results) Conclusion: reject H 0

19 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 19 ANOVA assumptions Residuals are independent of one another. Residuals are normally distributed. Variance of residuals within groups is the same for all groups (homoscedasticity). Note: all assumptions apply to the residuals, not the raw data. Since all assumptions apply to the residuals, not the raw data… …all tests of assumptions are done after the analysis is completed (and residuals have been generated).

20 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 20 The general ANOVA model The general model is: … so the predicted value of all observations in the ith group is The difference between the predicted value for an observation and the observed value is its residual. Group 1 Group 2 Group 3 Group Y               Y 22 22   42

21 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 21 Why does observations need to be independent? If observations are not independent, then the true degrees of freedom is less (sometimes much less) than the calculated degrees of freedom … … the distribution used to calculate p will be wrong … … and p will be smaller than it ought to be. -3-2 0123 Probability t calcuated df true df Calculated t

22 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 22 Checking independence of observations (residuals) Does the experimental design suggest that sampling units may not be independent (e.g. spatiotemporal correlation?) Do autocorrelation plots to check for serial autocorrelation.

23 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 23 Testing normality of residuals Generate normal probability plot of residuals and check for linearity. If warranted, run Lilliefors test, keeping in mind the power issue! Outliers?

24 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 24 Testing homoscedasticity I: plotting residuals against estimates Does “spread” of residuals appear the same for each group? Outlier? 87 23 59

25 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 25 Testing homoscedasticity II: Levene’s test Calculate mean absolute residual for each group. Does this value vary among groups?

26 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 26 Testing homoscedasticity II: Levene’s test (cont’d) *** Analysis of Variance Model *** Short Output: Call: aov(formula = absres ~ YEAR, data = Dam10dat, na.action = na.exclude) Terms: YEAR Residuals Sum of Squares 108.083 1413.692 Deg. of Freedom 3 114 Residual standard error: 3.521478 Estimated effects may be unbalanced Df Sum of Sq Mean Sq F Value Pr(F) YEAR 3 108.083 36.02755 2.905257 0.03782483 Residuals 114 1413.692 12.40081

27 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 27 Effects of violations of assumptions Calculation of p assumes p(F) = p(F*) … but as residuals conform less to required assumptions, the deviation between the two increases. Therefore, calculated p values are incorrect. F, low conformity F  high conformity True F (F*) F 4 1 23 50 0.0 0.2 0.4 0.6 0.8 1.0 Probability

28 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 28 Robustness of ANOVA with respect to violation of assumptions

29 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 29 Residual analysis: questions Which assumptions are not met, and how robust is ANOVA to their violation? What is the sample size? Is the violation of assumptions due to a couple of outliers? How close is p to  Eliminate outliers and rerun analysis. Transform data. Try a non-parametric alternative (generally recommended if sample sizes are small, i.e. < 10 per group) such as Kruskal-Wallis ANOVA.

30 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 30 A non-parametric alternative: Kruskal-Wallis ANOVA Calculate rank sum (R g ) for each group. H 0 : R C = R 1 = R 2 Calculate K-W H statistic: … which is distributed as  2 with k-1 df if N for each group is not too small, otherwise use critical values for H.

31 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 31 Power and sample size in single- classification ANOVA If H 0 is true, then variance ratio MS groups /MS error follows central F distribution. But, if H 0 is false, then MS groups /MS error follows non-central F, defined by 1, 2 and non- centrality parameter  So, power calculations depend on non-central F. Control Experimental (N) Experimental (N+P) Yield CC NN  N+P Frequency

32 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 32 Power and sample size in single- classification ANOVA Power of a test involving k groups with n replicates per group at specified  when (1) group means are known; (2) minimal detectable distance is specified. estimation of minimum sample size and minimal detectable difference among groups Control Experimental (N) Experimental (N+P) Yield CC NN  N+P Frequency

33 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 33 Power and sample size in single-classification ANOVA ANOVA with k groups with n replicates per group at specified . If we have an estimate of the within-group variability s 2 (MS error ), we can calculate  : Control Experimental (N) Experimental (N+P) Yield CC NN  N+P Frequency

34 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 34 Calculating power given  Given 1, 2,  and , we can read 1-  from suitable tables or curves (e.g. Zar (1996), Appendix Figure B.1). 1-  Decreasing 2 1 = 2  =.05 2345  =.01 11.522.5  =.01)  =.05)

35 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 35 Model I ANOVA: minimal detectable difference Suppose we want to detect a difference between the two most different sample means of at least . To test at the  significance level with 1 -  power, we can calculate the minimal sample size n min required to detect , given a sample group variance s 2 by solving iteratively.  Yield Frequency CC NN  N+P

36 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 36 Model I ANOVA: power of the test If H 0 is accepted, it is good practice to calculate power! Knowing MS groups, s 2 (= MS error ), and k, we can calculate . Yield Frequency CC NN  N+P

37 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 37 Power of the test: an example Effect of temperature on insect development time 4 eggs each at two temperatures, 5 at the third (k = 3, n 1 = n 2 = 4, n 3 = 5) So, there is a 67% chance of committing a Type II error.

38 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 38 Factors determining power in single classification ANOVA Power increases with increasing  Therefore, power increases with (1) increasing sample size n; (2) increasing differences among group means (MS groups ); (3) decreasing number of groups; (4) decreasing within-group variability s 2 (MS error ).

39 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 39 Power in single- classification Model II ANOVA In this case, we can calculate 1-  from central F: Knowing 1, 2,  and MS groups, we can estimate 1 - . Body size (kg) 120 160 200 240 280 Riding Mountain Kluane Algonquin

40 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 40 Power in non-parametric single- classification ANOVA If assumptions of parametric ANOVA are met, then non-parametric ANOVA is 3/  = 95% as powerful. If non-parametric ANOVA is used, calculate power for parametric ANOVA to get a rough estimate of power of non-parametric test.

41 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 41 Power with G*Power

42 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 42 Concepts map


Download ppt "University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 1 Single classification analysis of variance."

Similar presentations


Ads by Google