Basics of ANOVA Why ANOVA Assumptions used in ANOVA

Slides:



Advertisements
Similar presentations
CHAPTER 2 Building Empirical Model. Basic Statistical Concepts Consider this situation: The tension bond strength of portland cement mortar is an important.
Advertisements

1 Chapter 4 Experiments with Blocking Factors The Randomized Complete Block Design Nuisance factor: a design factor that probably has an effect.
Chapter 4 Randomized Blocks, Latin Squares, and Related Designs
Design of Experiments and Analysis of Variance
Chapter 13 Multiple Regression
Maximum likelihood (ML) and likelihood ratio (LR) test
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Basics of ANOVA Why ANOVA Assumptions used in ANOVA Various forms of ANOVA Simple ANOVA tables Interpretation of values in the table Exercises.
Elementary hypothesis testing
Basics of ANOVA Why ANOVA Assumptions used in ANOVA
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Elementary hypothesis testing
Generalised linear models
Maximum likelihood (ML) and likelihood ratio (LR) test
Lesson #23 Analysis of Variance. In Analysis of Variance (ANOVA), we have: H 0 :  1 =  2 =  3 = … =  k H 1 : at least one  i does not equal the others.
Chapter 3 Analysis of Variance
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
PSY 307 – Statistics for the Behavioral Sciences
Intro to Statistics for the Behavioral Sciences PSYC 1900
Design of experiment and ANOVA
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Lecture 9: One Way ANOVA Between Subjects
Log-linear and logistic models
Generalised linear models Generalised linear model Exponential family Example: Log-linear model - Poisson distribution Example: logistic model- Binomial.
13-1 Designing Engineering Experiments Every experiment involves a sequence of activities: Conjecture – the original hypothesis that motivates the.
Design of experiment: ANOVA and testing hypotheses
Design of experiment I Motivations Factorial (crossed) design
Stat Today: Multiple comparisons, diagnostic checking, an example After these notes, we will have looked at (skip figures 1.2 and 1.3, last.
Linear and generalised linear models
Inferences About Process Quality
Ch. 14: The Multiple Regression Model building
Linear and generalised linear models
Basics of regression analysis
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
5-3 Inference on the Means of Two Populations, Variances Unknown
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
1 Advances in Statistics Or, what you might find if you picked up a current issue of a Biological Journal.
F-Test ( ANOVA ) & Two-Way ANOVA
Basics of ANOVA Why ANOVA Assumptions used in ANOVA Various forms of ANOVA Simple ANOVA tables Interpretation of values in the table Exercises.
MANOVA Multivariate Analysis of Variance. One way Analysis of Variance (ANOVA) Comparing k Populations.
© Copyright McGraw-Hill CHAPTER 12 Analysis of Variance (ANOVA)
PSY 307 – Statistics for the Behavioral Sciences Chapter 16 – One-Factor Analysis of Variance (ANOVA)
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
1 Chapter 3 Multiple Linear Regression Multiple Regression Models Suppose that the yield in pounds of conversion in a chemical process depends.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
INTRODUCTION TO ANALYSIS OF VARIANCE (ANOVA). COURSE CONTENT WHAT IS ANOVA DIFFERENT TYPES OF ANOVA ANOVA THEORY WORKED EXAMPLE IN EXCEL –GENERATING THE.
Orthogonal Linear Contrasts This is a technique for partitioning ANOVA sum of squares into individual degrees of freedom.
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
One-way ANOVA: - Comparing the means IPS chapter 12.2 © 2006 W.H. Freeman and Company.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
Analysis of RT distributions with R Emil Ratko-Dehnert WS 2010/ 2011 Session 07 –
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
The p-value approach to Hypothesis Testing
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
While you wait: Enter the following in your calculator. Find the mean and sample variation of each group. Bluman, Chapter 121.
Chapter 14: Analysis of Variance One-way ANOVA Lecture 9a Instructor: Naveen Abedin Date: 24 th November 2015.
F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE.
Comparing Multiple Groups:
CHAPTER 7 Linear Correlation & Regression Methods
Hypothesis testing using contrasts
Comparing Multiple Groups: Analysis of Variance ANOVA (1-way)
Chapter 10 – Part II Analysis of Variance
Presentation transcript:

Basics of ANOVA Why ANOVA Assumptions used in ANOVA Various forms of ANOVA Simple ANOVA tables Interpretation of values in the table R commands for ANOVA Exercises

Why ANOVA If we have two samples then under mild conditions we can use t-test to test if difference between means is significant. When there are more than two sample then using t-test might become unreliable. ANalysis Of VAriances – ANOVA is designed to test differences between means in many sample cases. Examples of ANOVA: Suppose that we want to test effect of various exercises on weight loss. We want to test 5 different exercises. We recruit 20 men and assign for each exercises four of them. After few weeks we record weight loss. Let us denote i=1,2,3,4,5 as exercise number and j=1,2,3,4 person’s number. Then Yij is weight loss for jth person on the ith exercise programme. It is one-way balanced ANOVA. One way because we have only one category (exercise programme). Balanced because we have exactly same number of men on each exercise programme. Another example: Now we want to subdivide each exercises into 4 subcategories. For each subcategory of the exercise we recruit four men. We measure weight loss after few weeks. i – exercise category j – exercise subcategory k – kth men. Then Yijk is weight loss for kth men in the jth subcategory of ith category. Number of observations is 5x4x4 = 80. It is two-fold nested ANOVA. We want to test: a) There is no significant differences between categories; b) there is no significant difference between different subcategories

Examples of ANOVA One more example: We have 5 categories of exercises and 4 categories of diets. We hire for each exercise and category 4 persons. There will be 5x4x4=80 men. It is two way crossed ANOVA. Two-way because we have categorised men in two ways: exercises and diets. This model is also balanced: we have exactly same number of men for each exercise-diet. i – exercise number j – diet number k – kth person Yijk – kth person in the ith exercise and jth diet. In this case we can have two different types of hypothesis testing. Assume that mean for each exercise-diet combination is ij. If we assume that model is additive, i.e. effects of exercise and diet add up then we have: ij = i+j. i is the effect of ith exercise and j is the effect of diet. Then we want to test following hypotheses: a) ij does not depend on exercise and b) ij does not depend on diet. Sometimes we do not want to assume additivity. Then we want to test one more hypothesis: model is additive. If model is not additive then there might be some problems of interpretations with other hypotheses. In this case it might be useful to use transformation to make the model additive. Models used for ANOVA can be made more and more complicated. We can design three, four ways crossed models or nested models. We can combine nested and crossed models together. Number of possible ANOVA models is very large.

Assumptions ANOVA models are special cases of the linear models. We can write the model as: Where Y is the observation vector,  -is vector of the means composed of the treatment means and  is the error vector. Basic assumptions in ANOVA models are: Expected values of the errors are 0 Variance of all errors are equal to each other Errors are independent Errors are normally distributed All ANOVA treatments are very sensitive to assumptions 1)-3). F-tests meant to be robust against the assumption 4). If assumptions 1)-3) are valid then 4) will always be valid at least asymptotically. I.e. for large number of the observations.

ANOVA tables Standard ANOVA tables look like Where v1,,,vp are values we want to test if they are 0. df is degrees of freedom corresponding to this value. SSh is sum of the squares corresponding to this value (h denotes hypothesis). F is F-value used for F distribution. Its degrees of freedom is (di,de). Prob is corresponding probability. If probability is very low then we reject hypothesis that this value is 0. If the value for prob is small enough then we not reject null-hypothesis. These values are calculated using likelihood ratio test. Let us say we want to test hypothesis: H0: vi=0 vs H1:vi0 Then we maximise likelihood under null hypothesis find corresponding variance then we maximise the likelihood under alternative hypothesis and find corresponding variance. Then we calculate sum of the squares for null and alternative hypotheses and find F-statistics effect df SSh MS F prob v1 d1 SS1 MS1=SS1/d1 MS1/MSe pr1 ... … vp dp SSp MSp=SSp/dp MSp/MSe prp error de SSe MSe=SSe/de total N SSt

LR test for ANOVA Suppose variances are: Then mean sum of the squares for the null and alternative hypotheses as: Since first sum of the squares is 2 with degrees of freedom dfh and the second sum of squares is 2 with degrees of freedom dfe and they are independent then their ratio has F-distribution with degrees of freedom (dfh,dfe). Degrees of freedom of hypothesis is found using number of elements in the category-1 in the simplest case. Using this type of ANOVA tables we can only tell if there is significant differences between means. It does not tell which one is significantly different. This ratio has F distribution if null-hypothesis is true. Otherwise it has non-central F-distribution. Degree of freedom of hypothesis is defined by number of constraints it implies. Degree of freedom of error is as usual number of observations minus number of parameters

Example: Two way ANOVA Let us consider an example taken from Box, Hunter and Hunter. Experiment was done on animals. Survival times of the animals for various poisons and treatment was tested. Table is: treatment A B C D poisons I 0.31 0.82 0.43 0.45 0.45 1.10 0.45 0.71 0.46 0.88 0.63 0.66 0.43 0.72 0.76 0.62 II 0.36 0.92 0.44 0.56 0.29 0.61 0.35 1.02 0.40 0.49 0.31 0.71 0.23 1.24 0.40 0.38 III 0.22 0.30 0.23 0.30 0.21 0.37 0.25 0.36 0.18 0.38 0.24 0.31 0.23 0.29 0.22 0.33

ANOVA table ANOVA table produced by R: Df Sum Sq Mean Sq F value Pr(>F) pois 2 1.03828 0.51914 22.5135 4.551e-07 *** treat 3 0.92569 0.30856 13.3814 5.057e-06 *** pois:treat 6 0.25580 0.04263 1.8489 0.1170 Residuals 36 0.83013 0.02306 Most important values are F and Pr(>F). In this table we have tests for pois. and treat. Moreover we have “interaction” between these two categories. Interaction means that it would be difficult to separate effects of these two categories. They should be considered simultaneously. Pr. for interaction is not very small and it is not large enough to discard interaction effects. In these situations transformation of the variables might help. Let us consider ANOVA table for the transformed observations. Let us use transformation 1/y. Now ANOVA table looks like: Df Sum Sq Mean Sq F value Pr(>F) pois 2 34.903 17.452 72.2347 2.501e-13 *** treat 3 20.449 6.816 28.2131 1.457e-09 *** pois:treat 6 1.579 0.263 1.0892 0.3874 Residuals 36 8.697 0.242

ANOVA table According to this table Pr. corresponding to the interaction term is high. It means that interaction for the transformed variables is not significant. We could reject interaction terms. We can build the ANOVA table without the interactions. It will look like: Df Sum Sq Mean Sq F value Pr(>F) pois 2 34.903 17.452 71.326 3.124e-14 *** treat 3 20.449 6.816 27.858 4.456e-10 *** Residuals 42 10.276 0.245 Now we can say that there is significant differences between poisons as well as treatments. Sometimes it is wise to use transformation to reduce effect of interactions. For this several different transformations (inverse, inverse square, log) could be used. For each of them ANOVA tables could be built. Then by inspection you can decide which transformation gives better results. Following argument could be used to justify transformation. If effects of two different categories is multiplicative then log of them will have additive effect. It is easier to interpret additive effects than others.

R commands for ANOVA There are basically two type of commands in R. First is to fit general linear model and second is analyse results. Command to fit linear model is lm and is used lm(data~formula) Formula defines design matrix. See help for formula. For example for PlantGrowth data (available in R) we can use data(PlantGrowth) - load data into R from standard package lmPlant = lm(PlantGrowth$weight~PlantGrowth$group) Then linear model will be fitted into data and result will be stored in lmPlant Now we can analyse them anova(lmPlant) will give ANOVA table. If there are more than one factor (category) then for two-way crossed we can use lm(data~f1*f2) - It will fit complete model with interactions lm(data~f1+f2) - It will fit only additive model lm(data~f1+f1:f2) - It will fit f1 and interaction between f1 and f2. It is used for nested models. Other useful commands for linear model and analysis are summary(lmPlant) – give summary after fitting plot(lmPlant) - plot several useful plots Please let me know if any of the results is not clear then we can discuss and try sort out the problems.

R commands for ANOVA Another useful command for ANOVA is confint(lmPlant) This command gives confidence intervals for some of the coefficients and therefore differences between effects of different factors. To find confidence intervals between any two given effects one can use bootstrap.

Bootstrap for ANOVA Algorithm for bootstrap: Use lm to fit the model into the data Resample residuals and add to the fitted values of observations Use lm to fit model into the data Save coefficients Repeat steps 2-4 (for around 200-2000 times) Build distributions for each coefficients and other statistics of interest Then for confidence intervals one can use: lm1 = sort(lmBoot$coefficients[2,]) l=length(lm1) lmlow = lm1(round(lm1[0.025*l])) lmhigh = lm1(round(lm1(0.975*l])) It will give lower and upper limits of 95% confidence intervals example of implementation is: http://www.ysbl.york.ac.uk/~garib/mres_course/2005/boot_anova.r

Exercise 2 Analyse these data using ANOVA http://www.ysbl.york.ac.uk/~garib/mres_course/2005/exercise_2.html What do you think about the differences. What do you think about differences?

References Stuart, A., Ord, KJ, Arnold, S (1999) Kendall’s advanced theory of statistics, Volume 2A Box, GEP, Hunter, WG, Hunter, JS (1978) Statistics for experimenters