# ANOVA and Linear Models. Data Data is from the University of York project on variation in British liquids. Data is from the University of York project.

## Presentation on theme: "ANOVA and Linear Models. Data Data is from the University of York project on variation in British liquids. Data is from the University of York project."— Presentation transcript:

ANOVA and Linear Models

Data Data is from the University of York project on variation in British liquids. Data is from the University of York project on variation in British liquids. JK Local, Alan Wrench, Paul Carter JK Local, Alan Wrench, Paul Carter

Correlation When we have two variables we can measure the strength of the linear association by correlation When we have two variables we can measure the strength of the linear association by correlation Correlation in a strict technical statistical sense is the linear relationship between two variables. Correlation in a strict technical statistical sense is the linear relationship between two variables.

Correlation Many times we are not interested in the differences between two groups, but instead the relationship between two variables on the same set of subjects. Many times we are not interested in the differences between two groups, but instead the relationship between two variables on the same set of subjects. Ex: Are post-graduate salary and gpa related? Ex: Are post-graduate salary and gpa related? Ex: Is the F1.0 measurement related to the F1.1 measurement? Ex: Is the F1.0 measurement related to the F1.1 measurement? Correlation is a measurement of LINEAR dependence. Non-linear dependencies have to be modeled in a separate manner. Correlation is a measurement of LINEAR dependence. Non-linear dependencies have to be modeled in a separate manner.

Correlation There is a theoretical correlation, usually represented by ρ X,Y There is a theoretical correlation, usually represented by ρ X,Y We can calculate the sample correlation between two variables (x,y) The Pearson Coefficient is given to the left. We can calculate the sample correlation between two variables (x,y) The Pearson Coefficient is given to the left. This will vary between This will vary between -1.0 and 1.0 indicating the direction of the relationship. -1.0 and 1.0 indicating the direction of the relationship.

Correlation Pearson's product-moment correlation data: york.data\$F1.0 and york.data\$F1.1 t = 45.9262, df = 318, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.9161942 0.9452264 0.9161942 0.9452264 sample estimates: cor cor0.932194

Correlation Types Pearson’s Tau Pearson’s Tau X,Y are continuous variables. X,Y are continuous variables. Kendall’s Tau Kendall’s Tau X,Y are continuous or ordinal. The measure is based on X ranked and the Y ranked. The ranks are used as the basis X,Y are continuous or ordinal. The measure is based on X ranked and the Y ranked. The ranks are used as the basis

One-Way ANOVA If we want to test more than two means equality, we have to use an expanded test: One-Way ANOVA If we want to test more than two means equality, we have to use an expanded test: One-Way ANOVA

An Example Vowels: a, i, O, u Vowels: a, i, O, u Are the F1 measurements the same for each corresponding vowel in the segment? Are the F1 measurements the same for each corresponding vowel in the segment? Assumptions: Normality, each group (level of vowel) has the same variance, independent measurements. Assumptions: Normality, each group (level of vowel) has the same variance, independent measurements.

The ANOVA Table

Results Analysis of Variance Table Response: york.data\$F1.0 Df SS MS F Pr(>F) Df SS MS F Pr(>F) Vowel 3 10830838 3610279 189.96 < 2.2e-16 *** Residuals 316 6005850 19006

What about the assumptions? Can we test for equal variance? Yes. Can we test for equal variance? Yes. If the variance is not equal, is there a solution that will still allow us to use ANOVA? Yes. If the variance is not equal, is there a solution that will still allow us to use ANOVA? Yes.

Post-hoc analysis There is a difference between the mean of at least one vowel and the others, so what? There is a difference between the mean of at least one vowel and the others, so what? We can test where the difference is occurring through pairwise t-tests. This type of analysis is often referred to as a post-hoc analysis. We can test where the difference is occurring through pairwise t-tests. This type of analysis is often referred to as a post-hoc analysis.

Bonferroni Pairwise comparisons using t tests with pooled SD Pairwise comparisons using t tests with pooled SD data: york.data\$F1.0 and york.data\$Vowel a i O a i O i < 2e-16 - - O < 2e-16 <2e-16 - u < 2e-16 1 6.5e-14 P value adjustment method: bonferroni

Multi-Way ANOVA Usually we are not interested in merely one factor, but several factors effects on our independent variable. Usually we are not interested in merely one factor, but several factors effects on our independent variable. Same principle [Except now we have several ‘between groups variables’ ] Same principle [Except now we have several ‘between groups variables’ ]

Multi-Way ANOVA Df Sum Sq Mean Sq F value Pr(>F) Df Sum Sq Mean Sq F value Pr(>F) Vowel 3 173482 57827 2.0353 0.1077197 Liquid 1 216198 216198 7.6092 0.0059747 ** Sex 1 340872 340872 11.9971 0.0005687 *** Residuals 634 18013735 28413

Testing Assumptions Bartlett’s Test: Bartlett’s Test: H0: All variances for each of your cells are equal. H0: All variances for each of your cells are equal. If your p-value is significant (<.05), then you should not be using an ANOVA, but some non- parametric test that relies on ranks. If your p-value is significant (<.05), then you should not be using an ANOVA, but some non- parametric test that relies on ranks. We don’t have to worry about this with large sample data. The central limit theorem states that with enough data you will eventually get normality (of the mean). We don’t have to worry about this with large sample data. The central limit theorem states that with enough data you will eventually get normality (of the mean).

Higher Order Interactions It often isn’t enough to test factors by themselves, but we want to model higher- order interactions. It often isn’t enough to test factors by themselves, but we want to model higher- order interactions. We are looking at Sex, Liquid and Vowel– there are Sex x Liquid, Sex x Vowel, Vowel x Liquid and Sex x Liquid x Vowel as possible interaction effects. We are looking at Sex, Liquid and Vowel– there are Sex x Liquid, Sex x Vowel, Vowel x Liquid and Sex x Liquid x Vowel as possible interaction effects.

An Alternative Approach: Linear Model Linear Models allow for an easily expandable approach that allows us to answer questions more explicitly without having to add more machinery with each new factor or covariate. Linear Models allow for an easily expandable approach that allows us to answer questions more explicitly without having to add more machinery with each new factor or covariate. The underlying form in an ANOVA is essentially a linear model. The underlying form in an ANOVA is essentially a linear model.

What would it look like? In a linear model, we estimate parameters (or coefficients) of the predictors on a response. In a linear model, we estimate parameters (or coefficients) of the predictors on a response. Ex: We want to model the effect of Vowels on F1.0 Ex: We want to model the effect of Vowels on F1.0

What are each of the pieces? α represents the intercept term and the mean for F1.0 when the type of vowel is controlled for. α represents the intercept term and the mean for F1.0 when the type of vowel is controlled for. τ represents the treatment effect of the i vowel. τ i represents the treatment effect of the i th vowel. ε represents the noise and is assumed to be N(0,σ 2 ) (i.e. normally distributed with a mean of zero and constant variance). ε represents the noise and is assumed to be N(0,σ 2 ) (i.e. normally distributed with a mean of zero and constant variance).

Inestimability We can’t really estimate all of the data in our model. We can’t really estimate all of the data in our model. We don’t have a control group where there isn’t a vowel effect. We don’t have a control group where there isn’t a vowel effect.

Two Solutions Stick with the model. You can only test functions of the parameters and only if they are estimable [The hard way and only if you know a fair amount of linear algebra.] Stick with the model. You can only test functions of the parameters and only if they are estimable [The hard way and only if you know a fair amount of linear algebra.] Pick a control group and allow that to be your baseline (or alpha). Pick a control group and allow that to be your baseline (or alpha).

The Simple Way Call: lm(formula = F1.0 ~ Vowel) Residuals: Min 1Q Median 3Q Max Min 1Q Median 3Q Max -322.62 -109.44 -31.20 67.48 1044.13 Coefficients: Estimate Std. Error t value Pr(>|t|) Estimate Std. Error t value Pr(>|t|) (Intercept) 426.43 13.51 31.566 <2e-16 *** Voweli -42.62 19.10 -2.231 0.0260 * VowelO -33.94 19.10 -1.776 0.0761. Vowelu -35.16 19.10 -1.841 0.0662. --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 170.9 on 636 degrees of freedom Multiple R-Squared: 0.009255, Adjusted R-squared: 0.004582 F-statistic: 1.98 on 3 and 636 DF, p-value: 0.1157

Model Assestment Standard F: Are any of the levels significant? Standard F: Are any of the levels significant? R 2 : How much variation in the response is explained by the predictor(s) R 2 : How much variation in the response is explained by the predictor(s)

What’s Next? How to handle repeated measures? How to handle repeated measures? Generalized Linear Models (Counts, proportions) Generalized Linear Models (Counts, proportions) Classification and Regression Trees (Decision Trees). Classification and Regression Trees (Decision Trees).

Download ppt "ANOVA and Linear Models. Data Data is from the University of York project on variation in British liquids. Data is from the University of York project."

Similar presentations