ANOVA and Linear Models. Data Data is from the University of York project on variation in British liquids. Data is from the University of York project.

Slides:



Advertisements
Similar presentations
Week 2 – PART III POST-HOC TESTS. POST HOC TESTS When we get a significant F test result in an ANOVA test for a main effect of a factor with more than.
Advertisements

13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Chapter Fourteen The Two-Way Analysis of Variance.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Nonparametric Statistics Timothy C. Bates
N-way ANOVA. Two-factor ANOVA with equal replications Experimental design: 2  2 (or 2 2 ) factorial with n = 5 replicate Total number of observations:
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
PSY 307 – Statistics for the Behavioral Sciences
Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.
Chapter 10 Simple Regression.
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
Chapter Eighteen MEASURES OF ASSOCIATION
Chapter 14 Conducting & Reading Research Baumgartner et al Chapter 14 Inferential Data Analysis.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Lecture 9: One Way ANOVA Between Subjects
Measures of Association Deepak Khazanchi Chapter 18.
Simple Linear Regression and Correlation
Simple Linear Regression Analysis
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Inferential Statistics
AM Recitation 2/10/11.
Introduction to Linear Regression and Correlation Analysis
Equations in Simple Regression Analysis. The Variance.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Simple Linear Regression Models
Learning Objective Chapter 14 Correlation and Regression Analysis CHAPTER fourteen Correlation and Regression Analysis Copyright © 2000 by John Wiley &
Lecture 8 Analysis of Variance and Covariance Effect of Coupons, In-Store Promotion and Affluence of the Clientele on Sales.
Chapter 11 HYPOTHESIS TESTING USING THE ONE-WAY ANALYSIS OF VARIANCE.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Statistical Power 1. First: Effect Size The size of the distance between two means in standardized units (not inferential). A measure of the impact of.
Completing the ANOVA From the Summary Statistics.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
Jeopardy Opening Robert Lee | UOIT Game Board $ 200 $ 200 $ 200 $ 200 $ 200 $ 400 $ 400 $ 400 $ 400 $ 400 $ 10 0 $ 10 0 $ 10 0 $ 10 0 $ 10 0 $ 300 $
Inferential Statistics
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
1 Review of ANOVA & Inferences About The Pearson Correlation Coefficient Heibatollah Baghi, and Mastee Badii.
1 Inferences About The Pearson Correlation Coefficient.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 15 – Analysis of Variance Math 22 Introductory Statistics.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Monotonic relationship of two variables, X and Y.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Business Research Methods
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Nonparametric Statistics
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Nonparametric Statistics
Spearman’s Rank Correlation
Regression and Correlation
Chapter 10 CORRELATION.
Hypothesis testing using contrasts
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
Chapter 10: Analysis of Variance: Comparing More Than Two Means
Relationship with one independent variable
CHAPTER fourteen Correlation and Regression Analysis
Nonparametric Statistics
Chapter 11: The ANalysis Of Variance (ANOVA)
Relationship with one independent variable
RES 500 Academic Writing and Research Skills
Chapter Thirteen McGraw-Hill/Irwin
Statistical Inference for the Mean: t-test
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

ANOVA and Linear Models

Data Data is from the University of York project on variation in British liquids. Data is from the University of York project on variation in British liquids. JK Local, Alan Wrench, Paul Carter JK Local, Alan Wrench, Paul Carter

Correlation When we have two variables we can measure the strength of the linear association by correlation When we have two variables we can measure the strength of the linear association by correlation Correlation in a strict technical statistical sense is the linear relationship between two variables. Correlation in a strict technical statistical sense is the linear relationship between two variables.

Correlation Many times we are not interested in the differences between two groups, but instead the relationship between two variables on the same set of subjects. Many times we are not interested in the differences between two groups, but instead the relationship between two variables on the same set of subjects. Ex: Are post-graduate salary and gpa related? Ex: Are post-graduate salary and gpa related? Ex: Is the F1.0 measurement related to the F1.1 measurement? Ex: Is the F1.0 measurement related to the F1.1 measurement? Correlation is a measurement of LINEAR dependence. Non-linear dependencies have to be modeled in a separate manner. Correlation is a measurement of LINEAR dependence. Non-linear dependencies have to be modeled in a separate manner.

Correlation There is a theoretical correlation, usually represented by ρ X,Y There is a theoretical correlation, usually represented by ρ X,Y We can calculate the sample correlation between two variables (x,y) The Pearson Coefficient is given to the left. We can calculate the sample correlation between two variables (x,y) The Pearson Coefficient is given to the left. This will vary between This will vary between -1.0 and 1.0 indicating the direction of the relationship and 1.0 indicating the direction of the relationship.

Correlation Pearson's product-moment correlation data: york.data$F1.0 and york.data$F1.1 t = , df = 318, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: sample estimates: cor cor

Correlation Types Pearson’s Tau Pearson’s Tau X,Y are continuous variables. X,Y are continuous variables. Kendall’s Tau Kendall’s Tau X,Y are continuous or ordinal. The measure is based on X ranked and the Y ranked. The ranks are used as the basis X,Y are continuous or ordinal. The measure is based on X ranked and the Y ranked. The ranks are used as the basis

One-Way ANOVA If we want to test more than two means equality, we have to use an expanded test: One-Way ANOVA If we want to test more than two means equality, we have to use an expanded test: One-Way ANOVA

An Example Vowels: a, i, O, u Vowels: a, i, O, u Are the F1 measurements the same for each corresponding vowel in the segment? Are the F1 measurements the same for each corresponding vowel in the segment? Assumptions: Normality, each group (level of vowel) has the same variance, independent measurements. Assumptions: Normality, each group (level of vowel) has the same variance, independent measurements.

The ANOVA Table

Results Analysis of Variance Table Response: york.data$F1.0 Df SS MS F Pr(>F) Df SS MS F Pr(>F) Vowel < 2.2e-16 *** Residuals

What about the assumptions? Can we test for equal variance? Yes. Can we test for equal variance? Yes. If the variance is not equal, is there a solution that will still allow us to use ANOVA? Yes. If the variance is not equal, is there a solution that will still allow us to use ANOVA? Yes.

Post-hoc analysis There is a difference between the mean of at least one vowel and the others, so what? There is a difference between the mean of at least one vowel and the others, so what? We can test where the difference is occurring through pairwise t-tests. This type of analysis is often referred to as a post-hoc analysis. We can test where the difference is occurring through pairwise t-tests. This type of analysis is often referred to as a post-hoc analysis.

Bonferroni Pairwise comparisons using t tests with pooled SD Pairwise comparisons using t tests with pooled SD data: york.data$F1.0 and york.data$Vowel a i O a i O i < 2e O < 2e-16 <2e-16 - u < 2e e-14 P value adjustment method: bonferroni

Multi-Way ANOVA Usually we are not interested in merely one factor, but several factors effects on our independent variable. Usually we are not interested in merely one factor, but several factors effects on our independent variable. Same principle [Except now we have several ‘between groups variables’ ] Same principle [Except now we have several ‘between groups variables’ ]

Multi-Way ANOVA Df Sum Sq Mean Sq F value Pr(>F) Df Sum Sq Mean Sq F value Pr(>F) Vowel Liquid ** Sex *** Residuals

Testing Assumptions Bartlett’s Test: Bartlett’s Test: H0: All variances for each of your cells are equal. H0: All variances for each of your cells are equal. If your p-value is significant (<.05), then you should not be using an ANOVA, but some non- parametric test that relies on ranks. If your p-value is significant (<.05), then you should not be using an ANOVA, but some non- parametric test that relies on ranks. We don’t have to worry about this with large sample data. The central limit theorem states that with enough data you will eventually get normality (of the mean). We don’t have to worry about this with large sample data. The central limit theorem states that with enough data you will eventually get normality (of the mean).

Higher Order Interactions It often isn’t enough to test factors by themselves, but we want to model higher- order interactions. It often isn’t enough to test factors by themselves, but we want to model higher- order interactions. We are looking at Sex, Liquid and Vowel– there are Sex x Liquid, Sex x Vowel, Vowel x Liquid and Sex x Liquid x Vowel as possible interaction effects. We are looking at Sex, Liquid and Vowel– there are Sex x Liquid, Sex x Vowel, Vowel x Liquid and Sex x Liquid x Vowel as possible interaction effects.

An Alternative Approach: Linear Model Linear Models allow for an easily expandable approach that allows us to answer questions more explicitly without having to add more machinery with each new factor or covariate. Linear Models allow for an easily expandable approach that allows us to answer questions more explicitly without having to add more machinery with each new factor or covariate. The underlying form in an ANOVA is essentially a linear model. The underlying form in an ANOVA is essentially a linear model.

What would it look like? In a linear model, we estimate parameters (or coefficients) of the predictors on a response. In a linear model, we estimate parameters (or coefficients) of the predictors on a response. Ex: We want to model the effect of Vowels on F1.0 Ex: We want to model the effect of Vowels on F1.0

What are each of the pieces? α represents the intercept term and the mean for F1.0 when the type of vowel is controlled for. α represents the intercept term and the mean for F1.0 when the type of vowel is controlled for. τ represents the treatment effect of the i vowel. τ i represents the treatment effect of the i th vowel. ε represents the noise and is assumed to be N(0,σ 2 ) (i.e. normally distributed with a mean of zero and constant variance). ε represents the noise and is assumed to be N(0,σ 2 ) (i.e. normally distributed with a mean of zero and constant variance).

Inestimability We can’t really estimate all of the data in our model. We can’t really estimate all of the data in our model. We don’t have a control group where there isn’t a vowel effect. We don’t have a control group where there isn’t a vowel effect.

Two Solutions Stick with the model. You can only test functions of the parameters and only if they are estimable [The hard way and only if you know a fair amount of linear algebra.] Stick with the model. You can only test functions of the parameters and only if they are estimable [The hard way and only if you know a fair amount of linear algebra.] Pick a control group and allow that to be your baseline (or alpha). Pick a control group and allow that to be your baseline (or alpha).

The Simple Way Call: lm(formula = F1.0 ~ Vowel) Residuals: Min 1Q Median 3Q Max Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) Estimate Std. Error t value Pr(>|t|) (Intercept) <2e-16 *** Voweli * VowelO Vowelu Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 636 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: 1.98 on 3 and 636 DF, p-value:

Model Assestment Standard F: Are any of the levels significant? Standard F: Are any of the levels significant? R 2 : How much variation in the response is explained by the predictor(s) R 2 : How much variation in the response is explained by the predictor(s)

What’s Next? How to handle repeated measures? How to handle repeated measures? Generalized Linear Models (Counts, proportions) Generalized Linear Models (Counts, proportions) Classification and Regression Trees (Decision Trees). Classification and Regression Trees (Decision Trees).