Analysis of Variance (ANOVA)

Slides:



Advertisements
Similar presentations
Inference for Regression
Advertisements

Chapter 13 Multiple Regression
© 2010 Pearson Prentice Hall. All rights reserved The Complete Randomized Block Design.
Part I – MULTIVARIATE ANALYSIS
Chapter 12 Multiple Regression
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Lecture 9: One Way ANOVA Between Subjects
Two Groups Too Many? Try Analysis of Variance (ANOVA)
Linear Regression and Correlation Analysis
Stat Today: Multiple comparisons, diagnostic checking, an example After these notes, we will have looked at (skip figures 1.2 and 1.3, last.
Inferences About Process Quality
Analysis of Variance & Multivariate Analysis of Variance
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Chapter 14 Inferential Data Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Inferential Statistics
Chapter 12: Analysis of Variance
Analysis of Variance (ANOVA) Quantitative Methods in HPELS 440:210.
Statistics for the Social Sciences Psychology 340 Fall 2013 Thursday, November 21 Review for Exam #4.
F-Test ( ANOVA ) & Two-Way ANOVA
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Hypothesis testing – mean differences between populations
Hypothesis Testing in Linear Regression Analysis
Repeated Measures ANOVA
QNT 531 Advanced Problems in Statistics and Research Methods
STA291 Statistical Methods Lecture 31. Analyzing a Design in One Factor – The One-Way Analysis of Variance Consider an experiment with a single factor.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Comparing Three or More Means 13.
More About Significance Tests
Which Test Do I Use? Statistics for Two Group Experiments The Chi Square Test The t Test Analyzing Multiple Groups and Factorial Experiments Analysis of.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Chapter 11 HYPOTHESIS TESTING USING THE ONE-WAY ANALYSIS OF VARIANCE.
 The idea of ANOVA  Comparing several means  The problem of multiple comparisons  The ANOVA F test 1.
Chapter 14 Introduction to Multiple Regression
January 31 and February 3,  Some formulae are presented in this lecture to provide the general mathematical background to the topic or to demonstrate.
t(ea) for Two: Test between the Means of Different Groups When you want to know if there is a ‘difference’ between the two groups in the mean Use “t-test”.
1 1 Chapter 2: Comparing Means 2.1 One-Sample t -Test 2.2 Paired t -Test 2.3 Two-Sample t -Test.
ANOVA (Analysis of Variance) by Aziza Munir
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
ANALYSIS OF VARIANCE (ANOVA) BCT 2053 CHAPTER 5. CONTENT 5.1 Introduction to ANOVA 5.2 One-Way ANOVA 5.3 Two-Way ANOVA.
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Analysis of Variance (ANOVA) Brian Healy, PhD BIO203.
One-way ANOVA: - Comparing the means IPS chapter 12.2 © 2006 W.H. Freeman and Company.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
© Copyright McGraw-Hill 2004
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Copyright c 2001 The McGraw-Hill Companies, Inc.1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent variable.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Analysis of Variance STAT E-150 Statistical Methods.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Chapter 1 Introduction to Statistics. Section 1.1 Fundamental Statistical Concepts.
While you wait: Enter the following in your calculator. Find the mean and sample variation of each group. Bluman, Chapter 121.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Formula for Linear Regression y = bx + a Y variable plotted on vertical axis. X variable plotted on horizontal axis. Slope or the change in y for every.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Chapter 13 Simple Linear Regression
Chapter 14 Introduction to Multiple Regression
12 Inferential Analysis.
12 Inferential Analysis.
Presentation transcript:

Analysis of Variance (ANOVA) Shibin Liu SAS Beijing R&D

Agenda 0. Lesson overview 1. Two-Sample t-Tests 2. One-Way ANOVA 3. ANOVA with Data from a Randomized Block Design 4. ANOVA Post Hoc Tests 5. Two-Way ANOVA with Interactions 6. Summary 2

Agenda 0. Lesson overview 1. Two-Sample t-Tests 2. One-Way ANOVA 3. ANOVA with Data from a Randomized Block Design 4. ANOVA Post Hoc Tests 5. Two-Way ANOVA with Interactions 6. Summary 3

Lesson overview One sample t-Test μ μ0 4

Lesson overview Two-sample t-Test ? μ1 ≠ μ2 5

Lesson overview ANOVA ? ? μ1 ≠ μ2 ≠ μ3 6

Lesson overview ANOVA Predictor Variable Response Variable + Levels 7

+ Lesson overview Response Variable Predictor Variable ANOVA Response Variable Predictor Variable Predictor Variable + 8

Lesson overview One sample t-Test Two-sample t-Test ANOVA 9

Lesson overview 10 What do you want to examine? The relationship between variables The difference between groups on one or more variables The location, spread, and shape of the data’s distribution Summary statistics or graphics? How many groups? Which kind of variables? SUMMARY STATISTICS DISTRIBUTION ANALYSIS TTEST LINEAR MODELS CORRELATIONS ONE-WAY FREQUENCIES & TABLE ANALYSIS LINEAR REGRESSION LOGISTIC REGRESSION Summary statistics Both Two Two or more Descriptive Statistics Descriptive Statistics, histogram, normal, probability plots Analysis of variance Continuous only Frequency tables, chi-square test Categorical response variable Inferential Statistics Lesson 1 Lesson 2 Lesson 3 & 4 Lesson 5 10

Agenda 0. Lesson overview 1. Two-Sample t-Tests 2. One-Way ANOVA 3. ANOVA with Data from a Randomized Block Design 4. ANOVA Post Hoc Tests 5. Two-Way ANOVA with Interactions 6. Summary 11

The Two-Sample t-Test: Introduction ? μ1 ≠ μ2 12

The Two-Sample t-Test: Introduction ? Salary Blood pressure Salary ? Blood pressure? … … 13

The Two-Sample t-Test: Introduction In this topic, we will learn to do the following: Analyze differences between two population means using the t-Test task Verify the assumption of and perform a two-sample t-Test perform a one-sided t-Test 14

The Two-Sample t-Test: Introduction Ha: μ1 ≠ μ2 Ha: μ1 - μ2 ≠ 0 15

Two-Sample t-Tests: Assumptions Before we start the analysis, examine the data to verify that the statistical assumption are valid: Independent observations: Normally distributed data for each group Equal variances for each group. No information provided by other observations No impact between the observations If one of the assumptions is not valid and no adjustments are made, the probability of drawing incorrect conclusions from the analysis could increase. 𝜎1 2 = 𝜎2 2 16

Two-Sample t-Tests: F-Test for Equality of Variance H0: 𝜎1 2 = 𝜎2 2 H1: 𝜎1 2 ≠ 𝜎2 2 F-Statistic 𝐹= max ( 𝑠1 2 , 𝑠2 2 ) m𝑖𝑛 ( 𝑠1 2 , 𝑠2 2 ) This test is only valid for independent samples from normal distributions. Normality is required even for large sample size. If you reject the null hypothesis, it is recommended that you use unequal variance t-test for testing the equality. When the null hypothesis is true, what value will the F-Statistic be close to? 𝐹≅1 𝐹=large value 17

Two-Sample t-Tests: Examining the Equal Variance t-Test and p-Values ? F-Test for Equal Variance H0: 𝜎1 2 = 𝜎2 2 > > 0.7446 0.05 18

Two-Sample t-Tests: Examining the Equal Variance t-Test and p-Values 𝜎1 2 = 𝜎2 2 > < 0.0003 0.05 H0: μ1 - μ2 =0 Ha: μ1 - μ2 ≠ 0 19

Two-Sample t-Tests: Examining the Unequal Variance t-Test and p-Values ? F-Test for Equal Variance H0: 𝜎1 2 = 𝜎2 2 > < 0.0185 0.05 20

Two-Sample t-Tests: Examining the Unequal Variance t-Test and p-Values 𝜎1 2 ≠ 𝜎2 2 > < 0.0320 0.05 H0: μ1 - μ2 =0 Ha: μ1 - μ2 ≠ 0 21

Two-Sample t-Tests: Demo Scenario: compare two group’s means, girl’s and boy’s SAT scores Identify the data TestScores Classification variable? Continuous variable to analyze? 22

Two-Sample t-Tests: Demo Analyze > ANOVA > t Test 23

Two-Sample t-Tests: Demo Result interpreting: Normal? 24

Two-Sample t-Tests: Demo Result interpreting Normal? 25

Two-Sample t-Tests: Demo Result interpreting H0: μ1 - μ2 =0 H0: 𝜎1 2 = 𝜎2 2 26

Two-Sample t-Tests: Demo Result interpreting The confidence interval for the mean difference (-3.6950, 125.2) includes 0. this implies that you cannot say with 95% confidence that the difference between boys and girls is not zero. Therefore, it also implies that the p-value is greater than 0.05. 27

Two-Sample t-Tests: One-Sided Tests Ha: μ1 ≠ μ2 Ha: μ1 > μ2 New drug only have positive effect <0 Ha: μ1 −μ2 <0 One-Sided Test 28

Two-Sample t-Tests: One-Sided Tests H0: μ1 ≤ μ2 H0: μ1 ≥ μ2 One direction One direction <0 29

Two-Sample t-Tests: One-Sided Tests H0: μ1 > μ2 H0: equality Ha: μ1 < μ2 Power is the ability when you reject the null hypothesis when the hypothesis is false. Probability of Correct Rejection= 1−𝛽=Power Type II Error: Fail to Reject Null, when H0 is False Probability of Type II Error = 𝛽 Critical region 30

Two-Sample t-Tests: One-Sided Tests H0: μ1 > μ2 H0: equality Ha: μ1 < μ2 Power is the probability when you reject the null hypothesis when the hypothesis is false. Or the probability when you detect a difference when the difference actually exists. Probability of Correct Rejection= 1−𝛽=Power Type II Error: Fail to Reject Null, when H0 is False Probability of Type II Error = 𝛽 What is Power? Power is the probability when your test will reject the null hypothesis when the hypothesis is false. Or the probability when you detect a difference when the difference actually exists. 31

Two-Sample t-Tests: One-Sided Tests/Scenario one-sided upper-tailed t-test > SAT Scores SAT Scores Group 1 Group 2 H0: μ1 −μ2≤0 Ha: μ1 −μ2 >0 32

Two-Sample t-Tests: One-Sided Tests/Scenario one-sided upper-tailed t-test t statistic H0: μ1 −μ2≤0 Ha: μ1 −μ2 >0 33

Two-Sample t-Tests: One-Sided Tests/Scenario Analyze > ANOVA > t Test > Two sample > Preview Code> Insert code 34

Two-Sample t-Tests: One-Sided Tests/Scenario 0.0321< 0.05 0.0321< 0.05 0 is not in the interval 0 is not in the interval 35

Two-Sample t-Tests Question 1. What justifies the choice of a one-sided test versus a two-sided test The need for more statistical power Theoretical and subject-matter considerations The non-significance of a two-sided test The need for an unbiased test statistic Answer: b 36

Two-Sample t-Tests Question 2. A professor suspects her class is performing below the department average of 73%. She decides to test this claim. Which of the following is the correct alternative hypothesis? μ < 0.73 μ > 0.73 μ ≠ 0.73 Because the professor suspects that her class is performing below the department average, this is an example of a lower-tailed test. So the alternative hypotheses is u<0.73. Answer: a 37

Agenda 0. Lesson overview 1. Two-Sample t-Tests 2. One-Way ANOVA 3. ANOVA with Data from a Randomized Block Design 4. ANOVA Post Hoc Tests 5. Two-Way ANOVA with Interactions 6. Summary 38

One-Way ANOVA: Introduction ? ? μ1 ≠ μ2 ≠ μ3 39

One-Way ANOVA: Objective Analyze difference between population means using the Linear Models task Verify the assumption of analysis variance. 40

One-Way ANOVA: ANOVA overview Response Variable Predictor Variable + One-Way ANOVA only have one Predictor Variable. Levels 41

One-Way ANOVA: ANOVA overview Case 1 Programmer earns more than teacher? Response Variable Predictor Variable salary Job title Programmer earns more than teacher? Teacher Programmer Teacher 42

One-Way ANOVA: ANOVA overview Case 1 Two-sample t-Test Programmer earns more than teacher? Teacher Programmer Teacher Predictor Variable 43

One-Way ANOVA: ANOVA overview Case 1 = F statistic One-way ANOVA t statistic2 Two-sample t-Test Programmer earns more than teacher? Teacher = Programmer Teacher Predictor Variable 44

One-Way ANOVA: ANOVA overview Case 2 Response Variable T-cell counts Predictor Variable t cell [医]淋巴细胞(胸腺依赖性细胞) T-cell counts Medication 1 Medication 2 Placebo 45

One-Way ANOVA: ANOVA overview Case 2 Two-sample t-Test Medication 1 Placebo t cell [医]淋巴细胞(胸腺依赖性细胞) T-cell counts Medication 2 Medication 1 Medication 2 Placebo 46

One-Way ANOVA: ANOVA overview Case 2 One-way ANOVA Medication 1 Placebo Medication 2 47

One-Way ANOVA: The ANOVA Hypothesis Small difference ANOVA ANOVA help check if the diffs are large enough to reject the population mean are same. H0: μ1 = μ2 = μ3 = μ4 48

One-Way ANOVA: The ANOVA Hypothesis Medication 1 Predictor Variable Placebo Medication 2 49

One-Way ANOVA: The ANOVA Hypothesis Ha: μ1 = μ2 ≠ μ3 Medication 1 Predictor Variable Placebo Medication 2 different 50

One-Way ANOVA: The ANOVA model Error term i: indexes treatments (three types of medication) K: observation number μ: overall population mean i: indexes treatments (three types of medication) K: observation number effect τ1 =μ1 - μ One-way ANOVA μ: overall population mean τ2 =μ2 - μ τ3 =μ3 - μ 51

One-Way ANOVA: Sums of Squares H0: μ1 = μ2 = μ3 > Variability between groups Variability within groups By ratio By ratio > 52

One-Way ANOVA: Sums of Squares SST Total Sum of Squares SSM Model Sum of Squares SSE Error Sum of Squares 53

One-Way ANOVA: Sums of Squares Variability between groups Variability within groups Total variability SST = SSM + SSE SST Total Sum of Squares SSM Model Sum of Squares SSE Error Sum of Squares SST =∑ ∑ ( 𝑌 𝑖𝑗 − 𝑌 ) 2 Total Sum of Squares SSM =∑ 𝑛 𝑖 ( 𝑌 𝑖 − 𝑌 ) 2 Model Sum of Squares SSE =∑ ∑ ( 𝑌 𝑖𝑗 − 𝑌 𝑖 ) 2 Error Sum of Squares 54

One-Way ANOVA: F statistic F statistic and Critical Value at α=0.05 F(Model df, Error df)= MSM / MSE= SSM 𝑑𝑓𝑀 SSE 𝑑𝑓𝐸 MSM : Model Mean Square MSE : Error Mean Square In general, degree of freedom (DF) can be thought of as the number of independent pieces of information. Model DF is the number of treatment minus 1. Corrected total DF is the sample size minus 1. Error DF is the sample size minus the number of treatments (or the difference between the corrected total DF and the Model DF) SST Total Sum of Squares SSM Model Sum of Squares SSE Error Sum of Squares 55

One-Way ANOVA: Coefficient of Determination “proportion of variance accounted for by the model ” SST Total Sum of Squares SSM Model Sum of Squares SSE Error Sum of Squares 56

One-Way ANOVA: Assumptions for ANOVA Independent observations Error terms are normally distributed Error terms have equal variances Assessing ANOVA Assumptions Good data collection methods help ensure the independence assumption. Diagnostic plots can be used to verify the assumption that the error is approximately normally distributed The Linear Models task can produce a test of equal variance. H0 for this hypothesis test is that the variances are equal for all populations. > 57

One-Way ANOVA: Predicted and Residual values Estimates of the error term - = Observation Group mean Residuals > 58

One-Way ANOVA: Predicted and Residual values + The predicted value in ANOVA is the group mean. A residual is the difference between the observed value of the response and the predicted value of the response variable. The predicted value in ANOVA is the group mean. A residual is the difference between the observed value of the response and the predicted value of the response variable. 59

Three fertilizers + control One-Way ANOVA: Scenario: Comparing Group Means with One-Way ANOVA Garlic Three fertilizers + control Predictor Variable Response Variable Bulb Weight Fertilizer 60

Ha: at least one is different One-Way ANOVA: Scenario: Comparing Group Means with One-Way ANOVA ANOVA H0: μ1 = μ2 = μ3 = μ4 Ha: at least one is different 61

Description check of MGGARLIC across groups One-Way ANOVA: Scenario: Comparing Group Means with One-Way ANOVA Description check of MGGARLIC > Summary Statistic Description check of MGGARLIC across groups 62

One-Way ANOVA: Question 1. You have 20 observations in you ANOVA and you calculate the residuals. What will they sum to? -20 20 400 Need more information Answer: d 63

One-Way ANOVA: Question 2. Which of the following phrases describes the model sums of squares, or SSM? The variability between the groups The variability within the groups The variability explained by the error terms Answer: a 64

One-Way ANOVA: Question 3. Match the null hypothesis to the correct SAS output a) b H0: 𝜎1 2 = 𝜎2 2 b) a H0: μ 1 = μ 2 65

One-Way ANOVA: Scenario: Comparing Group Means with One-Way ANOVA Task> ANOVA>Linear Models, with MGGARLIC data 66

Three assumptions: Independence Normality Equal variance One-Way ANOVA: Scenario: Comparing Group Means with One-Way ANOVA Task> ANOVA>Linear Models Three assumptions: Independence Normality Equal variance One-Way ANOVA can say at "least one mean is different" , but cannot say "which specific group is different". post hoc test can do it. 67

One-Way ANOVA: Summary Null Hypothesis: All means are equal Alternative Hypothesis: at least one mean is different Produce descriptive statistics. Verify assumptions. Independence Normality Equal variance Examine the p-value in the ANOVA table. If the p-value is less than alpha, reject the null hypothesis. 68

Agenda 0. Lesson overview 1. Two-Sample t-Tests 2. One-Way ANOVA 3. ANOVA with Data from a Randomized Block Design 4. ANOVA Post Hoc Tests 5. Two-Way ANOVA with Interactions 6. Summary 69

ANOVA with Data from a Randomized Block Design: Introduction Brands age Placebo Medication 1 Medication 2 70

ANOVA with Data from a Randomized Block Design: Introduction Over 50 30-50 Under30 We are not interested in age, but want make the impact of age to minimum. 71

ANOVA with Data from a Randomized Block Design: Objective Recognize the difference between a completely randomized design and a randomized block design. Differentiate between observed data and designed experiments. Use the Linear Models task to analyze data from a randomized block design 72

ANOVA with Data from a Randomized Block Design: Observational Studies Groups can be naturally occurring. Gender and ethnicity Random assignment might be unethical or untenable Smoking or credit risk groups Groups can be naturally occurring. Gender and ethnicity Random assignment might be unethical or untenable: Smoking or credit risk groups In Observational or Retrospective studies, the data values are observed as they occur, not affected by an experimental design. In Observational or Retrospective studies, the data values are observed as they occur, not affected by an experimental design. 73

ANOVA with Data from a Randomized Block Design: Controlled Experiments Random assignment might be desirable to eliminate selection bias. You often wan to look at the outcome measure prospectively. You can manipulate the factors of interest and can more reasonably claim causation. You can design your experiment to control for other factors contributing to the outcome measure. 74

ANOVA with Data from a Randomized Block Design Question 3. Can you determine a cause-and–effect relationship in an observational study? Yes No In an observational study, you often examine what already occurred, and therefore have little control over factors contributing to the outcome. In a controlled experiment, you can manipulate the factors of interest and can more reasonably claim causation. Answer: b observational study controlled study In an observational study, you often examine what already occurred, and therefore have little control over factors contributing to the outcome. In a controlled experiment, you can manipulate the factors of interest and can more reasonably claim causation. 75

ANOVA with Data from a Randomized Block Design: Nuisance Factors Nuisance Factors are factors can affect the outcome but are not of interest in the experiment. = T-Cell Count Medication + Age Age + Medication Randomized block design 76

ANOVA with Data from a Randomized Block Design Question 4. Which part of the ANOVA tables contains the variation due to nuisance factors? Sum of Squares Model Sum of Squares Error Degrees of Freedom Answer: b 77

ANOVA with Data from a Randomized Block Design: Including a Blocking Variable in the Model Age 78

ANOVA with Data from a Randomized Block Design: Including a Blocking Variable in the Model + Age 79

ANOVA with Data from a Randomized Block Design: More ANOVA Assumptions The treatment are randomly assigned to each block Independent observations Under30 30-50 Over 50 Normally distributed data The treatment are randomly assigned to each block The treatments don’t change across the blocks. Interaction: the treatment changes across the blocks. Equal variances No interaction 80

ANOVA with Data from a Randomized Block Design Scenario: Creating a Randomized Block Design Garlic H0: μ1 = μ2 = μ3 = μ4 Ha: at least one is different Sun What’s the nuisance factors in this case PH level of soil Rain 81

ANOVA with Data from a Randomized Block Design Scenario: Creating a Randomized Block Design Sun = Bulb Weight fertilizer + PH of the soil + Rain 82

ANOVA with Data from a Randomized Block Design Scenario: Creating a Randomized Block Design Randomized block design 83

ANOVA with Data from a Randomized Block Design Scenario: Creating a Randomized Block Design 84

ANOVA with Data from a Randomized Block Design Question 5. In a block design, Which part of the ANOVA tables contains the variation due to nuisance factors? Sum of Squares Model Sum of Squares Error Degrees of Freedom Answer: a 85

ANOVA with Data from a Randomized Block Design: Performing ANOVA with Blocking Task> ANOVA>Linear Models, with MGGARLIC_BLOCK data 86

ANOVA with Data from a Randomized Block Design: Performing ANOVA with Blocking Task> ANOVA>Linear Models, with MGGARLIC_BLOCK data The F value of Sector, the blocking variable, is 6.53 more than 1. Thumb rule: if the F value of the blocking variable is more than 1, it should be considered. 87

ANOVA with Data from a Randomized Block Design My groups are different. What next? The p-value for Fertilizer indicates you should reject the H0 that all groups are the same. From which pairs of fertilizers, are garlic bulb weights different from one another? Should you go back and do several t-tests? 88

Agenda 0. Lesson overview 1. Two-Sample t-Tests 2. One-Way ANOVA 3. ANOVA with Data from a Randomized Block Design 4. ANOVA Post Hoc Tests 5. Two-Way ANOVA with Interactions 6. Summary 89

ANOVA Post Hoc Tests: Introduction μ1 One-way ANOVA μ2 μ? Randomized block design μ3 90

ANOVA Post Hoc Tests: Introduction μ1 Pairwise test H0: μ1 = μ2 P-value μ? Type I error μ2 Pairwise test H0: μ1 = μ3 P-value Pairwise test H0: μ2 = μ3 P-value μ3 ANOVA Post Hoc Tests Type I error Multiple Comparison Method 91

ANOVA Post Hoc Tests: Multiple Comparison Methods Question 7. With a fair coin, your probability of getting heads on one flip is 0.5. if you flip a coin and got heads, what is the probability of getting heads on the second try? 0.5 0.25 0.00 1.00 0.75 Answer: a 92

ANOVA Post Hoc Tests: Multiple Comparison Methods Question 8. With a fair coin, your probability of getting heads on one flip is 0.5. If you flip a coin twice, what is the probability of getting at least one head out of two? 0.5 0.25 0.00 1.00 0.75 Answer: e 93

ANOVA Post Hoc Tests: Multiple Comparison Methods Pairwise t-test Type I Error H0: μ1 = μ2 H0: μ1 = μ3 α=0.05 H0: μ2 = μ3 94

ANOVA Post Hoc Tests: Multiple Comparison Methods Comparisonwise Error Rate Number of Comparisons Experimentwise Error Rate 0.05 1 3 0.14 6 0.26 10 0.40 Comparisonwise Error Rate(CER): the probability of Type I Error on the single one Pairwise t-test. The Experimentwise Error Rate (EER) uses an alpha that take into consideration all the pairwise comparisons you are making. nc: Number of comparisons Reject 1 of 3 null hypothesis just by chance, even the null is true. Type I Error 𝐸𝐸𝑅=1− (1−𝛼) 𝑛𝑐 Pairwise t-test nc: Number of comparisons 95

ANOVA Post Hoc Tests: Tukey's Multiple Comparison Method Tukey Method EER H0: μ1 = μ2 H0: μ1 = μ3 Pairwise comparisons H0: μ2 = μ3 96

ANOVA Post Hoc Tests: Tukey's Multiple Comparison Method Tukey Method EER=0.05 EER<0.05 H0: μ1 = μ2 H0: μ1 = μ3 Pairwise comparisons 97

ANOVA Post Hoc Tests: Tukey's Multiple Comparison Method This method is appropriate when you consider pairwise comparisons only. The Experimentwise Error Rate is: Equal to alpha when all pairwise comparisons are considered Less than alpha when fewer than all pairwise comparisons are considered 98

ANOVA Post Hoc Tests: scenario: determine which mean is different Garlic Ha: at least one is different Fertilizers: three organics, one control ? Fertilizers: three organics, one control 99

ANOVA Post Hoc Tests: Diffograms and the Tukey Method Difference between the means least square mean by Equality of the means The downward-sloping diagnose line shows the confidence intervals for the difference. The upward-sloping line is a reference line showing where the group means would be equal. intersection of the downward-sloping diagnose line for a pair with the upward-sloping, broken gray diagonal line implies that the confidence interval includes zero and that the mean difference between the two groups is not statistically significant. 100

ANOVA Post Hoc Tests: Diffograms and the Tukey Method Is there the diff between the treatments 1 and 2? Can you identify the pairwise comparisons that do not have significant diff means? The downward-sloping diagnose line shows the confidence intervals for the difference. The upward-sloping line is a reference line showing where the group means would be equal. intersection of the downward-sloping diagnose line for a pair with the upward-sloping, broken gray diagonal line implies that the confidence interval includes zero and that the mean difference between the two groups is not statistically significant. 101

ANOVA Post Hoc Tests: Dunnett's Multiple Comparison Method Special Case of Comparing to a Control Comparing to a control is appropriate when there is a natural reference group, such as a placebo group in a drug trial. Experimentwise Error Rate is no greater than the stated alpha Comparing to a control takes into account the correlations among tests One-sided hypothesis test against a control group can be performed Control comparison computes and tests k-1 GroupWise differences, where k is the number of levels of the classification variable. An example is the Dunnett method Dunnett's Multiple Comparison Method is recommended when there is a true control group. When appropriate it is more powerful than methods that control for all possible comparisons. 102

ANOVA Post Hoc Tests: Control Plots and the Dunnett Method Upper decision limit L-S-mean control plot are produced only when you specify that you want to compare all other group means against a control group mean. The value of the control group mean is shown as a horizontal line. the shaded area is bounded by the UDL and LDL (Upper decision limit and Lower decision limit). if the vertical line extends past the shaded area, the means that the group represented by that line is significantly from the control group. Lower decision limit 103

ANOVA Post Hoc Tests: Performing a Post Hoc Tests 104

ANOVA Post Hoc Tests: Performing a Post Hoc Tests: Turkey 105

ANOVA Post Hoc Tests: Performing a Post Hoc Tests: Dunnett 106

ANOVA Post Hoc Tests: Performing a Post Hoc Tests: t-test 107

Agenda 0. Lesson overview 1. Two-Sample t-Tests 2. One-Way ANOVA 3. ANOVA with Data from a Randomized Block Design 4. ANOVA Post Hoc Tests 5. Two-Way ANOVA with Interactions 6. Summary 108

Two-Way ANOVA with Interactions: Introduction One-way ANOVA ? ? μ1 ≠ μ2 ≠ μ3 109

Two-Way ANOVA with Interactions: Introduction One-way ANOVA Predictor Variable Response Variable + Levels 110

Two-Way ANOVA with Interactions: Introduction Response Variable Predictor Variable Predictor Variable + Levels 111

Two-Way ANOVA with Interactions: Introduction High alloy Low alloy Heat 1 Heat 2 Heat 3 Heat 4 112

Two-Way ANOVA with Interactions: Objective Fit a two-way ANOVA Detect interactions between factors Analyze the treatments when there is a significant interaction 113

Two-Way ANOVA with Interactions: n-Way ANOVA Response Variable Predictor Variable One-way ANOVA Response Variable Predictor Variable Predictor Variable N-way ANOVA More than one Predictor Variable More than one Predictor Variable 114

Two-Way ANOVA with Interactions: n-Way ANOVA ? Randomized block design ≈ N-way ANOVA blocking factor interested factor 115

Two-Way ANOVA with Interactions: interactions No interaction 116

Two-Way ANOVA with Interactions: interactions ? Alloys Interactions When you analyze an n-way ANOVA with interactions, you should first look at any tests for interactions among factors. If there is no interactions between the factors, the tests for the individual factor effects can be interpreted as true effects of that factor. If an interactions exists between any factors, the tests for the individual factor effects might be misleading, due to masking of the effects by the interaction. This is especially true for unbalanced data. ? Heat setting 117

Two-Way ANOVA with Interactions: The Two-Way ANOVA Model When non-significant Error term μ: overall population mean, Regardless of alloy and heating effect effect Effect of interaction αi =μi - μ βj =μj - μ 118

Two-Way ANOVA with Interactions: Scenario: Using a Two-Way ANOVA Response Variable Predictor Variable Predictor Variable Levels Levels 119

Two-Way ANOVA with Interactions: Scenario: Using a Two-Way ANOVA Response Variable Predictor Variable Predictor Variable Blood pressure Disease types Drug doses A, B, C 100ml, 200ml, 300ml, placebo 120

Two-Way ANOVA with Interactions: Identify the data Drug 121

Two-Way ANOVA with Interactions: Applying the model Two-way ANOVA assumptions: Independent observations Normally distributed data Equal variances 122

Two-Way ANOVA with Interactions: The Two-Way ANOVA Model Observed BllodP for each patient effect effect Effect of interaction Error term αi =μi - μ βj =μj - μ Overall mean of BllodP 123

Two-Way ANOVA with Interactions: The Two-Way ANOVA Model H0: None are statistically Different H0: ? 124

Two-Way ANOVA with Interactions: Examining Your Data /* Create format, Method I, via EG UI */ data drugdose; input dose $ 8. level; cards; Placebo 1 50 mg 2 100 mg 3 200 mg 4 ; run; /*Method II, by code*/ proc format library=work; value dosefmt 1='Placebo' 2='50 mg' 3='100 mg' 4='200 mg'; run; 125

Two-Way ANOVA with Interactions: Examining Your Data In which disease type does the drug dose appear to be most effective? 126

Two-Way ANOVA with Interactions: Performing Two-Way ANOVA with Interactions Task> ANOVA> Linear Models 127

Two-Way ANOVA with Interactions: Performing Two-Way ANOVA with Interactions The type I SS are model-order dependent. Each effect is adjusted only for the preceding effects in the model. They are known as sequential sums of squares. They are useful in cases where the marginal effect for adding terms in a specific order is important. An example is a test of polynomials, where X, X*X, and X*X*X are in the model. Each term is test only controlling for a lower-order term. The type I SS is additive. They sum to the Model Sum of Squares for the overall model. The type III sum of squares are commonly called partial sum of squares. The type III sum of squares for a particular variable is the increase in the model sum of squares due to adding the variable to a model that already contains all the other variables in the model. The type III sum of squares, therefore, do not depend on the order in which the explanatory variables are specified in the model. The type III sum of squares values are not generally additive (except in a completely balanced design). The values do not necessarily sum to the Model SS. You generally interpret and report results based on the type III SS. 128

Two-Way ANOVA with Interactions: Performing a Post Hoc Pairwise Comparison 129

Two-Way ANOVA with Interactions: Performing a Post Hoc Pairwise Comparison Given all of this information, it seems you would want to aggressively treat blood pressure in people with disease A with high dose of the drug. For those with disease B, treating with the drug at all would be a mistake. For those with disease C, there seems to be no effect on blood pressure. 130

Agenda 0. Lesson overview 1. Two-Sample t-Tests 2. One-Way ANOVA 3. ANOVA with Data from a Randomized Block Design 4. ANOVA Post Hoc Tests 5. Two-Way ANOVA with Interactions 6. Summary 131

Question 9. If you want to compare the average monthly spending for males versus females which statistical method should you choose? One-Sample t-Tests One-Way ANOVA Two-Way ANOVA Answer: b 132

Home Work: Exercise 1 1.1 Using the t Test for Comparing Groups Elli Sageman, a Master of Education candidate in German Education at the University of North Carolina at Chapel Hill in 2000, collected data for a study: she looked at the effectiveness of a new type of foreign language teaching technique on grammar skills. She selected 30 students to receive tutoring; 15 received the new type of training during the tutorials and 15 received standard tutoring. Two students moved away from the district before completing the study. Scores on a standardized German grammar test were recorded immediately before the 12–week tutorials and then again 12 weeks later at the end of the trial. Sageman wanted to see the effect of the new technique on grammar skills. The data are in the GERMAN data set. Change change in grammar test scores Group the assigned treatment, coded Treatment and Control Assess whether the Treatment group changed the same amount as the Control group. Use a two-sided t-test. Analyze the data using the t Test task. Assess whether the Treatment group improved more than the Control group. Do the two groups appear to be approximately normally distributed? Do the two groups have approximately equal variance? Does the new teaching technique seem to result in significantly different change scores compared with the standard technique? 133

Home Work: Exercise 2 2.1 Analyzing Data in a Completely Randomized Design Consider an experiment to study four types of advertising: local newspaper ads, local radio ads, in-store salespeople, and in-store displays. The country is divided into 144 locations, and 36 locations are randomly assigned to each type of advertising. The level of sales is measured for each region in thousands of dollars. You want to see whether the average sales are significantly different for various types of advertising. The Ads data set contains data for these variables: Ad type of advertising Sales level of sales in thousands of dollars Examine the data. Use the Summary Statistics task. What information can you obtain from looking at the data? Test the hypothesis that the means are equal. Be sure to check that the assumptions of the analysis method that you choose are met. What conclusions can you reach at this point in your analysis? 134

Home Work: Exercise 3 3.1 Analyzing Data in a Randomized Block Design When you design the advertising experiment in the first question, you are concerned that there is variability caused by the area of the country. You are not particularly interested in what differences are caused by Area, but you are interested in isolating the variability due to this factor. The ads1 data set contains data for the following variables: Ad type of advertising Area area of the country Sales level of sales in thousands of dollars Test the hypothesis that the means are equal. Include all of the variables in your model. What can you conclude from your analysis? Was adding the blocking variable Area into the design and analysis detrimental to the test of Ad? 135

Home Work: Exercise 4 4.1 post Hoc Pairwise Comparisons Consider again the analysis of Ads1 data set. There was a statistically significant difference among means for sales for the different types of advertising. Perform a post hoc test to look at the individual differences among means for the advertising campaigns. Conduct pairwise comparisons with an experiments error rate of a=0.05. (use the Tukey adjustment) which types of advertising are significantly different? Use display (case sensitive ) as the control group and do a Dunnett comparison of all other advertising methods to see whether those methods resulted in significantly different amounts of sales compared with display advertising in stores? 136

Home Work: Exercise 5 5.1 Performing Two-Way ANOVA Consider an experiment to test three different brands of concrete and see whether an additive makes the cement in the concrete stronger. Thirty test plots are poured and the following features are recorded in the Concrete data set: Strength the measured strength of a concrete test plot Additive whether an additive was used in the test plot Brand the brand of concrete being tested Use the Summary Statistics task to examine the data, with Strength as the analysis variable and Additive and Brand as the classification variables. What information can you obtain from Looking at the data? Test the hypothesis that the means are equal, making sure to include an interaction term if the results from the Summary Statistics output indicate that would be advisable. What conclusions can you reach at this point in your analysis? Do the appropriate multiple comparisons test for statistically significant effects? 137

Thank you!