Analysis of Variance (ANOVA) Shibin Liu SAS Beijing R&D
Agenda 0. Lesson overview 1. Two-Sample t-Tests 2. One-Way ANOVA 3. ANOVA with Data from a Randomized Block Design 4. ANOVA Post Hoc Tests 5. Two-Way ANOVA with Interactions 6. Summary 2
Agenda 0. Lesson overview 1. Two-Sample t-Tests 2. One-Way ANOVA 3. ANOVA with Data from a Randomized Block Design 4. ANOVA Post Hoc Tests 5. Two-Way ANOVA with Interactions 6. Summary 3
Lesson overview One sample t-Test μ μ0 4
Lesson overview Two-sample t-Test ? μ1 ≠ μ2 5
Lesson overview ANOVA ? ? μ1 ≠ μ2 ≠ μ3 6
Lesson overview ANOVA Predictor Variable Response Variable + Levels 7
+ Lesson overview Response Variable Predictor Variable ANOVA Response Variable Predictor Variable Predictor Variable + 8
Lesson overview One sample t-Test Two-sample t-Test ANOVA 9
Lesson overview 10 What do you want to examine? The relationship between variables The difference between groups on one or more variables The location, spread, and shape of the data’s distribution Summary statistics or graphics? How many groups? Which kind of variables? SUMMARY STATISTICS DISTRIBUTION ANALYSIS TTEST LINEAR MODELS CORRELATIONS ONE-WAY FREQUENCIES & TABLE ANALYSIS LINEAR REGRESSION LOGISTIC REGRESSION Summary statistics Both Two Two or more Descriptive Statistics Descriptive Statistics, histogram, normal, probability plots Analysis of variance Continuous only Frequency tables, chi-square test Categorical response variable Inferential Statistics Lesson 1 Lesson 2 Lesson 3 & 4 Lesson 5 10
Agenda 0. Lesson overview 1. Two-Sample t-Tests 2. One-Way ANOVA 3. ANOVA with Data from a Randomized Block Design 4. ANOVA Post Hoc Tests 5. Two-Way ANOVA with Interactions 6. Summary 11
The Two-Sample t-Test: Introduction ? μ1 ≠ μ2 12
The Two-Sample t-Test: Introduction ? Salary Blood pressure Salary ? Blood pressure? … … 13
The Two-Sample t-Test: Introduction In this topic, we will learn to do the following: Analyze differences between two population means using the t-Test task Verify the assumption of and perform a two-sample t-Test perform a one-sided t-Test 14
The Two-Sample t-Test: Introduction Ha: μ1 ≠ μ2 Ha: μ1 - μ2 ≠ 0 15
Two-Sample t-Tests: Assumptions Before we start the analysis, examine the data to verify that the statistical assumption are valid: Independent observations: Normally distributed data for each group Equal variances for each group. No information provided by other observations No impact between the observations If one of the assumptions is not valid and no adjustments are made, the probability of drawing incorrect conclusions from the analysis could increase. 𝜎1 2 = 𝜎2 2 16
Two-Sample t-Tests: F-Test for Equality of Variance H0: 𝜎1 2 = 𝜎2 2 H1: 𝜎1 2 ≠ 𝜎2 2 F-Statistic 𝐹= max ( 𝑠1 2 , 𝑠2 2 ) m𝑖𝑛 ( 𝑠1 2 , 𝑠2 2 ) This test is only valid for independent samples from normal distributions. Normality is required even for large sample size. If you reject the null hypothesis, it is recommended that you use unequal variance t-test for testing the equality. When the null hypothesis is true, what value will the F-Statistic be close to? 𝐹≅1 𝐹=large value 17
Two-Sample t-Tests: Examining the Equal Variance t-Test and p-Values ? F-Test for Equal Variance H0: 𝜎1 2 = 𝜎2 2 > > 0.7446 0.05 18
Two-Sample t-Tests: Examining the Equal Variance t-Test and p-Values 𝜎1 2 = 𝜎2 2 > < 0.0003 0.05 H0: μ1 - μ2 =0 Ha: μ1 - μ2 ≠ 0 19
Two-Sample t-Tests: Examining the Unequal Variance t-Test and p-Values ? F-Test for Equal Variance H0: 𝜎1 2 = 𝜎2 2 > < 0.0185 0.05 20
Two-Sample t-Tests: Examining the Unequal Variance t-Test and p-Values 𝜎1 2 ≠ 𝜎2 2 > < 0.0320 0.05 H0: μ1 - μ2 =0 Ha: μ1 - μ2 ≠ 0 21
Two-Sample t-Tests: Demo Scenario: compare two group’s means, girl’s and boy’s SAT scores Identify the data TestScores Classification variable? Continuous variable to analyze? 22
Two-Sample t-Tests: Demo Analyze > ANOVA > t Test 23
Two-Sample t-Tests: Demo Result interpreting: Normal? 24
Two-Sample t-Tests: Demo Result interpreting Normal? 25
Two-Sample t-Tests: Demo Result interpreting H0: μ1 - μ2 =0 H0: 𝜎1 2 = 𝜎2 2 26
Two-Sample t-Tests: Demo Result interpreting The confidence interval for the mean difference (-3.6950, 125.2) includes 0. this implies that you cannot say with 95% confidence that the difference between boys and girls is not zero. Therefore, it also implies that the p-value is greater than 0.05. 27
Two-Sample t-Tests: One-Sided Tests Ha: μ1 ≠ μ2 Ha: μ1 > μ2 New drug only have positive effect <0 Ha: μ1 −μ2 <0 One-Sided Test 28
Two-Sample t-Tests: One-Sided Tests H0: μ1 ≤ μ2 H0: μ1 ≥ μ2 One direction One direction <0 29
Two-Sample t-Tests: One-Sided Tests H0: μ1 > μ2 H0: equality Ha: μ1 < μ2 Power is the ability when you reject the null hypothesis when the hypothesis is false. Probability of Correct Rejection= 1−𝛽=Power Type II Error: Fail to Reject Null, when H0 is False Probability of Type II Error = 𝛽 Critical region 30
Two-Sample t-Tests: One-Sided Tests H0: μ1 > μ2 H0: equality Ha: μ1 < μ2 Power is the probability when you reject the null hypothesis when the hypothesis is false. Or the probability when you detect a difference when the difference actually exists. Probability of Correct Rejection= 1−𝛽=Power Type II Error: Fail to Reject Null, when H0 is False Probability of Type II Error = 𝛽 What is Power? Power is the probability when your test will reject the null hypothesis when the hypothesis is false. Or the probability when you detect a difference when the difference actually exists. 31
Two-Sample t-Tests: One-Sided Tests/Scenario one-sided upper-tailed t-test > SAT Scores SAT Scores Group 1 Group 2 H0: μ1 −μ2≤0 Ha: μ1 −μ2 >0 32
Two-Sample t-Tests: One-Sided Tests/Scenario one-sided upper-tailed t-test t statistic H0: μ1 −μ2≤0 Ha: μ1 −μ2 >0 33
Two-Sample t-Tests: One-Sided Tests/Scenario Analyze > ANOVA > t Test > Two sample > Preview Code> Insert code 34
Two-Sample t-Tests: One-Sided Tests/Scenario 0.0321< 0.05 0.0321< 0.05 0 is not in the interval 0 is not in the interval 35
Two-Sample t-Tests Question 1. What justifies the choice of a one-sided test versus a two-sided test The need for more statistical power Theoretical and subject-matter considerations The non-significance of a two-sided test The need for an unbiased test statistic Answer: b 36
Two-Sample t-Tests Question 2. A professor suspects her class is performing below the department average of 73%. She decides to test this claim. Which of the following is the correct alternative hypothesis? μ < 0.73 μ > 0.73 μ ≠ 0.73 Because the professor suspects that her class is performing below the department average, this is an example of a lower-tailed test. So the alternative hypotheses is u<0.73. Answer: a 37
Agenda 0. Lesson overview 1. Two-Sample t-Tests 2. One-Way ANOVA 3. ANOVA with Data from a Randomized Block Design 4. ANOVA Post Hoc Tests 5. Two-Way ANOVA with Interactions 6. Summary 38
One-Way ANOVA: Introduction ? ? μ1 ≠ μ2 ≠ μ3 39
One-Way ANOVA: Objective Analyze difference between population means using the Linear Models task Verify the assumption of analysis variance. 40
One-Way ANOVA: ANOVA overview Response Variable Predictor Variable + One-Way ANOVA only have one Predictor Variable. Levels 41
One-Way ANOVA: ANOVA overview Case 1 Programmer earns more than teacher? Response Variable Predictor Variable salary Job title Programmer earns more than teacher? Teacher Programmer Teacher 42
One-Way ANOVA: ANOVA overview Case 1 Two-sample t-Test Programmer earns more than teacher? Teacher Programmer Teacher Predictor Variable 43
One-Way ANOVA: ANOVA overview Case 1 = F statistic One-way ANOVA t statistic2 Two-sample t-Test Programmer earns more than teacher? Teacher = Programmer Teacher Predictor Variable 44
One-Way ANOVA: ANOVA overview Case 2 Response Variable T-cell counts Predictor Variable t cell [医]淋巴细胞(胸腺依赖性细胞) T-cell counts Medication 1 Medication 2 Placebo 45
One-Way ANOVA: ANOVA overview Case 2 Two-sample t-Test Medication 1 Placebo t cell [医]淋巴细胞(胸腺依赖性细胞) T-cell counts Medication 2 Medication 1 Medication 2 Placebo 46
One-Way ANOVA: ANOVA overview Case 2 One-way ANOVA Medication 1 Placebo Medication 2 47
One-Way ANOVA: The ANOVA Hypothesis Small difference ANOVA ANOVA help check if the diffs are large enough to reject the population mean are same. H0: μ1 = μ2 = μ3 = μ4 48
One-Way ANOVA: The ANOVA Hypothesis Medication 1 Predictor Variable Placebo Medication 2 49
One-Way ANOVA: The ANOVA Hypothesis Ha: μ1 = μ2 ≠ μ3 Medication 1 Predictor Variable Placebo Medication 2 different 50
One-Way ANOVA: The ANOVA model Error term i: indexes treatments (three types of medication) K: observation number μ: overall population mean i: indexes treatments (three types of medication) K: observation number effect τ1 =μ1 - μ One-way ANOVA μ: overall population mean τ2 =μ2 - μ τ3 =μ3 - μ 51
One-Way ANOVA: Sums of Squares H0: μ1 = μ2 = μ3 > Variability between groups Variability within groups By ratio By ratio > 52
One-Way ANOVA: Sums of Squares SST Total Sum of Squares SSM Model Sum of Squares SSE Error Sum of Squares 53
One-Way ANOVA: Sums of Squares Variability between groups Variability within groups Total variability SST = SSM + SSE SST Total Sum of Squares SSM Model Sum of Squares SSE Error Sum of Squares SST =∑ ∑ ( 𝑌 𝑖𝑗 − 𝑌 ) 2 Total Sum of Squares SSM =∑ 𝑛 𝑖 ( 𝑌 𝑖 − 𝑌 ) 2 Model Sum of Squares SSE =∑ ∑ ( 𝑌 𝑖𝑗 − 𝑌 𝑖 ) 2 Error Sum of Squares 54
One-Way ANOVA: F statistic F statistic and Critical Value at α=0.05 F(Model df, Error df)= MSM / MSE= SSM 𝑑𝑓𝑀 SSE 𝑑𝑓𝐸 MSM : Model Mean Square MSE : Error Mean Square In general, degree of freedom (DF) can be thought of as the number of independent pieces of information. Model DF is the number of treatment minus 1. Corrected total DF is the sample size minus 1. Error DF is the sample size minus the number of treatments (or the difference between the corrected total DF and the Model DF) SST Total Sum of Squares SSM Model Sum of Squares SSE Error Sum of Squares 55
One-Way ANOVA: Coefficient of Determination “proportion of variance accounted for by the model ” SST Total Sum of Squares SSM Model Sum of Squares SSE Error Sum of Squares 56
One-Way ANOVA: Assumptions for ANOVA Independent observations Error terms are normally distributed Error terms have equal variances Assessing ANOVA Assumptions Good data collection methods help ensure the independence assumption. Diagnostic plots can be used to verify the assumption that the error is approximately normally distributed The Linear Models task can produce a test of equal variance. H0 for this hypothesis test is that the variances are equal for all populations. > 57
One-Way ANOVA: Predicted and Residual values Estimates of the error term - = Observation Group mean Residuals > 58
One-Way ANOVA: Predicted and Residual values + The predicted value in ANOVA is the group mean. A residual is the difference between the observed value of the response and the predicted value of the response variable. The predicted value in ANOVA is the group mean. A residual is the difference between the observed value of the response and the predicted value of the response variable. 59
Three fertilizers + control One-Way ANOVA: Scenario: Comparing Group Means with One-Way ANOVA Garlic Three fertilizers + control Predictor Variable Response Variable Bulb Weight Fertilizer 60
Ha: at least one is different One-Way ANOVA: Scenario: Comparing Group Means with One-Way ANOVA ANOVA H0: μ1 = μ2 = μ3 = μ4 Ha: at least one is different 61
Description check of MGGARLIC across groups One-Way ANOVA: Scenario: Comparing Group Means with One-Way ANOVA Description check of MGGARLIC > Summary Statistic Description check of MGGARLIC across groups 62
One-Way ANOVA: Question 1. You have 20 observations in you ANOVA and you calculate the residuals. What will they sum to? -20 20 400 Need more information Answer: d 63
One-Way ANOVA: Question 2. Which of the following phrases describes the model sums of squares, or SSM? The variability between the groups The variability within the groups The variability explained by the error terms Answer: a 64
One-Way ANOVA: Question 3. Match the null hypothesis to the correct SAS output a) b H0: 𝜎1 2 = 𝜎2 2 b) a H0: μ 1 = μ 2 65
One-Way ANOVA: Scenario: Comparing Group Means with One-Way ANOVA Task> ANOVA>Linear Models, with MGGARLIC data 66
Three assumptions: Independence Normality Equal variance One-Way ANOVA: Scenario: Comparing Group Means with One-Way ANOVA Task> ANOVA>Linear Models Three assumptions: Independence Normality Equal variance One-Way ANOVA can say at "least one mean is different" , but cannot say "which specific group is different". post hoc test can do it. 67
One-Way ANOVA: Summary Null Hypothesis: All means are equal Alternative Hypothesis: at least one mean is different Produce descriptive statistics. Verify assumptions. Independence Normality Equal variance Examine the p-value in the ANOVA table. If the p-value is less than alpha, reject the null hypothesis. 68
Agenda 0. Lesson overview 1. Two-Sample t-Tests 2. One-Way ANOVA 3. ANOVA with Data from a Randomized Block Design 4. ANOVA Post Hoc Tests 5. Two-Way ANOVA with Interactions 6. Summary 69
ANOVA with Data from a Randomized Block Design: Introduction Brands age Placebo Medication 1 Medication 2 70
ANOVA with Data from a Randomized Block Design: Introduction Over 50 30-50 Under30 We are not interested in age, but want make the impact of age to minimum. 71
ANOVA with Data from a Randomized Block Design: Objective Recognize the difference between a completely randomized design and a randomized block design. Differentiate between observed data and designed experiments. Use the Linear Models task to analyze data from a randomized block design 72
ANOVA with Data from a Randomized Block Design: Observational Studies Groups can be naturally occurring. Gender and ethnicity Random assignment might be unethical or untenable Smoking or credit risk groups Groups can be naturally occurring. Gender and ethnicity Random assignment might be unethical or untenable: Smoking or credit risk groups In Observational or Retrospective studies, the data values are observed as they occur, not affected by an experimental design. In Observational or Retrospective studies, the data values are observed as they occur, not affected by an experimental design. 73
ANOVA with Data from a Randomized Block Design: Controlled Experiments Random assignment might be desirable to eliminate selection bias. You often wan to look at the outcome measure prospectively. You can manipulate the factors of interest and can more reasonably claim causation. You can design your experiment to control for other factors contributing to the outcome measure. 74
ANOVA with Data from a Randomized Block Design Question 3. Can you determine a cause-and–effect relationship in an observational study? Yes No In an observational study, you often examine what already occurred, and therefore have little control over factors contributing to the outcome. In a controlled experiment, you can manipulate the factors of interest and can more reasonably claim causation. Answer: b observational study controlled study In an observational study, you often examine what already occurred, and therefore have little control over factors contributing to the outcome. In a controlled experiment, you can manipulate the factors of interest and can more reasonably claim causation. 75
ANOVA with Data from a Randomized Block Design: Nuisance Factors Nuisance Factors are factors can affect the outcome but are not of interest in the experiment. = T-Cell Count Medication + Age Age + Medication Randomized block design 76
ANOVA with Data from a Randomized Block Design Question 4. Which part of the ANOVA tables contains the variation due to nuisance factors? Sum of Squares Model Sum of Squares Error Degrees of Freedom Answer: b 77
ANOVA with Data from a Randomized Block Design: Including a Blocking Variable in the Model Age 78
ANOVA with Data from a Randomized Block Design: Including a Blocking Variable in the Model + Age 79
ANOVA with Data from a Randomized Block Design: More ANOVA Assumptions The treatment are randomly assigned to each block Independent observations Under30 30-50 Over 50 Normally distributed data The treatment are randomly assigned to each block The treatments don’t change across the blocks. Interaction: the treatment changes across the blocks. Equal variances No interaction 80
ANOVA with Data from a Randomized Block Design Scenario: Creating a Randomized Block Design Garlic H0: μ1 = μ2 = μ3 = μ4 Ha: at least one is different Sun What’s the nuisance factors in this case PH level of soil Rain 81
ANOVA with Data from a Randomized Block Design Scenario: Creating a Randomized Block Design Sun = Bulb Weight fertilizer + PH of the soil + Rain 82
ANOVA with Data from a Randomized Block Design Scenario: Creating a Randomized Block Design Randomized block design 83
ANOVA with Data from a Randomized Block Design Scenario: Creating a Randomized Block Design 84
ANOVA with Data from a Randomized Block Design Question 5. In a block design, Which part of the ANOVA tables contains the variation due to nuisance factors? Sum of Squares Model Sum of Squares Error Degrees of Freedom Answer: a 85
ANOVA with Data from a Randomized Block Design: Performing ANOVA with Blocking Task> ANOVA>Linear Models, with MGGARLIC_BLOCK data 86
ANOVA with Data from a Randomized Block Design: Performing ANOVA with Blocking Task> ANOVA>Linear Models, with MGGARLIC_BLOCK data The F value of Sector, the blocking variable, is 6.53 more than 1. Thumb rule: if the F value of the blocking variable is more than 1, it should be considered. 87
ANOVA with Data from a Randomized Block Design My groups are different. What next? The p-value for Fertilizer indicates you should reject the H0 that all groups are the same. From which pairs of fertilizers, are garlic bulb weights different from one another? Should you go back and do several t-tests? 88
Agenda 0. Lesson overview 1. Two-Sample t-Tests 2. One-Way ANOVA 3. ANOVA with Data from a Randomized Block Design 4. ANOVA Post Hoc Tests 5. Two-Way ANOVA with Interactions 6. Summary 89
ANOVA Post Hoc Tests: Introduction μ1 One-way ANOVA μ2 μ? Randomized block design μ3 90
ANOVA Post Hoc Tests: Introduction μ1 Pairwise test H0: μ1 = μ2 P-value μ? Type I error μ2 Pairwise test H0: μ1 = μ3 P-value Pairwise test H0: μ2 = μ3 P-value μ3 ANOVA Post Hoc Tests Type I error Multiple Comparison Method 91
ANOVA Post Hoc Tests: Multiple Comparison Methods Question 7. With a fair coin, your probability of getting heads on one flip is 0.5. if you flip a coin and got heads, what is the probability of getting heads on the second try? 0.5 0.25 0.00 1.00 0.75 Answer: a 92
ANOVA Post Hoc Tests: Multiple Comparison Methods Question 8. With a fair coin, your probability of getting heads on one flip is 0.5. If you flip a coin twice, what is the probability of getting at least one head out of two? 0.5 0.25 0.00 1.00 0.75 Answer: e 93
ANOVA Post Hoc Tests: Multiple Comparison Methods Pairwise t-test Type I Error H0: μ1 = μ2 H0: μ1 = μ3 α=0.05 H0: μ2 = μ3 94
ANOVA Post Hoc Tests: Multiple Comparison Methods Comparisonwise Error Rate Number of Comparisons Experimentwise Error Rate 0.05 1 3 0.14 6 0.26 10 0.40 Comparisonwise Error Rate(CER): the probability of Type I Error on the single one Pairwise t-test. The Experimentwise Error Rate (EER) uses an alpha that take into consideration all the pairwise comparisons you are making. nc: Number of comparisons Reject 1 of 3 null hypothesis just by chance, even the null is true. Type I Error 𝐸𝐸𝑅=1− (1−𝛼) 𝑛𝑐 Pairwise t-test nc: Number of comparisons 95
ANOVA Post Hoc Tests: Tukey's Multiple Comparison Method Tukey Method EER H0: μ1 = μ2 H0: μ1 = μ3 Pairwise comparisons H0: μ2 = μ3 96
ANOVA Post Hoc Tests: Tukey's Multiple Comparison Method Tukey Method EER=0.05 EER<0.05 H0: μ1 = μ2 H0: μ1 = μ3 Pairwise comparisons 97
ANOVA Post Hoc Tests: Tukey's Multiple Comparison Method This method is appropriate when you consider pairwise comparisons only. The Experimentwise Error Rate is: Equal to alpha when all pairwise comparisons are considered Less than alpha when fewer than all pairwise comparisons are considered 98
ANOVA Post Hoc Tests: scenario: determine which mean is different Garlic Ha: at least one is different Fertilizers: three organics, one control ? Fertilizers: three organics, one control 99
ANOVA Post Hoc Tests: Diffograms and the Tukey Method Difference between the means least square mean by Equality of the means The downward-sloping diagnose line shows the confidence intervals for the difference. The upward-sloping line is a reference line showing where the group means would be equal. intersection of the downward-sloping diagnose line for a pair with the upward-sloping, broken gray diagonal line implies that the confidence interval includes zero and that the mean difference between the two groups is not statistically significant. 100
ANOVA Post Hoc Tests: Diffograms and the Tukey Method Is there the diff between the treatments 1 and 2? Can you identify the pairwise comparisons that do not have significant diff means? The downward-sloping diagnose line shows the confidence intervals for the difference. The upward-sloping line is a reference line showing where the group means would be equal. intersection of the downward-sloping diagnose line for a pair with the upward-sloping, broken gray diagonal line implies that the confidence interval includes zero and that the mean difference between the two groups is not statistically significant. 101
ANOVA Post Hoc Tests: Dunnett's Multiple Comparison Method Special Case of Comparing to a Control Comparing to a control is appropriate when there is a natural reference group, such as a placebo group in a drug trial. Experimentwise Error Rate is no greater than the stated alpha Comparing to a control takes into account the correlations among tests One-sided hypothesis test against a control group can be performed Control comparison computes and tests k-1 GroupWise differences, where k is the number of levels of the classification variable. An example is the Dunnett method Dunnett's Multiple Comparison Method is recommended when there is a true control group. When appropriate it is more powerful than methods that control for all possible comparisons. 102
ANOVA Post Hoc Tests: Control Plots and the Dunnett Method Upper decision limit L-S-mean control plot are produced only when you specify that you want to compare all other group means against a control group mean. The value of the control group mean is shown as a horizontal line. the shaded area is bounded by the UDL and LDL (Upper decision limit and Lower decision limit). if the vertical line extends past the shaded area, the means that the group represented by that line is significantly from the control group. Lower decision limit 103
ANOVA Post Hoc Tests: Performing a Post Hoc Tests 104
ANOVA Post Hoc Tests: Performing a Post Hoc Tests: Turkey 105
ANOVA Post Hoc Tests: Performing a Post Hoc Tests: Dunnett 106
ANOVA Post Hoc Tests: Performing a Post Hoc Tests: t-test 107
Agenda 0. Lesson overview 1. Two-Sample t-Tests 2. One-Way ANOVA 3. ANOVA with Data from a Randomized Block Design 4. ANOVA Post Hoc Tests 5. Two-Way ANOVA with Interactions 6. Summary 108
Two-Way ANOVA with Interactions: Introduction One-way ANOVA ? ? μ1 ≠ μ2 ≠ μ3 109
Two-Way ANOVA with Interactions: Introduction One-way ANOVA Predictor Variable Response Variable + Levels 110
Two-Way ANOVA with Interactions: Introduction Response Variable Predictor Variable Predictor Variable + Levels 111
Two-Way ANOVA with Interactions: Introduction High alloy Low alloy Heat 1 Heat 2 Heat 3 Heat 4 112
Two-Way ANOVA with Interactions: Objective Fit a two-way ANOVA Detect interactions between factors Analyze the treatments when there is a significant interaction 113
Two-Way ANOVA with Interactions: n-Way ANOVA Response Variable Predictor Variable One-way ANOVA Response Variable Predictor Variable Predictor Variable N-way ANOVA More than one Predictor Variable More than one Predictor Variable 114
Two-Way ANOVA with Interactions: n-Way ANOVA ? Randomized block design ≈ N-way ANOVA blocking factor interested factor 115
Two-Way ANOVA with Interactions: interactions No interaction 116
Two-Way ANOVA with Interactions: interactions ? Alloys Interactions When you analyze an n-way ANOVA with interactions, you should first look at any tests for interactions among factors. If there is no interactions between the factors, the tests for the individual factor effects can be interpreted as true effects of that factor. If an interactions exists between any factors, the tests for the individual factor effects might be misleading, due to masking of the effects by the interaction. This is especially true for unbalanced data. ? Heat setting 117
Two-Way ANOVA with Interactions: The Two-Way ANOVA Model When non-significant Error term μ: overall population mean, Regardless of alloy and heating effect effect Effect of interaction αi =μi - μ βj =μj - μ 118
Two-Way ANOVA with Interactions: Scenario: Using a Two-Way ANOVA Response Variable Predictor Variable Predictor Variable Levels Levels 119
Two-Way ANOVA with Interactions: Scenario: Using a Two-Way ANOVA Response Variable Predictor Variable Predictor Variable Blood pressure Disease types Drug doses A, B, C 100ml, 200ml, 300ml, placebo 120
Two-Way ANOVA with Interactions: Identify the data Drug 121
Two-Way ANOVA with Interactions: Applying the model Two-way ANOVA assumptions: Independent observations Normally distributed data Equal variances 122
Two-Way ANOVA with Interactions: The Two-Way ANOVA Model Observed BllodP for each patient effect effect Effect of interaction Error term αi =μi - μ βj =μj - μ Overall mean of BllodP 123
Two-Way ANOVA with Interactions: The Two-Way ANOVA Model H0: None are statistically Different H0: ? 124
Two-Way ANOVA with Interactions: Examining Your Data /* Create format, Method I, via EG UI */ data drugdose; input dose $ 8. level; cards; Placebo 1 50 mg 2 100 mg 3 200 mg 4 ; run; /*Method II, by code*/ proc format library=work; value dosefmt 1='Placebo' 2='50 mg' 3='100 mg' 4='200 mg'; run; 125
Two-Way ANOVA with Interactions: Examining Your Data In which disease type does the drug dose appear to be most effective? 126
Two-Way ANOVA with Interactions: Performing Two-Way ANOVA with Interactions Task> ANOVA> Linear Models 127
Two-Way ANOVA with Interactions: Performing Two-Way ANOVA with Interactions The type I SS are model-order dependent. Each effect is adjusted only for the preceding effects in the model. They are known as sequential sums of squares. They are useful in cases where the marginal effect for adding terms in a specific order is important. An example is a test of polynomials, where X, X*X, and X*X*X are in the model. Each term is test only controlling for a lower-order term. The type I SS is additive. They sum to the Model Sum of Squares for the overall model. The type III sum of squares are commonly called partial sum of squares. The type III sum of squares for a particular variable is the increase in the model sum of squares due to adding the variable to a model that already contains all the other variables in the model. The type III sum of squares, therefore, do not depend on the order in which the explanatory variables are specified in the model. The type III sum of squares values are not generally additive (except in a completely balanced design). The values do not necessarily sum to the Model SS. You generally interpret and report results based on the type III SS. 128
Two-Way ANOVA with Interactions: Performing a Post Hoc Pairwise Comparison 129
Two-Way ANOVA with Interactions: Performing a Post Hoc Pairwise Comparison Given all of this information, it seems you would want to aggressively treat blood pressure in people with disease A with high dose of the drug. For those with disease B, treating with the drug at all would be a mistake. For those with disease C, there seems to be no effect on blood pressure. 130
Agenda 0. Lesson overview 1. Two-Sample t-Tests 2. One-Way ANOVA 3. ANOVA with Data from a Randomized Block Design 4. ANOVA Post Hoc Tests 5. Two-Way ANOVA with Interactions 6. Summary 131
Question 9. If you want to compare the average monthly spending for males versus females which statistical method should you choose? One-Sample t-Tests One-Way ANOVA Two-Way ANOVA Answer: b 132
Home Work: Exercise 1 1.1 Using the t Test for Comparing Groups Elli Sageman, a Master of Education candidate in German Education at the University of North Carolina at Chapel Hill in 2000, collected data for a study: she looked at the effectiveness of a new type of foreign language teaching technique on grammar skills. She selected 30 students to receive tutoring; 15 received the new type of training during the tutorials and 15 received standard tutoring. Two students moved away from the district before completing the study. Scores on a standardized German grammar test were recorded immediately before the 12–week tutorials and then again 12 weeks later at the end of the trial. Sageman wanted to see the effect of the new technique on grammar skills. The data are in the GERMAN data set. Change change in grammar test scores Group the assigned treatment, coded Treatment and Control Assess whether the Treatment group changed the same amount as the Control group. Use a two-sided t-test. Analyze the data using the t Test task. Assess whether the Treatment group improved more than the Control group. Do the two groups appear to be approximately normally distributed? Do the two groups have approximately equal variance? Does the new teaching technique seem to result in significantly different change scores compared with the standard technique? 133
Home Work: Exercise 2 2.1 Analyzing Data in a Completely Randomized Design Consider an experiment to study four types of advertising: local newspaper ads, local radio ads, in-store salespeople, and in-store displays. The country is divided into 144 locations, and 36 locations are randomly assigned to each type of advertising. The level of sales is measured for each region in thousands of dollars. You want to see whether the average sales are significantly different for various types of advertising. The Ads data set contains data for these variables: Ad type of advertising Sales level of sales in thousands of dollars Examine the data. Use the Summary Statistics task. What information can you obtain from looking at the data? Test the hypothesis that the means are equal. Be sure to check that the assumptions of the analysis method that you choose are met. What conclusions can you reach at this point in your analysis? 134
Home Work: Exercise 3 3.1 Analyzing Data in a Randomized Block Design When you design the advertising experiment in the first question, you are concerned that there is variability caused by the area of the country. You are not particularly interested in what differences are caused by Area, but you are interested in isolating the variability due to this factor. The ads1 data set contains data for the following variables: Ad type of advertising Area area of the country Sales level of sales in thousands of dollars Test the hypothesis that the means are equal. Include all of the variables in your model. What can you conclude from your analysis? Was adding the blocking variable Area into the design and analysis detrimental to the test of Ad? 135
Home Work: Exercise 4 4.1 post Hoc Pairwise Comparisons Consider again the analysis of Ads1 data set. There was a statistically significant difference among means for sales for the different types of advertising. Perform a post hoc test to look at the individual differences among means for the advertising campaigns. Conduct pairwise comparisons with an experiments error rate of a=0.05. (use the Tukey adjustment) which types of advertising are significantly different? Use display (case sensitive ) as the control group and do a Dunnett comparison of all other advertising methods to see whether those methods resulted in significantly different amounts of sales compared with display advertising in stores? 136
Home Work: Exercise 5 5.1 Performing Two-Way ANOVA Consider an experiment to test three different brands of concrete and see whether an additive makes the cement in the concrete stronger. Thirty test plots are poured and the following features are recorded in the Concrete data set: Strength the measured strength of a concrete test plot Additive whether an additive was used in the test plot Brand the brand of concrete being tested Use the Summary Statistics task to examine the data, with Strength as the analysis variable and Additive and Brand as the classification variables. What information can you obtain from Looking at the data? Test the hypothesis that the means are equal, making sure to include an interaction term if the results from the Summary Statistics output indicate that would be advisable. What conclusions can you reach at this point in your analysis? Do the appropriate multiple comparisons test for statistically significant effects? 137
Thank you!