4 In words:TSS(total SS) = total sample variability among yij valuesSSB(SS “between”) = variability explained by differences in group meansSSW(SS “within”) = unexplained variability (within groups)
5 Analysis of Variance Table Note: unequal sample sizes allowed
6 Extracted from From Ex. 8.2, page 390-391 3 Methods for Reducing Hostility12 students displaying similar hostility were randomly assigned to 3 treatment methods. Scores (HLT) at end of study recorded.MethodMethodMethodTest:
7 ANOVA Table Output – extracted hostility data - calculations done in class Source SS df MS F p-valueBetween <.001 samplesWithinTotals
8 Fisher’s Least Significant Difference (LSD) Protected LSD: Preceded by an F-test for overall significance.Only use the LSD if F is significant.XUnprotected: Not preceded by an F-test (like individual t-tests).
9 Hostility Data - Completely Randomized Design The GLM Procedure t Tests (LSD) for scoreNOTE: This test controls the Type I comparisonwise error rate, not the experimentwise error rate.AlphaError Degrees of FreedomError Mean SquareCritical Value of tLeast Significant DifferenceMeans with the same letter are not significantly different.t Grouping Mean N methodABBB
10 Notice unequal sample sizes Ex. 8.2, page3 Methods for Reducing Hostility24 students displaying similar hostility were randomly assigned to 3 treatment methods. Scores (HLT) at end of study recorded.MethodMethodMethodNotice unequal sample sizesTest:
11 ANOVA Table Output – full hostility data Source SS df MS F p-valueBetween <.0001 samplesWithinTotals
12 The GLM Proceduret Tests (LSD) for scoreNOTE: This test controls the Type I comparisonwise error rate, not the experimentwise error rate.AlphaError Degrees of FreedomError Mean SquareCritical Value of tComparisons significant at the 0.05 level are indicated by ***.Differencemethod Between % ConfidenceComparison Means Limits******************Notice the different format since there is not one LSD value with which to make all pairwise comparisons.
13 Duncan's Multiple Range Test for score NOTE: This test controls the Type I comparisonwise error rate, not the experimentwise error rate.AlphaError Degrees of FreedomError Mean SquareHarmonic Mean of Cell SizesNOTE: Cell sizes are not equal.Number of MeansCritical RangeMeans with the same letter are not significantly different.Duncan Grouping Mean N methodABCNote: Duncan’s test (another multiple comparison test) avoids the issue of different sample sizes by using the harmonic mean of the ni’s.
14 Some Multiple Comparison Techniques in SAS FISHER’S LSD (LSD)BONFERONNI (BON)DUNCANSTUDENT-NEWMAN-KEULS (SNK)DUNNETTRYAN-EINOT-GABRIEL-WELCH (REGWQ)SCHEFFETUKEY
15 Balloon Data Col. 1-2 - observation number Col observation numberCol color (1=pink, 2=yellow, 3=orange, 4=blue)Col inflation time in seconds1122.42324.63120.34419.85324.36222.27228.58225.79320.2
16 Balloon Data Col. 1-2 - observation number Col observation numberCol color (1=pink, 2=yellow, 3=orange, 4=blue)Col inflation time in seconds1122.42324.63120.34419.85324.36222.27228.58225.79320.2
17 ANOVA --- Balloon DataGeneral Linear Models ProcedureDependent Variable: TIMESum of MeanSource DF Squares Square F Value Pr > FModelErrorCorrected TotalR-Square C.V Root MSE TIME MeanMeanSource DF Type I SS Square F Value Pr > FColor
18 ANOVA --- Balloon Data The GLM Procedure t Tests (LSD) for time NOTE: This test controls the Type I comparisonwise error rate, not the experimentwise error rate.AlphaError Degrees of FreedomError Mean SquareCritical Value of tLeast Significant DifferenceMeans with the same letter are not significantly different.t Grouping Mean N colorAAABBB
19 Experimental Design: Concepts and Terminology Designed Experiment- an investigation in which a specified framework is used to compare groups or treatmentsFactors- any feature of the experiment that can be varied from trial to trial- up to this point we’ve only looked at experiments with a single factor
20 - these are called replicates Treatments- conditions constructed from the factors (levels of the factor considered, etc.)Experimental Units- subjects, material, etc. to which treatment factors are randomly assigned- there is inherent variability among these units irrespective of the treatment imposedReplication- we usually assign each treatment to several experimental units- these are called replicates
21 Examples: 1. factor 2. treatments 3. experimental units 4. replicates Car DataHostility DataBalloon Data
22 Question: Balloon Data 1122.42324.63120.34419.85324.36222.27228.58225.79320.2Balloon DataCol observation number (run order)Col color (1=pink, 2=yellow, 3=orange, 4=blue)Col inflation time in secondsQuestion:Why randomize run order? i.e. why not blow up all the pink balloons first, blue balloons next, etc?
23 Scatterplot Using GPLOT What do we learn from this plot? TimeRun OrderWhat do we learn from this plot?
24 RECALL: 1-Factor ANOVA Model - random errors follow a Normal (N) distribution, are independently distributed (ID), and have zero mean and constant variance-- i.e. variability does not change from group to group
25 Checking Validity of Assumptions Model Assumptions:- equal variances- normalityChecking Validity of AssumptionsEqual Variances1. F-test similar to 2-sample case- Hartley’s test (p.366 text)- not recommended2. Graphical- side-by-side box plots
26 Graphical Assessment of Equal Variance Assumption
27 Note: Optional approaches if equal variance assumption is violated: 1. Use Kruskal Wallis nonparametric procedure – Section 8.62. Transform the data to induce more nearly equal variances – Section 8.5-- log-- square rootNote: These transformations may also help induce normality
28 Assessing Normality of Errors The e ij’s are called residuals. yij = m + ai + eijsoeij = yij - (m + ai)= yij - mieij is estimated byThe e ij’s are called residuals.
29 SAS Code for Balloon Data proc glm;class color;model time=color;title 'ANOVA --- Balloon Data';output out=new r=resball;means color/lsd;run;proc sort;by color;proc boxplot;plot time*color;title 'Side-by-Side Box Plots for Balloon Data';proc univariate;var resball;histogram resball/normal;title 'Histogram of Residuals -- Balloon Data';proc univariate normal plot;title 'Normal Probability Plot for Residuals - Balloon Data';proc gplot;plot time*id;title 'Scatterplot of Time vs ID (Run Order)';
30 Normal Probability Plot *+| * *+++| *+++| *+| ***| *******+| **| ***| *****| *+| *+*+** ++++
31 Caution: Chapter 15 introduces some new notation - i.e. changes notation already defined
32 Recall: Sum-of-Squares Identity 1-Factor ANOVA In words:Total SS = SS between samples + within sample SS
33 - new notation for Chapter 15 Recall: Sum-of-Squares Identity Factor ANOVA- new notation for Chapter 15
34 - new notation for Chapter 15 Recall: Sum-of-Squares Identity Factor ANOVA- new notation for Chapter 15
35 - new notation for Chapter 15 Recall: Sum-of-Squares Identity Factor ANOVA- new notation for Chapter 15In words:Total SS = SS for “treatments” + SS for “error”
36 Revised ANOVA Table for 1-Factor ANOVA (Ch. 15 terminology - p.857) Source SS df MS FTreatments SST t - 1Error SSE N - t Total TSS N - 1
37 yij = mi + eij yij = m + ai + eij Recall 1-factor ANOVA (CRD) Model for Gasoline Octane Datayij = mi + eijoryij = m + ai + eijobserved octanemean for ith gasolineunexplained part-- car-to-car differences-- temperature-- etc.
38 Similar to diet t-test example: Gasoline Octane DataQuestion:What if car differences are obscuring gasoline differences?Similar to diet t-test example:Recall: person-to-person differences obscured effect of diet
39 Possible Alternative Design for Octane Study: Test all 5 gasolines on the same car- in essence we test the gasoline effect directly and remove effect of car-to-car variationQuestion:How would you randomize an experiment with 4 cars?
40 Blocking an Experiment - dividing the observations into groups (called blocks) where the observations in each block are collected under relatively similar conditions- comparisons can many times be made more precisely this way
41 Terminology is based on Agricultural Experiments Consider the problem of testing fertilizers on a crop- t fertilizers- n observations on each
43 Randomized Complete Block Strategy A | C | BB | A | CC | B | AA | B | Ct = 3 fertilizersC | A | B- select 5 “blocks”- randomly assign the 3 treatments to each blockNote: The 3 “plots” within each block are similar- similar soil type, sun, water, etc
44 Randomized Complete Block Design Randomly assign each treatment once to every blockCar ExampleCar 1: randomly assign each gas to this carCar 2:etc.Agricultural ExampleRandomly assign each fertilizer to one of the 3 plots within each block
45 Model For Randomized Complete Block (RCB) Design yij = m + ai + bj + eijeffect of ith treatmenteffect of jth blockunexplained error(gasoline)(car)-- temperature-- etc.
46 Previous Data Table from Chapter 8 for 1-factor ANOVA column averages don’t make any sense
47 Back to Octane data: “Restructured” Data Car Old Data Format Gas Gas Suppose that instead of 20 cars, there were only 4 cars, and we tested each gasoline on each car.“Restructured” DataCarOld Data FormatABCDEABCDEGasGas
48 - using new notation for Chapter 15 Recall: Sum-of-Squares Identity Factor ANOVA- using new notation for Chapter 15In words:Total SS = SS for “treatments” + SS for “error”
49 A New Sum-of-Squares Identity In words:Total SS = SS for treatments + SS for blocks + SS for error
50 Hypotheses: To test for treatment effects - i.e. gas differences we testTo test for block effects- i.e. car differences(not usually the research hypothesis)we test
51 Randomized Complete Block Design ANOVA Table Source SS df MS FTreatments SST t - 1Blocks SSBError SSE Total TSS bt - 1See page 866
54 “Restructured” CAR Data - SAS Format A BA BA BA BB BB BB BB BC BC BC BC BD BD BD BD BE BE BE BE BThe first variable (A - E) indicates gas as it did with the CompletelyRandomized Design. The second variable (B1 - B4) indicates car.
55 SAS file - Randomized Complete Block Design for CAR Data INPUT gas$ block$ octane;PROC GLM;CLASS gas block;MODEL octane=gas block;TITLE 'Gasoline Example -Randomized Complete Block Design';MEANS gas/LSD;RUN;
56 1-Factor ANOVA Table Output - octane data Source SS df MS F p-valueGas (treatments)Error Totals
57 1-Factor ANOVA Table Output - car data Source SS df MS F p-valueGas (treatments)Cars (blocks)Error Totals
58 SAS Output -- RCB CAR Data Dependent Variable: OCTANESum of MeanSource DF Squares Square F Value Pr > FModelErrorCorrected TotalR-Square C.V Root MSE OCTANE MeanSource DF Anova SS Mean Square F Value Pr > FGASBLOCK