Presentation on theme: "Topic 12 – Further Topics in ANOVA"— Presentation transcript:
1Topic 12 – Further Topics in ANOVA Unequal Cell Sizes(Chapter 20)
2Overview We’ll start with the Learning Activity. More practice in interpreting ANOVA results; and a baby-step into 3-way ANOVA.An illustration of the problems that an unbalanced design will cause.We’ll then continue with a discussion of unbalanced designs (Chapter 20)
3Collaborative Learning Activity Take your time going through this. Ask questions as needed!
4Analyze the design elements. Question 1Analyze the design elements.
5Design Chart Unequal Cell Sizes – but there is SOME balance achieved Single Factor Analyses will be balanced.Gender*Age = 6 observations per cellTime*Age = 6 observations per cellGender*Time = Unbalanced
8Interpretations No interaction is evident between age and time Seems middle age group gets generally higher offers.Seems offers during the week are generally higher than on the weekend (this effect is not as big as the age effect)
16InterpretationsSmall interaction is seen; might be described as follows:There is still a clear main effect: Middle aged get higher offers in generalThere seem to be no gender differences for middle aged or young.For elderly, women may be getting lower offers than men.
18ANOVA / LSMeans Only age differences show up in the ANOVA. “Sliced” LSMeans comparisons do pick up gender difference within elderlyNote: Type I error rate is uncontrolled. But on the other hand sample sizes are also fairly small.Conclusions?
21InterpretationsSeems to be a clear interaction: For men, there is not much difference in the offer between weekday/weekend.Women should go on the weekdays, where it seems they average about $400 more.Interestingly, significance is not seen in the ANOVA table, but is seen in the ‘sliced’ LSMeans output.Remember Type I Error is uncontrolled.
22ANOVA TableWhy are Type I / Type III SS different here?
24ConclusionsThis is an intriguing example, because the ANOVA output would lead you to believe there is a small time effect, but no gender effect.Looking at the interaction plot presents a completely different picture (and likely a more accurate one). Let’s reconsider that, showing the sample sizes.
26ConfoundingThis picture illustrates how the effects of gender and time will be confounded.Suppose that women do get lower offers than men in general. Then because the women received more weekend offers (and men more offers on weekdays), the average offer on the weekend will by default be lower than the weekday.Simple example: Suppose men get $2 and women get $1. Then with the sample sizes, the weekday average will be 30/18 while the weekend average will be only 24/18.
28ModelingRemoving unimportant terms (starting at the interaction level) seems like a reasonable way to go.Use Type III SS to do this since cell sizes are not the same.The procedure leads to a model containing only Age and Time; suggesting that gender is unimportant. But we know this may not be accurate since gender/time are confounded.
29ConfoundingWhat exactly does it mean to say that the time/gender effects are confounded.The biggest thing that it means is that the analysis we just did is inappropriate since...The time effect may have been seen because more women went on the weekend. It may well be a gender effect that is disguised as a time effect due to the unbalanced design.Due to the lack of balance – we were forced to use Type III SS which (due to collinearity / confounding may not tell the whole story).
30Importance of Gender? Probably! Direct algorithmic analysis suggests both time and age are important, while gender is not. But due to confounding, that wasn’t really appropriate.The plot for time*gender indicates what is probably the real story (due to small sample sizes it is hard to get significance).With a balanced design – we would be much better off. The effects would not be confounded, and we could therefore see an accurate picture.
31Importance of Gender? (2) Differing sample sizes means thatEstimates for women on weekdays, and men on weekends, will have larger standard errors.This will reduce our power to detect differences, and the effects will “overlap” to some extent because of the unequal sample sizes.When we looked at the gender*time interaction, the plot suggested there was an important one. Further studies should be conducted to determine if this is the case.
33Differing Cell Sizes Encountered for a variety of reasons including: Convenience – usually if we have an observational study, we have very little control over the cell sizes.Cost Effectiveness – sometimes the cost of samples is different, and we may use larger sample sizes when the cost is less.Accidently – In experimental studies, you may start with a balanced design, but lose that balance if some problem occurs.
34Differing Cell Sizes (2) What changes?Loss of balance brings “intercorrelation” among the predictors.Type I and III SS will be different; typically Type III SS should be used for testing but as we have seen even that is not perfect!Standard errors for cell means and for multiple comparisons will be different (they depend on the cell size). For the same reason, confidence intervals will have different widths.
35ExampleExamine the effects of gender (A) and anxiety level (B) on a toxin level in the bloodstream.Three categories of anxiety (Severe, Moderate, and Mild).We categorize people on this basis after they are in the study (it is an observational factor).For cost effectiveness, we wouldn’t want to throw away data just to keep a balanced design.
38Interpretation Effect seems to be greater if anxiety is more severe. This is an interaction of the “enhancement type”. The effect of anxiety level on toxin levels is greater for women than it is for men.Remember, we aren’t saying anything about significance here – we’ll do that when we look at the ANOVA.
41Differences in Type I / III SS The more unbalanced the design, the further apart these may be.There are actually four types of SS:I – SequentialII – Added Last (Observation)III – Added Last (Cell)IV – Added Last (Empty Cells)
42Type I SSSequential Sums of Squares; Most appropriate for equal cell sizes.SS(A), SS(B|A), SS(A*B|A,B)Each observation is weighted equally. So the net result for an unbalanced design is that some treatments will be considered with greater weight than others.
43Type II SSVariable Added Last SS; Generally only used for regression because again each observation is weighted equally.SS(A|B,A*B), SS(B|A,A*B), SS(A*B|A,B)
44Type III SSVariable Added Last SS, appropriate for unequal cell sizes. Type III SS adjusts for the fact that cell sizes are different.Each cell is weighted equally, with the result that treatments are weighted equally. This means that observations in “smaller” cells will carry more weight.SS(A|B,A*B), SS(B|A,A*B), SS(A*B|A,B)
45Type IV SSVariable Added Last SS and similar to Type III SS but further allows for the possibility of empty cells.It is only necessary to use these if there are empty cells (which hopefully there won’t be if you’ve designed the experiment well).SS(A|B,A*B), SS(B|A,A*B), SS(A*B|A,B)
46General StrategyRemember that Type I SS and Type III SS examine different null hypotheses.Type III SS are preferred when sample sizes are not equal, but can be somewhat misleading if sample sizes differ greatly.Type IV SS are appropriate if there are empty cells.Can obtain Type IV SS if necessary by using /ss4 in MODEL statement
47Example (continued)The interaction is unimportant, nor is there an apparent large effect of gender.Now look at comparing different levels of anxiety; should not ‘change’ models at this point, so just average over gender (LSMeans).
48LSMeansMust use LSMeans to adjust all means to the same “average level” of gender.
49ComparisonsMild group has significantly lower toxin levels than the moderate and severe groups
50Confidence IntervalsCould get CI’s for means and/or differences if you wanted them.They will be of different widths – why?It will be harder to detect differences for groups with fewer observations.