Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intermediate Applied Statistics STAT 460 Lecture 17, 11/10/2004 Instructor: Aleksandra (Seša) Slavković TA: Wang Yu

Similar presentations


Presentation on theme: "Intermediate Applied Statistics STAT 460 Lecture 17, 11/10/2004 Instructor: Aleksandra (Seša) Slavković TA: Wang Yu"— Presentation transcript:

1 Intermediate Applied Statistics STAT 460 Lecture 17, 11/10/2004 Instructor: Aleksandra (Seša) Slavković sesa@stat.psu.edu TA: Wang Yu wangyu@stat.psu.edu

2 Revised schedule Nov 8 lab on 2-way ANOVANov 10 lecture on two-way ANOVA and blocking Post HW9 Nov 12 lecture repeated measure and review Nov 15 lab on repeated measuresNov 17 lecture on categorical data/logistic regression HW9 due Post HW10 Nov 19 lecture on categorical data/logistic regression Nov 22 lab on logistic regression & project II introduction No class Thanksgiving No class Thanksgiving Nov 29 labDec 1 lecture HW10 due Post HW11 Dec 3 lecture and Quiz Dec 6 labDec 8 lecture HW 11 due Dec 10 lecture & project II due Dec 13 Project II due

3 Review Two-Way ANOVA & Experimental Design  Possible readings Chapters 13, 14, and 24 in text Sit, V. (1995) Analyzing ANOVA Designs: Biometrics Information Handbook No. 5. Province of British Columbia: Ministry of Forests Research Program. http://www.for.gov.bc.ca/hfd/pubs/docs/wp/wp07.pdf

4 Review Two-Way ANOVA  Multiple-way ANOVA is often used to analyze the results of factorial experiments. These experiments are designed to demonstrate the main effects and interactions of one or more categorical predictor variables.  For now we assume the simplest kind of factorial design, the “completely randomized” design (each group is treated as independent and separate from the other groups).

5 Fish Example Suppose we want to find out: 1.Do different species of fish in the lake have different average lengths? (Is there a significant main effect for factor A?) 2.Do male fish have different average length than female fish? (Is there a significant main effect for factor B?) 3.Does the effect of species depend on whether the fish is male or female? (Is there a significant interaction between factors A and B?)

6 One-way ANOVA vs. Two-way ANOVA  Instead of one test (groups same vs. different) we now have three tests (significance of factor A, factor B, and interaction).  Instead of SSB (sum-squares between groups) and SSW (sum-squares within groups), now we have a SS for each factor, plus a SS for the interaction, and a SSW (usually called SSE) for error.  If an interaction effect is significant, then the effect of one factor depends on the level of the other factor.

7 Formal Model for a Two-Way ANOVA Test for Main Effect A Test for Main Effect B Test for Interaction H0H0  i = 0 for all i  j =0 for all j  ij =0 for all i,j HAHA some  i ≠ 0some  j ≠0some  ij ≠0

8 SSTotal = SSTreat + SSE Sum squares total. Measures total variability of all scores from the grand mean. Sum squares between groups. SSTreat measures variations accounted for by group membership. A function of the squared distances of each sample mean from the grand mean, i.e., of how different the samples are. Sum squared error, also known as sum squares within groups (SSW). Measures variations not accounted for by group membership. A function of the total squared distances of all the scores in the individual groups away from their appropriate group means. SST = SSa + SSb + SSab + SSE Variability attributed to factor A (the row factor). A function of the differences in averages among the different rows. Variability attributed to factor B (the column factor). A function of the differences in averages among the different columns. Sum squares total. Same as above. Variability attributed to the interaction between A and B. A function of the differences in cell means between cells left over after adjusting for row and column main effects. Sum squared error measures variability not accounted for by the factors or interaction. It is based on the variability of the scores from their respective cell means. Old formula: New formula:

9  Each combination of factor levels is called a treatment.  Experimental units are assigned to treatments. Observational units (which in the simplest case are the same as the experimental units) are measured on the response variable.  Other names for experimental / observational units are “subjects,” “participants,” “cases,” “plots,” and “guinea pigs.”

10 Assigning Units to Treatments  We use random assignment in order to make valid causal inferences about effects.  In a completely randomized design, all factors are assigned randomly.  In a randomized block design, one of the factors is not assigned randomly but represents preexisting “blocks” of units. The others are assigned randomly within each “block.”

11 Completely Randomized Two-Factor Design All treatments are randomized in the same way

12 Randomized Complete Block Design Each block is randomized separately

13 Blocking  Group similar subjects into “blocks” and randomized treatment applications into those.  A blocking factor is one which accounts for some variability  Eg. Age, gender, location, apparatus, etc..  It is included in the model to make the ANOVA work better.

14 Completely Randomized Design Fertilizer LowHigh Low High Pesticide Plots are randomly assigned, independent of each other, to levels of fertilizer and pesticide. 

15 Randomized Block Design Fertilizer LowHigh North South Field Plots in the north field are randomly assigned to low or high fertilizer. Plots in the south field are randomly assigned to low or high fertilizer. Field is a blocking factor. 

16 Independence Assumption  Note that in both of these examples, the assumption of independent observations is going to be very questionable; but the design with blocking handles it better.  There is also a type of design called split-plot where whole fields get assigned levels of one treatment and then subplots of them get assigned levels of another treatment.

17 Blocking (contd.)  Another example of blocking Have pairs of subjects (chosen because they are twins, or are similar on some demographic variables, etc.) Within each pair randomly assign one treatment to one subject and the other treatment to the other. This works best if there are only two levels of the factor of interest. So here the blocks are of size 2.

18 Treatment: Low Treatment: High Pair 1 Pair 2 Pair 3 Pair 4 Pair 5 Pair 6 Pair 7 Blocks of Size Two e.g. Schizophrenia in twins study, pp. 30-31, Sleuth (although that did not involve random assignment)

19  Blocks are usually “random effects” factors but can sometimes be treated as “fixed effects” factors.  “Random effects” factors are those whose levels represent a sample from population, so that we are not interested in the means of the levels but only in what they tell us about the variability in response due to variability in that population.  “Fixed effects” factors are those in which each level is considered to be important in its own right and we want to estimate the mean Y at that level.  In some situations, the tests and calculations are different for the two kinds of factors.

20  In either a completely randomized design or a randomized block design, there may be either one or more than one experimental unit in each cell. Especially in the case of the completely randomized design, it is greatly preferred to have more than one experimental unit in each cell.

21 Why Replication (Larger Samples) is Good  1.More Power to Reject Null Hypotheses  2.Helps protect you in case of Missing Data  3.Helps protect you in case of outliers  4.When possible we want to base our theories on reproducible results (although this last reason applies more to replicating your whole study than to just using larger samples) Disaster Strikes! Recall that power is one minus the probability of a Type II error. For the F test as for the t-test, higher n means more power.

22  If you have replicates then you are able to test for an interaction between factors. You can fit either an “additive” or a “nonadditive” model.  If there is only one observation in each cell then you just have to assume that there is no interaction and that the additive model works.

23 Additive vs. NonAdditive Model Testing for interactions means testing which model is the best description of the data, the non-additive model or the additive model. (Actually the non-additive model always gives better fit, but we test whether the fit is significantly better.) Testing for main effects means comparing either the non-additive or the additive model to the null model and deciding which model is better. ? ?

24  The Limpets Example (Sleuth, 375-377, 382-391) is a randomized Complete Block design. (So is the Pygmalion example actually.)  Two factors: the treatment factor (grazers allowed) and block  There are 8 blocks (locations)

25 Next lecture  Repeated Measures  Review  Return graded quizzes and projects


Download ppt "Intermediate Applied Statistics STAT 460 Lecture 17, 11/10/2004 Instructor: Aleksandra (Seša) Slavković TA: Wang Yu"

Similar presentations


Ads by Google