Presentation is loading. Please wait.

Presentation is loading. Please wait.

STA305 week21 The One-Factor Model Statistical model is used to describe data. It is an equation that shows the dependence of the response variable upon.

Similar presentations


Presentation on theme: "STA305 week21 The One-Factor Model Statistical model is used to describe data. It is an equation that shows the dependence of the response variable upon."— Presentation transcript:

1 STA305 week21 The One-Factor Model Statistical model is used to describe data. It is an equation that shows the dependence of the response variable upon the levels of the treatment factors. Let Y ij be a random variable that represents the response obtained on the j-th observation of the i-th treatment. Let μ denote the overall expected response. The expected response for an experimental unit in the i-th treatment group is μ i = μ + τ i τ i is deviation of i-th mean from overall mean; it is referred to as the effect of treatment i.

2 STA305 week22 The model is where is the deviation of the individual’s response from the treatment group mean. is known as the random or experimental error.

3 STA305 week23 Fixed Effects versus Random Effects In some cases the treatments are specifically chosen by the experimenter from all possible treatments. The conclusions drawn from such an experiment apply only to these treatments and cannot be generalized to other treatments not included in experiment. This is called a fixed effects model In other cases, the treatments included in the experiment can be regarded as a random selection from the set of all possible treatments. In this situation, conclusions based on the experiment can be generalized to other treatments. When the treatments are random sample, treatment effects, τ i are random variables. This model is called a random effects model or a components of variance model. The random effects model will be studied after the fixed effects model

4 STA305 week24 More about the Fixed Effects Model As specified in slide (2) the model is Where are i.i.d. with distribution N(0, σ 2 ) It follows that response of experimental unit j in treatment group i, Y ij, is normally distributed with In other words

5 STA305 week25 Treatment Effects Recall that treatment effects have been defined as deviations from overall mean, and so the model can be parameterized so that: In the special case where r 1 = r 2 = · · · = r a = r this condition reduces to The hypothesis that there is no treatment effect can be expressed mathematically as: H 0 : μ 1 = μ 2 = · · · = μ a H a : not all μ i are equal This can be expressed equivalently in terms of the τ i : H 0 : τ 1 = τ 2 = · · · = τ a = 0 H a : not all τ i are equal to 0

6 STA305 week26 ’Dot’ Notation “Dot” notation will be used to denote treatment and overall totals, as well as treatment and overall means. The sum of all observations in the i-th treatment group will be denoted as Similarly, the sum of all responses in all treatment groups is denoted: The treatment and overall means are:

7 Rationale for Analysis of Variance Consider all of the data from the a treatment groups as a whole. The variability in the data may come from two sources: 1) treatment means differ from overall mean, this is called between group variability. 2) within a given treatment group individual observations differ from group mean, this is called within group variability. STA305 week27

8 Total Sum of Squares Total variation in data set as a whole is measured by the total sum of squares. It is given by Each deviation from the overall sample mean can be expressed as the sum of 2 parts: 1) deviation of the observation from the group mean. 2) deviation of the group mean from the overall mean In other words… The SS T can then be written as… STA305 week28

9 Expected Sums of Squares Finding the expected value of the sums of squares for error and treatment will lead us to a test of the hypothesis of no treatment effect, i.e., H 0 : τ 1 = τ 2 = · · · = τ a = 0 We start by finding the expected value of SSE…. We continue with the expected value of SS Treat STA305 week29

10 Mean Squares As we have seen in the calculation above, the MSE = SSE/(n − a) is an unbiased estimator of σ 2. The MSE is called the mean square for error. The degrees of freedom associated with SSE are n − a and it follows that E(MSE) = σ 2. The mean square for treatment is defined to be: MS Treat = SS Treat / (a-1). The expected value of MS Treat is STA305 week210

11 Hypothesis Testing Recall that our goal is to test whether there is a treatment effect. The hypothesis of interest is H 0 : τ 1 = τ 2 = · · · = τ a = 0 H a : not all τ i are equal to 0 Notice that if H 0 is true, then On the other hand, if H 0 is false, then at least one τ a ≠ 0, in which case and so E (MS Treat ) > E (MS E ) On average, then, the ratio MS Treat /MS E should be small if H 0 is true, and large otherwise. We use this to develop formal test. STA305 week211

12 Cochran’s Theorem Let Z 1,Z 2,...,Z n be i.i.d. N(μ, 1). Suppose that where Q j has d.f v j. A necessary and sufficient condition for the Q j to be independent of one another, and for Q j ~ χ 2 (v j ) is that. Cochran’s theorem implies that SS E /σ 2 and SS Treat / σ 2 have independent χ 2 distributions with n – a and a − 1 d.f., respectively. Recall: If X 1 and X 2 are two independent random variables, each with a χ 2 distribution, then STA305 week212

13 Hypothesis Test for Treatment Effects Cochran’s theorem and the result just stated provide the tools to construct a formal hypothesis test of no treatment effects. The Hypothesis again are: H 0 : τ 1 = τ 2 = · · · = τ a = 0 H a : not all τ i are equal to 0 The Test Statistic is: F obs = MS Treat /MS E Note that if H 0 is true, then F obs ~ F(a − 1, n − a). So the P-value = P(F(a − 1, n − a) > F obs ). We reject H 0 in favor of H a if P−value < α. Alternatively, reject H 0 in favor of H a if F obs > F α (a − 1, n − a), where F α (a − 1, n − a) is the upper 100 × α%-ile point of the F(a − 1, n − a) distribution. STA305 week213

14 Analysis of Variance Table STA305 week214 The results of the calculations and the hypothesis testing are best summarized in an analysis of variance table The ANOVA Table is given below

15 Estimable Functions of Parameters A function of the model parameters is estimable if and only if it can be written as the expected value of a linear combination of the response variables. In other words, every estimable function is of the form where the c ij are constants It can be shown that from previous sections, μ, μ i, and σ 2 are estimable. STA305 week215

16 Example - Effectiveness of Three Methods for Teaching a Programming Language A study was conducted to determine whether there is any difference in the effectiveness of 3 methods of teaching a particular programming language. The factor levels (treatments) are the three teaching methods: 1) on-line tutorial 2) personal attention of instructor plus hands-on experience 3) personal attention of instructor, but no hands-on experience Replication and Randomization: 5 volunteers were randomly allocated to each of the 3 teaching methods, for a total of 15 study participants. Response Variable: After the programming instruction, a test was administered to determine how well the students had learned the programming language. Research Question: Do the data provide any evidence that the instruction methods differ with respect to test score. The data and the solutions are…. STA305 week216

17 Conducting an ANOVA in SAS There are several procedures in SAS that can be used to do an analysis of variance. PROC GLM (for generalized linear model) will be used in this course To do the analysis for the Example on slide 16, start by creating a SAS dataset: data teach ; input method score ; cards ; 1 73 1 77..... 3 71 ; run ; STA305 week217

18 Use this dataset to conduct an ANOVA using the following SAS code: proc glm data = teach ; class method ; model score = method / ss3 ; run ; quit ; The output produced by this procedure is given in the next slide. STA305 week218

19 STA305 week219

20 Estimating Model Parameters The ANOVA indicates whether there is a treatment effect, however, it doesn’t provide any information about individual treatments or how treatments compare with each other. To better understand outcome of experiment, estimating mean response for each treatment group is useful. Also, it is useful to obtain an estimate of how much variability there is within each treatment group. This involves estimating model parameters. STA305 week220

21 Variability Recall, on slides (9 and 10) we have showed that the MS E is unbiased estimator of σ 2. Further, Cochran’s Theorem was used to show that SS E / σ 2 ~ χ 2 (n − a). We can use this result to calculate a 100 × (1 − α)% confidence interval for σ 2. The CI is give by where and are the upper and lower percentage points of the χ 2 distribution with n − a d.f., respectively. STA305 week221

22 Overall Mean As discussed in the beginning, the overall expected value is μ. Show that is unbiased estimator of μ… The variance of is σ 2 /n. So the 100 × (1 −α)% confidence interval for μ is: Further, a 100 × (1 −α)% confidence interval for μ i is: It follows that is an unbiased estimator of the effect of treatment i, τ i. STA305 week222

23 Differences between Treatment Groups Differences between specific treatment groups will be important from researcher’s point of view. The expected difference in response between treatment groups i and j is: μ i − μ j = τ i – τ j. Since treatment groups are independent of each other, it follows that Therefore, a 100 × (1 −α)% confidence interval for τ i – τ j is: STA305 week223

24 Example - Methods for Teaching Programming Language Cont’d Back to the example of three teaching methods and their effect on programming test score. Based on the ANOVA developed earlier, we found significant difference between the three methods. Which method had the highest average? What is a 95% CI for mean difference in test scores for the 2 instructor-based methods? STA305 week224

25 Comparisons Among Treatment Means As mentioned above, ANOVA will indicate whether there is significant effect of treatments overall it doesn’t indicate which treatments are significantly different from each other. There are a number of methods available for making pairwise comparisons of treatment means. STA305 week225

26 Least Significant Difference (LSD) This method tests the hypothesis that all treatment pairs have the same mean against the alternative that at least one pair differs, that is the hypothesis are: H 0 : μ i − μ j = 0 for all i, j H a : μ i − μ j ≠ 0 for at least one pair i, j In testing difference between any two specific means, reject the null hypothesis if: In the case where the design is balanced and r i = r for all i, the condition above becomes: STA305 week226

27 In other words, the smallest difference between the means that would be considered statistically significant is: This quantity, LSD, is called the least significant difference. LSD method requires that the difference between each pair of means be compared to the LSD. In cases where difference is greater than LSD, we conclude that treatment means differ. STA305 week2 27

28 Important Notes As in any situation where large number of significance tests conducted, the possibility of finding large difference due to chance alone increases. Therefore, in case where the number of treatment groups is large, the probability of making this type of error is relatively large. In other words, probability of committing a Type I error will be increased above α. Further, although the ANOVA F-test might find a significant treatment effect, LSD method might conclude that there are no 2 treatment means that are significantly different from each other. This is because ANOVA F-test considers overall trend of effect of treatment on outcome, and is not restricted to pairwise comparisons. STA305 week228

29 Other Methods for Pairwise Comparisons Other methods for conducting pairwise comparisons are available. The methods that are implemented in PROC GLM in SAS include: – Bonferonni – Duncan’s Multiple Range Test – Dunnett’s procedure – Scheffe’s method – Tukey’s test – several otheres Chapter 4 of Dean & Voss discusses some of these methods. STA305 week229

30 Pairwise Comparisons in SAS Pairwise comparisons can be requested by including a means statement. The code below requests means with LSD comparison: proc glm data = teach ; class method ; model score = method / ss3 ; means method / lsd cldiff ; run ; The part of the output containing the pairwise comparisons is shown in the next slide. STA305 week230

31 STA305 week231

32 STA305 week232

33 Contrasts ANOVA test indicates only whether there is an overall trend for the treatment means to differ, and does not indicate specifically which treatments are the same, which are different, etc. In the last few slides looked at pairwise comparisons between treatment means. However, comparisons that are of interest to researcher may include more then just two group. They can be linear combination of means. STA305 week233

34 Example - Does Food Decrease Effectiveness of Pain Killers? Researchers at pain clinic want to know whether effectiveness of two leading pain killers is same when taken on empty stomach as when taken with food. A study with four treatment groups was designed: 1. aspirin with no food 2. aspirin with food 3. tylenol with no food 4. tylenol with food In addition to determining whether there is a difference between the four treatment groups, researchers want to determine whether there is a difference between taking medication with food and taking it without. This second hypothesis can be expressed statistically as: H 0 : μ 1 + μ 3 = μ 2 + μ 4 H a : μ 1 + μ­ ≠ μ 2 + μ 4 STA305 week234

35 The point estimate of difference between fed and not fed conditions is based on sample means: STA305 week235

36 Hypothesis Tests Using Contrasts As in the example on the previous slide, the comparison of treatment means that is of interest might be a linear combination of means. That is, the hypothesis of interest would be of the form H 0 : c 1 μ 1 + c 2 μ 2 + · · · + c a μ a = 0 H a : c 1 μ 1 + c 2 μ 2 + · · · + c a μ a ≠ 0 The c i are constants subject to the constraints: (i) c i > 0 for all i, and (ii) Test of this hypothesis can be constructed using sample means for each treatment group. The linear combination c 1 μ 1 + c 2 μ 2 + · · · + c a μ a is called a contrast. STA305 week236

37 If the assumptions of the model are satisfied, then: If σ 2 was known, a test of H 0 could be done using: Since σ 2 is unknown, we use its unbiased estimate, the MS E, and conduct a t-test with n − a d.f.. The test statistics is Recall, if X is a random variable with t(v) distribution, then X 2 has F(1, v) distribution. STA305 week237

38 So an equivalent test statistic is: At level α, reject H 0 in favour of H a if F obs > F α (1, n − a), or equivalently if |t obs | > t α/2 (n − a). The sum of squares for contrast is: Each contrast has 1 d.f., so the mean square for contrast is: MS contrast = SS contrast /1 STA305 week238

39 Summary The hypothesis: H 0 : c 1 μ 1 + c 2 μ 2 + · · · + c a μ a = 0 H a : c 1 μ 1 + c 2 μ 2 + · · · + c a μ a ≠ 0 Test Statistic Decision Rule: reject H 0 if F obs > F α (1, n − a) STA305 week239

40 Orthogonal Contrasts Very often more than one contrast will be of interest. Further, it is possible that one research question will require more than one contrast, i.e., H 0 : μ 1 = μ 3 and μ 2 = μ 4 Ideally, we want tests about different contrasts to be independent of each other. Suppose that the two contrasts of interest are: c 1 μ 1 + c 2 μ 2 + · · · + c a μ a and d 1 μ 1 + d 2 μ 2 + · · · + d a μ a. These two contrasts are orthogonal to each other they iff they satisfy: If there are a treatments then, SS Treat can be decomposed into set of a − 1 orthogonal contrasts, each with 1 d.f. as follows SS Treat = SS contrast1 + SS contrast2 + · · · + SS contrasta−1. Unless a = 2, there will be more than one set of orthogonal contrasts. STA305 week240

41 Example - Food / Pain Killers Continued Refer back to the example on slide 31. The study designed with 4 treatment groups. The treatment sum of squares can be decomposed into 3 orthogonal contrasts. Since researcher interested in difference between fed & unfed, makes sense to use the following contrasts: STA305 week241

42 Exercise: verify that each is in fact a contrast. Exercise: verify that contrasts are orthogonal. Note, there is more than one way to decompose treatment sum of squares into set of orthogonal contrasts. For example, instead of comparing aspirin and Tylenol, might be interested in comparing food with no food. In this case, compare (i) aspirin with food and Tylenol with food, (ii) aspirin without food and Tylenol without food, and (iii) the 2 food groups to the 2 no-food groups. STA305 week242

43 ANOVA Table for Orthogonal Contrasts Contrasts to be used in experiment must be chosen at the beginning of the study. The hypotheses to be tested should not be selected after viewing the data. Once the treatment SS has been decomposed using preplanned orthogonal contrasts, the ANOVA table can be expanded to show decomposition as shown in the next slide. STA305 week243

44 STA305 week244

45 Example - Pressure on a Torsion Spring STA305 week245

46 The figure above shows a diagram of a torsion spring. Pressure is applied to arms to close the spring. A study has been designed to examine pressure on torsion spring. Five different angles between arms of spring will be studied to determined their impact on the pressure: 67º, 71 º, 75 º, 79 º, and 83 º. Researchers are interested in whether there is an overall difference between different angle settings. In addition would like to study set of orthogonal contrasts which compares the 2 smallest angles to each other and 2 largest angles to each other. The data collected are shown in the following slide. STA305 week246

47 Torsion Spring Data STA305 week247

48 Solution STA305 week248

49 Contrasts in SAS To do the analysis for the last example, start by creating a SAS dataset: data torsion ; input angle pressure; cards ; 67 83 67 85 71 87 71 84........... 79 90 83 90 83 92; run ; STA305 week249

50 Here is an additional code that is required to specify the contrasts of interest: proc glm data = torsion ; class angle ; model pressure = angle / ss3 ; contrast ’67-71’ angle 1 -1 0 0 0 ; contrast ’79-83’ angle 0 0 0 1 -1 ; contrast ’sm vs lg’ angle 1 1 0 -1 -1 ; contrast ’mid vs oth’ angle 1 1 -4 1 1 ; run ; quit ; STA305 week250

51 The ANOVA part of the output is not shown here. The part of the output generated by the contrast statements looks like this: Contrast DF Contrast SS Mean Square F Value Pr>F 67-71 1 3.37500000 3.37500000 2.92 0.1031 79-83 1 1.33333333 1.33333333 1.15 0.2958 sm vs lg 1 93.35294118 93.35294118 80.70 <0.0001 mid vs oth 1 0.20796354 0.20796354 0.18 0.6761 STA305 week251


Download ppt "STA305 week21 The One-Factor Model Statistical model is used to describe data. It is an equation that shows the dependence of the response variable upon."

Similar presentations


Ads by Google