Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested.

Similar presentations


Presentation on theme: "Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested."— Presentation transcript:

1

2 Analysis of Variance Chapter 15

3 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested in determining whether differences exist between the population means. The procedure works by analyzing the sample variance.

4 The analysis of variance is a procedure that tests to determine whether differences exits between two or more population means. To do this, the technique analyzes the sample variances 15.2 One Way Analysis of Variance

5 Example 15.1 –An apple juice manufacturer is planning to develop a new product -a liquid concentrate. –The marketing manager has to decide how to market the new product. –Three strategies are considered Emphasize convenience of using the product. Emphasize the quality of the product. Emphasize the product’s low price. One Way Analysis of Variance

6 Example 15.1 - continued –An experiment was conducted as follows: In three cities an advertisement campaign was launched. In each city only one of the three characteristics (convenience, quality, and price) was emphasized. The weekly sales were recorded for twenty weeks following the beginning of the campaigns. One Way Analysis of Variance

7 See file Xm15 -01 Weekly sales

8 Solution –The data are interval –The problem objective is to compare sales in three cities. –We hypothesize that the three population means are equal One Way Analysis of Variance

9 H 0 :  1 =  2 =  3 H 1 : At least two means differ To build the statistic needed to test the hypotheses use the following notation: Solution Defining the Hypotheses

10 Independent samples are drawn from k populations (treatments). 12k X 11 x 21. X n1,1 X 12 x 22. X n2,2 X 1k x 2k. X nk,k Sample size Sample mean First observation, first sample Second observation, second sample X is the “response variable”. The variables’ value are called “responses”. Notation

11 Terminology In the context of this problem… Response variable – weekly sales Responses – actual sale values Experimental unit – weeks in the three cities when we record sales figures. Factor – the criterion by which we classify the populations (the treatments). In this problems the factor is the marketing strategy. Factor levels – the population (treatment) names. In this problem factor levels are the marketing trategies.

12 Two types of variability are employed when testing for the equality of the population means The rationale of the test statistic

13 Graphical demonstration: Employing two types of variability

14 20 25 30 1 7 Treatment 1Treatment 2 Treatment 3 10 12 19 9 Treatment 1Treatment 2Treatment 3 20 16 15 14 11 10 9 The sample means are the same as before, but the larger within-sample variability makes it harder to draw a conclusion about the population means. A small variability within the samples makes it easier to draw a conclusion about the population means.

15 The rationale behind the test statistic – I If the null hypothesis is true, we would expect all the sample means to be close to one another (and as a result, close to the grand mean). If the alternative hypothesis is true, at least some of the sample means would differ. Thus, we measure variability between sample means.

16 The variability between the sample means is measured as the sum of squared distances between each mean and the grand mean. This sum is called the S um of S quares for T reatments SST In our example treatments are represented by the different advertising strategies. Variability between sample means

17 There are k treatments The size of sample j The mean of sample j Sum of squares for treatments (SST) Note: When the sample means are close to one another, their distance from the grand mean is small, leading to a small SST. Thus, large SST indicates large variation between sample means, which supports H 1.

18 Solution – continued Calculate SST = 20(577.55 - 613.07 )2 + + 20(653.00 - 613.07) 2 + + 20(608.65 - 613.07) 2 = = 57,512.23 The grand mean is calculated by Sum of squares for treatments (SST)

19 Is SST = 57,512.23 large enough to reject H 0 in favor of H 1 ? See next. Sum of squares for treatments (SST)

20 Large variability within the samples weakens the “ability” of the sample means to represent their corresponding population means. Therefore, even though sample means may markedly differ from one another, SST must be judged relative to the “within samples variability”. The rationale behind test statistic – II

21 The variability within samples is measured by adding all the squared distances between observations and their sample means. This sum is called the S um of S quares for E rror SSE In our example this is the sum of all squared differences between sales in city j and the sample mean of city j (over all the three cities). Within samples variability

22 Solution – continued Calculate SSE Sum of squares for errors (SSE)  (n 1 - 1)s 1 2 + (n 2 -1)s 2 2 + (n 3 -1)s 3 2 = (20 -1)10,774.44 + (20 -1)7,238.61+ (20-1)8,670.24 = 506,983.50

23 Is SST = 57,512.23 large enough relative to SSE = 506,983.50 to reject the null hypothesis that specifies that all the means are equal? Sum of squares for errors (SSE)

24 mean squares To perform the test we need to calculate the mean squares as follows: The mean sum of squares Calculation of MST - M ean S quare for T reatments Calculation of MSE M ean S quare for E rror

25 Calculation of the test statistic with the following degrees of freedom: v 1 =k -1 and v 2 =n-k Required Conditions: 1. The populations tested are normally distributed. 2. The variances of all the populations tested are equal.

26 And finally the hypothesis test: H 0 :  1 =  2 = …=  k H 1 : At least two means differ Test statistic: R.R: F>F ,k-1,n-k The F test rejection region

27 The F test H o :  1 =  2 =  3 H 1 : At least two means differ Test statistic F= MST  MSE= 3.23 Since 3.23 > 3.15, there is sufficient evidence to reject H o in favor of H 1, and argue that at least one of the mean sales is different than the others.

28 Use Excel to find the p-value – f x Statistical FDIST(3.23,2,57) =.0467 The F test p- value p Value = P(F>3.23) =.0467

29 Excel single factor ANOVA SS(Total) = SST + SSE Xm15-01.xls

30 15.3 Analysis of Variance Experimental Designs Several elements may distinguish between one experimental design and others. –The number of factors. Each characteristic investigated is called a factor. Each factor has several levels.

31 Factor A Level 1Level2 Level 1 Factor B Level 3 Two - way ANOVA Two factors Level2 One - way ANOVA Single factor Treatment 3 (level 1) Response Treatment 1 (level 3) Treatment 2 (level 2)

32 Groups of matched observations are formed into blocks, in order to remove the effects of “unwanted” variability. By doing so we improve the chances of detecting the variability of interest. Independent samples or blocks

33 Fixed effects –If all possible levels of a factor are included in our analysis we have a fixed effect ANOVA. –The conclusion of a fixed effect ANOVA applies only to the levels studied. Random effects –If the levels included in our analysis represent a random sample of all the possible levels, we have a random-effect ANOVA. –The conclusion of the random-effect ANOVA applies to all the levels (not only those studied). Models of Fixed and Random Effects

34 In some ANOVA models the test statistic of the fixed effects case may differ from the test statistic of the random effect case. Fixed and random effects - examples –Fixed effects - The advertisement Example (15.1): All the levels of the marketing strategies were included –Random effects - To determine if there is a difference in the production rate of 50 machines, four machines are randomly selected and there production recorded. Models of Fixed and Random Effects.

35 15.4 Randomized Blocks (Two-way) Analysis of Variance The purpose of designing a randomized block experiment is to reduce the within-treatments variation thus increasing the relative amount of between treatment variation. This helps in detecting differences between the treatment means more easily.

36 Treatment 4 Treatment 3 Treatment 2 Treatment 1 Block 1Block3Block2 Block all the observations with some commonality across treatments Randomized Blocks

37 Block all the observations with some commonality across treatments Randomized Blocks

38 The sum of square total is partitioned into three sources of variation –Treatments –Blocks –Within samples (Error) SS(Total) = SST + SSB + SSE Sum of square for treatmentsSum of square for blocksSum of square for error Recall. For the independent samples design we have: SS(Total) = SST + SSE Partitioning the total variability

39 Calculating the sums of squares Formulai for the calculation of the sums of squares SST = SSB=

40 Calculating the sums of squares Formulai for the calculation of the sums of squares SST = SSB=

41 To perform hypothesis tests for treatments and blocks we need Mean square for treatments Mean square for blocks Mean square for error Mean Squares

42 Test statistics for the randomized block design ANOVA Test statistic for treatments Test statistic for blocks

43 Testing the mean responses for treatments F > F ,k-1,n-k-b+1 Testing the mean response for blocks F> F ,b-1,n-k-b+1 The F test rejection regions

44 Example 15.2 –Are there differences in the effectiveness of cholesterol reduction drugs? –To answer this question the following experiment was organized: 25 groups of men with high cholesterol were matched by age and weight. Each group consisted of 4 men. Each person in a group received a different drug. The cholesterol level reduction in two months was recorded. –Can we infer from the data in Xm15-02 that there are differences in mean cholesterol reduction among the four drugs?Xm15-02 Randomized Blocks ANOVA - Example

45 Solution –Each drug can be considered a treatment. –Each 4 records (per group) can be blocked, because they are matched by age and weight. –This procedure eliminates the variability in cholesterol reduction related to different combinations of age and weight. –This helps detect differences in the mean cholesterol reduction attributed to the different drugs. Randomized Blocks ANOVA - Example

46 BlocksTreatmentsb-1MST / MSEMSB / MSE Conclusion: At 5% significance level there is sufficient evidence to infer that the mean “cholesterol reduction” gained by at least two drugs are different. K-1 Randomized Blocks ANOVA - Example

47 Analysis of Variance Chapter 15 - continued

48 15.5 Two-Factor Analysis of Variance - Example 15.3 –Suppose in Example 15.1, two factors are to be examined: The effects of the marketing strategy on sales. –Emphasis on convenience –Emphasis on quality –Emphasis on price The effects of the selected media on sales. –Advertise on TV –Advertise in newspapers

49 Solution –We may attempt to analyze combinations of levels, one from each factor using one-way ANOVA. –The treatments will be: Treatment 1: Emphasize convenience and advertise in TV Treatment 2: Emphasize convenience and advertise in newspapers ……………………………………………………………………. Treatment 6: Emphasize price and advertise in newspapers Attempting one-way ANOVA

50 Solution –The hypotheses tested are: H 0 :  1 =  2 =  3 =  4 =  5 =  6 H 1 : At least two means differ. Attempting one-way ANOVA

51 City1 City2 City3 City4City5City6 Convnce Convnce Quality Quality Price Price TVPaper TV Paper TV Paper – In each one of six cities sales are recorded for ten weeks. – In each city a different combination of marketing emphasis and media usage is employed. Solution Attempting one-way ANOVA

52 The p-value =.0452. We conclude that there is evidence that differences exist in the mean weekly sales among the six cities. City1 City2 City3 City4City5City6 Convnce Convnce Quality Quality Price Price TVPaper TV Paper TV Paper Solution Xm15-03 Attempting one-way ANOVA

53 These result raises some questions: –Are the differences in sales caused by the different marketing strategies? –Are the differences in sales caused by the different media used for advertising? –Are there combinations of marketing strategy and media that interact to affect the weekly sales? Interesting questions – no answers

54 The current experimental design cannot provide answers to these questions. A new experimental design is needed. Two-way ANOVA (two factors)

55 City 1 sales City3 sales City 5 sales City 2 sales City 4 sales City 6 sales TV Newspapers ConvenienceQualityPrice Are there differences in the mean sales caused by different marketing strategies? Factor A: Marketing strategy Factor B: Advertising media

56 Test whether mean sales of “Convenience”, “Quality”, and “Price” significantly differ from one another. H 0 :  Conv. =  Quality =  Price H 1 : At least two means differ Calculations are based on the sum of square for factor A SS(A) Two-way ANOVA (two factors)

57 City 1 sales City 3 sales City 5 sales City 2 sales City 4 sales City 6 sales Factor A: Marketing strategy Factor B: Advertising media Are there differences in the mean sales caused by different advertising media? TV Newspapers ConvenienceQualityPrice

58 Test whether mean sales of the “TV”, and “Newspapers” significantly differ from one another. H 0 :  TV =  Newspapers H 1 : The means differ Calculations are based on the sum of square for factor B SS(B) Two-way ANOVA (two factors)

59 City 1 sales City 5 sales City 2 sales City 4 sales City 6 sales TV Newspapers ConvenienceQualityPrice Factor A: Marketing strategy Factor B: Advertising media Are there differences in the mean sales caused by interaction between marketing strategy and advertising medium? City 3 sales TV Quality

60 Test whether mean sales of certain cells are different than the level expected. Calculation are based on the sum of square for interaction SS(AB) Two-way ANOVA (two factors)

61 Graphical description of the possible relationships between factors A and B.

62 Levels of factor A 123 Level 1 of factor B Level 2 of factor B 123 123123 Level 1and 2 of factor B Difference between the levels of factor A No difference between the levels of factor B Difference between the levels of factor A, and difference between the levels of factor B; no interaction Levels of factor A No difference between the levels of factor A. Difference between the levels of factor B Interaction M R e s a p n o n s e M R e s a p n o n s e M R e s a p n o n s e M R e s a p n o n s e

63 Sums of squares

64 F tests for the Two-way ANOVA Test for the difference between the levels of the main factors A and B F= MS(A) MSE F= MS(B) MSE Rejection region: F > F ,a-1,n-ab F > F , b-1, n-ab Test for interaction between factors A and B F= MS(AB) MSE Rejection region: F > F  a-1)(b-1),n-ab SS(A)/(a-1) SS(B)/(b-1) SS(AB)/(a-1)(b-1) SSE/(n-ab)

65 Required conditions: 1.The response distributions is normal 2.The treatment variances are equal. 3.The samples are independent.

66 Example 15.3 – continued( Xm15-03) Xm15-03 F tests for the Two-way ANOVA

67 Example 15.3 – continued –Test of the difference in mean sales between the three marketing strategies H 0 :  conv. =  quality =  price H 1 : At least two mean sales are different F tests for the Two-way ANOVA Factor A Marketing strategies

68 Example 15.3 – continued –Test of the difference in mean sales between the three marketing strategies H 0 :  conv. =  quality =  price H 1 : At least two mean sales are different F = MS(Marketing strategy)/MSE = 5.33 F critical = F ,a-1,n-ab = F.05,3-1,60-(3)(2) = 3.17; (p-value =.0077) –At 5% significance level there is evidence to infer that differences in weekly sales exist among the marketing strategies. F tests for the Two-way ANOVA MS(A)  MSE

69 Example 15.3 - continued –Test of the difference in mean sales between the two advertising media H 0 :  TV. =  Nespaper H 1 : The two mean sales differ F tests for the Two-way ANOVA Factor B = Advertising media

70 Example 15.3 - continued –Test of the difference in mean sales between the two advertising media H 0 :  TV. =  Nespaper H 1 : The two mean sales differ F = MS(Media)/MSE = 1.42 F critical = F  a-1,n-ab = F.05,2-1,60-(3)(2) = 4.02 (p-value =.2387) –At 5% significance level there is insufficient evidence to infer that differences in weekly sales exist between the two advertising media. F tests for the Two-way ANOVA MS(B)  MSE

71 Example 15.3 - continued –Test for interaction between factors A and B H 0 :  TV*conv. =  TV*quality =…=  newsp.*price H 1 : At least two means differ F tests for the Two-way ANOVA Interaction AB = Marketing*Media

72 Example 15.3 - continued –Test for interaction between factor A and B H 0 :  TV*conv. =  TV*quality =…=  newsp.*price H 1 : At least two means differ F = MS(Marketing*Media)/MSE =.09 F critical = F  a-1)(b-1),n-ab = F.05,(3-1)(2-1),60-(3)(2) = 3.17 (p-value=.9171) –At 5% significance level there is insufficient evidence to infer that the two factors interact to affect the mean weekly sales. MS(AB)  MSE F tests for the Two-way ANOVA

73 15.7 Multiple Comparisons When the null hypothesis is rejected, it may be desirable to find which mean(s) is (are) different, and at what ranking order. Three statistical inference procedures, geared at doing this, are presented: –Fisher’s least significant difference (LSD) method –Bonferroni adjustment –Tukey’s multiple comparison method

74 Two means are considered different if the difference between the corresponding sample means is larger than a critical number. Then, the larger sample mean is believed to be associated with a larger population mean. Conditions common to all the methods here: –The ANOVA model is the one way analysis of variance –The conditions required to perform the ANOVA are satisfied. –The experiment is fixed-effect 15.7 Multiple Comparisons

75 Fisher Least Significant Different (LSD) Method This method builds on the equal variances t-test of the difference between two means. The test statistic is improved by using MSE rather than s p 2. We can conclude that  i and  j differ (at  % significance level if |  i -  j | > LSD, where

76 Experimentwise Type I error rate (  E ) (the effective Type I error) The Fisher’s method may result in an increased probability of committing a type I error. The experimentwise Type I error rate is the probability of committing at least one Type I error at significance level of  It is  calculated by  E = 1-(1 –  ) C where C is the number of pairwise comparisons (I.e. C = k(k-1)/2 The Bonferroni adjustment determines the required T ype I error probability per pairwise comparison (  ), to secure a pre- determined overall  E 

77 The procedure: –Compute the number of pairwise comparisons (C) [C=k(k-1)/2], where k is the number of populations. –Set  =  E /C, where  E is the true probability of making at least one Type I error (called experimentwise Type I error). –We can conclude that  i and  j differ (at  /C% significance level if Bonferroni Adjustment

78 Example 15.1 - continued –Rank the effectiveness of the marketing strategies (based on mean weekly sales). –Use the Fisher’s method, and the Bonferroni adjustment method Solution (the Fisher’s method) –The sample mean sales were 577.55, 653.0, 608.65. –Then, Fisher and Bonferroni Methods

79 Solution (the Bonferroni adjustment) –We calculate C=k(k-1)/2 to be 3(2)/2 = 3. –We set  =.05/3 =.0167, thus t.0167  2, 60-3 = 2.467 (Excel). Again, the significant difference is between  1 and  2. Fisher and Bonferroni Methods

80 The test procedure: –Find a critical number  as follows: k = the number of samples =degrees of freedom = n - k n g = number of observations per sample (recall, all the sample sizes are the same)  = significance level q  (k, ) = a critical value obtained from the studentized range table Tukey Multiple Comparisons

81 If the sample sizes are not extremely different, we can use the above procedure with n g calculated as the harmonic mean of the sample sizes. Repeat this procedure for each pair of samples. Rank the means if possible. Select a pair of means. Calculate the difference between the larger and the smaller mean. If there is sufficient evidence to conclude that  max >  min. Tukey Multiple Comparisons

82 City 1 vs. City 2: 653 - 577.55 = 75.45 City 1 vs. City 3: 608.65 - 577.55 = 31.1 City 2 vs. City 3: 653 - 608.65 = 44.35 Example 15.1 - continued We had three populations (three marketing strategies). K = 3, Sample sizes were equal. n 1 = n 2 = n 3 = 20,  = n-k = 60-3 = 57, MSE = 8894. Take q.05 (3,60) from the table. Population Sales - City 1 Sales - City 2 Sales - City 3 Mean 577.55 653 698.65 Tukey Multiple Comparisons

83 Excel – Tukey and Fisher LSD method Xm15-01 Fisher’s LDS Bonferroni adjustments  =.05  =.05/3 =.0167


Download ppt "Analysis of Variance Chapter 15 15.1 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested."

Similar presentations


Ads by Google