1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 15 The Analysis of Variance.

Slides:



Advertisements
Similar presentations
Lecture 8: Hypothesis Testing
Advertisements

1
Lecture Slides Elementary Statistics Eleventh Edition
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Introductory Mathematics & Statistics for Business
STATISTICS HYPOTHESES TEST (I)
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
CALENDAR.
Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
1 Contact details Colin Gray Room S16 (occasionally) address: Telephone: (27) 2233 Dont hesitate to get in touch.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 10 The Analysis of Variance.
Chapter 7 Sampling and Sampling Distributions
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Break Time Remaining 10:00.
The basics for simulations
You will need Your text Your calculator
On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach
PP Test Review Sections 6-1 to 6-6
Chi-Square and Analysis of Variance (ANOVA)
Active Learning Lecture Slides For use with Classroom Response Systems Comparing Groups: Analysis of Variance Methods.
Hypothesis Tests: Two Independent Samples
Chapter 10 Estimating Means and Proportions
Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1 Section 5.5 Dividing Polynomials Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Statistical Inferences Based on Two Samples
© The McGraw-Hill Companies, Inc., Chapter 10 Testing the Difference between Means and Variances.
Types of selection structures
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Analysis of Variance Chapter 12.
Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Essential Cell Biology
Chapter Thirteen The One-Way Analysis of Variance.
Ch 14 實習(2).
Chapter 8 Estimation Understandable Statistics Ninth Edition
Exponents and Radicals
Clock will move after 1 minute
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Experimental Design and Analysis of Variance
Simple Linear Regression Analysis
Correlation and Linear Regression
ANalysis Of VAriance can be used to test for the equality of three or more population means. H 0 :  1  =  2  =  3  = ... =  k H a : Not all population.
Multiple Regression and Model Building
Select a time to count down from the clock above
Copyright Tim Morris/St Stephen's School
9. Two Functions of Two Random Variables
4/4/2015Slide 1 SOLVING THE PROBLEM A one-sample t-test of a population mean requires that the variable be quantitative. A one-sample test of a population.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 9 Inferences Based on Two Samples.
Adapted by Peter Au, George Brown College McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited.
Analysis of Variance (ANOVA) ANOVA can be used to test for the equality of three or more population means We want to use the sample results to test the.
The Analysis of Variance
Chapter 12: Analysis of Variance
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 13 Experimental Design and Analysis of Variance nIntroduction to Experimental Design.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Comparing Three or More Means 13.
1 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 15 The Analysis of Variance.
12-1 Chapter Twelve McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
Chapter 19 Analysis of Variance (ANOVA). ANOVA How to test a null hypothesis that the means of more than two populations are equal. H 0 :  1 =  2 =
Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.
Analysis of Variance (One Factor). ANOVA Analysis of Variance Tests whether differences exist among population means categorized by only one factor or.
12-1 Chapter Twelve McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
Chapter 15 Analysis of Variance. The article “Could Mean Platelet Volume be a Predictive Marker for Acute Myocardial Infarction?” (Medical Science Monitor,
The Analysis of Variance
Comparing Three or More Means
Chapter 11 Analysis of Variance
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 15 The Analysis of Variance

2 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or breast when treated with ascorbate 1. In this study, the authors wanted to determine if the survival times differ based on the affected organ. 1 Cameron, E. and Pauling, L. (1978) Supplemental ascorbate in the supportive treatment of cancer: re-evaluation of prolongation of survival time in terminal human cancer. Proceedings of the National Academy of Science, USA, 75, A Problem

3 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. A comparative dotplot of the survival times is shown below. A Problem

4 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. H 0 : µ stomach = µ bronchus = µ colon = µ ovary = µ breast H a : At least two of the µ ’s are different A Problem The hypotheses used to answer the question of interest are The question is similar to ones encountered in chapter 11 where we looked at tests for the difference of means of two different variables. In this case we are interested in looking a more than two variable.

5 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. A single-factor analysis of variance (ANOVA) problems involves a comparison of k population or treatment means µ 1, µ 2, …, µ k. The objective is to test the hypotheses: H 0 : µ 1 = µ 2 = µ 3 = … = µ k H a : At least two of the µ ’s are different Single-factor Analysis of Variance (ANOVA)

6 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. The analysis is based on k independently selected samples, one from each population or for each treatment. In the case of populations, a random sample from each population is selected independently of that from any other population. When comparing treatments, the experimental units (subjects or objects) that receive any particular treatment are chosen at random from those available for the experiment. Single-factor Analysis of Variance (ANOVA)

7 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. A comparison of treatments based on independently selected experimental units is often referred to as a completely randomized design. Single-factor Analysis of Variance (ANOVA)

8 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Notice that in the above comparative dotplot, the differences in the treatment means is large relative to the variability within the samples. Single-factor Analysis of Variance (ANOVA)

9 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Notice that in the above comparative dotplot, the differences in the treatment means is not easily understood relative to the sample variability. ANOVA techniques will allow us to determined if those differences are significant. Single-factor Analysis of Variance (ANOVA)

10 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. ANOVA Notation k = number of populations or treatments being compared Population or treatment12 …k Population or treatment mean µ 1 µ 2 … µ k Sample mean… Population or treatment variance… Sample variance… Sample sizen 1 n 2 …n k

11 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. N = n 1 + n 2 + … + n k (Total number of observations in the data set) ANOVA Notation T = grand total = sum of all N observations

12 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Assumptions for ANOVA 1.Each of the k populations or treatments, the response distribution is normal.  1 =  2 = … =  k (The k normal distributions have identical standard deviations. 3.The observations in the sample from any particular one of the k populations or treatments are independent of one another. 4.When comparing population means, k random samples are selected independently of one another. When comparing treatment means, treatments are assigned at random to subjects or objects.

13 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Definitions A measure of disparity among the sample means is the treatment sum of squares, denoted by SSTr is given by A measure of variation within the k samples, called error sum of squares and denoted by SSE is given by

14 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Definitions A mean square is a sum of squares divided by its df. In particular, The error df comes from adding the df’s associated with each of the sample variances: (n 1 - 1) + (n 2 - 1) + …+ (n k - 1) = n 1 + n 2 … + n k … - 1 = N - k mean square for treatments = MSTr = mean square for error = MSE =

15 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example Three filling machines are used by a bottler to fill 12 oz cans of soda. In an attempt to determine if the three machines are filling the cans to the same (mean) level, independent samples of cans filled by each were selected and the amounts of soda in the cans measured. The samples are given below. Machine Machine Machine

16 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example

17 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example

18 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example mean square for treatments = MSTr = mean square for error = MSE =

19 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Comments Both MSTr and MSE are quantities that are calculated from sample data. As such, both MSTr and MSE are statistics and have sampling distributions. More specifically, when H 0 is true, µ MSTr = µ MSE. However, when H 0 is false, µ MSTr  µ MSE and the greater the differences among the  ’s, the larger µ MSTr will be relative to µ MSE.

20 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. The Single-Factor ANOVA F Test Null hypothesis: H 0 : µ 1 = µ 2 = µ 3 = … = µ k Alternate hypothesis: At least two of the µ ’s are different Test Statistic:

21 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. The Single-Factor ANOVA F Test When H 0 is true and the ANOVA assumptions are reasonable, F has an F distribution with df 1 = k - 1 and df 2 = N - k. Values of F more contradictory to H 0 than what was calculated are values even farther out in the upper tail, so the P-value is the area captured in the upper tail of the corresponding F curve.

22 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example Consider the earlier example involving the three filling machines. Machine Machine Machine

23 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example

24 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example 1.Let µ 1, µ 2 and µ 3 denote the true mean amount of soda in the cans filled by machines 1, 2 and 3, respectively. 2.H 0 : µ 1 = µ 2 = µ 3 3.H a : At least two among are µ 1, µ 2 and µ 3 different 4.Significance level:  = Test statistic:

25 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example 6.Looking at the comparative dotplot, it seems reasonable to assume that the distributions have the same  ’s. We shall look at the normality assumption on the next slide.* *When the sample sizes are large, we can make judgments about both the equality of the standard deviations and the normality of the underlying populations with a comparative boxplot.

26 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example 6.Looking at normal plots for the samples, it certainly appears reasonable to assume that the samples from Machine’s 1 and 2 are samples from normal distributions. Unfortunately, the normal plot for the sample from Machine 2 does not appear to be a sample from a normal population. So as to have a computational example, we shall continue and finish the test, treating the result with a “grain of salt.”

27 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example 7.Computation:

28 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example 8.P-value: From the F table with numerator df 1 = 2 and denominator df 2 = 21 we can see that < P-value < 0.05 (Minitab reports this value to be 0.038

29 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example 9.Conclusion: Since P-value >  = 0.01, we fail to reject H 0. We are unable to show that the mean fills are different and conclude that the differences in the mean fills of the machines show no statistically significant differences at the 1% level of significance.

30 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Total Sum of Squares The relationship between the three sums of squares is SSTo = SSTr + SSE which is often called the fundamental identity for single-factor ANOVA. Informally this relation is expressed as Total variation = Explained variation + Unexplained variation Total sum of squares, denoted by SSTo, is given by with associated df = N - 1.

31 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Single-factor ANOVA Table The following is a fairly standard way of presenting the important calculations from an single-factor ANOVA. The output from most statistical packages will contain an additional column giving the P-value.

32 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Single-factor ANOVA Table The ANOVA table supplied by Minitab One-way ANOVA: Fills versus Machine Analysis of Variance for Fills Source DF SS MS F P Machine Error Total

33 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Another Example A food company produces 4 different brands of salsa. In order to determine if the four brands had the same sodium levels, 10 bottles of each Brand were randomly (and independently) obtained and the sodium content in milligrams (mg) per tablespoon serving was measured. The sample data are given on the next slide. Use the data to perform an appropriate hypothesis test at the 0.05 level of significance.

34 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Another Example Brand A Brand B Brand C Brand D

35 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Another Example 1.Let µ 1, µ 2, µ 3 and µ 4 denote the true mean sodium content per tablespoon in each of the brands respectively. 2.H 0 : µ 1 = µ 2 = µ 3 = µ 4 3.H a : At least two among are µ 1, µ 2, µ 3 and µ 4 are different 4.Significance level:  = Test statistic:

36 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.Looking at the following comparative boxplot, it seems reasonable to assume that the distributions have the equal  ’s as well as the samples being samples from normal distributions. Another Example

37 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example Treatment df = k - 1 = = 3 7.Computation: Brand k s i Brand A Brand B Brand C Brand D

38 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example 7.Computation (continued): Error df = N - k = = 36

39 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example 8.P-value: F = 7.96 with df numerator = 3 and df denominator = 36 Using df = 30 we find P-value <

40 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example 9.Conclusion: Since P-value <  = 0.001, we reject H 0. We can conclude that the mean sodium content is different for at least two of the Brands. We need to learn how to interpret the results and will spend some time on developing techniques to describe the differences among the µ ’s.

41 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Multiple Comparisons A multiple comparison procedure is a method for identifying differences among the µ ’s once the hypothesis of overall equality (H 0 ) has been rejected. The technique we will present is based on computing confidence intervals for difference of means for the pairs. Specifically, if k populations or treatments are studied, we would create k(k-1)/2 differences. (i.e., with 3 treatments one would generate confidence intervals for µ 1 - µ 2, µ 1 - µ 3 and µ 2 - µ 3.) Notice that it is only necessary to look at a confidence interval for µ 1 - µ 2 to see if µ 1 and µ 2 differ.

42 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. The Tukey-Kramer Multiple Comparison Procedure When there are k populations or treatments being compared, k(k-1)/2 confidence intervals must be computed. If we denote the relevant Studentized range critical value by q, the intervals are as follows: For  i -  j : Two means are judged to differ significantly if the corresponding interval does not include zero.

43 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. The Tukey-Kramer Multiple Comparison Procedure When all of the sample sizes are the same, we denote n by n = n 1 = n 2 = n 3 = … = n k, and the confidence intervals (for µ i - µ j ) simplify to

44 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example (continued) Continuing with example dealing with the sodium content for the four Brands of salsa we shall compute the Tukey-Kramer 95% Tukey- Kramer confidence intervals for µ A - µ B, µ A - µ C, µ A - µ D, µ B - µ C, µ B - µ D and µ C - µ D.

45 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example (continued) Notice that the confidence intervals for µ A – µ B, µ A – µ C and µ C – µ D do not contain 0 so we can infer that the mean sodium content for Brands C is different from Brands A, B and D.

46 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example (continued) We also illustrate the differences with the following listing of the sample means in increasing order with lines underneath those blocks of means that are indistinguishable. Brand B Brand A Brand DBrand C Notice that the confidence interval for µ A – µ C, µ B – µ C, and µ C – µ D do not contain 0 so we can infer that the mean sodium content for Brand C and all others differ.

47 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Minitab Output for Example One-way ANOVA: Sodium versus Brand Analysis of Variance for Sodium Source DF SS MS F P Brand Error Total Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev Brand A (-----*------) Brand B (------*-----) Brand C (------*------) Brand D (------*-----) Pooled StDev =

48 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Minitab Output for Example Tukey's pairwise comparisons Family error rate = Individual error rate = Critical value = 3.81 Intervals for (column level mean) - (row level mean) Brand A Brand B Brand C Brand B Brand C Brand D

49 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Simultaneous Confidence Level The Tukey-Kramer intervals are created in a manner that controls the simultaneous confidence level. For example at the 95% level, if the procedure is used repeatedly on many different data sets, in the long run only about 5% of the time would at least one of the intervals not include that value of what it is estimating. We then talk about the family error rate being 5% which is the maximum probability of one or more of the confidence intervals of the differences of mean not containing the true difference of mean.