Chapter 15 Analysis of Variance. The article “Could Mean Platelet Volume be a Predictive Marker for Acute Myocardial Infarction?” (Medical Science Monitor,

Slides:



Advertisements
Similar presentations
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 10 The Analysis of Variance.
Advertisements

 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
CHAPTER 25: One-Way Analysis of Variance Comparing Several Means
Inference for Regression
Design of Experiments and Analysis of Variance
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16.
© 2010 Pearson Prentice Hall. All rights reserved Single Factor ANOVA.
1 1 Slide © 2009, Econ-2030 Applied Statistics-Dr Tadesse Chapter 10: Comparisons Involving Means n Introduction to Analysis of Variance n Analysis of.
Copyright ©2011 Brooks/Cole, Cengage Learning Analysis of Variance Chapter 16 1.
Statistics Are Fun! Analysis of Variance
Chapter Topics The Completely Randomized Model: One-Factor Analysis of Variance F-Test for Difference in c Means The Tukey-Kramer Procedure ANOVA Assumptions.
Chapter 3 Analysis of Variance
Statistics for Managers Using Microsoft® Excel 5th Edition
Lecture 12 One-way Analysis of Variance (Chapter 15.2)
Copyright © Cengage Learning. All rights reserved. 10 The Analysis of Variance.
Chapter 11: Inference for Distributions
Inferences About Process Quality
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Chapter 12: Analysis of Variance
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
QNT 531 Advanced Problems in Statistics and Research Methods
1 1 Slide © 2006 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 13 Experimental Design and Analysis of Variance nIntroduction to Experimental Design.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Comparing Three or More Means 13.
1 1 Slide Analysis of Variance Chapter 13 BA 303.
CHAPTER 18: Inference about a Population Mean
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Chapter 10 Analysis of Variance.
1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Chapter 12 Analysis of Variance 12.2 One-Way ANOVA.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Analysis of Variance Statistics for Managers Using Microsoft.
Comparing Three or More Means ANOVA (One-Way Analysis of Variance)
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
ANOVA: Analysis of Variance. The basic ANOVA situation Two variables: 1 Nominal, 1 Quantitative Main Question: Do the (means of) the quantitative variables.
Lecture 9-1 Analysis of Variance
Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.1 One-Way ANOVA: Comparing.
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics S eventh Edition By Brase and Brase Prepared by: Lynn Smith.
Chapter 12 Introduction to Analysis of Variance PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Eighth Edition by Frederick.
Econ 3790: Business and Economic Statistics Instructor: Yogesh Uppal
Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Analysis of Variance STAT E-150 Statistical Methods.
While you wait: Enter the following in your calculator. Find the mean and sample variation of each group. Bluman, Chapter 121.
1/54 Statistics Analysis of Variance. 2/54 Statistics in practice Introduction to Analysis of Variance Analysis of Variance: Testing for the Equality.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
 List the characteristics of the F distribution.  Conduct a test of hypothesis to determine whether the variances of two populations are equal.  Discuss.
Lecture notes 13: ANOVA (a.k.a. Analysis of Variance)
ANOVA: Analysis of Variation
ANOVA: Analysis of Variation
Lecture Slides Elementary Statistics Twelfth Edition
i) Two way ANOVA without replication
Comparing Three or More Means
Basic Practice of Statistics - 5th Edition
CHAPTER 12 More About Regression
Econ 3790: Business and Economic Statistics
Comparing Three or More Means
One-Way Analysis of Variance
CHAPTER 12 More About Regression
CHAPTER 18: Inference about a Population Mean
CHAPTER 12 More About Regression
CHAPTER 18: Inference about a Population Mean
Chapter 10 – Part II Analysis of Variance
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

Chapter 15 Analysis of Variance

The article “Could Mean Platelet Volume be a Predictive Marker for Acute Myocardial Infarction?” (Medical Science Monitor, 2005) described an experiment in which four groups of patients seeking treatment for chest pain were compared with respect of mean platelet volume (MPV). The purpose of the study was to determine is mean MPV was different for the four groups, in particular for the heart attack group. If so, then MPV could be used as an indicator of heart attack risk. When two or more populations or treatments are being compared, the characteristic that distinguishes the populations or treatments from one another is called the factor. Researchers need to compare the means from the four treatment groups to determine if  1 =  2 =  3 =  4 or if at least one of the means differ from the rest. In this experiment, the factor is the clinical diagnosis. The four groups were (1) noncardiac chest pain, (2) stable angina pectoris, (3) unstable angina pectoris, (4) myocardial infarction (heart attack). In order to compare the means, the researchers must use a procedure called a single factor analysis of variance or ANOVA.

Mean of Sample 1 Mean of Sample 3 Mean of Sample 2 Mean of Sample 1 Mean of Sample 3 Mean of Sample 2 Graph A Graph B Whether the null hypothesis (of equal means) should be rejected depends on how substantially the samples from the different populations or treatments differ from one another. Consider the following example. In Group A, notice that the three samples seem to have very different means and very little variability in each sample. This would lead us to doubt the claim that  1 =  2 =  3. The phrase “analysis of variance” comes from the idea of analyzing variability in the data to see how much can be attributed to differences in  ’s and how much is due to variability in the individual populations. In Group B, notice that the three samples have the same means as Group A. However, due to the large amount of variability in each sample and the fact that the samples overlap, it is plausible that the samples could come from populations with equal means.

ANOVA Notation k = the number of populations or treatments being compared The total number of observations Grand Total Grand Mean

ANOVA Notation Continued... A measure of differences among the sample means is the treatment sum of squares, denoted by SSTr and given by A measure of variation within the k samples, called error sum of squares and denoted SSE, is Each sum of squares has an associated df: treatment df = k – 1error df = N – k A mean square is a sum of squares divided by its df. The number of error degrees of freedom comes from adding the number of degrees of freedom associated with each of the sample variances: (n 1 – 1) + (n 2 – 1) + … (n k – 1) = N - k

The Single Factor ANOVA F test Null hypothesis: H 0 :  1 =  2 = … =  k Alternative hypothesis: H a : at least two  ’s are different Test Statistic: with df 1 = k – 1 and df 2 = N – k P-value: the area under the appropriate F curve to the right of the calculated F value When H 0 is true,  MSTr =  MSE When H 0 is false,  MSTr >  MSE

The Single Factor ANOVA F Test Continued... Assumptions: 1.Each of the k population or treatment response distributions is normal. 2.The k normal distributions have identical standard deviations. (  1 =  2 = … =  k ) 3.The observations in the sample from any particular one of the k populations or treatments are independent of one another. 4.When comparing population means, the k random samples are selected independently of each other. When comparing treatment means, treatments are assigned at random to subjects or objects. If sample sizes are large, individual boxplots or normal probability plots for each sample can be used to check for normality. If the sample sizes are small, then a combined normal probability plot should be used to check for normality. First find the deviations from the respective mean in each sample, Then combine the deviations to create the normal probability plot While there is a formal procedure to check for equal standard deviations, its use is not recommended due to its sensitivity to any departure from normality. The ANOVA F test can safely be used if the largest sample standard deviation is not more than twice the smallest sample standard deviation.

Heart Attack Risk Continued... Here are the summary statistics for the four groups: H 0 :  1 =  2 =  3 =  4 H a : at least two  ’s are different GroupDescription Sample size Sample mean Sample standard deviation 1Noncardiac chest pain Stable angina pectoris Unstable angina pectoris Myocardial Infarction State the hypotheses. Verify assumptions. The four boxplots are approximately symmetrical with no outliers, so the assumption of normality is plausible. To verify the equality of the standard deviations, notice that the largest sample deviation (group 4) is less than twice that of the smallest standard deviation (group 1). The subjects were randomly selected from groups of individuals who had been diagnosed with the four conditions.

Heart Attack Risk Continued... Here is the summary statistics for the four groups: GroupDescription Sample size Sample mean Sample standard deviation 1Noncardiac chest pain Stable angina pectoris Unstable angina pectoris Myocardial Infarction Calculate the sum of squares terms. Calculate the F test statistic.

Heart Attack Risk Continued... H 0 :  1 =  2 =  3 =  4 H a : at least two  ’s are different Test Statistic: with df 1 = 3 and df 2 = 136 P-value <.001  =.05 Since the P-value < , we reject H 0. There is convincing evidence to conclude that mean MPV is not the same for all four patient populations.

Summarizing an ANOVA ANOVA calculations are often summarized in a tabular format called an ANOVA table. To understand such a table, we need one more sum of squares term. Total sum of squares, denoted by SSTo, is given by with df = N – 1. The relationship between the three sum of squares is: SSTo = SSTr + SSE This is the fundamental identity for single-factor ANOVA.

The General Format for a Single- Factor ANOVA Table Source of Variation df Sum of Squares Mean SquareFP-value Treatmentk – 1SSTr k - 1 MSTr MSE ErrorN – kSSE N - k TotalN - 1SSTo When the analysis is done by statistical software, then the P-value appears here.

Heart Attack Risk Continued... This is the ANOVA table for this data set. SourcedfSSMSFP-value Treatment Error Total Now we know that at least two of the means are different – but which two? To answer the question in this study we need to know if the mean MPV for the heart attack group is the mean that is different.

This procedure is based on calculating confidence intervals for the difference between each possible pair of  ’s. If the interval contains the value zero, then there is no significant difference between the means involved. If, however, the interval does NOT contain the value zero, then the two means are significantly different. How can we tell which of the mean(s) is/are different? We need to use a multiple comparison procedure, which is a method of identifying differences between  ’s. Tukey-Kramer (T-K) Multiple Comparison Procedure What do we do now that we know that at least two of the population or treatment means are different?

Tukey-Kramer (T-K) Multiple Comparison Procedure When there are k populations or treatments being compare, the number of confidence intervals necessary is given by For  i –  j : where q is the relevant Studentized range critical value The two means are judged to differ significantly if the interval does not contain 0. If the sample sizes are the same, we can use T-K intervals are based on probability distributions called studentized range distributions.

Heart Attack Risk Revisited... Number of confidence intervals to compute: For  1 –  2 : GroupDescription Sample size Sample mean Sample standard deviation 1Noncardiac chest pain Stable angina pectoris Unstable angina pectoris Myocardial Infarction How many confidence intervals will we need to compute? Sample sizes are the same in each treatment. This is the critical value for 95% confidence when k = 4 and df = 120 (closest df in the table to 136). This interval contains 0, so there is not a significant difference in the mean MVP between patients with noncardiac chest pain and patients with stable angina.

Heart Attack Risk Revisited... For95% Confidence Interval  1 –  2 (-0.898, 0.178)  1 –  3 (-1.018, 0.058)  1 –  4 (-1.398, )  2 –  3 (-0.658,.0418)  2 –  4 (-1.038, 0.038)  3 –  4 (-0.918, 0.158) The only interval that does not contain 0 is for the difference in mean MPV between patients with noncardiac chest pain and patients with heart attacks. The remaining confidence intervals are calculated in the same manner. They are...

Summarizing the Results of the Tukey-Kramer Procedure 1.List the sample means in increasing order, identifying population just above each x Population32145 Sample Meanx 3 x 2 x 1 x 4 x 5 2. Use the T-K intervals to determine the group of means that do not differ significantly from the first in the list. Draw a horizontal line extending from the smallest mean to the last mean in the group identified, Population32145 Sample Meanx 3 x 2 x 1 x 4 x 5 If the sample means for populations 3, 2, and 1 are not significantly different, then draw a line under them.

Summarizing the Results of the Tukey-Kramer Procedure 3. Use the T-K intervals to determine the group of means that are not significantly different from the second smallest in the list. If this entire group of means is not underscored, draw a horizontal line extending from the smallest mean to the last mean in the new group, Population32145 Sample Meanx 3 x 2 x 1 x 4 x 5 If the sample means for population 2 is not significantly different from 1 and 4, but is different from 5, then draw a line under 2, 1, and Continue considering the means in the order listed, adding new lines as needed.

Heart Attack Risk Revisited... For95% Confidence Interval  1 –  2 (-0.898, 0.178)  1 –  3 (-1.018, 0.058)  1 –  4 (-1.398, )  2 –  3 (-0.658, 0.418)  2 –  4 (-1.038, 0.038)  3 –  4 (-0.918, 0.158) Population:1234 Sample Mean: Should mean MPV be used as a predictor of heart attacks? Let’s summarize these T-K intervals. Based on these data, we have evidence that the mean MPV is not the same for the noncardiac chest pain group and the heart attack group. But since the difference in means is small compared to the variability among the individuals in each group, it would still be difficult to distinguish the two groups based on an individual MPV value. And we don’t have evidence that the mean is different for the heart attack group and the two angina groups. So, MPV is probably not useful as a predictor of heart attack.