Tukey’s Honestly Significant Difference

Tukey’s Honestly Significant Difference
Chapter 11 Tukey’s Honestly Significant Difference 1/12/2019

Inferential Statistics Parametric Ch 5. Inferential Statistics Random Samples Estimate Population Statistics Correlation of Two variables Experimental Methods Standard Error of the Mean Descriptive Statistics Ch 1. The Mean, The Number of Observations, & the Standard Deviation N/Population/Parameters Measures of Central Tendency –Median, Mode, Mean Measures of Variability – Range, Sum of Squares, Variance, Standard Deviation Ch 6. T Scores / T Curves Estimates of Z scores Computing t Scores Critical Values Degrees of Freedom Other stuff to come Ch 10.Two Way Factorial Analysis of Variance Three null hypotheses Graphing the means Factorial designs Ch 7. Correlation Variable Relationships – linearity, direction, strength Correlation Coefficient Scatter plots Best Fitting lines Ch 2. Frequency Distributions and Histograms Frequency Distributions Bar Graphs / Histograms Continuous vs Discreet Variables Chapter 11. Tukey’s Honestly Significant Difference Ch 12. Tukeys Significant Difference Testing differences in group means Alpha for the whole experiment HSD - Honestly Significant Difference Ch 8. Regression Predicting using the regression equation Generalizing – The null hypothesis Degrees of freedom and statistical significance Ch 3. The Normal Curve Z scores & percentiles Least Squares, Unbiased estimates Ch 9. Experimental Studies Independent and dependent variables The experimental hypothesis The F test and the t test Ch 12. Power Analysis Type 1 error and alpha Type 2 error and beta How many subjects do you need? Ch 4. Translating To and From Z Scores Normal Scores Scale Scores Raw Scores Percentiles Ch 13. Assumptions Underlying Parametric Statistics Sample means form a normal curveSubjects are randomly selected from the population Homgeneity of VarianceExperimental error is random across samples Non Parametric Ch 14. Chi Square Nominal Data

Objectives At the end of this chapter you will understand how to tell which group means are significantly different from each other. You will be able to: Calculate the number of pair-wise comparisons for group means. Calculate the harmonic mean. Look up the q value in a table. Calculate Tukey’s HSD. Determine which means are different and which are not 1/12/2019

Comparing Experimental
Group Means 1/12/2019

Means and Significance
A significant F test could tell us that there is a main effect or an interaction effect. When that happens,we know that there is a difference among the means that is unlikely to occur just by chance. But which means???? The only ones we can be sure are significantly different from each other are the highest and lowest means. Not a problem with the t test, because there are only two means But when more that 2 means, what can we say about the relationship between means other than the highest and lowest? 1/12/2019

Multiple Pair-wise Comparisons
WHY WE NEED TUKEY'S HSD: Multiple Pair-wise Comparisons 1/12/2019

Multiple comparisons When there were 3 groups in the study, comparing all the possible combinations of the three means, two means at a time, requires us to make three comparisons, With four groups there are six necessary comparisons, with five groups there are ten comparisons, etc. We set alpha at .05 thinking we were only doing one t test. What are the odds of finding significance on one or more of 6 or 10 or more t tests because of sampling fluctuation? They quickly get much higher than .05 1/12/2019

Here are the (4)(3)/2=6 comparisons for 4 groups
Example Here are the (4)(3)/2=6 comparisons for 4 groups Group # vs Group # 1/12/2019

Here is the formula for the number of possible pairwise comparisons
Number of possible pairwise comparisons = [k(k-1)/2] The number of possible pairwise comparisons equals the number of groups times the number of groups minus one divided by 2 6 groups =[(6)(6-1)/2=6*5/2= 30/2 = 15 7 groups =[(7)(7-1)/2]=7*6/2 = 42/2=21 Etc. 1/12/2019

We need a fancy t test The rule is t for 2.
When you want to compare 2 groups to determine whether they are significantly different, use the t test. But we need a t test designed for multiple comparisons. Alpha adjusted so that experimentwise alpha =.05. Easy to compare all the groups in any specific experiment, no matter how many there are. 1/12/2019

Keeping experimentwise alpha at .05 is critical
Significance tests exist to minimize Type 1 error (rejecting a true null hypothesis). So, we don’t want to say the difference between two group means is significant (and, therefore, shows how the independent variable would effect the whole population) when the groups differ only because of sampling fluctuation. We must keep the odds on making that kind of mistake at 5 in 100 no matter how many comparisons we make. 1/12/2019

What are the odds on failing to reject the null when H0 is true and you make three pairwise comparisons In an ordinary t test, with each comparison, the odds on obtaining a strange sample and getting statistical significance when H0 is true is set, by convention, at .05 (5 in 100) The odds on not rejecting a null hypothesis that is correct is therefore 95 in 100 (.95). But, the odds on not rejecting 3 correct nulls in a row, using simple t tests, is (.95)(.95)(.95) =(.95)N=(.95)3=.8573 The odds on one of the three comparisons being significant when the three nulls are true is = or almost exactly 1 out of 7, not 1 out of 20. In a three group study there are 3 pairwise comparisons. If we do three comparisons and don’t adjust alpha for each one, the odds on sampling fluctuation (mis)leading us to finding a significant difference between at least one of the 3 pairs are over 14 in 100, not 5 in 100. 1/12/2019

Say we have four groups, that yields (4x3)/2=6 pairwise comparisons.
When there are more comparisons the odds on type 1 error get much higher! Say we have four groups, that yields (4x3)/2=6 pairwise comparisons. If we do 6 t tests with alpha at .05 for each, we have 95 chances in 100 of properly failing to reject the null (and retaining it) each time. But the odds on properly retaining it every one of the 6 times is (.95)6= .95x.95x.95x.95x.95x.95=.735 So there would be = .265 = 26.5% chance of committing at least one Type 1 error by rejecting the null hypothesis when the only reason two groups differ is random sampling fluctuation 1/12/2019

Five groups: 10 pairwise comparisons
Let’s say that we had 5 groups in a study That gives us (5)(4)/2=10 possible pairwise comparisons Assume H0 is true. With 10 comparisons, the chances of all of the 10 comparisons failing to be statistically significant are (.95)10=.5987 So the odds on at least one significant finding (though H0 is true) is =.4013 or over 40% 1/12/2019

Then, taken together to yield an experimentwise alpha of .05
We have to lower alpha for each comparison to keep experimentwise alpha at .05 To make alpha stay at .05 when the all the comparisons are considered, we must lower alpha (quite a bit) For example, with 5 groups, we need to set alpha at a little more than .005 for each of the 10 comparisons. Then, taken together to yield an experimentwise alpha of .05 [( )10=( )10 = ] 1/12/2019

Think of it this way. You must pass a true/false test about tensor calculus. You have a 5% chance to passing the test each time you take it just by choosing True or False randomly. Which would you prefer, taking the test once or as many times as you wish You know intuitively that you will pass the test eventually just by chance. 1/12/2019

It’s the same thing with the t test
Compare 2 groups that differ by chance an you have only a 5% chance of making a Type 1 error. Make lots of comparisons and sooner or later you will make a Type 1 error simply by chance. The solution: change the alpha level and thus the critical value of t on each test so that there is only a 5% chance of getting any Type 1 errors given the number of comparisons you have to make. 1/12/2019

It’s like dividing .05 by the number of comparisons
When there are three groups (and 3 comparisons) its almost like you divide .05 by 3 and set the critical value of t so that a proportion of .05/3=0167 (1.67%) stays in the tails for each t test That means that you are creating a =98.33% confidence interval that is consistent with the null hypothesis for each t test. Then you have 5% altogether for the 3 tests. The actual values for the confidence interval are slightly different than that, but its close. 1/12/2019

More than 3 groups? Divide by the number of comparisons.
Four groups = 6 comparisons Critical value for t leaves about .05/6~.0083 in the tails, about =.9917 in the body Five groups = 10 comparisons Critical value for t leaves about .05/10~.0050 in the tails, about =.995 in the body (Actual values involve nth root of .95, so are a very little different than the values above (e.g., in the body for 5 groups and 10 comparisons instead of But dividing .05 by the number of comparisons is a “good enough” way to think about it.) 1/12/2019

Summary: We have to lower alpha for each individual t test to keep experimentwise alpha at .05
If we kept alpha for each individual t test at .05, then did 10, 20 comparisons, or 30 comparisons between pais of mean, we almost certainly will get at least one and possibly more Type 1 errors. That is,we would get statistically significant findings that would force us to say that two treatments would differ in their effects in the population as a whole, when that isn’t true. So we must lower alpha for each comparison to get an experimentwise alpha of .05 1/12/2019

The q table and the Tukey test
We could find the correct critical values for t with a lot of work and a very lengthy t table. Fortunately, Someone did it for us. A man named Tukey. He gave us the Tukey test and the q table. The q table is a fancy t table with each value of q equal to the proper critical value for t corrected for the number of comparisons to be made and then multiplied by the square root of 2.00 (1.414) to make the equations simpler. 1/12/2019

HSD 1/12/2019

An HSD is the minimum difference between two means that can be deemed statistically different, while keeping the experiment-wise alpha at .05. Any two means separated by this amount or greater are significantly different from each other. Any two means separated by less than this amount cannot be considered significantly different. 1/12/2019

Calculating HSD q - look up in a table based on dfW and k.
The harmonic mean is a geometric average of the number of subjects in each group. Remember that this is a post hoc comparison, therefore we have already calculated MSW, computed the ANOVA and found a statistically significant F ratio. 1/12/2019

Calculating the Harmonic Mean
Notice that this technique allows different numbers of subjects in each group. Oh No!! My rat died! What is going to happen to my experiment? 1/12/2019

Same size groups and harmonic and ordinary mean number of participants is the same.
3 groups; 4 subjects each 1/12/2019

When groups do not have equal numbers, harmonic mean is smaller than ordinary mean.
4 groups; 6, 4, 8 and 4 participants. Ordinary mean=22/4=5.5 participants each. 1/12/2019

Calculating HSD – finding q
q - look up in a table based on dfW and K. 1/12/2019

dfW q table for =.05

The table in the book has bad values. q table for =.05
dfW q table for =.05 Number of groups (means) across top. There is a whole other table for .01 The table in the book has bad values. dfW down left. (n-k)

Effects of alcohol Vitamin B in various teas
Two examples Effects of alcohol Vitamin B in various teas 1/12/2019

Ethanol and minutes of REM sleep
Means 0 g/kg min 1 g/kg min 2 g/kg min 3 g/kg min MSW = 65 k = 4 n = 16; 4 each group A rat in group 3 died! n=15 dfW = n-k = 15-4 = 11 1/12/2019

k=4 dfW=11 q table for =.05 4.26 dfW 2 3 4 5 6 7 8 9 10
4.26 k=4 dfW=11

Harmonic Mean 1/12/2019

Ethanol and sleep Means 0 g/kg - 79.28 min 1 g/kg - 61.54 min MSW = 65
dfW = n-k = 15-4 = 11 q = 4.26 Means as far or further apart than represent a significant difference and can be generalized. 1/12/2019

Ethanol and Sleep – the six comparisons
HSD = 17.87 Comparisons Difference p 0g/kg g/kg n.s. 0g/kg g/kg 0g/kg g/kg 1g/kg g/kg n.s. 1g/kg g/kg 2g/kg g/kg n.s. 1/12/2019

Ethanol and Sleep Conclusion
2 and 3 gm/kg of ethanol interrupted sleep significantly more than no ethanol, Also, 3 gm/kg of ethanol interrupts sleep significantly more than 1 gm/kg of ethanol. No adjoining doses differed significantly (0 vs.1, 1vs2, 2 vs.3 – all n.s.) 0 vs. 1 n.s. 0 vs 0 vs 1 vs. 2 n.s. 1 vs 2 vs. 3 n.s. 1/12/2019

Tea Example The means are Brand A: 8.27 ml Brand B: 7.50 ml
Brand C: 6.15 ml Brand D: 6.00 ml Brand E: 5.82 ml MSW = 1.51 k = 5 n = 50; 10 each group dfW = n-k = 50-5 = 45 1/12/2019

dfW q table for =.05 k=5 dfW=45 Use smaller number of df for missing degrees of freedom (or interpolate). 4.04

Harmonic Mean - Tea 1/12/2019

Tea Example – amount of vitamin B present in various cups of tea -10 cups in each group.
MSW = 1.51 k = 5 n = 50; 10 each group dfW = n-k = 50-5 = 45 The means are Brand A: 8.27 ml Brand B: 7.50 ml Brand C: 6.15 ml Brand D: 6.00 ml Brand E: 5.82 ml q = 4.04 Means as far or further apart than 1.57 represent an honestly significant difference. 1/12/2019

Tea Example – the ten comparisons
HSD = 1.57 Brand vs Brand Difference p A B n.s. A C A D A E B C n.s. B D n.s B E C D n.s. C E n.s. D E n.s. 1/12/2019

Tea Conclusion Brand A has significantly more nutritional value, as measured by amount of vitamin B, than Brand C, D, and E. Brand B has significantly more vitamin B than Brand E. No other brands differed significantly in nutritional value. A B n.s. A C .05 A D .05 A E .05 B C n.s. B D n.s B E .05 C D n.s. C E n.s. D E n.s. 1/12/2019

Tukey’s Honestly Significant Difference

Similar presentations

Presentation on theme: "Tukey’s Honestly Significant Difference"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tukey’s Honestly Significant Difference

Similar presentations

Presentation on theme: "Tukey’s Honestly Significant Difference"— Presentation transcript:

Similar presentations

About project

Feedback