2Outline Intro Two one-sided test approach Alternative: regular CI approachTryon approach with “inferential” confidence intervals
3Tests of EquivalenceAs has been mentioned, the typical method of NHST applied to looking for differences between groups does not technically allow us to conclude equivalence just because we do not reject nullThe observed p-value can only be used as a measure of evidence against the null, not for itHaving a small sample would allow us to the retain the nullOften this conclusion is reached anywayStated differently, absence of evidence does not imply evidence of absenceAltman & Bland,1995Examples of usage:generic drug vs. established drugefficacy of counselling therapies vs. standards
4Conceptual approachWith our regular t-tests, to conclude there is a substantial difference you must observe a difference large enough to conclude it is not due to sampling errorThe same approach applies with equivalence testingTo conclude there is not a substantial difference you must observe a difference small enough to reject that closeness is not due to sampling error from distributions centered on large effectsIf the difference between meansfalls in this range, we wouldconclude the means belong toequivalent groups.
5Two one-sided tests (TOST) One method is to test the joint null hypothesis that our mean difference is not as large as the upper value of a specified range and not below the lower bound of the specified range of equivalenceH0a: μ1 - μ2 > δ ORH0b1: μ1 - μ2 < -δBy rejecting both of these hypotheses, we can conclude that | μ1 - μ2| < δ, or that our difference falls within the range specified1. Sometimes only this null is tested so as to claim ‘noninferiority’. Often used in clinical trials.
6First we’d have to reject a null regarding a difference in which μ1 - μ2 < -δ Then reject a difference of the opposite kind (same size though)
7Two one-sided tests (TOST) Having rejected both, we can safely conclude the small difference we see does not come from a distribution where the effect size is too big to ignore
8Tests of Equivalence Specify a range? Isn’t that subjective? Base it on:Previous researchPractical considerationsYour knowledge of the scale of measurement
9ExampleScores from a life satisfaction scale given to groups from two different cultures of interestFirst specify range of equivalence δSay, any score within 3 points of anotherGroup 1: M = 75, s = 3.2, N = 20Group 2: M = 76, s = 2.4, N = 20
10ExampleH01:H02:By rejecting H01 we conclude the difference is less than 3By rejecting H02 we conclude the difference is greater than -3
11Fuzzy yet?Recall that the size difference we are looking for is one that is 3 units.This would hold whether the first mean was 3 above the second mean or vice versaHence we are looking for a difference that lies in the μ1 – μ2 interval (-3,3), but can be said to be unlikely to have fallen in that interval ‘by chance’.
12Worked outH01 is rejected if -t ≤ -tcv, and H02 is rejected if t ≥ tcvdf = = 38Here we reject in both cases (.05 level)1 and conclude statistical equivalence1.Note these are 2 one-sided tests, so the cv regards the one-tailed critical value tcv = 1.69
13The CI ApproachAnother (and perhaps easier) method is to specify a range of values that would constitute equivalency among groups-δ to δDetermine the appropriate confidence interval for the mean difference between the groupsSee if the CI for the difference between means falls entirely within the range of equivalency1If either lower or upper end falls beyond do not claim equivalentThis is equivalent to the TOST outcome1. In the previous example, the 95% CI for the difference between means is -.80 to 2.79.
14Using Inferential Confidence Intervals Decide on a ranged estimate that reflects your estimation of equivalence (δ)In other words, if my ranged estimate is smaller than this, I will conclude equivalenceEstablish inferential CIs for each variable’s meanCreate a new range that includes the lower bound from the smaller mean, and the upper bound from the larger meanRepresents the maximum probable differenceSee if this CI range (Rg) is smaller than the specified maximum amount of difference allowed to still claim equivalence (δ)
16Previous example Scores on the life satisfaction scale First specify range of equivalence δSay, any score within 3 points of anotherGroup 1: M = 75, s = 3.2, N = 20Group 2: M = 76, s = 2.4, N = 20ICI95 Section 1 = to 76.06ICI95 Section 2 = to 76.79Rg = = 2.84
17ExampleThe range observed by our ICIs is not larger than the equivalence range (δ)Conclude the two classes scored similarly.11. At this point you might be confused how this parallels the other CI approach (slide 14). Note that the previous CI dealt with the difference score. Here we are dealing with the original scale units. However, if the ICIs overlap, you know the CI for the difference score would include zero.
18Another ExampleAnxiety measures are taken from two groups of clients who’d been exposed to different types of therapies (A & B)We’ll say the scale goes from 0 to 100First establish your range of equivalence
20Which method?Tryon’s proposal using ICIs is perhaps preferable in that:NHST is implicit rather than explicitRetains respective group informationCovers both tests of difference and equivalence simultaneouslyAllows for easy communication of either outcomeProvides for a third outcomeStatistical indeterminancySay what??
21Indeterminancy Neither statistically different or equivalent Or perhaps bothWith equivalence tests one may not be able to come to a solid conclusionJudgment must be suspended as there is no evidence for or against any hypothesisMay help in warding off interpretation of ‘marginally significant’ findings as trends
22Figure from Jones et al (BMJ 1996) showing relationship between equivalence and confidence intervals. This is from the first approach.
23Note on sample sizeIt was mentioned how we couldn’t conclude equivalence from a difference test because small samples could easily be used to show nonsignificancePower is not necessarily the same for tests of equivalence and differenceHowever the idea is the same, in that with larger samples we will be more likely to conclude equivalence
24SummaryConfidence intervals are an important component statistical analysis and should always be reportedNon-significance on a test of difference does not allow us to assume equivalenceMethods exist to test the group equivalency, and should be implemented whenever that is the true goal of the research questionFurthermore, using these methods force you to think about what a meaningful difference is before you even start