Lecture 14: Thur., Feb. 26 Multiple Comparisons (Sections 6.3-6.4) Next class: Inferences about Linear Combinations of Group Means (Section 6.2).

Presentation on theme: "Lecture 14: Thur., Feb. 26 Multiple Comparisons (Sections 6.3-6.4) Next class: Inferences about Linear Combinations of Group Means (Section 6.2)."— Presentation transcript:

Lecture 14: Thur., Feb. 26 Multiple Comparisons (Sections 6.3-6.4) Next class: Inferences about Linear Combinations of Group Means (Section 6.2).

Discrimination against the Handicapped (Case Study 6.1) Study of how physical handicaps affect people’s perception of employment qualifications. Researchers prepared five videotaped job interviews, using same two male actors for each. Tapes differed only in that applicant appeared with a different handicap in each– (i) wheelchair; (ii) on crutches; (iii) hearing impaired; (iv) one leg amputated; (v) no handicap. Each tape shown to 14 students from U.S. university. Students rate qualifications of candidate on 0 to 10 point scale based on tape. Questions of interest: Do subjects systematically evaluate qualifications differently according to candidate’s handicap? If so, which handicaps produce different evaluations?

Multiple Comparisons Simultion In multiplecomp.JMP, 20 groups are compared with sample sizes of ten for each group. The observations for each group are simulated from a standard normal distribution. Thus, in fact, Number of pairs found to have significantly different means using t-test at level Iterat ion 12345 # of Pairs 3

Compound Uncertainty Compound uncertainty: When drawing more than one direct inference, there is an increased chance of making at least one mistake. Impact on tests: If using a conventional criteria such as a p-value of 0.05 to reject a null hypothesis, the probability of falsely rejecting at least one null hypothesis will be greater than 0.05 if considering multiple tests. Impact on confidence intervals: If forming multiple 95% confidence intervals, the chance that all of the confidence intervals will contain true parameter is less than 95%.

Simultaneous Inferences When several tests are considered simultaneously, they constitute a family of tests. Individual Type I error rate: Probability for a single test that the null hypothesis will be rejected assuming that the null hypothesis is true. Familywise Type I error rate: Probability for a family of test that at least one null hypothesis will be rejected assuming that all of the null hypotheses are true.

Individual vs. Family Confidence Levels If a family consists of k tests, each with individual type I error rate 0.05, the familywise type I error rate is at least 0.05 and no larger than k*0.05. Actual familywise type I error rate depends on degree of dependence between tests. If the tests are independent, the familywise type I error rate is 1-(.95) k If all the null hypotheses in a family true, the mean number of Type I errors equals k*0.05.

Familywise Type I error rates KUpper BoundFamilywise Error Rate if independent 30.150.14 50.250.23 201.000.64 1001.000.99

Multiple Comparison Procedures Multiple comparison procedures are methods of carrying out tests so that the familywise type I error rate is controlled (at 0.05 for example). Key issue: What is the appropriate family to consider?

Planned vs. Unplanned Comparisons Consider one-way layout with 20 groups. Planned Comparisons: researcher is specifically interested in comparing groups 1 and 4 because comparison answers a research question directly. This is a planned comparison. In the mice diets example, the researchers had five planned comparisons. Unplanned Comparisons: researcher examines all possible pairs of groups – 190 groups. As a result, researcher finds that only groups 5 and 8 suggest actual differences. Only this pair is reported as significant. For the handicap example, the comparisons were unplanned.

Families in Planned/Unplanned Planned Comparisons: The family of tests is the family of all planned comparisons (e.g., the family of five planned comparisons in mice diet). For small number of planned comparisons, it is usual practice to just use individual type I error rates. Unplanned Comparisons: The family of tests is the family of all possible comparisons - (k*(k-1)/2) for a k-group one-way layout. It is important to control the familywise type I error rate for unplanned comparisons. The handicap study involves unplanned comparisons.

Multiple Comparisons Procedures Consider testing vs. as part of a family of tests. t-statistic: Test to control individual Type I error rate for at level : Reject if Multiple Comparisons procedure to control familywise type I error rate at level : Higher critical value for rejecting We consider two multiple comparison procedures: (i) Tukey-Kramer; (ii) Bonferroni

Tukey-Kramer Based on computing the distribution of the largest |t| statistic under the null hypothesis that all group means are equal. For testing vs., reject if where q* depends on familywise type I error rate, I and n-I. For handicap study, q*=2.81 for familywise type I error rate=0.05 whereas as cutoff for individual type I error rate of 0.05 is 1.997.

Tukey-Kramer in JMP To see which groups are significantly different (in sense of statistical significance) at a familywise Type I error rate of 0.05, click Compare Means under Oneway Analysis (after Analyze, Fit Y by X) and click All Pairs, Tukey’s HSD. In table “Comparison of All Pairs Using Tukey’s HSD,” two groups are significantly different if and only if the entry in the table for the pair of groups is positive. The cutoff value q* is listed in the output.

Bonferroni Method General method for doing multiple comparisons for any family of k tests. Denote familywise type I error rate we want by p*. Compute p-values for each individual test -- Reject null hypothesis for ith test if Guarantees that familywise type I error rate is at most p*.

Bonferroni for mice diets Five comparisons were planned. Suppose we want the familywise error rate for the five comparisons to be 0.05. Bonferroni method: We should consider two groups to be significantly different if the p-value from the two-sided t-test is less than 0.05/5=0.01. Bonferroni in JMP: To do each test at a given level, after Fit Y by X, click red triangle next to Oneway Analysis and click Set Alpha Level. Then click Compare Means and Each Pair, Student’s t. This will show results of tests at chosen alpha level.

Multiple Comparisons and Confidence Intervals When several 95% confidence intervals are considered simultaneously, they constitute a family of confidence intervals Individual Confidence Level: Success rate of a procedure for constructing a single confidence interval. Familywise Confidence Level: Success rate of procedure for constructing a family of confidence intervals, where a “successful” usage is one in which all intervals in the family capture their parameters.

Multiple Comparison Procedures Confidence Interval: Estimate Margin of Error. Margin of Error = (Multiplier)x(Standard Error of Estimate). For individual confidence level of 95%, multiplier is about 2. For familywise confidence level of 95%, the multiplier is greater than 2. Family of confidence intervals for all group mean differences that has 95% familywise confidence level: can be found on Table A.5. For df=n-I, use closest df > n-I on chart.

Multiplicity A news report says, “A 15 year study of more than 45,000 Swedish solidiers revealed that heavy users of marijuana were six times more likely than nonusers to develop schizophrenia.” Were the investigators only looking for difference in schizophrenia among heavy/non-heavy users of marijuana? Key question: What is their family of tests? If they were actually looking for a difference among 100 outcomes (e.g., blood pressure, lung cancer), Bonferroni should be used to control the familywise Type I error rate, i.e., only consider a difference significant if p-value is less than.05/100=.0005. The best way to deal with the multiple comparisons problem is to design a study to search specifically for a pattern that was suggested by an exploratory data analysis. In other words we convert an “unplanned” comparison into a “planned” comparison by doing a new experiment.

Similar presentations