Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

Chapter 7 Hypothesis Testing
Study Size Planning for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
Sample size estimation
1 Health Warning! All may not be what it seems! These examples demonstrate both the importance of graphing data before analysing it and the effect of outliers.
LSU-HSC School of Public Health Biostatistics 1 Statistical Core Didactic Introduction to Biostatistics Donald E. Mercante, PhD.
Hypothesis Testing making decisions using sample data.
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
By Trusha Patel and Sirisha Davuluri. “An efficient method for accommodating potentially underpowered primary endpoints” ◦ By Jianjun (David) Li and Devan.
ODAC May 3, Subgroup Analyses in Clinical Trials Stephen L George, PhD Department of Biostatistics and Bioinformatics Duke University Medical Center.
Analysis of Variance (ANOVA) Statistics for the Social Sciences Psychology 340 Spring 2010.
When Is Stratification Detrimental to a Clinical Trial Design? Part II Katherine L. Monti, Ph.D. Senior Statistical Scientist and Director of the Massachusetts.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Chapter 11: Sequential Clinical Trials Descriptive Exploratory Experimental Describe Find Cause Populations Relationships and Effect Sequential Clinical.
Sample size computations Petter Mostad
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
8-2 Basics of Hypothesis Testing
Inferences About Process Quality
Sample Size Determination
Course Content Introduction to the Research Process
Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
BA 427 – Assurance and Attestation Services
Chapter 14 Inferential Data Analysis
Sample Size Determination Ziad Taib March 7, 2014.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
Hypothesis Testing:.
1 Efficacy Results NDA (MTP-PE) Laura Lu Statistical Reviewer Office of Biostatistics FDA/CDER.
Lecture Slides Elementary Statistics Twelfth Edition
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Chapter 8 Introduction to Hypothesis Testing
Dr. Tom Kuczek Purdue University. Power of a Statistical test Power is the probability of detecting a difference in means under a given set of circumstances.
8.1 Inference for a Single Proportion
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
Chapter 8 Introduction to Hypothesis Testing
1 Sampling Distributions Lecture 9. 2 Background  We want to learn about the feature of a population (parameter)  In many situations, it is impossible.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
Hypothesis Testing Introduction to Statistics Chapter 8 Mar 2-4, 2010 Classes #13-14.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
1 Statistical Review Dr. Shan Sun-Mitchell. 2 ENT Primary endpoint: Time to treatment failure by day 50 Placebo BDP Patients randomized Number.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
1 Statistics in Drug Development Mark Rothmann, Ph. D.* Division of Biometrics I Food and Drug Administration * The views expressed here are those of the.
Adaptive randomization
What is a non-inferiority trial, and what particular challenges do such trials present? Andrew Nunn MRC Clinical Trials Unit 20th February 2012.
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
RDPStatistical Methods in Scientific Research - Lecture 41 Lecture 4 Sample size determination 4.1 Criteria for sample size determination 4.2 Finding the.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
Compliance Original Study Design Randomised Surgical care Medical care.
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
Statistics for the Social Sciences Psychology 340 Spring 2009 Analysis of Variance (ANOVA)
1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
THE ROLE OF SUBGROUPS IN CLINICAL TRIALS Ralph B. D’Agostino, Sr., PhD Boston University September 13, 2005.
1 Pulminiq™ Cyclosporine Inhalation Solution Pulmonary Drug Advisory Committee Meeting June 6, 2005 Statistical Evaluation Statistical Evaluation Jyoti.
1 Section 8.2 Basics of Hypothesis Testing Objective For a population parameter (p, µ, σ) we wish to test whether a predicted value is close to the actual.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Contingency Tables.
Chapter 17 STRUCTURAL EQUATION MODELING. Structural Equation Modeling (SEM)  Relatively new statistical technique used to test theoretical or causal.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Hypothesis Testing Chapter Hypothesis Testing  Developing Null and Alternative Hypotheses  Type I and Type II Errors  One-Tailed Tests About.
Statistical Core Didactic
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Chapter 8: Inference for Proportions
Power, Sample Size, & Effect Size:
CONCEPTS OF HYPOTHESIS TESTING
2. Stratified Random Sampling.
2. Stratified Random Sampling.
Chapter Nine Part 1 (Sections 9.1 & 9.2) Hypothesis Testing
Presentation transcript:

Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant Breast and Bowel Project (NSABP) University of Pittsburgh, Department of Statistics Joint Work With John Bryant, PhD Director of the NSABP University of Pittsburgh, Departments of Statistics and Biostatistics

This work concerns the design and analysis of clinical trials to compare treatment to control where we wish to test the primary hypothesis on several subgroups in addition to the global test. Unless steps are taken to control for multiple comparisons, the type I error rate will be inflated in this situation. Controlling for multiple comparisons generally leads to a loss of power so that subgroup analyses are often avoided. However, subgroup analyses often serve a legitimate scientific purpose, and should not be entirely avoided.

To address this problem, we propose a method whereby a pre-specified experimentwise alpha is “spent” or allocated among the global (stratified) test and the constituent subset (stratum-level) tests. We find the method to be efficient in terms of experimentwise power when the treatment effect in each stratum is in the same direction and the magnitude of the range of treatment effects between strata is not too great. The procedure can be used to make the design of a clinical trial robust against the presence of a treatment by strata interaction when a significant interaction is not anticipated.

Outline Motivating Example - NSABP Protocol B-29. Define Experimentwise Type I Error Rate. Common methods of dealing with subgroup testing: How do they control Type I error rate? Multiple testing approach: Perform all tests at reduced nominal levels of significance so that the experimentwise Type I error rate is controlled. Exploration of how to spend alpha on the individual tests to achieve ‘good’ operating characteristics for the overall experiment.

NSABP B-29 Schema T1 or T2 or T3; pN0; M0 ER-Positive Decision to use Chemotherapy* No ChemotherapyChemotherapy Stratification Age Pathologic Tumor Size Stratification Age Pathologic Tumor Size Tamoxifen + Octreotide AC + Tamoxifen AC + Tamoxifen + Octreotide Group 1Group 2Group 3Group 4 * The decision to use AC chemotherapy must be made prior to randomization.

Design Considerations H 0 : Relative Risk = 1, Power .8 to detect Relative Risk .75, using a.05-level two-sided stratified log-rank test. Power requirements and assumptions about rates of accrual dictate the following: i) Accrual of 3,000 patients over 5 years with 3 years additional follow-up. ii) Final analysis following the 400th event.

Physicians involved in the design of the trial thought the effect of Octreotide would be unlikely to materially interact with chemotherapy status. In planning the trial it was felt to be important to provide for individual tests for the effect of Octreotide in the presence of chemotherapy as well as in its absence. It was considered unacceptable to treat these subgroup analyses as post-hoc, or exploratory, so it was necessary to design an analysis plan that controlled for the experimentwise error rate.

Definition Experimentwise Type I Error Rate The probability of finding a significant difference between treatment and control on either the overall stratified test or any of the stratum-specific tests given that no difference exists.

Common approaches to controlling experimentwise Type I error rate Unprotected Subgroup Tests – Perform the overall stratified test at level  ; follow-up with stratum- specific  level tests. Protected Subgroup Tests – Perform the overall stratified test at level  ; follow-up with stratum- specific  level tests only if treatment-by-strata interaction is significant. Protected Subgroup Tests – Test for treatment-by- strata interaction at level . If interaction is significant, test for treatment effect individually in each stratum at level . If interaction is not significant, test for overall treatment effect at level .

The two alternatives for protecting the stratum specific tests are actually quite similar in operating characteristics, since if both interaction test and the overall stratified test are significant, it is almost certain that at least one stratum level test will also be significant. It can be shown that this is true with probability one in the case of k = 2 strata. Equivalence of Protection Schemes

Experimentwise Level of Significance

Range of experimentwise type I error rate for protected and unprotected schemes. All tests performed at  =.05

Multiple testing approach We now consider a multiple testing approach where one performs an overall test for treatment effect based on the stratified log-rank statistic followed by tests within each stratum. All tests are carried out at reduced levels of significance so that the experimentwise level of significance is maintained at a specified rate.

Let RR i represent the relative risk in the i th stratum.

Experimentwise Power The probability of detecting at least one significant difference during the multiple testing procedure given the true RR in each stratum. When the true RR in each stratum is 1, we refer to the power as the Type I error rate. Definition

The experimentwise power against a specific alternate hypothesis can be written as: Where  denotes the standard normal density, and the integral is taken over the acceptance region defined by:

Using the simplified region of integration, we can rewrite the power as follows: Where  is the CDF of the standard normal distribution.

These results generalize to k strata as follows:

The multiple integral in the previous equation can be difficult to evaluate when the number of strata goes beyond about 3 or 4. Fortunately there is a recursive representation of the power function that facilitates computation when there are many strata.

An S-Plus function implementing the recursive method of calculating power is available.

How should we spend alpha? The question arises as to how the type I error rates should be divided between the overall and the stratum-specific tests, or rather, how much alpha should be spent on the stratum-specific tests. For k = 2 strata and  exper = 0.05, the table and figure which follow show a variety of combinations of the nominal size of the overall test (  0 ) and the nominal size of the within stratum tests (  1 &  2 ). For simplicity, we only consider the case where  1 =  2. The possibilities form a continuum between (.05, 0) (no stratum specific tests) to (0,.0253) (no overall test). Given  exper,  0, and the constraint  1 =  2, the common value of  1 &  2 is a function of a (the proportion of events in the first stratum), however the effect of varying a is weak.

Conclusion The alpha spending approach described here is very efficient and effective when the treatment effect is in the same direction in each stratum and there may or may not be small to moderate differences in the size of the effect between strata. The method is also sensitive to the balance of allocation of patients (events) to the two strata. When the sizes of the stratum level tests are equal, the approach seems to be quite effective when the balance is no worse than about 3 to 1. We suggest spending more alpha on the stratum with the most patients (events) when the number of patients is out of balance.

Spending between ½ and 1 percent of alpha (setting  0 equal to.045 to.04) would seem to be a prudent choice for k = 2 strata and the range of circumstances explored in this paper when substantial interaction is thought to be unlikely apriori. When there is no overall effect but there may be offsetting effects between the strata, the alpha spending approach is not very powerful. Designing the trial for a test for interaction would be much more effective in this situation. If one were to use the multiple testing procedure in this situation, most of the alpha should be spent on the within strata tests.

In the design of NSABP protocol B-29, we expected little or no interaction and nearly equal accrual to the two stratum levels. Given our design assumptions in B-29, we spent about ½ % of alpha on stratum level tests and set the size of the stratum level tests equal. If we had anticipated unequal accrual to the strata or significant interaction, we likely would have altered our choices. Our choice of alpha spending (  0  0.045,  1 =  2  ), proved to preserve power in the presence of mild perturbations of design assumptions.

The tools described in this paper can be adapted to the design of other potential trials. Given prior beliefs regarding the likelihood of significant treatment-strata interaction, balance of accrual to the stratum levels, and other factors, one can explore the sensitivity of power to design assumptions and parameters much as we have in the latter part of this paper.