Assessing MCP-Mod relative to pairwise comparisons and trend tests in dose-ranging design and analysis Fang Liu, Anran Wang, Meihua Wang, Man Jin, Akshita.

Slides:



Advertisements
Similar presentations
COMPUTER INTENSIVE AND RE-RANDOMIZATION TESTS IN CLINICAL TRIALS Thomas Hammerstrom, Ph.D. USFDA, Division of Biometrics The opinions expressed are those.
Advertisements

Linear Regression.
A Flexible Two Stage Design in Active Control Non-inferiority Trials Gang Chen, Yong-Cheng Wang, and George Chi † Division of Biometrics I, CDER, FDA Qing.
Inference for Regression
Analysis of variance (ANOVA)-the General Linear Model (GLM)
By Trusha Patel and Sirisha Davuluri. “An efficient method for accommodating potentially underpowered primary endpoints” ◦ By Jianjun (David) Li and Devan.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Impact of Dose Selection Strategies on the Probability of Success in the Phase III Zoran Antonijevic Senior Director Strategic Development, Biostatistics.
Hypothesis Testing: Type II Error and Power.
Comparing Means.
MARE 250 Dr. Jason Turner Hypothesis Testing III.
Chapter 11 Multiple Regression.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
The Analysis of Variance
Comparing Means.
Lecture 14: Thur., Feb. 26 Multiple Comparisons (Sections ) Next class: Inferences about Linear Combinations of Group Means (Section 6.2).
Adaptive Designs for Clinical Trials
Sample Size Determination Ziad Taib March 7, 2014.
Qian H. Li, Lawrence Yu, Donald Schuirmann, Stella Machado, Yi Tsong
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Inference for regression - Simple linear regression
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
Comparing Means. Anova F-test can be used to determine whether the expected responses at the t levels of an experimental factor differ from each other.
Statistical Techniques I EXST7005 Review. Objectives n Develop an understanding and appreciation of Statistical Inference - particularly Hypothesis testing.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Chapter 8 Introduction to Hypothesis Testing
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
Proof of Concept and Dose Estimation in Phase II Clinical Trials Bernhard Klingenberg Asst. Prof. of Statistics Williams College Slides, paper and R-code.
1 Chihiro HIROTSU Meisei (明星) University Estimating the dose response pattern via multiple decision processes.
Handout Six: Sample Size, Effect Size, Power, and Assumptions of ANOVA EPSE 592 Experimental Designs and Analysis in Educational Research Instructor: Dr.
A Parametrized Strategy of Gatekeeping, Keeping Untouched the Probability of Having at Least One Significant Result Analysis of Primary and Secondary Endpoints.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter 11 Analysis of Variance
Chapter Nine Hypothesis Testing.
Chapter 4 Basic Estimation Techniques
Chapter 7. Classification and Prediction
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Adaptive non-inferiority margins under observable non-constancy
Chapter 11: Simple Linear Regression
Comparing Three or More Means
Hypothesis testing using contrasts
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Further Inference in the Multiple Regression Model
Chapter 11 Simple Regression
Chapter 8: Inference for Proportions
Slides by JOHN LOUCKS St. Edward’s University.
CONCEPTS OF HYPOTHESIS TESTING
Chapter 6 Hypothesis tests.
Chapter 9 Hypothesis Testing.
Chapter 9 Hypothesis Testing.
Discrete Event Simulation - 4
I. Statistical Tests: Why do we use them? What do they involve?
Statistical Inference about Regression
What is Regression Analysis?
DOSE SPACING IN EARLY DOSE RESPONSE CLINICAL TRIAL DESIGNS
One-Way Analysis of Variance
Comparing Means.
Product moment correlation
Parametric Methods Berlin Chen, 2005 References:
Proc MCPMod- A tool for Dose Finding using MCPMod
Chapter Nine: Using Statistics to Answer Questions
Covering Principle to Address Multiplicity in Hypothesis Testing
Tobias Mielke QS Consulting Janssen Pharmaceuticals
Björn Bornkamp, Georgina Bermann
Medical Statistics Exam Technique and Coaching, Part 2 Richard Kay Statistical Consultant RK Statistics Ltd 22/09/2019.
F test for Lack of Fit The lack of fit test..
Correlation and Simple Linear Regression
Presentation transcript:

Assessing MCP-Mod relative to pairwise comparisons and trend tests in dose-ranging design and analysis Fang Liu, Anran Wang, Meihua Wang, Man Jin, Akshita Chawla, Pranab Kumar Mitra, Sammy Yuan, Robin Mogg Aug 1, 2018

Background Importance of adequately understanding the dose-response relationship is well-recognized in drug development Inadequate design and analysis of dose finding studies continues to plague many development programs Consequences: limited understanding of dose-response relationship, failed Ph 3 trials, and post-marketing dose adjustments Analysis of dose-response studies typically have two primary strategies, each with shortcomings when applied separately Multiple comparisons procedures (MCP) Model-based approaches (MOD) MCP-Mod combines principles of each approach under a single approach

Motivations for MCP-MOD Traditional methods Multiple comparison procedures: E.g.: pairwise comparison, trend test, or multiple comparison procedure (MCP) (Bretz et al 2010) Qualitative dose level Pros: few assumptions about the dose-response relationship Cons: inferences are restricted to the dose level tested Model-based approaches: E.g.: Emax, Sigmoid Emax Models, or Pinheiro et al, 2006a Quantitative dose level Parametric functional relationship between dose and response: full profile over all possible dose levels Pros: More flexibility for target dose estimation Cons: Pre-specification of a single dose response model at design stage is difficult and risky (model uncertainty). No rigid pre-specification of how models are selected (potentially overfitting data)

Motivations for MCP-MOD Hybrid method: Combining MCP and MOD into one procedure Keep flexibility of modeling dose-response relationship (by Mod) while preserve FWER (family-wise error rate) (by MCP) The European Medicines Agency published a Qualification Opinion for the MCPMod approach in 2014, in which it wrote, “The MCP-Mod method is efficient in the sense that it uses the available data better than the commonly applied pairwise comparison.” FDA Division of Pharmacometrics issued a Determination letter supporting use of MCP-Mod. The European Medicines Agency published a Qualification Opinion for the MCPMod approach in 2013, in which it wrote, “The MCP-Mod method is efficient in the sense that it uses the available data better than the commonly applied pairwise comparison.”

MCP-MOD D.1 D.2 D.3 D.4 A.1 Pre-specified from historical data Calculating optimal contrast coefficients D.3 D.4 A.1

MCP-MOD: Design Stage (I) Response Y Normal response: continuous Generalized cases: binary, count, time to event Dose: K dose level (quantitative) placebo= 𝑑 1 , active dose= 𝑑 2 , 𝑑 3 ,…, 𝑑 𝑘 Identify a set of 𝑀 parameterized candidate models 𝑓 𝑚 𝑑 𝑖 , 𝛉 𝑚 = θ 0 + θ 1 𝑓 𝑚 0 ( 𝑑 𝑖 , 𝛉 𝑚 0 ) where 𝑖=1,…,𝑘; 𝑚=1,…,𝑀; 𝜃 0 is location parameter, 𝜃 1 is a scale parameter; 𝑓 𝑚 0 ( 𝑑 𝑖 , 𝛉 𝑚 0 ) denotes the standardized model function; 𝜃 0 , 𝜃 1 and 𝛉 𝑚 0 should be prespecified.

MCP-MOD: Design Stage (II) For each model f m d, θ m , construct a contrast test to detect a positive dose effect using optimal coefficients, i.e., test the null hypothesis of no dose-response: H 0 m : C m T u=0, m=1, 2,…M where u=(u d 1 …, u d k ) is the mean response vector at each dose level. c m = c m1 , …, c mk , The optimal contrast vector satisfies i=1 k c mi =0 , Maximize the power to test the contrast hypothesis Test statistics for model m: T m = i=1 k c mi Y i s i=1 k c mi 2 / n i , s 2 is the pooled variance estimate of σ 2 To control the FWER of testing M models simultaneously, use the maximum contrast test statistics: T max = max T 1 ,…, T M , Where (T 1 , …, T m ) ~ central multivariate t distribtuion when all M null hypotheses are assumed to be true. Let q 1−α denote the multiplicity adjusted critical value at level 1−α derived from the multivariate t distribution (Tukey, et al , 1985) Pr T max > q 1−α =1−Pr T 1 < q 1−α , …, T M < q 1−α =α Every single contrast test translates into a decision procedure to determine whether the given dose response shape is statistically significant, based on the observed data.

MCP-Mod: Analysis Stage Test dose-response PoC under model uncertainty: multiple contrast test If the observed T max > q 1−α , then we can reject H 0 and establish PoC; Models with T i > q 1−α are declared significant and kept for model selection, denoted by M ∗ ={ M 1 , .., M L }; If no model is significant, then no PoC Select the best model: Either select a single model from the significant models in M ∗ based on AIC, BIC or maximum contrast test statistics. Or apply model averaging techniques Weighted estimates across all the significant models are produced for the quantities of interest (Buckland et al., 1997) Based on dose response modelling approaches, the selected model is used to fit the observed data and estimate model parameter θ and target dose (such as the MED): MED = min d∈( d 1 , d k ] p d > p d 1 +∆ where Δ is the clinical relevance threshold, p d =f(d, θ) denote predicted response at dose d

Trend Test (I) To detect a linear trend in the response corresponding to three dose scales. Suppose d1,…, dn are the doses to be examined with d1=0 being control. Arithmetic scale: Xi=di, for i=1,.., k Ordinal scale: Xi=i Logarithmic scale: Xi=log(di), for i>1 and 𝑋1=𝑙𝑜𝑔𝑑2− 𝑑2−𝑑1 𝑑3−𝑑2 (𝑙𝑜𝑔𝑑3−𝑙𝑜𝑔𝑑2) At each scale Xi, testing the slope of a regression line is equivalent to testing the contrast among the treatment means , where yi is the response at each dose level; mici are the contrast coefficient corresponding to Xi; ; mi is the sample size at di. Multiplicity adjustment of the 3 test statistics for the three dose scalings using trivariate t- distribution to control FWER.

Simulation Objectives Performance of these 3 methods were evaluated through a simulation study Pairwise comparisons Trend test: select model based on AIC, Tmax or using model average technique. MCP-Mod: select model based on AIC, Tmax or using model average technique. Evaluation Criteria: Power to detect POC and type-I error Bias and MSE of the estimated MED We first review our previous simulations study. We would like to compare the performance of these three methods as we mentioned, pairwise comparisons, trend test and MCPMod under different scenarios. We want to investigate the differences between these methods in power to detect dose-response, type I error, bias and mean squared error of the MED estimates. Thanks Fang. So after a very exciting introduction of the background story and methods, we will then go to some interesting numerical experiments. We would like to compare the performance of these three methods as we mentioned, pairwise comparisons, trend test and MCPMod under different scenarios. We want to investigate the differences between these methods in power to detect dose-response, type I error, bias and mean squared error of the MED estimates.

Simulation Setup 4-arm, 5-arm and 6-arm parallel design Placebo response rate: 0.35; Maximum response rate: 1.0; clinically relevant effect vs. placebo: Δ = (0.2, 0.4, 0.55) 5,000 simulations each 8 true dose response profiles including monotonic and non-monotonic curves to evaluate power: Emax, exponential, linear, linear log, quadratic, beta model, logistic model, Step-2 model. Use the first 4 models or first 6 models as model candidates for MCPMod Using flat true model (no dose-response relationship) to evaluate type-I error. We considered three designs with different numbers of dose levels. Here we have 4-arm, 5-arm and 6-arm designs. For the 4 arm design, we have 3 doses plus placebo, and for 5 arm, 4 doses plus placebo, and so on. The dose range is from 0 to 1. And 6 true models were considered in this simulation study, they included 4 monotonic curves and 2 non-monotonic curves. They are emax model, exponential model, linear model, linear model with log transformed dose, quadratic model and beta model. And For MCPMod, these six models are the candidate models as well. We ran 5000 simulations for each combination of design, true model and delta .

Dose Response Relationship Six-arm design In addition to the previous six true models, two models, logistic model and Step-2 model as shown here were added to the true model settings in the simulation. We take the six-arm design for example here, these plots show the mean functions for the. The y-axis is the response, x-axis is the dose range, and the points here are the doses for the design. The true MEDs are also marked here. The red reference lines show the MEDs for delta 0.4 and the green lines show the MEDs for delta 0.2 for different models.

Power to Detect Dose-Response Signal and Type I Error We still used the previous 6 models as the candidate models for MCPMod. OK. These are the simulations results of power and type I error. We first look at the power to detect the dose-response. Here the rows are different true models, and the columns are different methods under different designs of dose arms. The best performance was highlighted here with red. So MCPMod has higher power to detect the dose-response than pairwise comparisons or the Dunnett’s test for all the designs with different numbers of dose arms. And MCPMod has the best power for quadratic model and beta model, emax model in general, and has the best power for linear-log models in the 5-arm design and 4-arm design. (In the six-arm design, actually the differences between the trend test and the MCPMod 0.004 and 0.006 for emax and linear log model are very small, compared to the MC standard errors of these values that are around 0.004 to 0.005. In summary, the power tends to be improved with more curved mean functions for MCPMod) And for trend test, due to non-monotonic shapes, it tends to have lower power for quadratic model and beta model, even worse than the pairwise comparison in the 4 dose arm design. But it has the best power for linear model and exponential model. It tends to have better power when response are monotonic. The relatively lower power of MCPMod compared to the trend test can be due to the number of candidate models in the MCP test, for the trend test, the multiplicity adjustment only includes three models while the MCPMod adjusts for the six models in this simulation. We also tried to only include 4 candidate models for MCPMod, and the power of MCPMod get improved.in general. For MCPMod and trend test, adding more dose arms increases their power in general. And the power of pairwise comparisons is in general lower with more active arms due to the multiplicity adjustments by Dunnett’s test. Then we look at the results for Type I error. These three methods control multiplicity well. They all have Type I error around 0.05, the values slightly above 0.05 can be due to the simulation error. The MC standard error here is around 0.003.

MED Estimation (Δ =0.2) These tables show the results of MSE and bias of MED estimates when delta equals to 0.2. MCPMod has best MSE and bias among all the methods for all the scenarios.

MED Estimation (Δ =0.55) These tables show the results of MSE and bias of MED estimates when delta equals to 0.2. MCPMod has best MSE and bias among all the methods for all the scenarios.

Conclusions For MCP-Mod and trend test, power to detect the dose-response relationship increases when more doses are included in the trial. Trend test is comparable to MCP-Mod, in terms of power to detect the dose-response relationship. However, trend test has larger bias and MSE when estimating MED. For MCP-Mod, MED estimate using model average technique has smaller MSE, while comparable bias comparing with Tmax and AIC model selection methods. MCP-Mod with 4 candidate models is slightly more powerful than MCP-Mod with 6 candidate models when the true model is one of the four candidate models. However, power is much less when the true model is not part of the 4 candidate models, but part of the 6 candidate models. So our findings are that for MCPMod and trend test, when we have more dose arms included in the trial, the power to detect dose-response get improved, whereas for the pairwise comparison, the power would go down with more doses. Power of dose selection depends on the dose-response shape. When the true model is exponential or linear, the trend test has the best power among these three methods, but performance is relatively worse in non-monotonic models. In terms of precision of the MED estimation, it seems that the Model based approach is better than non-modeling approach in general. MCPMod offers greater flexibility in identifying MED, and has the best MSE and bias of the MED estimates in general while it is able to have reasonable power as well. Great. This is our last slide. Any questions there? (Dose selection is a more difficult problem than PoC detecting, usually we get sample sizes that have sufficient power for PoC, but we do not give enough precision for estimation of MED.) Thank you so much everyone! Thanks so much Fang Robin Frank Sammy Wen for your guidance and suggestions on this project.

Appendix CHMP (2014) Qualification Opinion of MCP-Mod as an efficient statistical methodology for model-based design and analysis of Phase II dose finding studies under model uncertainty ICH-E4 Harmonized Tripartite Guideline (1994), “Dose-response Information to support drug registration” Tukey, J.W., Ciminera, J.L., and Heyse, J. F. (1985), Testing the statistical certainty of a response to increasing doses of a drug. Biometrics 41:295–301 Zhang, Donghui (2001), Testing the Trend of a Response Curve to an Increasing Sequence of Doses:A SAS Macro to Automate the Analysis. SAS Proceedings from PharmaSUG.