Presentation is loading. Please wait.

Presentation is loading. Please wait.

It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of.

Similar presentations


Presentation on theme: "It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of."— Presentation transcript:

1 It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of a new drug is ready to start accrual –Changes in the way we do phase II trials –Adaptive methods for the refinement and evaluation of predictive biomarkers in the pivotal trials in a non- exploratory manner –Use of archived tissues in focused “prospective- retrospective” designs based on randomized pivotal trials

2 Biomarker Adaptive Threshold Design Wenyu Jiang, Boris Freidlin & Richard Simon JNCI 99: , 2007

3 Biomarker Adaptive Threshold Design Randomized trial of T vs C Have identified a univariate biomarker index B thought to be predictive of patients likely to benefit from T relative to C Eligibility not restricted by biomarker No threshold for biomarker determined Biomarker value scaled to range (0,1) Time-to-event data

4 Procedure A Fallback Procedure Compare T vs C for all patients –If results are significant at level.04 claim broad effectiveness of T –Otherwise proceed as follows

5 Procedure A Test T vs C restricted to patients with biomarker B > b –Let S(b) be log likelihood ratio statistic Repeat for all values of b Let S* = max{S(b)} Compute null distribution of S* by permuting treatment labels If the data value of S* is significant at 0.01 level, then claim effectiveness of T for a patient subset Compute point and interval estimates of the threshold b

6 Sample Size Planning (A) Standard broad eligibility trial is sized for 80% power to detect reduction in hazard D at significance level 5% Biomarker adaptive threshold design is sized for 80% power to detect same reduction in hazard D at significance level 4% for overall analysis

7 ModelHazard reduction for those who benefit Overall Power Adaptive Test Everyone benefits 33% % benefit 60% % benefit 60%

8 Estimated Power of Broad Eligibility Design (n=386 events) vs Adaptive Design A (n=412 events) 80% power for 30% hazard reduction ModelBroad Eligibility Design Biomarker Adaptive Threshold A 40% reduction in 50% of patients (22% overall reduction) % reduction in 25% of patients (20% overall reduction) % reduction in 10% of patients (14% overall reduction).35.93

9 Estimation of Threshold

10 506 prostate cancer patients were randomly allocated to one of four arms: Placebo and 0.2 mg of diethylstilbestrol (DES) were combined as control arm C 1.0 mg DES, or 5.0 mg DES were combined as E. The end-point was overall survival (death from any cause). Covariates: Age: In years Performance status (pf): Not bed-ridden at all vs other Tumor size (sz): Size of the primary tumor (cm2) Index of a combination of tumor stage and histologic grade (sg) Serum phosphatic acid phosphatase levels (ap)

11 Prostate Cancer Data Covariate# patients with measured covariate Overall Test p value Procedure A Stage 2 p value Procedure B p value AP SG

12 Prostate Cancer Data Covariate# patients with measured covariate Estimated Threshold 95% CI80% CI AP50536(9,170)(25,108) SG49411(10,13)(11,11)

13

14

15 Procedure B S(b)=log likelihood ratio statistic for treatment effect in subset of patients with B  b T=max{S(0)+R, max{S(b)}} Compute null distribution of T by permuting treatment labels If the data value of T is significant at 0.05 level, then reject null hypothesis that T is ineffective Compute point and interval estimates of the threshold b

16 Sample Size Planning (B) Estimate power of procedure B relative to standard broad eligibility trial based on Table 1 for the row corresponding to the expected proportion of sensitive patients (  ) and the target hazard ratio for sensitive patients –e.g.  =25% and  =.4 gives RE=.429/.641=.67 When B has power 80%, overall test has power 80*.67=53% Use formula B.2 to determine the approximate number of events needed for overall test to have power 53% for detecting  =.4 limited to  =25% of patients

17 Example Sample Size Planning for Procedure B Design a trial to detect  =0.4 (60% reduction) limited to  =25% of patients –Relative efficiency from Table 1.429/.641=.67 When procedure B has power 80%, standard test has power 80%*.67=53% Formula B.2 gives D’=230 events to have 53% power for overall test and thus approximate 80% power for B Overall test needs D=472 events for 80% power for detecting the diluted treatment effect

18 Events needed to Detect Hazard Ratio  With Proportional Hazards

19 Events (D’) Needed for Overall Test to Detect Hazard Ratio  Limited to Fraction 

20

21

22 Multiple Biomarker Design A Generalization of the Biomarker Adaptive Threshold Design Have identified K candidate binary classifiers B 1, …, B K thought to be predictive of patients likely to benefit from T relative to C RCT comparing new treatment T to control C Eligibility not restricted by candidate classifiers Let the B 0 classifier classify all patients positive

23 Test T vs C restricted to patients positive for B k for k=0,1,…,K –Let S(B k ) be a measure of treatment effect in patients positive for B k –Let S* = max{S(B k )}, k* = argmax{S(B k )} –S* is the largest treatment effect observed –k* is the marker that identifies the patients where the largest treatment effect is observed

24 For a global test of significance –Randomly permute the treatment labels and repeat the process of computing S* for the shuffled data –Repeat this to generate the distribution of S* under the null hypothesis that there is no treatment effect for any subset of patients –The statistical significance level is the area in the tail of the null distribution beyond the value of S* obtained for the un-suffled data –If the data value of S* is significant at 0.05 level, then claim effectiveness of T for patients positive for marker k*

25 Repeating the analysis for bootstrap samples of cases provides – an estimate of the stability of k* (the indication)

26 Adaptive Signature Design An adaptive design for generating and prospectively testing a gene expression signature for sensitive patients Boris Freidlin and Richard Simon Clinical Cancer Research 11:7872-8, 2005

27 Adaptive Signature Design End of Trial Analysis Compare E to C for all patients at significance level 0.04 –If overall H 0 is rejected, then claim effectiveness of E for eligible patients –Otherwise

28 Otherwise: –Using only the first half of patients accrued during the trial, develop a binary classifier that predicts the subset of patients most likely to benefit from the new treatment E compared to control C –Compare E to C for patients accrued in second stage who are predicted responsive to E based on classifier Perform test at significance level 0.01 If H 0 is rejected, claim effectiveness of E for subset defined by classifier

29 Treatment effect restricted to subset. 10% of patients sensitive, 10 sensitivity genes, 10,000 genes, 400 patients. TestPower Overall.05 level test46.7 Overall.04 level test43.1 Sensitive subset.01 level test (performed only when overall.04 level test is negative) 42.2 Overall adaptive signature design85.3

30 Overall treatment effect, no subset effect. 10% of patients sensitive, 10 sensitivity genes, 10,000 genes, 400 patients. TestPower Overall.05 level test74.2 Overall.04 level test70.9 Sensitive subset.01 level test1.0 Overall adaptive signature design70.9

31 True Model

32 Classifier Development Using data from stage 1 patients, fit all single gene logistic models (j=1,…,M) Select genes with interaction significant at level 

33 Classification of Stage 2 Patients For i’th stage 2 patient, selected gene j votes to classify patient as preferentially sensitive to T if

34 Classification of Stage 2 Patients Classify i’th stage 2 patient as differentially sensitive to T relative to C if at least G selected genes vote for differential sensitivity of that patient

35 Empirical Power Response Rate for Control Patients 25% Response Rate in Sensitive Subset Overall.05Overall.04Subset.01Overall Adaptive 98% % % % %

36 Adaptive Signature Design for Clinical Trial of Advanced Prostate Cancer Richard Simon, D.Sc. Chief, Biometric Research Branch, National Cancer Institutehttp://brb.nci.nih.gov

37 Cancers of a primary site often represent a heterogeneous group of diverse molecular diseases which vary fundamentally with regard to –the oncogenic mutations that cause them –their responsiveness to specific drugs

38 How can we develop new drugs in a manner more consistent with modern tumor biology and obtain reliable information about what regimens work for what kinds of patients?

39 Developing a drug with a companion test increases complexity and cost of development but should improve chance of success and has substantial benefits for patients and for the economics of medical care

40 Although the randomized clinical trial remains of fundamental importance for predictive genomic medicine, some of the conventional wisdom of how to design and analyze rct’s requires re- examination The concept of doing an rct of thousands of patients to answer a single question about average treatment effect for a target population presumed homogeneous with regard to the direction of treatment efficacy in many cases no longer has an adequate scientific basis

41 Predictive biomarkers –Measured before treatment to identify who will benefit from a particular treatment

42 Prospective Co-Development of Drugs and Companion Diagnostics in Ideal Settings 1.Develop a completely specified classifier identifying the patients most likely to benefit from a new drug Based on biology, pre-clinical data and phase I-II studies 2.Establish analytical validity of the classifier 3.Design and analyze a focused clinical trial to evaluate effectiveness of the new treatment and how it relates to the classifier

43 Cancer biology is complex and it is not always possible to have the right single completely defined predictive classifier identified and analytically validated by the time the pivotal trial of a new drug is ready to start accrual –Adaptive methods for the refinement and evaluation of predictive biomarkers in the pivotal trials in a non- exploratory manner –Use of archived tissues in focused “prospective- retrospective” designs based on previously conducted randomized pivotal trials Simon, Paik, Hayes; JNCI 101:1-7, 2009

44 Adaptive Signature Design Boris Freidlin and Richard Simon Clinical Cancer Research 11:7872-8, 2005

45 Adaptive Signature Design End of Trial Analysis Compare X to C for all patients at significance level 0.01 –If overall H 0 is rejected, then claim effectiveness of X for eligible patients –Otherwise Compare X to C in adaptively defined subset of patients using threshold of statistical significance 0.04

46 Divide the patients randomly into a training set T and a validation set V. The training set will contain one-third of the patients. Using the biomarker information, treatment and outcome for the patients in T, develop a binary classifier that identifies the subset of patients who appear most likely to benefit from the new treatment X compared to control C –f(B1,B2,B3,B4) = log hazard ratio of death for X relative to C as a function of biomarker values –If f(B1,B2,B3,B4)/ser c then Classifier(B1,B2,B3,B4)=C –Cutpoint c optimized

47 Use the classifier developed in training set T to classify the patients in the validation set V. Let V X denote the subset of patients in V who are classified as likely to benefit from X Compare survivals of patients who received T to survivals of those who received C for patients accrued in V X –If the difference in survival is significant at level 0.04, then the new treatment is more effective than the control for patients with biomarker values for which Classifier(B1,B2,B3,B4) =X.

48 This approach can also be used to identify the subset of patients who don’t benefit from X in cases where X is superior to C overall at the 0.01 level. The patients in V C = V – V X are predicted not to benefit from X. Survivals of X vs C can be examined for patients in that subset and a confidence interval for the hazard ratio calculated.

49 This design has improved statistical power for identifying treatments that benefit a subset of patients in molecularly heterogeneous diseases It has greater specificity than the standard approach which results in over-treatment of vast numbers of patients with approved drugs that do not benefit them

50 Sample Size Planning for Advanced Prostate Cancer Trial Survival endpoint Final analysis when there are 700 deaths total –90% power for detecting a 25% overall reduction in hazard at two-sided 0.01 significance level (increase in median from 12 months to 9 months) Power for evaluating treatment in adaptively determined subset –157 deaths required for 80% power to detect 37% reduction in hazard at two-sided 0.04 significance level. –If one-third of patients in the validation set are classifier positive, then to have 157 deaths in the subset we need 157*3=471 deaths in the validation set. Since the validation set is two-thirds of the total, we require 707 total deaths. –To have 700 deaths at final analysis, 935 patients will be accrued and followed till the event rate is 75%

51 Sample Size Planning For this example, the sample size is strongly dependent on having high statistical power for detecting relatively modest treatment effect overall and in an adaptively defined subset consisting of only 33% of the patients. The number of required patients can be substantially reduced by –Targeting larger treatment effects –Targeting treatment benefits that apply to more than 33% of the patients –Refining the simple interim analysis for futility described for this example

52 Tumor specimen at entry as condition for eligibility Specimen preserved for later assay Assays will be performed prior to analysis using analytically validated tests –Reproducible, robust and accurate for use with archived tissue –No cut-point required –Additional markers could be included prior to using specimens

53 Interim Futility Analysis Interim futility analysis conducted when there are approximately 340 patients who have been followed for 6 months after randomization The analysis will use 6-month progression-free survival as intermediate endpoint. If difference between X group and C group is not significant at one-sided 0.20 level, then accrual will be terminated Power 90% for detecting 12 percentage point increase in proportion free of recurrence at 6 months from baseline of 40%

54 Interim Futility Analysis Interim futility analysis does not utilize any of the 5% type I error of the study Using 6 month PFS as endpoint for interim futility analysis does not assume that PFS is a valid surrogate of survival; only that it is plausible to not expect a survival benefit if there is no PFS benefit Using PFS enables trial to be stopped earlier if there is no evidence of benefit for X The one-sided 0.20 significance level is used because the overall effect may be weak if the treatment benefits only a 33% subset of the patients.

55 If the Markers Were Measured at Randomization Analytically validated tests would be required by the start of accrual The interim analysis could involve marker- defined subsets of patients Restricting accrual based on interim evaluation of marker specific treatment effects could substantially reduce sample size but would introduce additional issues not addressed in the current design

56 Key Features Trial-wise type I error limited to 0.05 –Chance of any false positive conclusion of treatment benefit limited to 0.05 Randomized treatment assignment Regulatory endpoint Sample size sufficient for –evaluating treatment effect in 33% subset Biomarkers measured using analytically validated tests Analysis algorithm pre-defined, and specific analysis plan defined prior to any assaying of tumors or data analysis

57 This approach is as sound statistically as the conventional one treatment fits all design –It provides strong evidence for evaluating the new treatment overall and within the classifier positive subset and for evaluating the classifier –In settings where a single conventional “average effect” trials would be the basis for drug approval, this design should be the basis for approval either overall or for the identified subset.

58 This approach is more science based and consistent with tumor biology than the standard approach of treating thousands of patients with a heterogeneous disease to answer one question of whether the average treatment effect is zero and then treating everyone in a one treatment fits all manner.

59 Cross-Validated Adaptive Signature Design Wenyu Jiang, Boris Freidlin, Richard Simon Clin Ca Res 16:691-8, 2010

60 Cross-Validated Adaptive Signature Design End of Trial Analysis Compare T to C for all patients at significance level  overall –If overall H 0 is rejected, then claim effectiveness of T for eligible patients –Otherwise

61 Otherwise Partition the full data set into K parts Form a training set by omitting one of the K parts. The omitted part is the test set –Using the training set, develop a predictive classifier of the subset of patients who benefit preferentially from the new treatment T compared to control C using the methods developed for the ASD –Classify the patients in the test set as either sensitive or not sensitive to T relative to C Repeat this procedure K times, leaving out a different part each time –After this is completed, all patients in the full dataset are classified as sensitive or insensitive

62 Compare T to C for sensitive patients by computing a test statistic S e.g. the difference in response proportions or log-rank statistic (for survival) Generate the null distribution of S by permuting the treatment labels and repeating the entire K- fold cross-validation procedure Perform test at significance level  overall If H 0 is rejected, claim effectiveness of E for subset defined by classifier –The sensitive subset is determined by developing a classifier using the full dataset

63 80% Response to T in Sensitive Patients 25% Response to C otherwise 25% Response to C 10% Patients Sensitive ASDCV-ASD Overall 0.05 Test Overall 0.04 Test Sensitive Subset 0.01 Test Overall Power

64 70% Response to T in Sensitive Patients 25% Response to T Otherwise 25% Response to C 20% Patients Sensitive ASDCV-ASD Overall 0.05 Test Overall 0.04 Test Sensitive Subset 0.01 Test Overall Power

65 70% Response to T in Sensitive Patients 25% Response to T Otherwise 25% Response to C 30% Patients Sensitive ASDCV-ASD Overall 0.05 Test Overall 0.04 Test Sensitive Subset 0.01 Test Overall Power

66 35% Response to T 25% Response to C No Subset Effect ASDCV-ASD Overall 0.05 Test Overall 0.04 Test Sensitive Subset 0.01 Test Overall Power

67 25% Response to T 25% Response to C No Subset Effect ASDCV-ASD Overall 0.05 Test Overall 0.04 Test Sensitive Subset 0.01 Test Overall Power

68 Predictive Analysis of Clinical Trials Using cross-validation we can evaluate our methods for analysis of clinical trials, including complex subset analysis algorithms, in terms of their effect on improving patient outcome via informing therapeutic decision making R. Simon Clinical trials for predictive medicine, Clinical Trials 2010:1-9

69 Define an algorithm for predicting optimal treatment as function of covariate vector x using training dataset D For patients with covariate vector x, the algorithm predicts preferred treatment R(x | D) = T –Or R(x | D) = C

70 At the conclusion of the trial randomly partition the patients into 10 equally sized sets P 1, …, P 10 Let D -i denote the full dataset minus data for patients in P i Using 10-fold complete cross-validation, omit patients in P i Apply the defined algorithm to analyze the trial using only data in D -i For each patient j in P i record the treatment recommendations based on fitting the algorithm to data D -I i.e. R j =T or R j =C.

71 Repeat the above for all 10 loops of the cross-validation When all 10 loops are completed, all patients have been classified as what their optimal treatment is predicted to be

72 Test of Significance for Effectiveness of T vs C Using the New Algorithm Let A denote the set of patients for whom treatment T is predicted optimal with the new algorithm i.e. R j =T Compare outcomes for patients in A who actually received T to those in A who actually received C – Let z = standardized log-rank statistic Compute statistical significance of z by randomly permuting treatment labels and repeating the entire procedure –Do this 1000 or more times to generate the permutation null distribution of treatment effect for the patients predicted to be the best candidates for T

73 The significance test based on comparing T vs C for the adaptively defined subset is the basis for demonstrating that T is more effective than C for some patients. –This test may be more powerful than the standard overall test in cases where the proportion of patients who benefit from T is limited. Although there is less certainty about which patients actually benefit, prediction accuracy may be substantially greater than for the standard single null hypothesis test based method have greater specificity for identifying the right patients

74 506 prostate cancer patients were randomly allocated to one of four arms: Placebo and 0.2 mg of diethylstilbestrol (DES) were combined as control arm C 1.0 mg DES, or 5.0 mg DES were combined as E. The end-point was overall survival (death from any cause). Covariates: Age: In years Performance status (pf): Not bed-ridden at all vs other Tumor size (sz): Size of the primary tumor (cm2) Index of a combination of tumor stage and histologic grade (sg) Serum phosphatic acid phosphatase levels (ap)

75 After removing records with missing observations in any of the covariates, 485 observations remained. A proportional hazards regression model was developed using patients in both E and C groups. Main effect of treatment, main effect of covariates and treatment by covariate interactions were considered. log[HR(z,x)]=a z + b’x + z c’x z = 0,1 treatment indicator (z=0 for control) x = vector of covariates log[HR(1,x)] – log[HR(0,x)] = a + c’x Define classifier C(X) = 1 if a + c’x < c = 0 otherwise c was fixed to be the median of the a + c’x values in the training set.

76 Figure 1: Overall analysis. The value of the log-rank statistic is 2.9 and the corresponding p-value is The new treatment thus shows no benefit overall at the 0.05 level.

77 Figure 2: Cross-validated survival curves for patients predicted to benefit from the new treatment. log-rank statistic = 10.0, permutation p-value is.002

78 Figure 3: Survival curves for cases predicted not to benefit from the new treatment. The value of the log-rank statistic is 0.54.

79 Proportional Hazards Model Fitted to Full Dataset coef p-value Treatment age pf(Normal.Activity) sz sg ap Treatment*age Treatment*pf(Normal.Activity) Treatment*sz Treatment*sg Treatment*ap

80 By applying the analysis algorithm to the full RCT dataset D, recommendations are developed for how future patients should be treated; i.e. R(x|D) for all x vectors. The stability of the recommendations can be evaluated based on the distribution of R(x|D(b)) for non-parametric bootstrap samples D(b) from the full dataset D.

81

82

83

84

85

86 Characteristics of Patients for Whom Classifications are Stable < 20% of classifiers favor X>80% of classifiers favor X MedianIQRMedianIQR Age Size7817 Stage92114 Acid Phos Perf Status144 good31 poor166 good2 poor

87 Standard Analysis Algorithm Test the overall H 0 If you reject H 0 then treat all future patients with T, otherwise treat all future patients with C

88 Expected K-Year DFS Using Standard Analysis If the overall null hypothesis is not rejected –Expected K-Year DFS is the observed K-year DFS in the control group If the overall null hypothesis is rejected –Expected K-Year DFS is the observed K-year DFS in T group

89 Expected K-Year DFS Using New Algorithm Let S(T) = observed K-year DFS for patients j for whom R j =T and who received treatment T –m T such patients Let S(C) = observed K-year DFS for patients j for whom R j =C and who received treatment C –m C such patients Expected K-Year DFS using new algorithm {m T S(T) + m C S(C)}/{m T + m C } Confidence limits for this estimate can be obtained by bootstrapping the complete cross- validation procedure

90 Hence, alternative methods for analyzing RCT’s can be evaluated in an unbiased manner with regard to their value to patients using the actual RCT data


Download ppt "It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of."

Similar presentations


Ads by Google