Use of Candidate Predictive Biomarkers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.

Use of Candidate Predictive Biomarkers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov

Predictive biomarkers Predictive biomarkers Measured before treatment to identify who is likely or unlikely to benefit from a particular treatment Measured before treatment to identify who is likely or unlikely to benefit from a particular treatment ER, HER2, KRAS, EGFR ER, HER2, KRAS, EGFR

Biomarker Validity Analytical validity Analytical validity Measures what it’s supposed to Measures what it’s supposed to Reproducible and robust Reproducible and robust Clinical validity (correlation) Clinical validity (correlation) It correlates with something clinically It correlates with something clinically Medical utility Medical utility Actionable resulting in patient benefit Actionable resulting in patient benefit

Developing a drug with a companion test increases complexity and cost of development but should improve chance of success and has substantial benefits for patients and for the economics of health care How can we do it in a way that provides the kind of reliable answers we expect from phase III trials?

When the Biology is Clear 1. Develop a completely specified classifier of the patients likely (or unlikely) to benefit from a new drug Classifier is based on either a single gene/protein or composite score Classifier is based on either a single gene/protein or composite score 2. Develop an analytically validated 3. Design a focused clinical trial to evaluate effectiveness of the new treatment and how it relates to the test

Using phase II data, develop predictor of response to new drug Develop Predictor of Response to New Drug Patient Predicted Responsive New Drug Control Patient Predicted Non-Responsive Off Study Targeted (Enrichment) Design

Evaluating the Efficiency of Targeted Design Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10:6759-63, 2004; Correction and supplement 12:3229, 2006 Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10:6759-63, 2004; Correction and supplement 12:3229, 2006 Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24:329-339, 2005. Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24:329-339, 2005.

Relative efficiency of targeted design depends on Relative efficiency of targeted design depends on proportion of patients test positive proportion of patients test positive effectiveness of new drug (compared to control) for test negative patients effectiveness of new drug (compared to control) for test negative patients When less than half of patients are test positive and the drug has little or no benefit for test negative patients, the targeted design requires dramatically fewer randomized patients than the standard design in which the marker is not used When less than half of patients are test positive and the drug has little or no benefit for test negative patients, the targeted design requires dramatically fewer randomized patients than the standard design in which the marker is not used

Comparing T vs C on Survival or DFS 5% 2-sided Significance and 90% Power % Reduction in HazardNumber of Events Required 25%509 30%332 35%227 40%162 45%118 50%88

Hazard ratio 0.60 for test + patients Hazard ratio 0.60 for test + patients 40% reduction in hazard 40% reduction in hazard Hazard ratio 1.0 for test – patients Hazard ratio 1.0 for test – patients 0% reduction in hazard 0% reduction in hazard 33% of patients test positive 33% of patients test positive Hazard ratio for unselected population is Hazard ratio for unselected population is 0.33*0.60 + 0.67*1 = 0.87 0.33*0.60 + 0.67*1 = 0.87 13% reduction in hazard 13% reduction in hazard

To have 90% power for detecting 40% reduction in hazard within a biomarker positive subset To have 90% power for detecting 40% reduction in hazard within a biomarker positive subset Number of events within subset = 162 Number of events within subset = 162 To have 90% power for detecting 13% reduction in hazard overall To have 90% power for detecting 13% reduction in hazard overall Number of events = 2172 Number of events = 2172

Stratification Design Develop Predictor of Response to New Rx Predicted Non- responsive to New Rx Predicted Responsive To New Rx Control New RXControl New RX

Develop prospective analysis plan for evaluation of treatment effect and how it relates to biomarker Develop prospective analysis plan for evaluation of treatment effect and how it relates to biomarker type I error should be protected for multiple comparisons type I error should be protected for multiple comparisons Trial sized for evaluating treatment effect overall and in subsets defined by test Trial sized for evaluating treatment effect overall and in subsets defined by test Stratifying” (balancing) the randomization is useful to ensure that all randomized patients have the test performed but is not necessary for the validity of comparing treatments within marker defined subsets Stratifying” (balancing) the randomization is useful to ensure that all randomized patients have the test performed but is not necessary for the validity of comparing treatments within marker defined subsets Post-stratification provides more time for development of analytically validated tests but risks validity of the results if adequate specimens are not collected in -> 100% of cases

Fallback Analysis Plan Compare the new drug to the control overall for all patients ignoring the classifier. Compare the new drug to the control overall for all patients ignoring the classifier. If p overall ≤ 0.01 claim effectiveness for the eligible population as a whole If p overall ≤ 0.01 claim effectiveness for the eligible population as a whole Otherwise perform a single subset analysis evaluating the new drug in the classifier + patients Otherwise perform a single subset analysis evaluating the new drug in the classifier + patients If p subset ≤ 0.04 claim effectiveness for the classifier + patients. If p subset ≤ 0.04 claim effectiveness for the classifier + patients.

Sample size for Analysis Plan To have 90% power for detecting uniform 33% reduction in overall hazard at 1% two-sided level requires 370 events. To have 90% power for detecting uniform 33% reduction in overall hazard at 1% two-sided level requires 370 events. If 33% of patients are positive, then when there are 370 total events there will be approximately 123 events in positive patients If 33% of patients are positive, then when there are 370 total events there will be approximately 123 events in positive patients 123 events provides 90% power for detecting a 45% reduction in hazard at a 4% two-sided significance level. 123 events provides 90% power for detecting a 45% reduction in hazard at a 4% two-sided significance level.

To detect a 40% reduction in hazard in an a- priori defined subset with 90% power and a 5% significance level requires 162 events in the subset. To detect a 40% reduction in hazard in an a- priori defined subset with 90% power and a 5% significance level requires 162 events in the subset. To detect a 40% reduction in hazard in an a- priori defined subset with 90% power and a 4% two-sided significance level requires 171 events in the subset. To detect a 40% reduction in hazard in an a- priori defined subset with 90% power and a 4% two-sided significance level requires 171 events in the subset. If the prevalence of the marker is 33%, then the trial might be sized for 3*171= total 513 events. If the prevalence of the marker is 33%, then the trial might be sized for 3*171= total 513 events.

R Simon. Using genomics in clinical trial design, Clinical Cancer Research 14:5984-93, 2008 R Simon. Using genomics in clinical trial design, Clinical Cancer Research 14:5984-93, 2008 R Simon. Designs and adaptive analysis plans for pivotal clinical trials of therapeutics and companion diagnostics, Expert Opinion in Medical Diagnostics 2:721-29, 2008 R Simon. Designs and adaptive analysis plans for pivotal clinical trials of therapeutics and companion diagnostics, Expert Opinion in Medical Diagnostics 2:721-29, 2008

Web Based Software for Planning Clinical Trials of Treatments with a Candidate Predictive Biomarker http://brb.nci.nih.gov http://brb.nci.nih.gov

The Biology is Often Not So Clear Cancer biology is complex and it is not always possible to have the right single completely defined predictive classifier identified and analytically validated by the time the pivotal trial of a new drug is ready to start accrual Cancer biology is complex and it is not always possible to have the right single completely defined predictive classifier identified and analytically validated by the time the pivotal trial of a new drug is ready to start accrual

K Candidate Biomarkers Design Based on Adaptive Threshold Design W Jiang, B Freidlin & R Simon JNCI 99:1036-43, 2007

K Candidate Biomarkers Design Have identified K candidate binary classifiers B 1, …, B K thought to be predictive of patients likely to benefit from T relative to C Have identified K candidate binary classifiers B 1, …, B K thought to be predictive of patients likely to benefit from T relative to C Eligibility not restricted by candidate markers Eligibility not restricted by candidate markers

Compare T vs C for all patients Compare T vs C for all patients If results are significant at level.01 claim broad effectiveness of T If results are significant at level.01 claim broad effectiveness of T Otherwise proceed as follows Otherwise proceed as follows Compare T vs C for the subset of patients positive for marker 1; compute p 1 Compare T vs C for the subset of patients positive for marker 1; compute p 1 Similarly compare T vs C for the subset of patients positive for marker 2 (p 2 ), positive for marker 3 (p 3 ), …positive for marker K (p k ) Similarly compare T vs C for the subset of patients positive for marker 2 (p 2 ), positive for marker 3 (p 3 ), …positive for marker K (p k ) Compute p* = min{p 1, p 2, …, p K } Compute p* = min{p 1, p 2, …, p K } Compute whether a value of p* is statistically significant when adjusted for multiple testing Compute whether a value of p* is statistically significant when adjusted for multiple testing Adjust for multiple testing using permutation of treatment labels to adjust for correlation among tests Adjust for multiple testing using permutation of treatment labels to adjust for correlation among tests

To detect a 40% reduction in hazard in an a- priori defined subset with 90% power and a 4% two-sided significance level requires 171 events in the subset. To detect a 40% reduction in hazard in an a- priori defined subset with 90% power and a 4% two-sided significance level requires 171 events in the subset. If the prevalence of the marker is 33%, then the trial might be sized for 3*171= total 513 events. If the prevalence of the marker is 33%, then the trial might be sized for 3*171= total 513 events. To adjust for multiplicity with 4 independent tests, 171 -> 224; 513 -> 672 total events. To adjust for multiplicity with 4 independent tests, 171 -> 224; 513 -> 672 total events.

Designs When there are Many Candidate Markers and too Much Patient Heterogeneity for any Single Marker

Adaptive Signature Design Adaptive Signature Design Boris Freidlin and Richard Simon Clinical Cancer Research 11:7872-8, 2005

Biomarker Adaptive Signature Design Randomized trial of T vs C Randomized trial of T vs C Large number of candidate predictive biomarkers available Large number of candidate predictive biomarkers available Eligibility not restricted by any biomarker Eligibility not restricted by any biomarker This approach can be used with any set of candidate markers This approach can be used with any set of candidate markers

End of Trial Analysis Fallback Analysis Compare T to C for all patients at significance level α 0 (eg 0.01) Compare T to C for all patients at significance level α 0 (eg 0.01) If overall H 0 is rejected, then claim effectiveness of T for eligible patients If overall H 0 is rejected, then claim effectiveness of T for eligible patients Otherwise proceed as follows Otherwise proceed as follows

Using only a randomly selected subset of patients of pre-specified size (e.g. 1/3 ) to be used as a training set T, develop a binary classifier M based of whether a patient is likely to benefit from T relative to C Using only a randomly selected subset of patients of pre-specified size (e.g. 1/3 ) to be used as a training set T, develop a binary classifier M based of whether a patient is likely to benefit from T relative to C The classifier may use multiple markers The classifier may use multiple markers The classifier classifies patients into only 2 subsets; those predicted to benefit from T and those for whom T is not predicted better than C The classifier classifies patients into only 2 subsets; those predicted to benefit from T and those for whom T is not predicted better than C

Apply the classifier M to classify patients in the validation set V=D-T Apply the classifier M to classify patients in the validation set V=D-T Compare T vs C in the subset of V who are predicted to benefit from T using a threshold of significance of 0.04 Compare T vs C in the subset of V who are predicted to benefit from T using a threshold of significance of 0.04

This approach can also be used to identify the subset of patients who don’t benefit from T in cases where T is superior to C overall at the 0.01 level. This approach can also be used to identify the subset of patients who don’t benefit from T in cases where T is superior to C overall at the 0.01 level.

Cross-Validated Adaptive Signature Design Freidlin B, Jiang W, Simon R Freidlin B, Jiang W, Simon R Clinical Cancer Research 16(2) 2010

At the conclusion of the trial randomly partition the patients into K approximately equally sized sets P 1, …, P K At the conclusion of the trial randomly partition the patients into K approximately equally sized sets P 1, …, P K Let D -i denote the full dataset minus data for patients in P i Let D -i denote the full dataset minus data for patients in P i Omit patients in P 1 Omit patients in P 1 Apply the defined algorithm to analyze the data in D -1 to obtain a classifier M -1 Apply the defined algorithm to analyze the data in D -1 to obtain a classifier M -1 Classify each patient j in P 1 using model M -1 Classify each patient j in P 1 using model M -1 Record the treatment recommendation T or C Record the treatment recommendation T or C

Repeat the above for all K loops of the cross- validation Repeat the above for all K loops of the cross- validation All patients have been classified once as what their optimal treatment is predicted to be All patients have been classified once as what their optimal treatment is predicted to be

Let S T denote the set of patients for whom treatment T is predicted optimal Let S T denote the set of patients for whom treatment T is predicted optimal Compare outcomes for patients in S T who actually received T to those in S T who actually received C Compare outcomes for patients in S T who actually received T to those in S T who actually received C Compute Kaplan Meier curves of those receiving T and those receiving C Compute Kaplan Meier curves of those receiving T and those receiving C Let z T = standardized log-rank statistic Let z T = standardized log-rank statistic

Test of Significance for Effectiveness of T vs C Compute statistical significance of z T by randomly permuting treatment labels and repeating the entire cross-validation procedure Compute statistical significance of z T by randomly permuting treatment labels and repeating the entire cross-validation procedure Do this 1000 or more times to generate the permutation null distribution of treatment effect for the patients in each subset Do this 1000 or more times to generate the permutation null distribution of treatment effect for the patients in each subset

By applying the analysis algorithm to the full RCT dataset D, recommendations are developed for how future patients should be treated By applying the analysis algorithm to the full RCT dataset D, recommendations are developed for how future patients should be treated

The size of the T vs C treatment effect for the indicated population is (conservatively) estimated by the Kaplan Meier survival curves of T and of C in S T The size of the T vs C treatment effect for the indicated population is (conservatively) estimated by the Kaplan Meier survival curves of T and of C in S T

70% Response to T in Sensitive Patients 25% Response to T Otherwise 25% Response to C 30% Patients Sensitive ASDCV-ASD Overall 0.05 Test 0.8300.838 Overall 0.04 Test 0.7940.808 Sensitive Subset 0.01 Test 0.3060.723 Overall Power 0.8250.918

506 prostate cancer patients were randomly allocated to one of four arms: Placebo and 0.2 mg of diethylstilbestrol (DES) were combined as control arm C 1.0 mg DES, or 5.0 mg DES were combined as T. The end-point was overall survival (death from any cause). Covariates: Age: In years Performance status (pf): Not bed-ridden at all vs other Tumor size (sz): Size of the primary tumor (cm2) Index of a combination of tumor stage and histologic grade (sg) Serum phosphatic acid phosphatase levels (ap)

Figure 1: Overall analysis. The value of the log-rank statistic is 2.9 and the corresponding p-value is 0.09. The new treatment thus shows no benefit overall at the 0.05 level.

Figure 2: Cross-validated survival curves for patients predicted to benefit from the new treatment. log-rank statistic = 10.0, permutation p-value is.002

Figure 3: Survival curves for cases predicted not to benefit from the new treatment. The value of the log-rank statistic is 0.54.

Acknowledgements Boris Freidlin Boris Freidlin Yingdong Zhao Yingdong Zhao Wenyu Jiang Wenyu Jiang Aboubakar Maitournam Aboubakar Maitournam

Use of Candidate Predictive Biomarkers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.

Similar presentations

Presentation on theme: "Use of Candidate Predictive Biomarkers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Use of Candidate Predictive Biomarkers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.

Similar presentations

Presentation on theme: "Use of Candidate Predictive Biomarkers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer."— Presentation transcript:

Similar presentations

About project

Feedback