Presentation is loading. Please wait.

Presentation is loading. Please wait.

Use of Genomics in Clinical Trial Design and How to Critically Evaluate Claims for Prognostic & Predictive Biomarkers Richard Simon, D.Sc. Chief, Biometric.

Similar presentations


Presentation on theme: "Use of Genomics in Clinical Trial Design and How to Critically Evaluate Claims for Prognostic & Predictive Biomarkers Richard Simon, D.Sc. Chief, Biometric."— Presentation transcript:

1 Use of Genomics in Clinical Trial Design and How to Critically Evaluate Claims for Prognostic & Predictive Biomarkers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov

2 BRB Website brb.nci.nih.gov Powerpoint presentations Powerpoint presentations Reprints Reprints BRB-ArrayTools software BRB-ArrayTools software Data archive Data archive Q/A message board Q/A message board Web based Sample Size Planning Web based Sample Size Planning Clinical Trials Clinical Trials Optimal 2-stage phase II designs Optimal 2-stage phase II designs Phase III designs using predictive biomarkers Phase III designs using predictive biomarkers Phase II/III designs Phase II/III designs Development of gene expression based predictive classifiers Development of gene expression based predictive classifiers

3 Different Kinds of Biomarkers Endpoint Endpoint Measured before, during and after treatment to monitor treatment effect Measured before, during and after treatment to monitor treatment effect Surrogate of clinical endpoint Surrogate of clinical endpoint Pharmacodynamic Pharmacodynamic Predictive biomarkers Predictive biomarkers Measured before treatment to identify who will benefit from a particular treatment Measured before treatment to identify who will benefit from a particular treatment Prognostic biomarkers Prognostic biomarkers Measured before treatment to indicate long-term outcome for patients untreated or receiving standard treatment Measured before treatment to indicate long-term outcome for patients untreated or receiving standard treatment

4 Types of Validation for Prognostic and Predictive Biomarkers Analytical validation Analytical validation Accuracy, reproducibility, robustness Accuracy, reproducibility, robustness Clinical validation Clinical validation Does the biomarker predict a clinical endpoint or phenotype Does the biomarker predict a clinical endpoint or phenotype Clinical utility Clinical utility Does use of the biomarker result in patient benefit Does use of the biomarker result in patient benefit By informing treatment decisions By informing treatment decisions Is it actionable Is it actionable

5 Prognostic and Predictive Biomarkers in Oncology Single gene or protein measurement Single gene or protein measurement Scalar index or classifier that summarizes expression levels of multiple genes Scalar index or classifier that summarizes expression levels of multiple genes

6 Prognostic Factors in Oncology Many prognostic factors are not used because they are not actionable Many prognostic factors are not used because they are not actionable Most prognostic factor studies are not conducted with an intended use Most prognostic factor studies are not conducted with an intended use They use a convenience sample of heterogeneous patients for whom tissue is available They use a convenience sample of heterogeneous patients for whom tissue is available Retrospective studies of prognostic markers should be planned and analyzed with specific focus on intended use of the marker Retrospective studies of prognostic markers should be planned and analyzed with specific focus on intended use of the marker Design of prospective studies depends on context of use of the biomarker Design of prospective studies depends on context of use of the biomarker Treatment options and practice guidelines Treatment options and practice guidelines Other prognostic factors Other prognostic factors

7 Potential Uses of a Prognostic Biomarker Identify patients who have very good prognosis on standard treatment and do not require more intensive regimens Identify patients who have very good prognosis on standard treatment and do not require more intensive regimens Identify patients who have poor prognosis on standard chemotherapy who are good candidates for experimental regimens Identify patients who have poor prognosis on standard chemotherapy who are good candidates for experimental regimens

8

9 Prospective Marker Strategy Design Patients are randomized to either Patients are randomized to either have marker measured and treatment determined based on marker result and clinical features have marker measured and treatment determined based on marker result and clinical features don’t have marker measured and receive standard of care treatment based on clinical features alone don’t have marker measured and receive standard of care treatment based on clinical features alone

10 Randomize Patients to Test or No Test Rx Determined by Test Rx Determined By SOC

11

12 Marker Strategy Design Inefficient Inefficient Many patients get the same treatment regardless of which arm they are randomized to Many patients get the same treatment regardless of which arm they are randomized to Uninformative Uninformative Since patients in the standard of care arm do not have the marker measured, it is not possible to compare outcome for patients whose treatment is changed based on the marker result Since patients in the standard of care arm do not have the marker measured, it is not possible to compare outcome for patients whose treatment is changed based on the marker result

13 Using phase II data, develop predictor of response to new drug Apply Test to All Eligible Patients Test Deterimined Rx Different From SOC Use Test Determined Rx Use SOC Test Determined Rx Same as SOC Off Study

14 Prospective Evaluation of OncotypeDx (TAILORx) For patients with predicted low risk of recurrence For patients with predicted low risk of recurrence Withhold chemotherapy and observe long term recurrence rate Withhold chemotherapy and observe long term recurrence rate If recurrence rate is very low, potential chemotherapy benefit must be very small If recurrence rate is very low, potential chemotherapy benefit must be very small

15 Predictive Biomarkers

16

17

18 Prospective Co-Development of Drugs and Companion Diagnostics 1. Develop a completely specified genomic classifier of the patients likely to benefit from a new drug 2. Establish analytical validity of the classifier 3. Use the completely specified classifier in the primary analysis plan of a phase III trial of the new drug

19 Guiding Principle The data used to develop the classifier should be distinct from the data used to test hypotheses about treatment effect in subsets determined by the classifier The data used to develop the classifier should be distinct from the data used to test hypotheses about treatment effect in subsets determined by the classifier Developmental studies can be exploratory Developmental studies can be exploratory Studies on which treatment effectiveness claims are to be based should not be exploratory Studies on which treatment effectiveness claims are to be based should not be exploratory

20 Using phase II data, develop predictor of response to new drug Develop Predictor of Response to New Drug Patient Predicted Responsive New Drug Control Patient Predicted Non-Responsive Off Study

21 Evaluating the Efficiency of Enrichment Design Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10:6759-63, 2004; Correction and supplement 12:3229, 2006 Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10:6759-63, 2004; Correction and supplement 12:3229, 2006 Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24:329-339, 2005. Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24:329-339, 2005. reprints and interactive sample size calculations at http://linus.nci.nih.gov reprints and interactive sample size calculations at http://linus.nci.nih.gov

22 Relative efficiency of targeted design depends on Relative efficiency of targeted design depends on proportion of patients test positive proportion of patients test positive effectiveness of new drug (compared to control) for test negative patients effectiveness of new drug (compared to control) for test negative patients When less than half of patients are test positive and the drug has little or no benefit for test negative patients, the targeted design requires dramatically fewer randomized patients When less than half of patients are test positive and the drug has little or no benefit for test negative patients, the targeted design requires dramatically fewer randomized patients

23 Stratification Design Develop Predictor of Response to New Rx Predicted Non- responsive to New Rx Predicted Responsive To New Rx Control New RXControl New RX

24 Stratification Design Use the test to structure a prospective specified primary analysis plan Use the test to structure a prospective specified primary analysis plan Having a prospective analysis plan is essential Having a prospective analysis plan is essential “Stratifying” (balancing) the randomization is useful to ensure that all randomized patients have tissue available but is not a substitute for a prospective analysis plan “Stratifying” (balancing) the randomization is useful to ensure that all randomized patients have tissue available but is not a substitute for a prospective analysis plan The purpose of the study is to evaluate the new treatment overall and for the pre-defined subsets; not to modify or refine the classifier The purpose of the study is to evaluate the new treatment overall and for the pre-defined subsets; not to modify or refine the classifier The purpose is not to demonstrate that repeating the classifier development process on independent data results in the same classifier The purpose is not to demonstrate that repeating the classifier development process on independent data results in the same classifier

25 R Simon. Using genomics in clinical trial design, Clinical Cancer Research 14:5984-93, 2008 R Simon. Using genomics in clinical trial design, Clinical Cancer Research 14:5984-93, 2008 R Simon. Designs and adaptive analysis plans for pivotal clinical trials of therapeutics and companion diagnostics, Expert Opinion in Medical Diagnostics 2:721-29, 2008 R Simon. Designs and adaptive analysis plans for pivotal clinical trials of therapeutics and companion diagnostics, Expert Opinion in Medical Diagnostics 2:721-29, 2008

26

27

28

29 Use of Archived Specimens in Evaluation of Prognostic and Predictive Biomarkers Richard M. Simon, Soonmyung Paik and Daniel F. Hayes Claims of medical utility for prognostic and predictive biomarkers based on analysis of archived tissues can be considered to have either a high or low level of evidence depending on several key factors. Claims of medical utility for prognostic and predictive biomarkers based on analysis of archived tissues can be considered to have either a high or low level of evidence depending on several key factors. Studies using archived tissues, when conducted under ideal conditions and independently confirmed can provide the highest level of evidence. Studies using archived tissues, when conducted under ideal conditions and independently confirmed can provide the highest level of evidence. Traditional analyses of prognostic or predictive factors, using non analytically validated assays on a convenience sample of tissues and conducted in an exploratory and unfocused manner provide a very low level of evidence for clinical utility. Traditional analyses of prognostic or predictive factors, using non analytically validated assays on a convenience sample of tissues and conducted in an exploratory and unfocused manner provide a very low level of evidence for clinical utility.

30 Use of Archived Specimens in Evaluation of Prognostic and Predictive Biomarkers Richard M. Simon, Soonmyung Paik and Daniel F. Hayes For Level I Evidence: For Level I Evidence: (i) archived tissue adequate for a successful assay must be available on a sufficiently large number of patients from a phase III trial that the appropriate analyses have adequate statistical power and that the patients included in the evaluation are clearly representative of the patients in the trial. (i) archived tissue adequate for a successful assay must be available on a sufficiently large number of patients from a phase III trial that the appropriate analyses have adequate statistical power and that the patients included in the evaluation are clearly representative of the patients in the trial. (ii) The test should be analytically and pre-analytically validated for use with archived tissue. (ii) The test should be analytically and pre-analytically validated for use with archived tissue. (iii) The analysis plan for the biomarker evaluation should be completely specified in writing prior to the performance of the biomarker assays on archived tissue and should be focused on evaluation of a single completely defined classifier. (iii) The analysis plan for the biomarker evaluation should be completely specified in writing prior to the performance of the biomarker assays on archived tissue and should be focused on evaluation of a single completely defined classifier. iv) the results from archived specimens should be validated using specimens from a similar, but separate, study. iv) the results from archived specimens should be validated using specimens from a similar, but separate, study.

31

32 Publications Reviewed Original study on human cancer patients relating gene expression to clinical outcome Original study on human cancer patients relating gene expression to clinical outcome Survival or disease-free survival Survival or disease-free survival Response to treatment Response to treatment Published in English before December 31, 2004 Published in English before December 31, 2004 Analyzed gene expression of more than 1000 probes Analyzed gene expression of more than 1000 probes

33 90 publications identified that met criteria 90 publications identified that met criteria Abstracted information for all 90 Abstracted information for all 90 Performed detailed review of statistical analysis for the 42 papers published in 2004 Performed detailed review of statistical analysis for the 42 papers published in 2004

34 Major Flaws Found in 40 Studies Published in 2004 Inadequate control of multiple comparisons in gene finding Inadequate control of multiple comparisons in gene finding 9/23 studies had unclear or inadequate methods to deal with false positives 9/23 studies had unclear or inadequate methods to deal with false positives 10,000 genes x.05 significance level = 500 false positives 10,000 genes x.05 significance level = 500 false positives Misleading report of prediction accuracy Misleading report of prediction accuracy 12/28 reports based on incomplete cross-validation 12/28 reports based on incomplete cross-validation Misleading use of cluster analysis Misleading use of cluster analysis 13/28 studies invalidly claimed that expression clusters based on differentially expressed genes could help distinguish clinical outcomes 13/28 studies invalidly claimed that expression clusters based on differentially expressed genes could help distinguish clinical outcomes 50% of studies contained one or more major flaws 50% of studies contained one or more major flaws

35 Control for Multiple Testing If each gene is tested for significance at level  and there are n genes, then the expected number of false discoveries is n . If each gene is tested for significance at level  and there are n genes, then the expected number of false discoveries is n . e.g. if n=10,000 and  =0.001, then 10 false “discoveries” e.g. if n=10,000 and  =0.001, then 10 false “discoveries” Control the FDR (false discovery rate) Control the FDR (false discovery rate) g = number of genes reported as having expression significantly correlated with a phenotype g = number of genes reported as having expression significantly correlated with a phenotype FDR = number of false positives / g FDR = number of false positives / g

36 Major Flaws Found in 40 Studies Published in 2004 Inadequate control of multiple comparisons in gene finding Inadequate control of multiple comparisons in gene finding 9/23 studies had unclear or inadequate methods to deal with false positives 9/23 studies had unclear or inadequate methods to deal with false positives 10,000 genes x.05 significance level = 500 false positives 10,000 genes x.05 significance level = 500 false positives Misleading report of prediction accuracy Misleading report of prediction accuracy 12/28 reports based on incomplete cross-validation 12/28 reports based on incomplete cross-validation Misleading use of cluster analysis Misleading use of cluster analysis 13/28 studies invalidly claimed that expression clusters based on differentially expressed genes could help distinguish clinical outcomes 13/28 studies invalidly claimed that expression clusters based on differentially expressed genes could help distinguish clinical outcomes 50% of studies contained one or more major flaws 50% of studies contained one or more major flaws

37 Evaluating a Classifier Fit of a model to the same data used to develop it is no evidence of prediction accuracy for independent data Fit of a model to the same data used to develop it is no evidence of prediction accuracy for independent data Goodness of fit vs prediction accuracy Goodness of fit vs prediction accuracy

38

39

40

41

42 Split-Sample Evaluation Training-set Training-set Used to select features, select model type, determine parameters and cut-off thresholds Used to select features, select model type, determine parameters and cut-off thresholds Test-set Test-set Withheld until a single model is fully specified using the training-set. Withheld until a single model is fully specified using the training-set. Fully specified model is applied to the expression profiles in the test-set to predict class labels. Fully specified model is applied to the expression profiles in the test-set to predict class labels. Number of errors is counted Number of errors is counted

43 Leave-one-out Cross Validation Leave-one-out cross-validation simulates the process of separately developing a model on one set of data and predicting for a test set of data not used in developing the model Leave-one-out cross-validation simulates the process of separately developing a model on one set of data and predicting for a test set of data not used in developing the model

44 Leave-one-out Cross Validation Omit sample 1 Omit sample 1 Develop multivariate classifier from scratch on training set with sample 1 omitted Develop multivariate classifier from scratch on training set with sample 1 omitted Predict class for sample 1 and record whether prediction is correct Predict class for sample 1 and record whether prediction is correct

45 Leave-one-out Cross Validation Repeat analysis for training sets with each single sample omitted one at a time Repeat analysis for training sets with each single sample omitted one at a time e = number of misclassifications determined by cross-validation e = number of misclassifications determined by cross-validation Subdivide e for estimation of sensitivity and specificity Subdivide e for estimation of sensitivity and specificity

46 Cross validation is only valid if the test set is not used in any way in the development of the model. Using the complete set of samples to select genes violates this assumption and invalidates cross-validation. Cross validation is only valid if the test set is not used in any way in the development of the model. Using the complete set of samples to select genes violates this assumption and invalidates cross-validation. With proper cross-validation, the model must be developed from scratch for each leave-one-out training set. This means that feature selection must be repeated for each leave-one-out training set. With proper cross-validation, the model must be developed from scratch for each leave-one-out training set. This means that feature selection must be repeated for each leave-one-out training set. The cross-validated estimate of misclassification error is an estimate of the prediction error for model fit using specified algorithm to full dataset The cross-validated estimate of misclassification error is an estimate of the prediction error for model fit using specified algorithm to full dataset

47 Prediction on Simulated Null Data Generation of Gene Expression Profiles 14 specimens (P i is the expression profile for specimen i) Log-ratio measurements on 6000 genes P i ~ MVN(0, I 6000 ) Can we distinguish between the first 7 specimens (Class 1) and the last 7 (Class 2)? Prediction Method Compound covariate prediction Compound covariate built from the log-ratios of the 10 most differentially expressed genes.

48

49 Major Flaws Found in 40 Studies Published in 2004 Inadequate control of multiple comparisons in gene finding Inadequate control of multiple comparisons in gene finding 9/23 studies had unclear or inadequate methods to deal with false positives 9/23 studies had unclear or inadequate methods to deal with false positives 10,000 genes x.05 significance level = 500 false positives 10,000 genes x.05 significance level = 500 false positives Misleading report of prediction accuracy Misleading report of prediction accuracy 12/28 reports based on incomplete cross-validation 12/28 reports based on incomplete cross-validation Misleading use of cluster analysis Misleading use of cluster analysis 13/28 studies invalidly claimed that expression clusters based on differentially expressed genes could help distinguish clinical outcomes 13/28 studies invalidly claimed that expression clusters based on differentially expressed genes could help distinguish clinical outcomes 50% of studies contained one or more major flaws 50% of studies contained one or more major flaws

50

51 Cluster Analysis is Subjective Cluster algorithms always produce clusters Cluster algorithms always produce clusters Different distance metrics and clustering algorithms may find different structure using the same data. Different distance metrics and clustering algorithms may find different structure using the same data. Supervised clustering is misleading Supervised clustering is misleading

52

53 Good Microarray Studies Have Clear Objectives Class Comparison (Gene Finding) Class Comparison (Gene Finding) Find genes whose expression differs among predetermined classes, e.g. tissue or experimental condition Find genes whose expression differs among predetermined classes, e.g. tissue or experimental condition Class Prediction Class Prediction Prediction of predetermined class (e.g. treatment outcome) using information from gene expression profile Prediction of predetermined class (e.g. treatment outcome) using information from gene expression profile Class Discovery Class Discovery Discover clusters of specimens having similar expression profiles Discover clusters of specimens having similar expression profiles

54 Class Comparison and Class Prediction Not clustering problems Not clustering problems Global similarity measures generally used for clustering arrays may not distinguish classes Global similarity measures generally used for clustering arrays may not distinguish classes Don’t control multiplicity or for distinguishing data used for classifier development from data used for classifier evaluation Don’t control multiplicity or for distinguishing data used for classifier development from data used for classifier evaluation Supervised methods Supervised methods

55 Acknowledgements NCI Biometric Research Branch NCI Biometric Research Branch Alain Dupuy Alain Dupuy Boris Freidlin Boris Freidlin Wenyu Jiang Wenyu Jiang Aboubakar Maitournam Aboubakar Maitournam Yingdong Zhao Yingdong Zhao Soonmyung Paik, NSABP Soonmyung Paik, NSABP Daniel Hayes, U. Michigan Daniel Hayes, U. Michigan


Download ppt "Use of Genomics in Clinical Trial Design and How to Critically Evaluate Claims for Prognostic & Predictive Biomarkers Richard Simon, D.Sc. Chief, Biometric."

Similar presentations


Ads by Google