Presentation is loading. Please wait.

Presentation is loading. Please wait.

Challenges In Progressing Biomarkers To Clinical Use Proteomic Experiences Chris Harbron Technical Lead For High Dimensional Data AstraZeneca FDA Industry.

Similar presentations


Presentation on theme: "Challenges In Progressing Biomarkers To Clinical Use Proteomic Experiences Chris Harbron Technical Lead For High Dimensional Data AstraZeneca FDA Industry."— Presentation transcript:

1 Challenges In Progressing Biomarkers To Clinical Use Proteomic Experiences Chris Harbron Technical Lead For High Dimensional Data AstraZeneca FDA Industry Statistics Workshop September 2006

2 2 Gap Between Published Biomarkers And Biomarkers Being Approved For Use

3 3 Why Might This Be? Challenges Pressures from the contextual environment High quality data is essential –These are new technologies - not simple to use or analyse –Robust study design including : –Consistent sample collection and processing –Need to understand reproducibility between & within labs & within subjects Failure leads to poor data quality, frequently dominated by nuisance factors Rigorous validation is also essential –Occurs at many levels –Avoid overfitting data Omics may not do it alone –Applications will require combining -omics with other data types

4 4 Example : Case-Control Study Interest in identifying a peptidomic profile that could predict an adverse event –Potential use as a personalised medicine predictive marker Blood samples taken from subjects at start of treatment Subjects monitored for adverse event using a rigorous definition Subjects entered in cohorts Samples processed in batches within cohorts Analysed on a LC/MS-MS platform

5 5 690. 81 1027.87 570.33 1156.84 599.13 635. 85 1138.86 1122.83 1251.79 371.25 799. 93 1010.89 242.26 727.23 258.19 881.99 389.22 561.21 958.89 276.24 832.76 1269.83 286.28 1234.85 1107.00 1346.63 12 52.9 57 9.3 64 3.8 Fragment Ion intensity Mass / Charge Ratio Ion intensity Mass / Charge Ratio Retention Time LC-MS/MS Proteomics Clinical Plasma Samples Peptides Liquid Chromatography Preparation & Digestion Mass Spectrometry MS/MS Separation By Mass/Charge Measurement Of Intensity Protein Identification Separation By Retention Time

6 6 Distribution Of Average Intensities Retention Time Mass-Charge Ratio High Intensity Low Intensity Distribution Of Average Intensities ~5,500,000 RT / MZ / Intensity Measurements Per Sample ~25,000 Common Peaks Per Sample Pre-Processing - Alignment Of Retention Times - Scaling - Binning

7 7 Proteomic Data Exploratory Analysis - PCA Considerable batch to batch variation Cohort 1 Cohort 2 Cohort 3 Cohort 4 Control Case Non-Index Case

8 8 Proteomic Data Exploratory Analysis - PCA Within all batches with both cases and controls, there is separation of cases and controls

9 9 Univariate Analyses Within Batches Histograms Of t-Test p-Values

10 10 Global Test Of Agreement Between Batches Using A Permutation Test ObservedPermuted Identify peaks where direction of effect agrees in all 3 batches Summarise by maximum p-value Global test of expected level due to multiple testing by permutation

11 11 Typical Highly Significant Peak CASE CONTROL NIC Within each batch, cases are highly expressed compared to controls Not possible to define a global cut- off between cases and controls Intensity Batches

12 12 Multivariate Analyses Identified consistent effect BUT, may be difficult to use as a predictive biomarker in a clinical setting due to batch variation Would a combination of markers, a peptidomic profile, work as a predictive biomarker? Use Random Forests to generate multivariate predictive models Assess predictive power using a nested cross-validation –Within and between batch prediction

13 13 Modelling Process Data Analyse Each Peak Within Each Batch Take Maximum p-Value For Each Peak Test Set Training Set Rank Peaks By p-Value Build Model With Top n Peaks Test Model In Test Set Mixed Case-Control batches Exclude Batches In Turn Exclude Observations By LOO Control Only batches Batch excluded Observation excluded Number Of Peaks Observation Excluded Batch Excluded

14 14 Leave One Out Cross Validation Proteomic Model Predictions Leave One Out Training Set Batches Cases Leave One Out Training Set Batches Controls Other Mixed Batch Cases Other Mixed Batch Controls Other Batches - Controls

15 15 Mask Data By Restricting To High Quality Regions Of Proteomic Space Retention Time Mass Charge Ratio TECHNICALLY Region of focus for instrument EMPIRICALLY Lowest residual variability Highest average intensity

16 16 Analysis Of Unmasked Peaks Batch Effects Still Dominate Consistent Case-Control Effect Can Identify Peaks Separating Cases & Controls Across Batches

17 17 Cross-Validation Predictions Unmasked Peaks Leave One Out Same Batch – Cases Leave One Out Same Batch - Controls Other Mixed Batch - Cases Other Mixed Batch - Controls Other Batches - Controls Good Predictions Within Same Batch Prediction Rate Falls When Extrapolated To Other Batches Need To Prospectively Test In Another Set Of Patients

18 18 How To Combine Other Non-omic Information Into A Biomarker? Combining different data types is challenging The bigger data type will dominate the modelling Greater signal in data, but doesnt extrapolate as well Exploring options turning the random part of random forests to our advantage Known Clinical Prognostic Proteomic Peaks

19 19 Proteomic Quality Control Consortium? MAQC recently reported a reproducibility study for microarrays –Wealth of valuable information –Mammoth effort Could we do the same for proteomics? –Less mature technology –Greater diversity of platforms –Diversity of pre-processing methodologies –Issues of identification making large scale comparisons challenging

20 20 Conclusions Complicated new technologies Many challenges –Technical, Data Quality, Data Analysis, Practical Essential role for statistics Need to integrate statistical approaches with understanding of technologies and biology Great potential –Better treatments for patients –Improved use of compounds –Greater biological understanding


Download ppt "Challenges In Progressing Biomarkers To Clinical Use Proteomic Experiences Chris Harbron Technical Lead For High Dimensional Data AstraZeneca FDA Industry."

Similar presentations


Ads by Google