Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Challenge 2 The NIEHS - NCATS - UNC DREAM Toxicogenetics Challenge THE DATA Fred A. Wright, Ph.D. Professor and Director of the Bioinformatics.

Similar presentations


Presentation on theme: "Introduction to Challenge 2 The NIEHS - NCATS - UNC DREAM Toxicogenetics Challenge THE DATA Fred A. Wright, Ph.D. Professor and Director of the Bioinformatics."— Presentation transcript:

1 Introduction to Challenge 2 The NIEHS - NCATS - UNC DREAM Toxicogenetics Challenge THE DATA Fred A. Wright, Ph.D. Professor and Director of the Bioinformatics Research Center Departments of Statistics and Biological Sciences North Carolina State University amateurbrainsurgery.com 1

2 In vitro cytotoxicity screening of human cell lines to characterize variability and map suseptibility loci Many caveats are obvious, but bear repeating: limitations of the in vitro environment cell type sources of technical variation On the other hand, we are working with the correct species, and there is much that can be done: heritability analysis identification of potential mechanisms underlying variability, mostly via genetic mapping characterization of average response and variation across agents/chemicals, to prioritize in vitro data used for predictive toxicity models 2

3 19831996200720092010 Image courtesy of M. Andersen and D. Krewski 3

4 Much of the previous work has been in pharmacogenomics, especially cytotoxicity screening of anticancer agents However, most of the principles apply to any agent/chemical 4 Cytotoxicity heritability estimates from 125 lymphoblastoid cell lines (LCLs), 29 chemotherapeutic agents

5 CYTOTOXICITY PROFILING – BOILING DOWN TO A NUMBER(?) Experiments done in batches Challenge: estimation of cytoxic response or other relevant phenotype per cell line in the presence of variation Solution: likelihood-based fitting of EC 10 values, with outlier detection and batch correction log 10 (concentration) cytotoxicity (normalized % cell survival) 5

6 Observed data True variation across population Measurement variation The concept of population toxicity involves means and true variability, obscured by technical variation Chemical 1 Measure of susceptibility/resistance (e.g. EC 10 ) for one cell line has error 6

7 A vulnerable subpopulation The concept of population toxicity involves means and true variability, obscured by technical variation 7

8 In the high-throughput screening toxicology literature, relatively little data to support these concepts across multiple populations Chemical 1 Chemical 2 Chemical 3 Chemical 4 Prioritizing chemicals for vulnerable subpops depends on both means and variances 8 Observed variability has the potential to provide finer-grained uncertainty factors in risk assessment

9 The Challenge Data

10 10

11 The data in context – previous cell line vs. chemical/drug studies Heatmap of the EC 10 values (axes to scale)

12 Ranking chemicals by average cytotoxicity is of obvious interest – even with this large sample size, some uncertainty in ranking

13 EC 10 for each cell line 5 th and 95 th percentiles/quantiles are of interest from a risk assessment perspective. We call q 95 -q 05 the “fold-range”

14 884 lines that are “unrelated” (i.e. no first degree relatives) TrainingTestValidation Subchallenge 1 – predict EC 10 from SNPs and RNA- Seq data 156 chemicals that are “predictable” 106 training 50 test Subchallenge 2 – predict average and fold-range from chemical descriptors

15 The NIEHS - NCATS - UNC DREAM Toxicogenetics Challenge OVERALL RESULTS Federica Eduati, Ph.D. European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI) Cambridge, United Kingdom 15

16 Subchallenge 1: Data Comp 1 Comp 2 … Comp 106 Cytotoxicity data (EC 10 ) Train cell line 1 Train cell line 2 … Train cell line 487 Cytotoxicity data (EC 10 ) Final Test cell line 1 Final Test cell line 2 … Final Test cell line 264 Cytotoxicity data (EC 10 ) Leaderboard cell line 1 Leaderboard cell line 2 … Leaderboard cell line 133 Leaderboard (released Aug 31st): -Genotype data for 133 cell lines -RNAseq data for 48 cell lines -Predict: EC10 data for 106 compounds and 133 cell lines Final test: -RNAseq data for 97 cell lines -Genotype data for 264 cell lines -Predict: EC10 data for 106 compounds and 264 cell lines Training: -EC10 data for 106 compounds and 487 cell lines -Genotype data for 487 cell lines -RNAseq data for 192 cell lines Predict interindividual variability in cytotoxicity based on genomic profiles

17 Experimental error cell line 1 cell line 2 cell line 3 cell line 4 Comp 1 ranking Comp 1 Exact order is variable if there is noise Probabilistic C-index  accounts for the probabilistic nature of the gold standard 2.1 1.0 1.9 0.1 cell line 1 cell line 2 cell line 3 cell line 4 Comp 1 4 2 3 1 ranking Exact measures Noisy measures To each pair of cell lines, it assigns a score given by the probability that the predicted ranking is supported by the noisy gold standard For each compound:

18 Scoring metrics Correlation between predicted and observed values – Pearson correlation Ranking of cytotoxicity for different cell lines – Probabilistic C-index – Spearman correlation

19 Predictions vs null hypothesis

20 Comp 1 Comp 2 … Comp 106 Test cell line 1 Test cell line 2 … Test cell line N Cytotoxicity data (EC 10 ) SUBMISSION 1 Cytotoxicity data (EC 10 ) 1.For each submission, compute the following metrics compound by compound: a.Pearson correlation b.Probabilistic C-index 2.For each metric: a.Rank submissions for each compound b.Compute the mean ranking over all compounds c.Rank submissions according to the mean ranking 3.The final ranking is obtained averaging the ranking obtained with the 2 different metrics SUBMISSION 2 Cytotoxicity data (EC 10 ) SUBMISSION 3 Cytotoxicity data (EC 10 ) SUBMISSION 3 Cytotoxicity data (EC 10 ) SUBMISSION 3 Cytotoxicity data (EC 10 ) SUBMISSION 3 Cytotoxicity data (EC 10 ) SUBMISSION M Cytotoxicity data (EC 10 ) Submission 1 Submission 2 … Submission M Comp 1 Comp 2 … Comp 106 Mean ranking Scoring

21 * one sided Wilcoxon signed-rank test, FDR<10 -10 significantly * different not significantly * different Robustness (sampling) analysis Verify if the rank is robust with respect to the compounds For 10000 times: 1.randomly mask data for 10% of the compounds 2.re-compute the score

22 Wisdom of crowds

23 Subchallenge 2: Data Final test: -Chemical attributes for 50 chemicals -Predict: population level parameters for 50 compounds -Median EC10 values -Interquantile distance (q95-q05) Training: -EC10 data for 106 compounds and 620 cell lines -Chemical attributes for 106 chemicals Cytotoxicity data (EC 10 ) Cell line 1 Cell line 2 … Cell line 620 Train Comp 1 Train Comp 2 … Train Comp 106 (a) Median EC10 (b) Interquantile distance (q95-q05) Test Comp 1 Test Comp 2 … Test Comp 50 DATA PREDICTIONS Predict population-level parameters of cytotoxicity of chemicals based on structural attributes of compounds.

24 Predictions vs null hypothesis

25 1.For each submission, compute the following metrics for each predicted population parameter (median, q95-905) a.Pearson correlation b.Spearman correlation 2.For each metric: a.Rank submissions each for population parameter b.Compute the mean ranking over the 2 population parameters c.Rank submissions according to the mean ranking 3.The final ranking is obtained averaging the ranking obtained with the 2 different metrics Test Comp 1 Test Comp 2 … Test Comp 50 Median EC10 Q95-Q05 SUBMISSION 1 SUBMISSION 2 SUBMISSION 3 SUBMISSION M Submission 1 Submission 2 … Submission M Mean ranking Scoring Median EC10 Q95-Q05

26 Robustness (sampling) analysis Verify if the rank is robust with respect to the compounds For 10000 times: 1.randomly mask data of 10% of the compounds 2.re-compute the score * one sided Wilcoxon signed-rank test, FDR<10 -10 significantly * different not significantly * different

27 Wisdom of crowds

28 Conclusions Predictive models of toxicity were developed by participants, great response from the community: – Subchallenge 1: 99 submissions from 34 teams – Subchallenge 2: 85 submissions from 24 teams predictions were scored against a hidden test set top performing models provide significant predictions that could be useful to assess health risk best performers are robustly ranked first, but there are other models which provide good predictions – wisdom of crowds: the aggregation of predictions can increase overall performances

29 Rebecca Boyles Allen Dearry Raymond Tice Nour Abdo Paul Gallins Oksana Kosyk Ivan Rusyn Jessica Wignall Fred Wright Kai Xia Yi-Hui Zhou Christopher Austin Ruili Huang Anton Simeonov Menghang Xia Chris Bare Stephen Friend Mike Kellen Lara Mangravite Thea Norman Federica Eduati Michael Menden Kely Norel Julio Saez-Rodriguez Gustavo Stolovitzky 213 participants


Download ppt "Introduction to Challenge 2 The NIEHS - NCATS - UNC DREAM Toxicogenetics Challenge THE DATA Fred A. Wright, Ph.D. Professor and Director of the Bioinformatics."

Similar presentations


Ads by Google