A novel approach to analysis of primary HTS data Compound Set Enrichment Thibault VarinAnsgar Schuffenhauer Gubler, H., Parker, C., Zhang, JH., Raman, P., Ertl, P.
INTRODUCTION | Compound Set Enrichment | Thibault Varin | 10/07/142 Compound Set Enrichment
Introduction Active series identification: Can relevant SAR be extracted from primary HTS data? Are activity data binary or continuous? | Compound Set Enrichment | Thibault Varin | 10/07/143
Introduction Active series identification | Compound Set Enrichment | Thibault Varin | 10/07/144 Hypothesis 1: Within primary HTS screening data, structure activity relationships (SAR) are apparent and can be used to help selecting active compound classes.
Introduction Are the activity data binary or continuous? | Compound Set Enrichment | Thibault Varin | 10/07/145 Scaffold 1Scaffold 2 Activity Binary activity: -1 active / 5 inactives -Scaffold 1 = Scaffold 2 Continuous activity: Scaffold 1 > Scaffold 2 Active compound (binary) Inactive compound (binary)
Introduction Are the activity data binary or continuous? | Compound Set Enrichment | Thibault Varin | 10/07/146 Threshold 1 Activity Threshold 2 Activity Binary scaffold activity is different according to the threshold Active compound (binary) Inactive compound (binary) Hypothesis 2: Methods based on an activity cut-off distort the activity information leading to the incorrect assignment of active series of compounds.
METHODS | Compound Set Enrichment | Thibault Varin | 10/07/147 Compound Set Enrichment
The Scaffold Tree – Visualization of the Scaffold Universe by Hierarchical Scaffold Classification A. Schuffenhauer, P. Ertl et al. J. Chem. Inf. Model., 47, 47, 2007 Methods The Scaffold Tree classification | Compound Set Enrichment | Thibault Varin | 10/07/148
Methods Datasets | Compound Set Enrichment | Thibault Varin | 10/07/149 PubChem Annotation from CRC Simulation of the primary screening data -7 PubChem bioassays - Ranging from 9389 to compounds - Ranging from 0.03 to 26.29% of active compounds Hypothesis 1
Methods Single hypothesis test: summary procedure 1. State the null and the alternative hypotheses -H 0 : „the scaffold is inactive“ -H 1 : „the scaffold is active“ 2. Specify a significance level: α=0.01 3. Compute the statistics and the p-value ) →p-value=probability that the scaffold is inactive (H 0 ) 4. Decision step: -p-value> α: H 0 is accepted -p-value< α: H 0 is rejected and then H 1 is accepted „The scaffold is active“ | Compound Set Enrichment | Thibault Varin | 10/07/1410
Methods The KS and the Binomial hypothesis tests | Compound Set Enrichment | Thibault Varin | 10/07/1411 Continuous data KS test Binary data Binomial test Actives Inactives Bioassay Scaffold H 0 : there is no difference in the activity distribution defined by compounds having the scaffold S3-2 and the background distribution H 0 : there is no difference in the proportion of active compounds for compounds having the scaffold S3-2 and the proportion of active compounds for the full dataset.
Methods Multiple hypothesis tests: Bonferroni correction Problem of false positives α =probability to identify as active an inactive scaffold (for each test done...) 100 inactive scaffolds: probability to identify an „active“ by chance is equal 63% ( )) Suggests to test each scaffold at a critical significance level equal to α = 0.01 / Nbr of scaffolds Makes the assumption that the individual tests are independent Each level in the Scaffold Tree have been done separately | Compound Set Enrichment | Thibault Varin | 10/07/1412
Methods Determining the activity of classes | Compound Set Enrichment | Thibault Varin | 10/07/1413 Hypo 1 Hypo 2 Scaffold activity evaluation Comparison of results Multiple hypothesis test correction (Bonferroni)
RESULTS | Compound Set Enrichment | Thibault Varin | 10/07/1414 Compound Set Enrichment
Results Comparison of KSP and BTP predictions | Compound Set Enrichment | Thibault Varin | 10/07/1415 Bioassay Total BPCA significantly actives BPCA non significantly actives KSPBTPΔBPCAKSPBTPΔKSPBTPΔ Hydroxysteroid dehydrogenase Caspase PK Luciferase Luciferase CYP450 2C CYP450 3A With: -KSP: KS Prediction -BTP: Binomial Threshold Prediction -Δ : KSP-BTP -BPCA: Binomial PubChem Annotation Both KSP and BTP retrieve BPCA significantly active classes Number of active classes: KSP > BTP Most of new KSP active classes are not BPCA significantly actives
Results KSP significantly active scaffolds that are in Pubchem inactives | Compound Set Enrichment | Thibault Varin | 10/07/1416 Inconclusives? Inconclusive? Inconclusives? Compound activity (PubChem Annotation) Active Inconclusive Inactive WA
Results Prioritize nodes instead of individual scaffolds | Compound Set Enrichment | Thibault Varin | 10/07/1417 Scaffold activity (KS Prediction / Bonferroni) Non significantly active Significantly active
Results Visualization tool (Peter Ertl) | Compound Set Enrichment | Thibault Varin | 10/07/1418
CONCLUSION | Compound Set Enrichment | Thibault Varin | 10/07/1419 Compound Set Enrichment
Conclusion Compound Set Enrichment | Compound Set Enrichment | Thibault Varin | 10/07/1420 Validation of initial hypotheses A method to mine HTS data and identify active series of compounds Chemical classification: Scaffold Tree Statistical analysis: Kolmogorov-Smirnov hypothesis test Multiple hypothesis test correction: Bonferroni correction Use all primary data No activity cut-off Identification of new active scaffolds not necessarily represented by very active compounds (latent hits) during the primary screen
With many thanks to | Compound Set Enrichment | Thibault Varin | 10/07/1421 Acknowledgments Primary mentor: - Ansgar Schuffenhauer Scientific advisers: -Christian Parker -Hanspeter Gubler -Ji-Hu Zhang -Peter Ertl -Edgar Jacoby Help: MLI group Fellowship: Education office Discussions: -Martin Beibel -Sebastian Bergling -Meir Glick -Alain Dietrich -Marie-Cecile Didiot
Questions? | Compound Set Enrichment | Thibault Varin | 10/07/1422