Presentation on theme: "DATE MULTICASE Inc. By Gilles Klopman Chemistry Department, Case Western Reserve University and MULTICASE Inc. Cleveland, OHIO 44122, U.S.A. Machine Intelligence."— Presentation transcript:
DATE MULTICASE Inc. By Gilles Klopman Chemistry Department, Case Western Reserve University and MULTICASE Inc. Cleveland, OHIO 44122, U.S.A. Machine Intelligence in the Design of Safer Chemicals. Wednesday May 19, 2010 QSAR - Duluth
DATE Scientific Problems in Computational Toxicology Molecules appear to have widely DIVERSE structure Difficulty of identifying specific reason for toxic activity Target is unknown or poorly understood eg. Carcinogenicity, Reproductive Toxicity, Neurotoxicity etc.... Hard to identify the responsible functionality Leading, or competing metabolism Solubility and Transport properties
DATE Structure-Activity Challenges Learning set of Experimental Data Model BuilderPredicted Values Is the learning set domain well defined Is the test set in the same domain as the learning set? Continuous Descriptors Fragment descriptors Molecular Geometry How good is the model? How good are the predictions? Congeneric Diverse How reliable is the data?
DATE META CASE Fragments based methodology Knowledge-based Systems
DATE MCASE Builds expert systems for each type of activity Use of biophores in mechanistic studies Activity of new chemicals can be predicted [Set of biophores] M C A S E M E T A
DATE BIOPHORES Linear chains of 2 to 10 heavy atoms. –May include a side chain. Example : -CH2-N-CH2- Remark : May combine to form larger groups Expanded fragments. –Documented valid variation of the Biophore Example : -CH2-N-CH - 2D Distance fragments –Distance between heteroatoms [7.8A] –Example : OH Cl
DATE MODULATORS 1. Linear fragments similar to Biophores. 2. Partition Coefficient 3. Water Solubility 4. HOMO/LUMO energies 5. Charge densities located on atoms of the Biophore 6. Location of: –Hydrogen donors –Hydrogen acceptors –Lipophilic centers with respect to Biophore
DATE Test of o-Nitroaniline The molecule contains the Biophore(nr.occ.= 1): NH2 -C - \\ C - *** 15 out of the known 19 molecules ( 79%) containing such Biophore are ChrAb active with an average activity of 30. (conf.level=100%) Constant is 40.0 Log partition coeff.= 1.01 ;LogP contribution is-1.5 ** The probability that this molecule is ChrAb active is 80.0% ** The activity is predicted to be MODERATE, activity= 39 NO 2 NH 2
DATE FDA Collaborators in developing new Modules FDA’s Center for Drug Evaluation and Research (CDER) Office of Pharmaceutical Science (OPS) Informatics and Computational Safety Analysis Staff (ICSAS) Office of Testing and Research (OTR) 1 PhRMA FDA / CDER / OPS / ICSAS E. Matthews OPS / ICSAS Joseph F. Contrera, Ph.D. Edwin J. Matthews, Ph.D.* R. Daniel Benz, Ph.D. Naomi L. Kruhlak, Ph.D. OPS / OTR James L. Weaver, Ph.D. Joseph P. Hanig, Ph.D. P. Scott Pine
DATE Example of MC4PC Inactivity Prediction A2H- mutagenic - all classes, salmo.typh.overall assa Now Processing... Linalool (Molecule # 1) This molecule already exists as nr.3955 of activity 10 (CASE units), under the name :Linalool MC calculated Water Solubility is: 0.83 [in log(mol/m**3)] MC calculated Log(Octanol/Water) Partition Coef.is: 2.97 Molecule satisfies the rule of 5,(bioavailable) MC Calculated Human Intestinal Absorption is: 90.4% MCASE-3 Prediction ** The molecule does not contain any known Biophore ** ** The probability that this molecule is mutagenic is 21% **- *** The molecule is known to be INACTIVE ***
DATE MCASE Prediction AF1- Carcinogenici- Male Rats (non-proprietary) # MULTICASE-3 Prediction The molecule contains the Biophore (nr.occ.= 1): O -c. =cH -c =c. - The ICSAS Alert Index for this Biophore is 325 *** 5 out of the known 5 molecules (100%) containing such Biophore are Carcinogenici with an average activity of 65. (conf.level= 97%) *** QSAR Contribution : Constant is ** The following Modulator(s) is/are also present: ( 1) OH -c = Inactivating ( 2) CO -c. = Inactivating Electronegativity = ; Its contribution is 1.82 Hard/Soft index is = 0.43 ; Its contribution is ** Total projected QSAR activity (in CASE units) is equal to CONCLUSIONS: ** The projected Carcinogenici activity is 56.0 CASE units ** ** The compound is predicted to be VERY active ** ** The probability that this molecule is Carcinogenici is 85% ** The Molecules containing fragment : O -c. =cH -c =c. - are : 1 in molecule 20 (0.91) acronycineof activity 70 1 in molecule 27 (0.78) aflatoxin B1of activity 75 1 in molecule 80 (0.61) aristolochic acid Iof activity 69 1 in molecule 81 (0.61) Aristolochic acid IIof activity 69 1 in molecule 1011 (0.82) sterigmatocystinof activity 43
DATE List of molecules
DATE AZ2- MUTAGENS - public domain and pharma # MC calculated Water Solubility is: 0.07 [in log(mol/m**3)] MC calculated Log(Water/Octanol) Partition Coef.is: 0.90 Molecule satisfies the rule of 5 (bioavailable) MC Calculated Human Intestinal Absorption is: 93.7 The molecule is a detergent ** WARNING ** The following functionalities are UNKNOWN to me: *** S -CO -c = *** CO -S -C = *** COH-N -C = MULTICASE-3 Prediction ** The molecule does not contain any known Biophore ** The results are QUESTIONABLE due to the presence of UNKNOWN functionalities ** CONCLUSIONS: *** The probability that this molecule is mutagenic is 21% ** The results are INCONCLUSIVE ICSAS METHOD CONCLUSIONS: ICSAS Method Expert Call: Negative Coverage: 3w Example of molecules seen as being outside the domain of validity of the module Acefurtiamine
DATE Example of molecules falling outside the domain of applicability MOLECULE 2. isopropyl-methyl-sulphonate ADH- AMES test - # MC calculated Water Solubility is: 2.24 [in log(mol/m**3)] MC calculated Log(Water/Octanol) Partition Coef.is: 1.29 Molecule satisfies the rule of 5 (bioavailable) MC Calculated Human Intestinal Absorption is: 90.3 ** WARNING ** The following functionalities are UNKNOWN to me: *** O -SO2-CH3 MULTICASE-3 Prediction ** The molecule does not contain any known Biophore ** The results are QUESTIONABLE due to the presence of UNKNOWN functionalities ** CONCLUSIONS: *** The probability that this molecule is AMES test Pos is 20 % ** ** The results are INCONCLUSIVE
DATE The molecule does now fall within the domain of applicability MOLECULE 2, isopropyl-methyl-sulphonate AGH- Ames test - ADH updated added 222 cmpds - # MC calculated Water Solubility is: 2.24 [in log(mol/m**3)] MC calculated Log(Water/Octanol) Partition Coef.is: 1.29 Molecule satisfies the rule of 5 (bioavailable) MC Calculated Human Intestinal Absorption is: 90.3 MULTICASE-3 Prediction The molecule contains the Biophore (nr.occ.= 1): O -SO2-CH3 The ICSAS Alert Index for this Biophore is 156 *** 4 out of the known 4 molecules (100) containing such Biophore are Ames test Pos with an average activity of 39. The probability that this fragment is linked to activity is: 75 ** The conf.level in this biophore is not very good ** *** QSAR Contribution : Constant is ** Total projected QSAR activity x, (x = response ) CONCLUSIONS: ** The projected mutagenic activity is 39.0 CASE units ** ** The activity is predicted to be MODERATE ** *** The probability that this molecule is Ames test Pos is 83% **
DATE AZ2 Ames Salmonella: Public domain and propr. pharmaceuticals Improving AMES mutagenicity model % 75.00% 80.00% 85.00% 90.00% 95.00% % % Concordance%Sensitivity%Specificity%Coverage A2I A2H Ames Salmonella: GENETOX + NTP + FDA (original) Ames Salmonella: GENETOX + NTP + FDA +Zeiger (rebuilt) Actives Inactives Total
I-Case, The Next Generation of the MultiCase Program Why do we need a new generation of the program : There is a need for the capability to generate expert system modules from larger databases. Currently the largest database we could handle is 8000 molecules. The majority of the computers will be with multiple core processors in the future. Current Multicase program does not support multi-core computation. We can achieve significant performance benefits if we support multi-core processing. Various biological properties of chemicals are being reported that have other elements than the organic set (C,H,N,S,O,P, Cl, Br, F, I). We needed support for other elements as well. We needed enhancements for quantitative prediction of activity mainly for databases with continuous activity. We needed enhancements so that the program can generate customized reports when chemicals are tested for activity.
What is I-CASE; A new program designed to replace MULTICASE Support for building models using larger databases (in an order of 50,000 and larger) that enabled us to build models for anticancer activity from NCI-60 cancer screenings. Support for computers with multiple-core processors. The result is significant increase in speed for building models, running cross validations and testing molecules. A totally revamped user interface that supports multitasking. Now we can build several models simultaneously, run several validations simultaneously and run tests simultaneously. The interface has several new features as well. The new program now supports all the elements of the full periodic table. Added several new continuous descriptors for building local QSARs for each biophore, e.g. Molar Refractivity, Vapor Pressure, E-State descriptors. Result is improvement in quantitative predictions. Improvements in the models for calculating water solubility, human intestinal absorption and logp. Addition of several pharmacokinetic models is in progress e.g. blood brain barrier permeation, plasma protein binding etc. New features have been made in the interface for generating customized test reports for chemicals.
EPA Against FDA FDA data is based on per unit molecule (per mole) while majority of EPA data is based on bulk (per milligram or per gram). Toxicity is generally considered to the result of the presence of certain molecular structural feature. It is not a bulk property; therefore molecules declared INACTIVE in EPA testing results can still be toxic at the molecular level whereas ACTIVE molecules in EPA data can be considered active. FDA data is suitable for QSAR modeling whereas a certain portion of EPA data can not be used in developing QSAR models. Most of the times EPA data can not be transformed to per mole type because the reported activity is of ACTIVE/INACTIVE type. FDA data makes sense from a QSAR point of view while EPA data makes sense from a practical point of view. Mixing FDA and EPA data in a single model results in the deterioration of the model quality.