Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Data mining of toxic chemicals & database-based toxicity prediction Jiansuo Wang & Luhua Lai Institute of Physical Chemistry, Peking University P. R.

Similar presentations


Presentation on theme: "1 Data mining of toxic chemicals & database-based toxicity prediction Jiansuo Wang & Luhua Lai Institute of Physical Chemistry, Peking University P. R."— Presentation transcript:

1 1 Data mining of toxic chemicals & database-based toxicity prediction Jiansuo Wang & Luhua Lai Institute of Physical Chemistry, Peking University P. R. China

2 2 Our goal: to introduce risk assessment of chemicals in the early stage of drug design. Candidates generated by computer aid Initial screening of chemical toxicity Leads which are a bit “safer”

3 3 Due to computer-aided drug design, characteristics & difficulty of the problem besides the complexity of toxicity : The virtually generated molecules are numerous. The molecules designed for drugs may be structurally diverse. The molecules have no or little other information except for chemical structure.

4 4 In terms of structure-activity rules: expert system. In terms of statistical models: QSAR (Qualitative/Quantitative Structure Activity Relationship). How to evaluate the bio-activity (toxicity) of a large number of molecules only from their structure?

5 5 How to extract rules/models of toxic chemicals from the database of toxic chemicals to aid toxicity assessment? Structural features of toxic chemicals statistical analysis, similarity analysis, cluster analysis QSAR models of toxic chemicals QSAR combined with cluster analysis To the database RTECS

6 6 What features toxic chemicals? Molecular weight Atomic composition of molecules groups of molecules rings of molecules An initial database analysis shows that there is no distinct difference between toxic chemicals and drugs about these basic molecular features.

7 7 Classification of toxic substances according to action modes: 1) substances that exhibit extremes of acidity, basicity, dehydrating ability, or oxidizing power; 2) reactive substances that contain functional groups prone to react with biomolecules in a damaging way; 3) heavy metals; 4) lipid-soluble compounds; 5) binding species in a reversible or irreversible way that bond to biomolecules and alter the normal function, and so on. Manahan, S. E. Toxicological chemistry

8 8 Considering the integrality of molecules and the specificity of action modes between the molecules. A molecular structure pattern is defined as a template comprising a given framework and some given groups. It represents the common structural features shared by a series of molecules that are possible to act in a toxicologically similar manner. Structure patterns

9 9 How to get molecular structure patterns? Dissect the molecules Similarity comparison : Cluster analysis

10 10 Do structure patterns really exist in the database of toxic chemicals ? The underlining idea of structure patterns: Specificity of action modes Structural correlation among the molecules with similar action mode The embodiment of structure patterns in the database: Structure similarity among the molecules in the databases will become convergent when the size of the databases varies from small to large. Parallel analysis A large enough database will have predictive power for new toxic chemicals to a certain degree. Cross analysis

11 11 The curve of coverage rates vs size of databases when 0.6 is given as the similarity limit. Figure displays that prediction accuracy is given, prediction ability of the databases tends to be convergent when the database is large enough. It indicates of the possibility that structure patterns exist in the database.

12 12 The findings of systematic analysis about the database indicate: not only structure patterns promise to exist, but also it is necessary and feasible to search for structure patterns.

13 13 The representative molecules of some structure patterns of toxic chemicals

14 14 Data mining of toxic chemicals: QSAR combined with structure patterns A two-step strategy to explore noncongeneric toxic chemicals from the database: the screening of structure patterns and the generation of detailed relationship between structure and activity. First, an efficient similarity comparison is proposed to screen chemical patterns for further QSAR analysis. Then, QSAR study of structure pattern can provide the estimate of the activity as well as the detailed relationship between activity and structure.

15 15 An example of the implementation The representative molecule of the structure pattern (WLN: T6VMVMV FHJ F2Y&1 F2U1; CAS-number: 115- 44-6): Select one structure pattern. By computing molecular similarity, we get 189 chemicals from the database RTECS whose similarity values to the representative molecule are higher than 0.6. According to species observed and route of exposure, the chemicals mainly fall in the five major categories. Build CoMFA models between the structure and LD50 values about three series of chemicals.

16 16 Rabbit-intravenous: cross-validated and final fit CoMFA analysis with five components; 37 chemicals, q 2 = 0.608, r 2 = 0.981, F = 323.

17 17 Rabbit-intravenous: contour map of final CoMFA model; for steric effects, more bulk near green and less bulk near yellow is favorable to increase the active, while for electrostatic effects, more positive near blue and more negative near red is desirable for molecules to be more active.

18 18 The performance of overall procedure demonstrates: such a stepwise scheme is feasible and effective to mine a database of toxic chemicals. The scheme take account of structural diversity of toxic chemicals The scheme is a compromise between speed and accuracy.

19 19 Database of toxic chemicalsShapeAnal Inquiry molecule Structure-related set Field-based similarity analysis Flexible CoMFA analysis Close molecule & similarity-activity CoMFA model & activity prediction dbToxPre: database-based toxicity predictor of chemicals

20 20 The program mainly includes four parts: 1) a fast and efficient clustering selection of molecules based on molecular shape 2) field-based similarity computation of molecular structure based on shape cluster 3) flexible CoMFA analysis of molecules based on shape cluster 4) a database of toxic chemicals suitable for such procedure dbToxPre The characteristics of the program : fast; efficient; dynamically combining with the database

21 21 Inquiry molecule Marking of atoms in the molecule Structure description:dimension,ring systems, relative orientation of ring-system atoms Alignment of molecule shapes Structure-related set ShapeAnal:fast & efficient shape analysis of molecules

22 22 Molecular Field Concept : continuous property fields around the molecule produced by the molecular atoms. Similarity analysis of molecular field(Carbo index) : Comparative Molecular Field Analysis, CoMFA

23 23 Evolutionary Algorithm - considering flexibility of molecules Community/Population: structure-related set Species/Chromosome: combination of rotatable single bonds in the molecules Convergence: steady state of sorting Procedure: Parent generation Congenric mutation Child generation

24 24 Structure-related set Molecular alignment based on framework shape EA: conformation mutation & similarity comparison Similarity analysis & activity prediction Fast field-based similarity analysis

25 25 Flexible CoMFA The procedure of CoMFA Characteristics: considering conformational flexibility & hydrophobic field Structure-related set Molecular alignment based on framework shape EA: conformation mutation & CoMFA CoMFA model & activity prediction

26 26 Rebuilding of toxic-chemical database Seleciton of DBMS Sketch map of the design of Toxdb Michael Stonebraker’s classification: simple data & no inquiry--file system complex data & no inquiry--object-oriented DBMS simple data & inquiry -- relationship DBMS complex data & inquiry -- object-relationship DBMS: Postgresql

27 27 Database-based toxicity prediction of chemicals provides activity assessment of the inquiry molecule by a serial of related molecules from the database. The purposes: To try the best to use available known knowledge of related chemicals. To offset uncertainty of single data by mutual correction among a serial of molecules.

28 28 Conclusion 1. Data mining of toxic chemicals: structural patterns and QSAR, Jiansuo Wang, Luhua lai, Youqi Tang, J. Mol. Modelling,1999 , 252-262. 2. Predictive toxicology of toxic chemicals and database mining, Jiansuo Wang, Luhua lai, Youqi Tang, Chinese Science Bulletin, 2000, 45, 12, 1093-1097 。 3. Structural features of toxic chemicals for specific toxicity, Jiansuo Wang, Luhua lai, Youqi Tang, J. Chem. Inf. Comput. Sci.,1999 , 39 , 6 , 1173-1189. Initial analysis of toxic-chemical database confirms the concept of structure pattern of toxic chemicals. QSAR combined with structure pattern provide an alternative to explore noncongenric toxic chemicals in the database. Database-based toxicity prediction combines dynamically the database to assist risk assessment of chemicals. Data-mining & toxicity prediction: visualization computation Reference & paper: Storage computation: effective computation integrated into reasonable data storage

29 29 Acknowledgements Prof. Luhua Lai Prof. Youqi Tang Mr. Alan Gelberg …...


Download ppt "1 Data mining of toxic chemicals & database-based toxicity prediction Jiansuo Wang & Luhua Lai Institute of Physical Chemistry, Peking University P. R."

Similar presentations


Ads by Google