1 Data mining of toxic chemicals & database-based toxicity prediction Jiansuo Wang & Luhua Lai Institute of Physical Chemistry, Peking University P. R.

Slides:



Advertisements
Similar presentations
SOMA2 – Drug Design Environment. Drug design environment – SOMA2 The SOMA2 project Tekes (National Technology Agency of Finland) DRUG2000 program.
Advertisements

Pharmaceutical Salt Selection Suzanna Ward BRAINFEST II.
Analysis of High-Throughput Screening Data C371 Fall 2004.
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Case Study: Dopamine D 3 Receptor Anthagonists Chapter 3 – Molecular Modeling 1.
1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
ADBIS 2007 A Clustering Approach to Generalized Pattern Identification Based on Multi-instanced Objects with DARA Rayner Alfred Dimitar Kazakov Artificial.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Consultation on Senior Cycle Science Anna Walshe Brendan Duane
EXPERT SYSTEMS apply rules to solve a problem. –The system uses IF statements and user answers to questions in order to reason just like a human does.
Establishing a Successful Virtual Screening Process Stephen Pickett Roche Discovery Welwyn.
Chapter 9 Business Intelligence Systems
M. Wagener 3D Database Searching and Scaffold Hopping Markus Wagener NV Organon.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Current and Emerging Paradigms in Environmental Toxicology Lecture 2.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Basic Facts Knowledge A Staff Tutorial. This tutorial will: 1.Define basic fact knowledge and outline why it is important 2.Introduce a teaching, learning.
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
Life Skills.  Physical Science  K.1  Observe, manipulate, sort and generate questions about objects and their physical properties. K.1.1 Use all senses.
VERTICAL ALIGNMENT PROJECT
Data Mining Techniques
AdvisorStudent Dr. Jia Li Shaojun Liu Dept. of Computer Science and Engineering, Oakland University 3D Shape Classification Using Conformal Mapping In.
Computational Techniques in Support of Drug Discovery October 2, 2002 Jeffrey Wolbach, Ph. D.
Protein Tertiary Structure Prediction
Molecular Descriptors
Data Mining Chun-Hung Chou
Combinatorial Chemistry and Library Design
1 An Efficient Classification Approach Based on Grid Code Transformation and Mask-Matching Method Presenter: Yo-Ping Huang Tatung University.
Process Flowsheet Generation & Design Through a Group Contribution Approach Lo ï c d ’ Anterroches CAPEC Friday Morning Seminar, Spring 2005.
Presented by Tienwei Tsai July, 2005
1 Research Groups : KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems SCI 2 SMetrology and Models Intelligent.
Exploratory IND Studies
PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
3D- QSAR. QSAR A QSAR is a mathematical relationship between a biological activity of a molecular system and its physicochemical parameters. QSAR attempts.
Ligand-based drug discovery No a priori knowledge of the receptor What information can we get from a few active compounds.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
Chemistry of LifeSection 4 Section 4: Energy and Metabolism Preview Bellringer Key Ideas Changing Matter Chemical Reactions Biological Reactions Summary.
Characterizing rooms …1 Characterizing rooms regarding reverberation time prediction and the sensitivity to absorption and scattering coefficient accuracy.
1 Effect of Spatial Locality on An Evolutionary Algorithm for Multimodal Optimization EvoNum 2010 Ka-Chun Wong, Kwong-Sak Leung, and Man-Hon Wong Department.
Selecting Diverse Sets of Compounds C371 Fall 2004.
December 1, Classification Analysis of HIV RNase H Bioassay Lianyi Han Computational Biology Branch NCBI/NLM/NIH Rocky ‘07.
O PTIMAL NANO - DESCRIPTORS AS TRANSLATORS OF ECLECTIC DATA INTO PREDICTION OF THE CELL MEMBRANE DAMAGE BY MEANS OF NANO METAL - OXIDES A LLA P. T OROPOVA.
Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.
Data Mining and Decision Support
D Nagesh Kumar, IIScOptimization Methods: M8L5 1 Advanced Topics in Optimization Evolutionary Algorithms for Optimization and Search.
Use of Machine Learning in Chemoinformatics
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
CoMFA Study of Piperidine Analogues of Cocaine at the Dopamine Transporter: Exploring the Binding Mode of the 3  -Substituent of the Piperidine Ring Using.
UNIT PLAN: FROM ATOMS TO POLYMERS Father Judge High School Grade 9 Physical Science Mr. A. Gutzler.
Abstract A step-wise or ‘tiered’ approach has been used as a rational procedure to conduct environmental risk assessments in many disciplines. The Technical.
Chapter 1: The Nature of Analytical Chemistry
A Computational Study of RNA Structure and Dynamics Rhiannon Jacobs and Harish Vashisth Department of Chemical Engineering, University of New Hampshire,
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 1 Lecture Slides.
Designing Drugs Virtually P14D461P - Arni B. Hj. Morshidi P14D389P - Anisah Bt Ismail P14D397P - Syarifah Rohaya Bt Wan Idris P14D394P - Dayang Adelina.
SMA5422: Special Topics in Biotechnology Lecture 11: Computer aided drug design: QSAR approach. SMA5422: Special Topics in Biotechnology Lecture 11: Computer.
Page 1 Computer-aided Drug Design —Profacgen. Page 2 The most fundamental goal in the drug design process is to determine whether a given compound will.
Forecasting a Country-Dependent Technology Growth
APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY
Virtual Screening.
Derivation of preliminary three-dimensional pharmacophoric maps for chemically diverse intravenous general anaesthetics†   J.C. Sewell, J.W. Sear  British.
IDEA International Dialogue for the Evaluation of Allergens
Evolutionary Ensembles with Negative Correlation Learning
Retrieval Performance Evaluation - Measures
Presentation transcript:

1 Data mining of toxic chemicals & database-based toxicity prediction Jiansuo Wang & Luhua Lai Institute of Physical Chemistry, Peking University P. R. China

2 Our goal: to introduce risk assessment of chemicals in the early stage of drug design. Candidates generated by computer aid Initial screening of chemical toxicity Leads which are a bit “safer”

3 Due to computer-aided drug design, characteristics & difficulty of the problem besides the complexity of toxicity : The virtually generated molecules are numerous. The molecules designed for drugs may be structurally diverse. The molecules have no or little other information except for chemical structure.

4 In terms of structure-activity rules: expert system. In terms of statistical models: QSAR (Qualitative/Quantitative Structure Activity Relationship). How to evaluate the bio-activity (toxicity) of a large number of molecules only from their structure?

5 How to extract rules/models of toxic chemicals from the database of toxic chemicals to aid toxicity assessment? Structural features of toxic chemicals statistical analysis, similarity analysis, cluster analysis QSAR models of toxic chemicals QSAR combined with cluster analysis To the database RTECS

6 What features toxic chemicals? Molecular weight Atomic composition of molecules groups of molecules rings of molecules An initial database analysis shows that there is no distinct difference between toxic chemicals and drugs about these basic molecular features.

7 Classification of toxic substances according to action modes: 1) substances that exhibit extremes of acidity, basicity, dehydrating ability, or oxidizing power; 2) reactive substances that contain functional groups prone to react with biomolecules in a damaging way; 3) heavy metals; 4) lipid-soluble compounds; 5) binding species in a reversible or irreversible way that bond to biomolecules and alter the normal function, and so on. Manahan, S. E. Toxicological chemistry

8 Considering the integrality of molecules and the specificity of action modes between the molecules. A molecular structure pattern is defined as a template comprising a given framework and some given groups. It represents the common structural features shared by a series of molecules that are possible to act in a toxicologically similar manner. Structure patterns

9 How to get molecular structure patterns? Dissect the molecules Similarity comparison : Cluster analysis

10 Do structure patterns really exist in the database of toxic chemicals ? The underlining idea of structure patterns: Specificity of action modes Structural correlation among the molecules with similar action mode The embodiment of structure patterns in the database: Structure similarity among the molecules in the databases will become convergent when the size of the databases varies from small to large. Parallel analysis A large enough database will have predictive power for new toxic chemicals to a certain degree. Cross analysis

11 The curve of coverage rates vs size of databases when 0.6 is given as the similarity limit. Figure displays that prediction accuracy is given, prediction ability of the databases tends to be convergent when the database is large enough. It indicates of the possibility that structure patterns exist in the database.

12 The findings of systematic analysis about the database indicate: not only structure patterns promise to exist, but also it is necessary and feasible to search for structure patterns.

13 The representative molecules of some structure patterns of toxic chemicals

14 Data mining of toxic chemicals: QSAR combined with structure patterns A two-step strategy to explore noncongeneric toxic chemicals from the database: the screening of structure patterns and the generation of detailed relationship between structure and activity. First, an efficient similarity comparison is proposed to screen chemical patterns for further QSAR analysis. Then, QSAR study of structure pattern can provide the estimate of the activity as well as the detailed relationship between activity and structure.

15 An example of the implementation The representative molecule of the structure pattern (WLN: T6VMVMV FHJ F2Y&1 F2U1; CAS-number: ): Select one structure pattern. By computing molecular similarity, we get 189 chemicals from the database RTECS whose similarity values to the representative molecule are higher than 0.6. According to species observed and route of exposure, the chemicals mainly fall in the five major categories. Build CoMFA models between the structure and LD50 values about three series of chemicals.

16 Rabbit-intravenous: cross-validated and final fit CoMFA analysis with five components; 37 chemicals, q 2 = 0.608, r 2 = 0.981, F = 323.

17 Rabbit-intravenous: contour map of final CoMFA model; for steric effects, more bulk near green and less bulk near yellow is favorable to increase the active, while for electrostatic effects, more positive near blue and more negative near red is desirable for molecules to be more active.

18 The performance of overall procedure demonstrates: such a stepwise scheme is feasible and effective to mine a database of toxic chemicals. The scheme take account of structural diversity of toxic chemicals The scheme is a compromise between speed and accuracy.

19 Database of toxic chemicalsShapeAnal Inquiry molecule Structure-related set Field-based similarity analysis Flexible CoMFA analysis Close molecule & similarity-activity CoMFA model & activity prediction dbToxPre: database-based toxicity predictor of chemicals

20 The program mainly includes four parts: 1) a fast and efficient clustering selection of molecules based on molecular shape 2) field-based similarity computation of molecular structure based on shape cluster 3) flexible CoMFA analysis of molecules based on shape cluster 4) a database of toxic chemicals suitable for such procedure dbToxPre The characteristics of the program : fast; efficient; dynamically combining with the database

21 Inquiry molecule Marking of atoms in the molecule Structure description:dimension,ring systems, relative orientation of ring-system atoms Alignment of molecule shapes Structure-related set ShapeAnal:fast & efficient shape analysis of molecules

22 Molecular Field Concept : continuous property fields around the molecule produced by the molecular atoms. Similarity analysis of molecular field(Carbo index) : Comparative Molecular Field Analysis, CoMFA

23 Evolutionary Algorithm - considering flexibility of molecules Community/Population: structure-related set Species/Chromosome: combination of rotatable single bonds in the molecules Convergence: steady state of sorting Procedure: Parent generation Congenric mutation Child generation

24 Structure-related set Molecular alignment based on framework shape EA: conformation mutation & similarity comparison Similarity analysis & activity prediction Fast field-based similarity analysis

25 Flexible CoMFA The procedure of CoMFA Characteristics: considering conformational flexibility & hydrophobic field Structure-related set Molecular alignment based on framework shape EA: conformation mutation & CoMFA CoMFA model & activity prediction

26 Rebuilding of toxic-chemical database Seleciton of DBMS Sketch map of the design of Toxdb Michael Stonebraker’s classification: simple data & no inquiry--file system complex data & no inquiry--object-oriented DBMS simple data & inquiry -- relationship DBMS complex data & inquiry -- object-relationship DBMS: Postgresql

27 Database-based toxicity prediction of chemicals provides activity assessment of the inquiry molecule by a serial of related molecules from the database. The purposes: To try the best to use available known knowledge of related chemicals. To offset uncertainty of single data by mutual correction among a serial of molecules.

28 Conclusion 1. Data mining of toxic chemicals: structural patterns and QSAR, Jiansuo Wang, Luhua lai, Youqi Tang, J. Mol. Modelling,1999 , Predictive toxicology of toxic chemicals and database mining, Jiansuo Wang, Luhua lai, Youqi Tang, Chinese Science Bulletin, 2000, 45, 12, 。 3. Structural features of toxic chemicals for specific toxicity, Jiansuo Wang, Luhua lai, Youqi Tang, J. Chem. Inf. Comput. Sci.,1999 , 39 , 6 , Initial analysis of toxic-chemical database confirms the concept of structure pattern of toxic chemicals. QSAR combined with structure pattern provide an alternative to explore noncongenric toxic chemicals in the database. Database-based toxicity prediction combines dynamically the database to assist risk assessment of chemicals. Data-mining & toxicity prediction: visualization computation Reference & paper: Storage computation: effective computation integrated into reasonable data storage

29 Acknowledgements Prof. Luhua Lai Prof. Youqi Tang Mr. Alan Gelberg …...