M.Pavan, P.Gramatica, F.Consolaro, V.Consonni, R.Todeschini

Slides:



Advertisements
Similar presentations
Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005.
Advertisements

Analysis of High-Throughput Screening Data C371 Fall 2004.
C A INTRODUCTION An Environmental Quality Objective (EQO), intended as a real “No Effect Concentration” (NEC), is not accessible experimentally. The usual.
1 Schedule 8:30-9:30 Introduction 9:40- 10:45 Analysis Methods 10:55-12:00 Design and Analysis 12:00 Lunch 1:00-2:05 Design and Analysis I (Will and Stan)
1 Sequential Screening S. Stanley Young NISS HTS Workshop October 25, 2002.
Ch 16 Amines Homework problems: 16.9, 16.10, 16.21, 16.25, 16.39,
MODELLING OF PHYSICO-CHEMICAL PROPERTIES FOR ORGANIC POLLUTANTS F. Consolaro, P. Gramatica and S. Pozzi QSAR Research Unit, Dept. of Structural and Functional.
ABSTRACT The BEAM EU research project focuses on the risk assessment of mixture toxicity. A data set of 124 heterogeneous chemicals of high concern as.
Faculty of Computer Science © 2006 CMPUT 605February 04, 2008 Novel Approaches for Small Bio-molecule Classification and Structural Similarity Search Karakoc.
Basic Steps of QSAR/QSPR Investigations
4 Th Iranian chemometrics Workshop (ICW) Zanjan-2004.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Cloud Computing for Chemical Property Prediction Paul Watson School of Computing Science Newcastle University, UK Microsoft Cloud.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
QSAR Modelling of Carcinogenicity for Regulatory Use in Europe Natalja Fjodorova, Marjana Novič, Marjan Vračko, Marjan Tušar, National institute of Chemistry,
Application and Efficacy of Random Forest Method for QSAR Analysis
Graduate Research Symposium 2014William G. Lowrie Dept. of Chemical and Biomolecular Engineering Evaluating the potential toxicity of chemical compounds.
A Comparative Analysis of Software Refinement Techniques Ion IVAN Adrian VISOIU.
Combining Statistical and Physical Considerations in Deriving Targeted QSPRs Using Very Large Molecular Descriptor Databases Inga Paster and Mordechai.
Non ionic organic pesticide environmental behaviour: ranking and classification F. Consolaro and P. Gramatica QSAR Research Unit, Dept. of Structural and.
Molecular Descriptors
RESULT and DISCUSSION In order to find a relation between the three rate reaction constant (k OH, k NO3 and k O3 ) and the structural features of chemicals,
Surveillance monitoring Operational and investigative monitoring Chemical fate fugacity model QSAR Select substance Are physical data and toxicity information.
Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.
ABSTRACT Bioconcentration by aquatic biota is an important factor in assessing the environmental behaviour and potential hazard evaluation of a chemical,
Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander.
Predicting a Variety of Constant Pure Compound Properties by the Targeted QSPR Method Abstract The possibility of obtaining a reliable prediction a wide.
CONCLUSIONS CONCLUSIONS - Missing values of the principal physico-chemical properties are predicted by validated regression models by using different kinds.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
“Emergency discovery” of novel antimicrobials among known drugs in response to new and re-emerging infectious threats A. Cherkasov UBC / VGH Infectious.
The aquatic toxicity values of 57 esters, with experimental and predicted LC50 in fish, EC50 in Daphnia and seaweed and IGC in Entosiphon sulcatum, were.
Paola GRAMATICA a, Paola LORENZINI a, Angela SANTAGOSTINO b and Ezio BOLZACCHINI b a University of Insubria, Dep. of Structural and Functional Biology,
Identifying Applicability Domains for Quantitative Structure Property Relationships Mordechai Shacham a, Neima Brauner b Georgi St. Cholakov c and Roumiana.
Institute for Advanced Studies in Basic Sciences – Zanjan Kohonen Artificial Neural Networks in Analytical Chemistry Mahdi Vasighi.
DESIRABILITY OF POPs ACCORDING TO THEIR ATMOSPHERIC MOBILITY The main goal pursued in this work is the formulation of a POP ranking by atmospheric mobility.
TOXICITY MODELLING OF “EEC PRIORITY LIST 1” COMPOUNDS TOXICITY MODELLING OF “EEC PRIORITY LIST 1” COMPOUNDS Council Directive 76/464/EEC of the European.
Martin Waldseemüller's World Map of 1507 Zanjan. Roberto Todeschini Viviana Consonni Davide Ballabio Andrea Mauri Alberto Manganaro chemometrics molecular.
Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.
QSAR Study of HIV Protease Inhibitors Using Neural Network and Genetic Algorithm Akmal Aulia, 1 Sunil Kumar, 2 Rajni Garg, * 3 A. Srinivas Reddy, 4 1 Computational.
ABSTRACT The behavior and fate of chemicals in the environment is strongly influenced by the inherent properties of the compounds themselves, particularly.
P. Gramatica and F. Consolaro QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy.
QSAR AND CHEMOMETRIC APPROACHES TO THE SCREENING OF POPs FOR ENVIRONMENTAL PERSISTENCE AND LONG RANGE TRANSPORT FOR ENVIRONMENTAL PERSISTENCE AND LONG.
Organic pollutants environmental fate: modeling and prediction of global persistence by molecular descriptors P.Gramatica, F.Consolaro and M.Pavan QSAR.
Selecting Diverse Sets of Compounds C371 Fall 2004.
Log Koc = MW nNO – 0.19 nHA CIC MAXDP Ts s = 0.35 F 6, 134 = MW: molecular weight nNO: number of NO bonds.
What is the formula of this compound if: A compound is: 40.1% C 6.6% H 53.3% O What is the empirical formula of this compound? What is the molecular formula.
F.Consolaro 1, P.Gramatica 1, H.Walter 2 and R.Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental.
MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of.
Introduction to the Periodic Table Atomic Number ● Symbol ● Atomic Weight Element ● Compound ● Mixture.
P. Gramatica 1, H. Walter 2 and R. Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental Research.
Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.
Use of Machine Learning in Chemoinformatics
Computational Approach for Combinatorial Library Design Journal club-1 Sushil Kumar Singh IBAB, Bangalore.
Outline Time series prediction Find k-nearest neighbors Lag selection Weighted LS-SVM.
Artificial Neural Network
PHYSICO-CHEMICAL PROPERTIES MODELLING FOR ENVIRONMENTAL POLLUTANTS
Nahid Abbas and Sonal Dubey
Hierarchical Classification of Calculated Molecular Descriptors
SMA5422: Special Topics in Biotechnology
Building Hypotheses and Searching Databases
Relationship between Genotype and Phenotype
Relationship between Genotype and Phenotype
Virtual Screening.
P. Gramatica1, F. Consolaro1, M. Vighi2, A. Finizio2 and M. Faust3
Introduction to Chemical Principles
Relationship between Genotype and Phenotype
CARCINOGENICITY: ABILITY TO CAUSE CANCER
Created by C. Ippolito June 2007
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Regression and Clinical prediction models
Presentation transcript:

QSAR MODELLING OF THE AROMATIC AMINES MUTAGENICITY BY GENETIC ALGORITHM - VARIABLE SUBSET SELECTION M.Pavan, P.Gramatica, F.Consolaro, V.Consonni, R.Todeschini QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy e-mail: manuela.pavan@libero.it Web: http://fisio.dipbsf.uninsubria.it/dbsf/qsar/QSAR.html INTRODUCTION Aromatic and heteroaromatic amines are widespread chemicals of considerable industrial and environmental relevance as they are carcinogenic for human beings. QSAR studies have been used to develop models to estimate and to predict mutagenicity by relating it to chemical structure. In mutagenicity QSAR applications, the investigators focus on either the molecular determinants that discriminate between active and inactive chemicals, or the modulators of the relative potency of the active chemicals. The development of a model to predict mutagenicity necessitates a test system capable of providing reproducible and quantitative estimates of toxic activity; the most widely used is a bacterial test, based on the Salmonella typhimurium strains (TA98  frameshift mutation; TA100  base-substitution mutation), introduced by Ames. The data set is constituted by 146 aromatic and heteroaromatic amines collected by Debnath1; mutagenicity data are expressed as the mutation rate in log (revertants/nmol). [1]A.K. Debnath et all. A QSAR investigation of the Role of Hydrophobicity in Regulating Mutagenicity in the Ames Test: 1. Mutagenicity of Aromatic and Heteroaromatic Amines in Salmonella typhimurium TA98 and TA100. Environmental and Molecular Mutagenesis 19, 37-52 (1992). MOLECULAR DESCRIPTORS The molecular structure has been represented by a wide set of 657 molecular descriptors calculated by the software DRAGON2: constitutional descriptors (56) BCUT descriptors (7) walk counts (20) 2D autocorrelation descriptors (96) Galvez index (21) aromaticity descriptors (4) charge descriptors (7) geometrical descriptors (18) molecular profiles (40) WHIM descriptors (99) 3 3D-MoRSE descriptors (160) empirical descriptors (3) topological descriptors (69) [2]R.Todeschini and V.Consonni - DRAGON - Software for the calculation of molecular descriptors, version 1.0 for Windows,(2000), Milano Chemometric and QSAR Research Group.Free download available at: http://www.disat.unimib.it/chm [3]R.Todeschini and P.Gramatica, 3D-modelling and prediction by WHIM descriptors. Part 5. Theory development and chemical meaning of the WHIM descriptors, Quant.Struct.-Act.Relat., 16 (1997) 113-119. Training set Internal validation Test set External Models Predictions Q2LOO Q2LMO Zr ERcv Q2ext ERext TRAINING SET SELECTION In order to have knowledge of the predictive capability of the models both internal and external validations were performed. An experimental design based on the Todeschini-Marengo algorithm was used to select the most representative training set of amines: models were developed on the selected training set and predictions were made for the molecules excluded from the model generation step (test set). Molecular descriptors Experimental responses DATASET: 146 amines 657 molecular descriptors 2 responses - TA98 - TA100 Training set Test set Variable subset selection Regression models Classification CART K-NN RDA CP-ANN OLS REGRESSION MODELS The mutagenicity potency has been modelled by Ordinary Least Squares (OLS) method using a selected subset starting from 657 different molecular descriptors; the selection of the best subset of variables has been realised by Genetic Algorithm (GA-VSS). The obtained models have been validated by leave-one-out (Q2LOO), leave-more-out (Q2LMO), y-scrambling (Zr) and an external test set (Q2ext) and show satisfactory predictive performances, considering the uncertainty of the biological end-points. TA98 TA100 LogTA98=-3.98+2.40MWC07+0.56MATS7m+2.44Mor27u+1.12Mor15m LogTA100=14.86-0.36nN-16.34ATS2e+12.43ATS4p-1.66GATS4p Training set = 60 compounds Test set = 39 comp. Training set = 46 compounds Test set = 30 comp. n.variables Q2LOO Q2LMO Q2ext R2 4 76.6 75.9 69.0 80.3 n.variables Q2LOO Q2LMO Q2ext R2 4 80.9 80.7 66.7 83.9 nN = number of Nitrogen atoms ATS2e = Broto-Moreau autocorrelation of a topological structure - lag 2 / weighted by atomic Sandreson electronegativities ATS4p = Broto-Moreau autocorrelation of a topological structure - lag 4 / weighted by atomic polarizabilities GATS4p = Geary autocorrelation - lag 4 / weighted by atomic polarizabilities MWC07 = number walk count of order 07 MATS7m = Moran autocorrelation - lag 7 / weighted by atomic masses Mor27u = 3D-MoRSE - signal 27 / unweighted Mor15m = 3D-MoRSE - signal 15 / weighted by atomic masses CLASSIFICATION MODELS Some classification methods (CART, K-NN, RDA and CP-ANN) have been applied to this data set in order to distinguish between activity classes. The selection of the best subset of variables has been realised by the experimental design based on the Todeschini - Marengo algorithm. The models have been validated internally (ER) and externally (ERext). The classification models for TA100 have showed a predictive power worse than the TA98 ones and thus they are not presented here. CONCLUSIONS frameshift mutation STRAIN TA98 molecular dimension molecular branching TA98 CART (classification and regression tree) aromatic amines intercalary agents BEPp1 < 3.77 2 1 14.5 9.1 6.7 NOMER% ER% ERext% Training set = 55 compounds Test set = 60 comp. STRAIN TA100 base-substitution mutation molecular dimension electronic properties aromatic amines complex base-pair substitution mutation 1 = mutagenic compounds 2 = non mutagenic compounds BEPp1 = positive eigenvalue n.1 / weighted by atomic polarizabilities