Presentation is loading. Please wait.

Presentation is loading. Please wait.

M.Pavan, P.Gramatica, F.Consolaro, V.Consonni, R.Todeschini

Similar presentations


Presentation on theme: "M.Pavan, P.Gramatica, F.Consolaro, V.Consonni, R.Todeschini"— Presentation transcript:

1 QSAR MODELLING OF THE AROMATIC AMINES MUTAGENICITY BY GENETIC ALGORITHM - VARIABLE SUBSET SELECTION
M.Pavan, P.Gramatica, F.Consolaro, V.Consonni, R.Todeschini QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy Web: INTRODUCTION Aromatic and heteroaromatic amines are widespread chemicals of considerable industrial and environmental relevance as they are carcinogenic for human beings. QSAR studies have been used to develop models to estimate and to predict mutagenicity by relating it to chemical structure. In mutagenicity QSAR applications, the investigators focus on either the molecular determinants that discriminate between active and inactive chemicals, or the modulators of the relative potency of the active chemicals. The development of a model to predict mutagenicity necessitates a test system capable of providing reproducible and quantitative estimates of toxic activity; the most widely used is a bacterial test, based on the Salmonella typhimurium strains (TA98  frameshift mutation; TA100  base-substitution mutation), introduced by Ames. The data set is constituted by 146 aromatic and heteroaromatic amines collected by Debnath1; mutagenicity data are expressed as the mutation rate in log (revertants/nmol). [1]A.K. Debnath et all. A QSAR investigation of the Role of Hydrophobicity in Regulating Mutagenicity in the Ames Test: 1. Mutagenicity of Aromatic and Heteroaromatic Amines in Salmonella typhimurium TA98 and TA100. Environmental and Molecular Mutagenesis 19, (1992). MOLECULAR DESCRIPTORS The molecular structure has been represented by a wide set of 657 molecular descriptors calculated by the software DRAGON2: constitutional descriptors (56) BCUT descriptors (7) walk counts (20) 2D autocorrelation descriptors (96) Galvez index (21) aromaticity descriptors (4) charge descriptors (7) geometrical descriptors (18) molecular profiles (40) WHIM descriptors (99) 3 3D-MoRSE descriptors (160) empirical descriptors (3) topological descriptors (69) [2]R.Todeschini and V.Consonni - DRAGON - Software for the calculation of molecular descriptors, version 1.0 for Windows,(2000), Milano Chemometric and QSAR Research Group.Free download available at: [3]R.Todeschini and P.Gramatica, 3D-modelling and prediction by WHIM descriptors. Part 5. Theory development and chemical meaning of the WHIM descriptors, Quant.Struct.-Act.Relat., 16 (1997) Training set Internal validation Test set External Models Predictions Q2LOO Q2LMO Zr ERcv Q2ext ERext TRAINING SET SELECTION In order to have knowledge of the predictive capability of the models both internal and external validations were performed. An experimental design based on the Todeschini-Marengo algorithm was used to select the most representative training set of amines: models were developed on the selected training set and predictions were made for the molecules excluded from the model generation step (test set). Molecular descriptors Experimental responses DATASET: amines 657 molecular descriptors 2 responses - TA98 - TA100 Training set Test set Variable subset selection Regression models Classification CART K-NN RDA CP-ANN OLS REGRESSION MODELS The mutagenicity potency has been modelled by Ordinary Least Squares (OLS) method using a selected subset starting from 657 different molecular descriptors; the selection of the best subset of variables has been realised by Genetic Algorithm (GA-VSS). The obtained models have been validated by leave-one-out (Q2LOO), leave-more-out (Q2LMO), y-scrambling (Zr) and an external test set (Q2ext) and show satisfactory predictive performances, considering the uncertainty of the biological end-points. TA98 TA100 LogTA98= MWC MATS7m+2.44Mor27u+1.12Mor15m LogTA100= nN-16.34ATS2e+12.43ATS4p-1.66GATS4p Training set = 60 compounds Test set = 39 comp. Training set = 46 compounds Test set = 30 comp. n.variables Q2LOO Q2LMO Q2ext R2 n.variables Q2LOO Q2LMO Q2ext R2 nN = number of Nitrogen atoms ATS2e = Broto-Moreau autocorrelation of a topological structure - lag 2 / weighted by atomic Sandreson electronegativities ATS4p = Broto-Moreau autocorrelation of a topological structure - lag 4 / weighted by atomic polarizabilities GATS4p = Geary autocorrelation - lag 4 / weighted by atomic polarizabilities MWC07 = number walk count of order 07 MATS7m = Moran autocorrelation - lag 7 / weighted by atomic masses Mor27u = 3D-MoRSE - signal 27 / unweighted Mor15m = 3D-MoRSE - signal 15 / weighted by atomic masses CLASSIFICATION MODELS Some classification methods (CART, K-NN, RDA and CP-ANN) have been applied to this data set in order to distinguish between activity classes. The selection of the best subset of variables has been realised by the experimental design based on the Todeschini - Marengo algorithm. The models have been validated internally (ER) and externally (ERext). The classification models for TA100 have showed a predictive power worse than the TA98 ones and thus they are not presented here. CONCLUSIONS frameshift mutation STRAIN TA98 molecular dimension molecular branching TA98 CART (classification and regression tree) aromatic amines intercalary agents BEPp1 < 3.77 2 1 NOMER% ER% ERext% Training set = 55 compounds Test set = 60 comp. STRAIN TA100 base-substitution mutation molecular dimension electronic properties aromatic amines complex base-pair substitution mutation 1 = mutagenic compounds 2 = non mutagenic compounds BEPp1 = positive eigenvalue n.1 / weighted by atomic polarizabilities


Download ppt "M.Pavan, P.Gramatica, F.Consolaro, V.Consonni, R.Todeschini"

Similar presentations


Ads by Google