P. Gramatica 1, H. Walter 2 and R. Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental Research.

Slides:



Advertisements
Similar presentations
The Influence of Chemical and Physical Factors on Macrobenthos in the San Francisco Estuary A Stressor Identification Method Aroon R. Melwani and Bruce.
Advertisements

Analysis of High-Throughput Screening Data C371 Fall 2004.
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
C A INTRODUCTION An Environmental Quality Objective (EQO), intended as a real “No Effect Concentration” (NEC), is not accessible experimentally. The usual.
Aggregating local image descriptors into compact codes
CLUSTERING PROXIMITY MEASURES
MODELLING OF PHYSICO-CHEMICAL PROPERTIES FOR ORGANIC POLLUTANTS F. Consolaro, P. Gramatica and S. Pozzi QSAR Research Unit, Dept. of Structural and Functional.
Chapter 6 Chemical Composition.
PROBABILISTIC ASSESSMENT OF THE QSAR APPLICATION DOMAIN Nina Jeliazkova 1, Joanna Jaworska 2 (1) IPP, Bulgarian Academy of Sciences, Sofia, Bulgaria (2)
Artificial Neural Networks
Metrics, Algorithms & Follow-ups Profile Similarity Measures Cluster combination procedures Hierarchical vs. Non-hierarchical Clustering Statistical follow-up.
AEB 37 / AE 802 Marketing Research Methods Week 7
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)
ABSTRACT The BEAM EU research project focuses on the risk assessment of mixture toxicity. A data set of 124 heterogeneous chemicals of high concern as.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
QSAR Modelling of Carcinogenicity for Regulatory Use in Europe Natalja Fjodorova, Marjana Novič, Marjan Vračko, Marjan Tušar, National institute of Chemistry,
Lecture 09 Clustering-based Learning
Clustering and MDS Exploratory Data Analysis. Outline What may be hoped for by clustering What may be hoped for by clustering Representing differences.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Combining Statistical and Physical Considerations in Deriving Targeted QSPRs Using Very Large Molecular Descriptor Databases Inga Paster and Mordechai.
Non ionic organic pesticide environmental behaviour: ranking and classification F. Consolaro and P. Gramatica QSAR Research Unit, Dept. of Structural and.
Molecular Descriptors
RESULT and DISCUSSION In order to find a relation between the three rate reaction constant (k OH, k NO3 and k O3 ) and the structural features of chemicals,
Surveillance monitoring Operational and investigative monitoring Chemical fate fugacity model QSAR Select substance Are physical data and toxicity information.
A unifying model of cation binding by humic substances Class: Advanced Environmental Chemistry (II) Presented by: Chun-Pao Su (Robert) Date: 2/9/1999.
ABSTRACT Bioconcentration by aquatic biota is an important factor in assessing the environmental behaviour and potential hazard evaluation of a chemical,
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
CONCLUSIONS CONCLUSIONS - Missing values of the principal physico-chemical properties are predicted by validated regression models by using different kinds.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
3D- QSAR. QSAR A QSAR is a mathematical relationship between a biological activity of a molecular system and its physicochemical parameters. QSAR attempts.
The aquatic toxicity values of 57 esters, with experimental and predicted LC50 in fish, EC50 in Daphnia and seaweed and IGC in Entosiphon sulcatum, were.
Paola GRAMATICA a, Paola LORENZINI a, Angela SANTAGOSTINO b and Ezio BOLZACCHINI b a University of Insubria, Dep. of Structural and Functional Biology,
AN APPROACH TO DETERMINE THE APPLICATION DOMAIN OF GROUP CONTRIBUTION MODELS Nina Jeliazkova 1 Joanna Jaworska 2, (2) Central Product Safety, Procter &
Institute for Advanced Studies in Basic Sciences – Zanjan Kohonen Artificial Neural Networks in Analytical Chemistry Mahdi Vasighi.
DESIRABILITY OF POPs ACCORDING TO THEIR ATMOSPHERIC MOBILITY The main goal pursued in this work is the formulation of a POP ranking by atmospheric mobility.
TOXICITY MODELLING OF “EEC PRIORITY LIST 1” COMPOUNDS TOXICITY MODELLING OF “EEC PRIORITY LIST 1” COMPOUNDS Council Directive 76/464/EEC of the European.
Martin Waldseemüller's World Map of 1507 Zanjan. Roberto Todeschini Viviana Consonni Davide Ballabio Andrea Mauri Alberto Manganaro chemometrics molecular.
Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.
A B S T R A C T The study presents the application of selected chemometric techniques to the pollution monitoring dataset, namely, cluster analysis,
Chemical Reactions in Ideal Gases. Non-reacting ideal gas mixture Consider a binary mixture of molecules of types A and B. The canonical partition function.
Molecular Specification Anan Wu Typical Gaussian Input Molecular specification This input section mainly specifies the nuclear positions.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
ABSTRACT The behavior and fate of chemicals in the environment is strongly influenced by the inherent properties of the compounds themselves, particularly.
P. Gramatica and F. Consolaro QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy.
An Overview of Clustering Methods Michael D. Kane, Ph.D.
QSAR AND CHEMOMETRIC APPROACHES TO THE SCREENING OF POPs FOR ENVIRONMENTAL PERSISTENCE AND LONG RANGE TRANSPORT FOR ENVIRONMENTAL PERSISTENCE AND LONG.
In the name of GOD. Zeinab Mokhtari 1-Mar-2010 In data analysis, many situations arise where plotting and visualization are helpful or an absolute requirement.
Organic pollutants environmental fate: modeling and prediction of global persistence by molecular descriptors P.Gramatica, F.Consolaro and M.Pavan QSAR.
Selecting Diverse Sets of Compounds C371 Fall 2004.
Log Koc = MW nNO – 0.19 nHA CIC MAXDP Ts s = 0.35 F 6, 134 = MW: molecular weight nNO: number of NO bonds.
Section 6.1 Atoms and Moles 1.To understand the concept of average mass 2.To learn how counting can be done by weighing 3.To understand atomic mass and.
F.Consolaro 1, P.Gramatica 1, H.Walter 2 and R.Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental.
MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of.
Use of Machine Learning in Chemoinformatics
Chapter 15: Correlation. Correlations: Measuring and Describing Relationships A correlation is a statistical method used to measure and describe the relationship.
Mixture Pure substances REVIEW. Mixtures: 1.Two or more _____________or _____________ NOT chemically combined 2.No reaction between substances. 3.Mixtures.
BEAM Bridging Effect Assessment of Mixtures to ecosystem situations and regulation University of Bremen, Germany University of Göteborg, Sweden University.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Toxicity vs CHEMICAL space
Self-Organizing Network Model (SOM) Session 11
PHYSICO-CHEMICAL PROPERTIES MODELLING FOR ENVIRONMENTAL POLLUTANTS
Hierarchical Classification of Calculated Molecular Descriptors
Principal Component Analysis (PCA)
Virtual Screening.
Clustering and Multidimensional Scaling
P. Gramatica1, F. Consolaro1, M. Vighi2, A. Finizio2 and M. Faust3
M.Pavan, P.Gramatica, F.Consolaro, V.Consonni, R.Todeschini
Presentation transcript:

P. Gramatica 1, H. Walter 2 and R. Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental Research - LEIPZIG - GERMANY Web: INTRODUCTION Environmental exposure situations are often characterized by a multitude of heterogeneous chemicals with different mechanisms of action and type of effect. The EEC priority List 1 (Council Directive 76/464/EEC) consists of heterogeneous environmental chemicals with mostly unknown or unspecific modes of action, so it was used to select components for mixture experiments in the EEC PREDICT (Prediction and Assessment of the Aquatic Toxicity of Mixtures of Chemicals) project. A list of 202 compounds was studied for structural similarity to identify the most representative and dissimilar chemicals and to find an objective method to group them on the basis of their structural aspects. STRUCTURAL DESCRIPTION OF COMPOUNDS Molecular descriptors represent the way chemical information contained in the molecular structure is transformed and coded. Among the theoretical descriptors, the best known, obtained simply from the knowledge of the formula, are: molecular weight and count descriptors (1D-descriptors, i. e. counting of bonds, atoms of different kind, presence or counting of functional groups and fragments, etc.). Graph-invariant descriptors (2D-descriptors, including both topological and information indices), are obtained from the knowledge of the molecular topology. WHIM molecular descriptors [1] contain information about the whole 3D-molecular structure in terms of size, symmetry and atom distribution. All these indices are calculated from the (x,y,z)-coordinates of a three-dimensional structure of a molecule, usually from a spatial conformation of minimum energy: 37 non-directional (or global) and 66 directional WHIM descriptors are obtained. A complete set of about two hundred molecular descriptors has been obtained [2]. [ 1] Todeschini R. and Gramatica P.; Quant.Struct.-Act.Relat. 1997, 16, ; [2] Todeschini R. and Consonni V. - DRAGON - Software for the calculation of the molecular descriptors., Talete srl, Milan (Italy) Download: REGRESSION MODELS QSAR models were developed by Ordinary Least Square regression (OLS) method. The selection of the best subset variables for modelling the algal toxicity of the studied compounds was done by a Genetic Algorithm (GA-VSS) approach and all the calculations have been performed by using the leave-one-out (LOO) and leave-more-out (LMO) procedures and the scrambling of the responses for the validation of the models. R 2 = 78 Q 2 LOO = 62.1 Q 2 LMO = 61.7 SDEP = SDEC = CONCLUSIONS The chemometric analyses here applied have been demonstrated to be very useful in ranking the studied chemicals in according to their structural similarity or dissimilarity. In the modelling of structural heterogeneous compounds with unknown mode of action, not very satisfactory QSAR models have been obtained. The role of specific parameters, such as directional WHIMs, capable to describe particular molecular features relevant for explaining the specific mode of action, is always relevant in QSAR models for congeneric chemicals. Increasing heterogeneity increases the role of structural and topological descriptors, accounting for general molecular features, not related to specific mode of action. This work was supported by the Environment & Climate programme for the European Commission, Contract EV4-CT (PREDICT) and Contract EVK1-CT (BEAM) CHEMOMETRIC METHODS Several chemometric analyses have been applied to the compounds (represented by molecular descriptors) to group the more similar ones, in accordance with a multivariate structural approach, and with the final aim to highlight the structurally most dissimilar compounds. The analyses performed are: : Hierarchical Cluster Analysis: hierarchical clustering was performed with the aim of finding clusters of the studied compounds in high dimensional space, using molecular descriptors as variables. Different distance metrics (Euclidean, Manhattan, Pearson) and different linkages (Complete, average, single, etc.) were used and compared to find the best way to cluster these compounds. Principal Component Analysis (PCA): this analysis was used to calculate just a few components from a large number of variables. These components allow the highlighting of the distribution of the compounds according to structure, and find the similarity between compounds assigned to the same cluster. : Kohonen Maps: this is an additional way of mapping similar compounds by using the so-called “self-organized topological feature maps”, which are maps that preserve the topology of a multidimensional representation within a toroidal two-dimensional representation. The position of the compounds in this map shows the similarity level of the structure of the EEC List 1 compounds. The chemicals selected as the structurally most dissimilar compounds are: N. Substance Chemical Class 1atrazine Triazine 2biphenyl Aromatic 3chloralhydrat Chlorinated aliphatics 42,4,5-trichlorophenol Benzene derivative 5fluoranthene PAH 6lindane HCH 7naphthalene PAH 8parathion Organophosphate 9phoxime Organophosphate 10tributyltin chloride Organotin 11triphenyltin chloride Organotin R 2 = 93.9 Q 2 LOO = 91.8 Q 2 LMO = 87.5 SDEP = SDEC = R 2 = 77 Q 2 LOO = 69.7 Q 2 LMO = 69.7 SDEP = SDEC = nO is the number of O atoms and IDE is the mean information content on the distance equality. HETEROGENEOUS COMPOUNDS CONGENERIC COMPOUNDS (NITROBENZENES) nOH is the number of OH groups, Sp is the sum of polarizabilities and Ds is the 3D-WHIM considering the global electrotopological distribution. HETEROGENEOUS + CONGENERIC COMPOUNDS nO is the number of O atoms, IDDM is the mean information content on the distance degree magnitude, while E1e is a directional 3D-WHIM descriptor of atomic distribution weighted on the electronegativity. RANKING OF “EEC PRIORITY LIST 1” CHEMICALS FOR STRUCTURAL SIMILARITY AND MODELLING OF ALGAL TOXICITY D 12