Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Slides:



Advertisements
Similar presentations
Analysis of High-Throughput Screening Data C371 Fall 2004.
Advertisements

color code vocabulary words and definitions
C A INTRODUCTION An Environmental Quality Objective (EQO), intended as a real “No Effect Concentration” (NEC), is not accessible experimentally. The usual.
Screening of a Sulfonamides Library by Supercritical Fluid Chromatography Coupled to Mass Spectrometry (SFC-MS). Preliminary properties-retention study.
Everardo Macias, Patrick Tomboc Eamonn F. Healy, Chemistry Department,
CE Introduction to Environmental Engineering and Science Readings for This Class: O hio N orthern U niversity Introduction Chemistry, Microbiology.
Copyright © Allyn & Bacon (2007) Research is a Process of Inquiry Graziano and Raulin Research Methods: Chapter 2 This multimedia product and its contents.
MODELLING OF PHYSICO-CHEMICAL PROPERTIES FOR ORGANIC POLLUTANTS F. Consolaro, P. Gramatica and S. Pozzi QSAR Research Unit, Dept. of Structural and Functional.
6th lecture Modern Methods in Drug Discovery WS07/08 1 More QSAR Problems: Which descriptors to use How to test/validate QSAR equations (continued from.
ABSTRACT The BEAM EU research project focuses on the risk assessment of mixture toxicity. A data set of 124 heterogeneous chemicals of high concern as.
Basic Steps of QSAR/QSPR Investigations
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
8 th Iranian workshop of Chemometrics 7-9 February 2009 Progress of Chemometrics in Iran Mehdi Jalali-Heravi February 2009 In the Name of God.
Quantitative Structure- Activity Relationships (QSAR)
Graph theory as a method of improving chemistry and mathematics curricula Franka M. Brückler, Dept. of Mathematics, University of Zagreb (Croatia) Vladimir.
Chemistry 11 Resource: Chang’s Chemistry Chapter 9.
Quantitative Structure-Activity Relationships (QSAR)  Attempts to identify and quantitate physicochemical properties of a drug in relation to its biological.
Lecture 7: Computer aided drug design: Statistical approach. Lecture 7: Computer aided drug design: Statistical approach. Chen Yu Zong Department of Computational.
The Science of Life Biology unifies much of natural science
Skills of GEOMETRIC THINKING in undergraduate level Arash Rastegar Assistant Professor Sharif University of Technology.
Combining Statistical and Physical Considerations in Deriving Targeted QSPRs Using Very Large Molecular Descriptor Databases Inga Paster and Mordechai.
Non ionic organic pesticide environmental behaviour: ranking and classification F. Consolaro and P. Gramatica QSAR Research Unit, Dept. of Structural and.
Molecular Descriptors
Ch 23 pages Lecture 15 – Molecular interactions.
Similarity Methods C371 Fall 2004.
Process Flowsheet Generation & Design Through a Group Contribution Approach Lo ï c d ’ Anterroches CAPEC Friday Morning Seminar, Spring 2005.
RESULT and DISCUSSION In order to find a relation between the three rate reaction constant (k OH, k NO3 and k O3 ) and the structural features of chemicals,
Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.
Section 1: The Nature of Science
“Topological Index Calculator” A JavaScript application to introduce quantitative structure-property relationships (QSPR) in undergraduate organic chemistry.
Language Objective: Students will be able to practice agreeing and disagreeing with partner or small group, interpret and discuss illustrations, identify.
CONCLUSIONS CONCLUSIONS - Missing values of the principal physico-chemical properties are predicted by validated regression models by using different kinds.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
3D- QSAR. QSAR A QSAR is a mathematical relationship between a biological activity of a molecular system and its physicochemical parameters. QSAR attempts.
Ligand-based drug discovery No a priori knowledge of the receptor What information can we get from a few active compounds.
University of Texas at AustinMichigan Technological University 1 Module 2: Evaluating Environmental Partitioning and Fate: Approaches based on chemical.
DESIRABILITY OF POPs ACCORDING TO THEIR ATMOSPHERIC MOBILITY The main goal pursued in this work is the formulation of a POP ranking by atmospheric mobility.
Martin Waldseemüller's World Map of 1507 Zanjan. Roberto Todeschini Viviana Consonni Davide Ballabio Andrea Mauri Alberto Manganaro chemometrics molecular.
Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.
QSAR Study of HIV Protease Inhibitors Using Neural Network and Genetic Algorithm Akmal Aulia, 1 Sunil Kumar, 2 Rajni Garg, * 3 A. Srinivas Reddy, 4 1 Computational.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
ABSTRACT The behavior and fate of chemicals in the environment is strongly influenced by the inherent properties of the compounds themselves, particularly.
P. Gramatica and F. Consolaro QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy.
QSAR AND CHEMOMETRIC APPROACHES TO THE SCREENING OF POPs FOR ENVIRONMENTAL PERSISTENCE AND LONG RANGE TRANSPORT FOR ENVIRONMENTAL PERSISTENCE AND LONG.
Organic pollutants environmental fate: modeling and prediction of global persistence by molecular descriptors P.Gramatica, F.Consolaro and M.Pavan QSAR.
Log Koc = MW nNO – 0.19 nHA CIC MAXDP Ts s = 0.35 F 6, 134 = MW: molecular weight nNO: number of NO bonds.
F.Consolaro 1, P.Gramatica 1, H.Walter 2 and R.Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental.
MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of.
P. Gramatica 1, H. Walter 2 and R. Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental Research.
1.3: Scientific Thinking & Processes Key concept: Science is a way of thinking, questioning, and gathering evidence.
Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.
Use of Machine Learning in Chemoinformatics
Introduction to ScienceSection 1 Section 1: The Nature of Science Preview Key Ideas Bellringer How Science Takes Place The Branches of Science Scientific.
Introduction to ScienceSection 1 SCSh8 Students will understand important features of the process of scientific inquiry.
1 Prediction of Phase Equilibrium Related Properties by Correlations Based on Similarity of Molecular Structures N. Brauner a, M. Shacham b, R.P. Stateva.
PHYSICO-CHEMICAL PROPERTIES MODELLING FOR ENVIRONMENTAL POLLUTANTS
Hierarchical Classification of Calculated Molecular Descriptors
Chap 1,2&3 Review Honors Chemistry.
TOPIC 4 CHEMICAL BONDING AND STRUCTURE
Introduction to science
Virtual Screening.
High Throughput Experimentation: Computational Requirements
Current Status at BioChemtek
P. Gramatica1, F. Consolaro1, M. Vighi2, A. Finizio2 and M. Faust3
Solid state Chemistry (CHEM 422)
Topological Index Calculator III
New compounds with improved biological activity
M.Pavan, P.Gramatica, F.Consolaro, V.Consonni, R.Todeschini
Key Ideas How do scientists explore the world?
Presentation transcript:

Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria decision making environmetrics experimental design artificial neural networks statistical process control Milano Chemometrics and QSAR Research Group Department of Environmental Sciences University of Milano - Bicocca P.za della Scienza, Milano (Italy) Website: michem.unimib.it/chm/

Roberto Todeschini Milano Chemometrics and QSAR Research Group An introduction to molecular descriptors and QSAR Iran - February 2009

 synthesis: chemistry produces the objetcs of its own study  chemical composition: a unifying concept for all the experimental sciences  molecular structure: one the most fruitful scientific concepts of this century  synthesis: chemistry produces the objetcs of its own study  chemical composition: a unifying concept for all the experimental sciences  molecular structure: one the most fruitful scientific concepts of this century The chemical data

The concept of molecular structure is one of the most reach of the last 140 years. Molecular structure

The basic assumptions are that different molecular structures have different chemical properties and similar molecular structures have similar molecular properties. Molecular structure congenericity principle

Each molecular representation represents a different way to look at the molecular structure and its chemical meaning is strongly immersed in the framework of the chemical theories. Molecular structure

Some historical notes

Studi sull’isomeria delle così dette sostanze aromatiche a sei atomi di carbonio. Gazzetta Chimica Italiana, vol. IV, p.305 Some historical notes 1874 Wilhelm KÖRNER

To distinguish the observed different di-substituted benzenes, he proposed to distinguish them into ortho-, meta-, and para-. Some historical notes These can be considered the first 3 molecular descriptors 1874 Wilhelm KÖRNER

Based on these descriptors, 90 years later, Corwin Hansch proposed the first QSAR approach. Some historical notes Lipophilic, electronic and steric descriptors for ortho-, meta-, and para-substituents 1964 Corwin HANSCH

“The molecular descriptor is the final result of a logic and mathematical procedure which transforms chemical information encoded within a symbolic representation of a molecule into a useful number or the result of some standardized experiment.” R. Todeschini and V. Consonni Definition of molecular descriptor Molecular descriptors

 3300 molecular descriptors Molecular descriptors

lion forefeet eagle hind legs scorpion tail dragon head bull body unicorn snake neck Molecular descriptors

size symmetry branching steric shape cyclicity hydrophobicity H - bonding electronic aspects reactivity Molecular descriptors

size symmetry branching steric shape cyclicity hydrophobicity H - bonding electronic aspects several meanings in just one number reactivity Molecular descriptors

graph theory discrete mathematics physical chemistry information theory quantum chemistry organic chemistry differential topology algebraic topology derived from …. QSAR/QSPR medicinal chemistry pharmacology genomics drug design toxicology proteomics analytical chemistry environmetrics virtual screening library searching applied in …. statisticschemometricschemoinformatics processed by …. Molecular descriptors

molecule physico - chemical properties  biological activities  molecular descriptors  Molecular descriptors

Historical note: fragment approach The biological activity of a molecule is the sum of its fragment properties common reference skeleton molecule properties gradually modified by substituents Congenericity principle QSAR styrategies can be applied ONLY to classes of similar compounds

Biological response = f 1 (L) + f 2 (E) + f 3 (S) + f 4 (M) Corvin Hansch, 1964 Historical note: Hansch approach Lipophilic properties Electronic properties Steric properties Other molecular properties

1 Congenericity approach 2 Linear additive scheme 3 Limited representation of global molecular properties 4 No 3D and conformational information Historical note: Hansch approach

boiling point melting point dipole moment molar refractivity parachor octanol/water partition coefficient vapor pressure density solubility Physico-chemical properties The role of the molecular descriptors

binding affinity lethal dose inhibition concentration mutagenicity carcinogenicity Biological activities The role of the molecular descriptors

biodegradation bioconcentration BOD COD half - life time mobility atmospheric persistance Environmental properties The role of the molecular descriptors

.... and more conductivity retention time reological behaviours The role of the molecular descriptors

molecule molecular descriptors  molecular structure representation a real object numbers Representations of a molecular structure

3D - geometrical 0D - counts Representations of a molecular structure Cl H H H H H H 2D - topochemical 2D - topostructural.. ·· ·· ·· ·· ·· ·· C C C C C C CC C C CC C l H H H H H H 1D – fragment counts.. ·· ·· ·· ·· ·· ·· C C C C C C CC C C CC C l H H H H H H

probes interaction energy value at each point for each probe steric steric electronic electronic hydrophobic hydrophobic Representations of a molecular structure 4D

molecular graph graph invariants topostructural descriptors topochemical descriptors topographic descriptors topological information indices 2D Atom list 0D countingsumming grid-based QSAR techniques interaction energy values 4D Substructure list 1D counting molecular geometry x, y, z coordinates geometrical descriptors quantum-chemical descriptors bulk descriptors molecular surface descriptors 3D structural keys

molecular graph graph invariants Wiener index, Hosoya Z index Zagreb indices, Mohar indices Randic connectivity index Balaban distance connectivity index Schultz molecular topological index Kier shape descriptors eigenvalues of the adjacency matrix eigenvalues of the distance matrix Kirchhoff number detour index topological charge indices Wiener index, Hosoya Z index Zagreb indices, Mohar indices Randic connectivity index Balaban distance connectivity index Schultz molecular topological index Kier shape descriptors eigenvalues of the adjacency matrix eigenvalues of the distance matrix Kirchhoff number detour index topological charge indices total information content on..... mean information content on..... total information content on..... mean information content on..... Kier-Hall valence connectivity indices Burden eigenvalues BCUT descriptors Kier alpha-modified shape descriptors 2D autocorrelation descriptors Kier-Hall valence connectivity indices Burden eigenvalues BCUT descriptors Kier alpha-modified shape descriptors 2D autocorrelation descriptors D-Wiener index 3D-Balaban index D/D index D-Wiener index 3D-Balaban index D/D index topological information indices topostructural descriptors topochemical descriptors molecular geometry x, y, z coordinates topographic descriptors

molecular geometry x, y, z coordinates geometrical descriptors interaction energy values grid-based QSAR techniques quantum-chemical descriptors gravitational indices 3D-Morse descriptors EVA descriptors EEVA descriptors WHIM descriptors GETAWAY descriptors gravitational indices 3D-Morse descriptors EVA descriptors EEVA descriptors WHIM descriptors GETAWAY descriptors CoMFA, GRID G-WHIM descriptors CoMFA, GRID G-WHIM descriptors van der Waals volume geometric volume van der Waals volume geometric volume charges electronegativities superdelocalizability hardness softness E LUMO E HOMO charges electronegativities superdelocalizability hardness softness E LUMO E HOMO solvent-accessible surface area CPSA descriptors molecular shape analysis Mezey 3D shape analysis solvent-accessible surface area CPSA descriptors molecular shape analysis Mezey 3D shape analysis molecular surface volume descriptors

Properties of a molecular descriptor Several scientists are involved in searching for new molecular descriptors able to catch new aspects of the molecular structure. This kind of reasearch involves creativity and imagination together with solid theoretical basis allowing to obtain numbers with some structural chemical meaning. "There are no restriction on the design of structural invariants, the limiting factor is one's own imagination." [1]. M. Randic (1996), Molecular bonding profiles, J. Math. Chem., 19,

Properties of a molecular descriptor  invariance with respect to labeling and numbering of atoms  invariance with respect to roto-translation  an unambiguous algorithmically computable definition  values in a suitable numerical range for the set of molecules where it is applicable to  invariance with respect to labeling and numbering of atoms  invariance with respect to roto-translation  an unambiguous algorithmically computable definition  values in a suitable numerical range for the set of molecules where it is applicable to a descriptor MUST have...

Properties of a molecular descriptor a descriptor should have...  a structural interpretation  a good correlation with at least one property  no trivial correlation with other molecular descriptors  gradual change in its values with gradual changes in the molecular structure  not including in the definition experimental properties  not restricted to a too small class of molecular structures  preferably, some discrimination power among isomers  preferably, not trivially including in the definition other molecular descriptors  preferably, allowing reversible decoding (back from the descriptor value to the structure)

QSAR strategy  regression models (quantitative response)  classification models (qualitative response)  ranking models (ordered response)  regression models (quantitative response)  classification models (qualitative response)  ranking models (ordered response) models...

QSAR strategy - Regression

QSAR strategy - Classification

QSAR strategy - Ranking

QSAR strategy experimental responses molecular descriptors training set set of molecules MODEL SRC (QSAR, QSPR,... ) fitting molecular descriptors new molecules predicted new responses reversible decoding experimental responses molecular descriptors test set prediction power

QSAR strategy The true interest is in predictive power of the model Model validation Chemometrics

… towards conclusions …

FAQ - Frequently Asked Questions 1. What is the meaning of that descriptor ? 2. Why are there some models with the same prediction power but different molecular descriptors ? 3. Why use a huge number of molecular descriptors ?

FGA - our Frequently Given Answers 1. What is the meaning of that descriptor ? A molecular descriptor is a number extracted by a well defined algorithm from a molecular representation of a complex system, i.e. the molecule. There are good reasons to believe that often our difficulties to attribute a meaning to this number ultimately flow from the lacking of deeper chemical theories and higher level languages and not from exoteric approaches to the descriptor definition. A molecular descriptor is a number extracted by a well defined algorithm from a molecular representation of a complex system, i.e. the molecule. There are good reasons to believe that often our difficulties to attribute a meaning to this number ultimately flow from the lacking of deeper chemical theories and higher level languages and not from exoteric approaches to the descriptor definition. R. Todeschini and V. Consonni

2. Why are there some models with the same prediction power but different molecular descriptors ? Molecular descriptors are often intercorrelated, therefore different molecular descriptors can, in turn, take part in a model. FGA - our Frequently Given Answers Any alternative viewpoint with a different emphasis leads to an inequivalent description. There is only one reality but there are many points of view. Hans Primas Hans Primas

3. Why use a huge number of molecular descriptors ? Complexity is not an intrinsic property of systems, but rather arises from the number of ways in which we are able (or desire) to interact with a system. A molecule is undoubtedly a complex system FGA - our Frequently Given Answers

Department of Environmental Sciences University of Milano - Bicocca P.za della Scienza, Milano (Italy) Website: michem.disat.unimib.it/chm/ THANK YOU Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria decision making environmetrics experimental design artificial neural networks statistical process control Milano Chemometrics and QSAR Research Group

coffee break

... since December  news  software  books  tutorials and a forum  news  software  books  tutorials and a forum

Don’t forget your goal! An understanding of the behavior of a system does not always coincide with the prediction of the system’s future behavior! 4. Is a model explaining the known facts of a system better than a model predicting the future events of that system ? fitting versus prediction FGA - our Frequently Given Answers

QSAR strategy - Regression

"SIGNORI, Si potrebbe chiedersi quale sia il modo più proficuo per ritrarre da una ipotesi il maggior utile per lo sviluppo di una data dottrina. Forse a molti potrà sembrare che in tale riguardo convenga procedere con grande prudenza per non introdurre nella scienza concezioni ipotetiche troppo ardite, che non si trovino poi in concordanza con la realtà dei fatti. Io credo invece che il progresso della scienza sia stato ritardato piuttosto da soverchia prudenza che da soverchio ardire. Nella scienza bisogna a tempo sapere osare come in materia di amore: sapere osare subito ed andare fino in fondo; i reclami ed i rammarichi del poi non servono a nulla." "SIGNORI, Si potrebbe chiedersi quale sia il modo più proficuo per ritrarre da una ipotesi il maggior utile per lo sviluppo di una data dottrina. Forse a molti potrà sembrare che in tale riguardo convenga procedere con grande prudenza per non introdurre nella scienza concezioni ipotetiche troppo ardite, che non si trovino poi in concordanza con la realtà dei fatti. Io credo invece che il progresso della scienza sia stato ritardato piuttosto da soverchia prudenza che da soverchio ardire. Nella scienza bisogna a tempo sapere osare come in materia di amore: sapere osare subito ed andare fino in fondo; i reclami ed i rammarichi del poi non servono a nulla." Giacomo Ciamician Tratto dalla Prolusione all'Opera scientifica di Wilhelm KÖRNER, Milano 15 maggio 1910.

Fragment approach The biological activity of a molecule is the sum of its fragment properties Congeneric molecules, i.e. a common reference skeleton Substituent properties

Fragment approach Parametric approach (Hammett – Hansch,1964) Group approach (Free-Wilson and Fujita-Ban, 1976) DARC-PELCO approach (Dubois, 1966) Sterimol approach (Verloop, 1976)

Hansch molecular descriptors partition coefficients - logP, logKow chromatog. param. - Rf, RT, Solubility …. Hammett constants molar refraction dipole moment HOMO, LUMO Ionization potential …. molecular weight VDW volume molar volume surface area …. lipophilic properties steric properties electronic properties Hansch approach

The role of the molecular descriptors

Introduction

Conclusions A molecular descriptor is a number extracted by a well defined algorithm from a molecular representation of a complex system, i.e. the molecule. There are good reasons to believe that often our difficulties to attribute a meaning to this number ultimately flow from the lacking of deeper chemical theories and higher level languages and not from exoteric approaches to the descriptor definition. A molecular descriptor is a number extracted by a well defined algorithm from a molecular representation of a complex system, i.e. the molecule. There are good reasons to believe that often our difficulties to attribute a meaning to this number ultimately flow from the lacking of deeper chemical theories and higher level languages and not from exoteric approaches to the descriptor definition. R. Todeschini and V. Consonni

Properties of a molecular descriptor

Conclusions Any alternative viewpoint with a different emphasis leads to an inequivalent description. There is only one reality but there are many points of view. Any alternative viewpoint with a different emphasis leads to an inequivalent description. There is only one reality but there are many points of view. Hans Primas Hans Primas

X

molecule physico - chemical properties  biological activities    molecular descriptors  

1D1D.. ·· ·· ·· ·· ·· ·· C C C C C C CC C C CC C l H H H H H H 3D3D 0D0D.. ·· ·· · · ·· ·· ·· H H H H H H 2D2D Representations of a molecular structure

molecular structure ? Just a question …

“... : benchè certamente si traveggano già dei rapporti fra la costituzione chimica (composizione e struttura) e le proprietà fisiche loro, è ancor certamente di gran lunga troppo ristretto il numero dei fatti, per dedurne delle conseguenze, che oltre al carattere d’una semplice ipotesi possono pretendere anche quello della probabilità. In ogni caso tali rapporti non sono di natura tanto semplice come a priori forse era lecito aspettarsi. Di certo le proprietà fisiche dei corpi sono in primo luogo una funzione della composizione e struttura loro, sulla di cui forma nulla ancora si sa; funzione probabilmente molto complessa e per il di cui studio occorrerà un imprevedibile numero di fatti, onde poter sufficientemente restringere la cerchia delle rappresentazioni possibili.” Some historical notes