Download presentation
1
Molecular Descriptors
C371 Fall 2004
2
INTRODUCTION Molecular descriptors are numerical values that characterize properties of molecules Examples: Physicochemical properties (empirical) Values from algorithms, such as 2D fingerprints Vary in complexity of encoded information and in compute time
3
Descriptors for Large Data Sets
Descriptors representing properties of complete molecules Examples: LogP, Molar Refractivity Descriptors calculated from 2D graphs Examples: Topological Indexes, 2D fingerprints Descriptors requiring 3D representations Example: Pharmacophore descriptors
4
DESCRIPTORS CALCULATED FROM 2D STRUCTURES
Simple counts of features Lipinski Rule of Five (H bonds, MW, etc.) Number of ring systems Number of rotatable bonds Not likely to discriminate sufficiently when used alone Combined with other descriptors for best effect
5
Physicochemical Properties
Hydrophobicity LogP – the logarithm of the partition coefficient between n-octanol and water ClogP (Leo and Hansch) – based on small set of values from a small set of simple molecules BioByte: Daylight’s MedChem Help page Isolating carbon: one not doubly or triply bonded to a heteroatom
6
ACD Labs Calculated Properties
ACD Labs values now incorporated into the CAS Registry File for millions of compounds I-Lab: Name generation NMR prediction Physical property prediction
7
Molar Refractivity MR = n2 – 1 MW -------- ----- n2 + 2 d
where n is the refractive index, d is density, and MW is molecular weight. Measures the steric bulk of a molecule.
8
Topological Indexes Single-valued descriptors calculated from the 2D graph of the molecule Characterize structures according to size, degree of branching, and overall shape Example: Wiener Index – counts the number of bonds between pairs of atoms and sums the distances between all pairs
9
Topological Indexes: Others
Molecular Connectivity Indexes Randić (et al.) branching index Defines a “degree” of an atom as the number of adjacent non-hydrogen atoms Bond connectivity value is the reciprocal of the square root of the product of the degree of the two atoms in the bond. Branching index is the sum of the bond connectivities over all bonds in the molecule. Chi indexes – introduces valence values to encode sigma, pi, and lone pair electrons
10
Kappa Shape Indexes Characterize aspects of molecular shape
Compare the molecule with the “extreme shapes” possible for that number of atoms Range from linear molecules to completely connected graph
11
2D Fingerprints Two types:
One based on a fragment dictionary Each bit position corresponds to a specific substructure fragment Fragments that occur infrequently may be more useful Another based on hashed methods Not dependent on a pre-defined dictionary Any fragment can be encoded Originally designed for substructure searching, not for molecular descriptors
12
Atom-Pair Descriptors
Encode all pairs of atoms in a molecule Include the length of the shortest bond-by-bond path between them Elemental type plus the number of non-hydrogen atoms and the number of π-bonding electrons
13
BCUT Descriptors Designed to encode atomic properties that govern intermolecular interactions Used in diversity analysis Encode atomic charge, atomic polarizability, and atomic hydrogen bonding ability
14
DESCRIPTORS BASED ON 3D REPRESENTATIONS
Require the generation of 3D conformations Can be computationally time consuming with large data sets Usually must take into account conformational flexibility 3D fragment screens encode spatial relationships between atoms, ring centroids, and planes
15
Pharmacophore Keys & Other 3D Descriptors
Based on atoms or substructures thought to be relevant for receptor binding Typically include hydrogen bond donors and acceptors, charged centers, aromatic ring centers and hydrophobic centers Others: 3D topographical indexes, geometric atom pairs, quantum mechanical calculations for HUMO and LUMO
16
DATA VERIFICATION AND MANIPULATION
Data spread and distribution Coefficient of variation (standard deviation divided by the mean) Scaling (standardization): making sure that each descriptor has an equal chance of contributing to the overall analysis Correlations Reducing the dimensionality of a data set: Principal Components Analysis
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.