AurQUEST Query management software for AurSCOPE Web-based application integrating ChemAxon technology Powerful Query Builder -Biological and Chemical Queries -Structural search using ChemAxon tools Efficient Navigation Different Export Formats (SDF, RDF, …)
Counterions MW > 700 Inorg NAS Stereo-duplicates Identical mol. but different salts … AurSCOPE database 2D unique structures 1 2 3 4 Data Preprocessing
11519 11519 molecules (*) (9897 uniques) Protocols: Binding or Electrophysiology Target: All Target type: Wild Parameter filter K i, EC 50, IC 50 < 300 nM (*) November 2005 AurSCOPE Ion Channels: Retrieving Active Molecules
Standardization of molecules. Generating Chemical Fingerprints (CF). Optimization of different CF parameters. CF-based Jarvis-Patrick clustering with various adjusted parameters. Encoding Chemical Space and Clustering
Parameters for Generating Hashed Chemical Fingerprints Fingerprint length - The number of bits in the bit string. - Bigger fingerprint increases the capacity for storing information on molecules. Maximum pattern length - The maximum length of atoms in the linear paths that are considered during the fragmentation of the molecule. (The length of cyclic patterns is not limited.). - Longer and more patterns hold more information on the molecule. Bits to be set for patterns - After detecting a pattern, some bits of the bit string are set to "1". The number of bits used to code patterns is constant. - Higher number of bits increases the coded information from a pattern. Darkness of the fingerprint - The percentage of "1" digits in the bit string. We consider fingerprints with more ones "darker" than those with less ones.
1. 1. For each structure, collect the set of nearest neighbors that has a dissimilarity (distance) less than a T threshold value. Two structures cluster together if they are in each others list of nearest neighbors. 2. 2. They have at least R min of their nearest neighbors in common, where R min is a ratio of the length of the shorter list. CF-based Jarvis-Patrick Clustering
T R min # Clusters# Singletons 0.15 0.150.29321663 0.39381663 0.49451663 0.59771663 0.16 0.160.38651499 0.59101500 0.17 0.170.38191372 0.58601373 0.18 0.180.37871238 0.58261238 0.19 0.190.37521140 0.57801141 0.20 0.200.37221051 0.57521051 Chemical fingerprint length in bits: 2048 Maximum number of bonds in patterns: 7 Maximum number of bits to set for each pattern: 5 CF-based Jarvis-Patrick Clustering
Similarity threshold = 0.85 (*) (*) Martin Y.C. et al. Do structurally similar molecules have similar biological activity? J. Med. Chem. 2002, 45, 4350-4358.
Conclusions JKlustor integrates computationally rapid and efficient clustering tools. Shortcomings to be addressed to deal with artificial singletons. Future work: combination with Maximum Common Substructure approach (LibMCS). Other algorithms (Ward,…)