Presentation is loading. Please wait.

Presentation is loading. Please wait.

JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

Similar presentations


Presentation on theme: "JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010."— Presentation transcript:

1 JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010

2 JKlustor Chemical clustering by similarity and structure

3 JKlustor performs similarity and structure based clustering of compound libraries and focused sets in both hierarchical and non-hierarchical fashion. Description of the product JKlustor Availability part of Jchem IJC (parts) server version (accessible via API) batch application programs HTML user interface one desktop application with GUI GUI is available as an applet

4 Wide range of methods Unsupervised, agglomerative clustering Hierarchical and non-hierarchical methods Similarity based and structure based techniques Flexible search options Tanimoto and Euclidean metrics, weighting Maximum common substructure identification chemical property matching including atom type, bond type, hybridization, charge Interactive display interactive hierarchy browser (dendrogram viewer) SAR-table R-table Efficient performance of tools varies between linear and quadratic scale Summary of key features

5 Versatile Choose the most appropriate method to the clustering problem Combine methods to achieve best results Use your trusted molecular descriptors in similarity calculation Easy integration in corporate discovery pipelines Cluster chemical files directly no need to import structures in database Intuitive Cluster formation is self-explanatory Benefits

6 Similarity based clustering Hierarchical Ward Non-hierarchical Sphere exclusion k-means Jarvis-Patrick

7 Ward's minimum variance method results in tight, well separated clusters Murtagh's reciprocal nearest neighbor (RNN) algorithm to speed it up quadratic scaling of running time (with respect to number of input structures) memory consumption scales linearly best used with smaller sets (like focused libraries), copes with < 100K structures Ward Clustering Features

8 based on fingerprints and/or other numerical data running time linear with respect to number of input structures memory scales sub-linearly can easily cope with 1Ms of structures suitable for diverse subset selection Sphere Exclusion Clustering Features

9 based on fingerprints and/or other numerical data minimises variance within each clusters number of clusters can directly be controlled finds the centre of natural clusters in the input data running time scales exponentially with respect to number of input structures can cope with <100Ks of structures k-means Clustering Features

10 variable-length Jarvis-Patrick clustering based on fingerprints and/or other numerical data takes structures/fingerprint and data values from either files or form database tables running time scales better than quadratic but worse than linear (with respect to number of input structures) memory scales linearly Jarp can cope with 100Ks of structures depending on data and parameters may create large number of singletons Jarp Clustering Features

11 8 different sets of know active compounds mixed together 5-HT3-antagonists ACE inhibitors angiotensin 2 antagonists D2 antagonists delta antagonists FTP antagonists mGluR1 antagonists thrombin inhibitors ChemAxons 2D Pharmacophore fingerprint was generated Fingerprints of the mixture were clustered by Ward 9 clusters were formed 8 centroids (cluster representative element) corresponded to the 8 activity classes 1 was a singleton All 8 real clusters contained structures only from the activity class of the centroid (over 95% true positive classification) Ward Clustering Example

12 Centroids

13 Ward Clustering Example Cluster of the D2 antagonists

14 Structure based clustering Non-hierarchical Bemis-Mucko frameworks Hierarchical LibraryMCS

15 Bemis-Murcko frameworks

16

17 based on structure of molecules cluster formation is apparent, visual, meets human expectations running time linear with respect to number of input structures memory scales sub-linearly can easily cope with 1Ms of structures suitable for quick overview of very large sets spots scaffold hops Bemis-Murcko frameworks features

18 Identifies the largest subgraph shared by several molecular structures LibraryMCS

19 LibraryMCS: Hierarchical MCS

20 SAR table view

21 R-group decomposition

22 based on structure of molecules cluster formation is apparent, visual, meets human expectations running time near-linear with respect to number of input structures can cope with 100K-200K of structures suitable for very thorough analysis spots scaffold hops substituent-activity (property analysis) LibraryMCS features

23 LibraryMCS integration at Abbott Clustering for the masses…, presented by Derek Debe at ChemAxons US UGM, Boston, 2008

24 Clustering performance comparison

25 Jklustor roadmap In the development pipeline Bemis-Murcko generalisations IJC integration KNIME integartion New GUI Manual clustering Multiple class membership Disconnected MCS (MOS) Planned PipelinePilot integration Spotfire integration JChemBase, JChemCartridge integration JC4XLS integration Blue sky Multitouch gestures LibraryMCS for 1M compound libraries


Download ppt "JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010."

Similar presentations


Ads by Google