JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.

Slides:



Advertisements
Similar presentations
Virtual Synthesis - Reactor
Advertisements

CLUSTERING.
Shape and Color Clustering with SAESAR Norah E. MacCuish, John D. MacCuish, and Mitch Chapman Mesa Analytics & Computing, Inc.
1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics.
Version 5.3, February 2010 Scientific & technical presentation JChem Base.
Scientific & technical presentation JChem Cartridge for Oracle
Integrating ChemAxon technology into your End User Applications Java solutions for cheminformatics Ver. Mar., 2005.
Scientific & technical presentation Calculator Plugins January 2011.
Instant JChem INFORMATICS MATTERS
Java Solutions for Cheminformatics Feb 2008 Whats new for PP.
Scientific & technical presentation Structure Visualization with MarvinSpace Oct 2006.
UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance.
Structural Search Using ChemAxon Tools
Nov 2008 Scientific & technical presentation JChem for Excel.
Pipeline Pilot Integration Szilard Dorant Solutions for Cheminformatics.
The new JKlustor suite Miklós Vargyas Solutions for Cheminformatics.
ChemAxon in 3D Gábor Imre, Adrián Kalászi and Miklós Vargyas Solutions for Cheminformatics.
1 Miklós Vargyas May, 2005 Compound Library Annotation.
UGM, June, 2007 Presenting: Szabolcs Csepregi JChem Base and Cartridge latest.
Chemaxon's chemo-informatics toolkit integration into the Affectis Data Management System Database Automated Data Integration - Example: IC50 Data generated.
UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS.
1 Miklós Vargyas, Judit Papp May, 2005 MarvinSpace – live demo.
2008 Accelrys EUGM Pipelining ChemAxon Szilard Dorant Solutions for Cheminformatics.
Instant JChem 2009 US + EU Seminars Confidential. Copyright© 2009 ChemAxon Kft, Informatics Matters Ltd Instant JChem Instant JChem Seminar series Q
© 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager.
Dr. Matthew Wright Product Director.
NetPay provides best and effective solution for company Managers to maintain their employee scheduling task (including staff in/out details, overtime,
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
Analysis of High-Throughput Screening Data C371 Fall 2004.
Components of GIS.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Apache Struts Technology
Wevo Product Presentation ® Alan Webster
Routing Protocol.
Introduction to Bioinformatics
SciFinder ® : Part of the process™ 2006 Edition. SciFinder ® : Part of the process™ 2006 Edition SciFinder ® 2006 provides new, powerful capabilities.
1998/5/21by Chang I-Ning1 ImageRover: A Content-Based Image Browser for the World Wide Web Introduction Approach Image Collection Subsystem Image Query.
Introduction to Bioinformatics - Tutorial no. 12
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Enterprise Object Framework. What is EOF? Enterprise Objects Framework is a set of tools and resources that help you create applications that work with.
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
XP 1 HTML: The Language of the Web A Web page is a text file written in a language called Hypertext Markup Language. A markup language is a language that.
Similarity Methods C371 Fall 2004.
Codeigniter is an open source web application. It occupies a very small amount of space in the memory and is most useful for developers who aim to develop.
May 2009 ChemAxon - What’s New?. What’s new and hot? All products have seen enhancements in the past 12 months BUT WHAT’S REALLY HOT?
Copyright © 2006 Knovel Corporation Streamline Your Science and Engineering Research
Custom Spotfire Applications for use in Drug Discovery Chris Louer Team Leader, Cheminformatics © 2001, GlaxoSmithKline, Inc. - All Rights Reserved.
28-29 th March 2006CCP4 Automation STAB MeetingCCP4i and Automation 1 CCP4i and Automation : Opportunities and Limitations Peter Briggs, CCP4.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Copyright OpenHelix. No use or reproduction without express written consent1.
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
Selecting Diverse Sets of Compounds C371 Fall 2004.
Applied Multivariate Statistics Cluster Analysis Fall 2015 Week 9.
Use of Machine Learning in Chemoinformatics
Computational Approach for Combinatorial Library Design Journal club-1 Sushil Kumar Singh IBAB, Bangalore.
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Daylight and Discovery
MOSWeb: clustering server
GPX: Interactive Exploration of Time-series Microarray Data
5 reasons why you should consider Purge-it!
Supporting High-Performance Data Processing on Flat-Files
New Technologies for Storage and Display of Meteorological Data
BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies
Neal Kurande, WinaGodwin Anyanwu Jr., Adam Chau
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010

JKlustor Chemical clustering by similarity and structure

JKlustor Description of the product Availability JKlustor performs similarity and structure based clustering of compound libraries and focused sets in both hierarchical and non-hierarchical fashion. Availability part of Jchem IJC (parts) server version (accessible via API) batch application programs HTML user interface one desktop application with GUI GUI is available as an applet

Summary of key features Wide range of methods Unsupervised, agglomerative clustering Hierarchical and non-hierarchical methods Similarity based and structure based techniques Flexible search options Tanimoto and Euclidean metrics, weighting Maximum common substructure identification chemical property matching including atom type, bond type, hybridization, charge Interactive display interactive hierarchy browser (dendrogram viewer) SAR-table R-table Efficient performance of tools varies between linear and quadratic scale

Benefits Versatile Intuitive Choose the most appropriate method to the clustering problem Combine methods to achieve best results Use your trusted molecular descriptors in similarity calculation Easy integration in corporate discovery pipelines Cluster chemical files directly no need to import structures in database Intuitive Cluster formation is self-explanatory

Similarity based clustering Hierarchical Ward Non-hierarchical Sphere exclusion k-means Jarvis-Patrick

Ward Clustering Features Ward's minimum variance method results in tight, well separated clusters Murtagh's reciprocal nearest neighbor (RNN) algorithm to speed it up quadratic scaling of running time (with respect to number of input structures) memory consumption scales linearly best used with smaller sets (like focused libraries), copes with < 100K structures

Sphere Exclusion Clustering Features based on fingerprints and/or other numerical data running time linear with respect to number of input structures memory scales sub-linearly can easily cope with 1Ms of structures suitable for diverse subset selection

k-means Clustering Features based on fingerprints and/or other numerical data minimises variance within each clusters number of clusters can directly be controlled finds the centre of natural clusters in the input data running time scales exponentially with respect to number of input structures can cope with <100Ks of structures

Jarp Clustering Features variable-length Jarvis-Patrick clustering based on fingerprints and/or other numerical data takes structures/fingerprint and data values from either files or form database tables running time scales better than quadratic but worse than linear (with respect to number of input structures) memory scales linearly Jarp can cope with 100Ks of structures depending on data and parameters may create large number of singletons

Ward Clustering Example 8 different sets of know active compounds mixed together 5-HT3-antagonists ACE inhibitors angiotensin 2 antagonists D2 antagonists delta antagonists FTP antagonists mGluR1 antagonists thrombin inhibitors ChemAxon’s 2D Pharmacophore fingerprint was generated Fingerprints of the mixture were clustered by Ward 9 clusters were formed 8 centroids (cluster representative element) corresponded to the 8 activity classes 1 was a singleton All 8 real clusters contained structures only from the activity class of the centroid (over 95% true positive classification)

Ward Clustering Example Centroids

Ward Clustering Example Cluster of the D2 antagonists

Structure based clustering Non-hierarchical Bemis-Mucko frameworks Hierarchical LibraryMCS

Bemis-Murcko frameworks

Bemis-Murcko frameworks

Bemis-Murcko frameworks features based on structure of molecules cluster formation is apparent, visual, meets human expectations running time linear with respect to number of input structures memory scales sub-linearly can easily cope with 1Ms of structures suitable for quick overview of very large sets spots scaffold hops

LibraryMCS Identifies the largest subgraph shared by several molecular structures

LibraryMCS: Hierarchical MCS

SAR table view

R-group decomposition

LibraryMCS features based on structure of molecules cluster formation is apparent, visual, meets human expectations running time near-linear with respect to number of input structures can cope with 100K-200K of structures suitable for very thorough analysis spots scaffold hops substituent-activity (property analysis)

LibraryMCS integration at Abbott “Clustering for the masses…”, presented by Derek Debe at ChemAxon’s US UGM, Boston, 2008

Clustering performance comparison

Jklustor roadmap In the development pipeline Planned Blue sky Bemis-Murcko generalisations IJC integration KNIME integartion New GUI Manual clustering Multiple class membership Disconnected MCS (MOS) Planned PipelinePilot integration Spotfire integration JChemBase, JChemCartridge integration JC4XLS integration Blue sky Multitouch gestures LibraryMCS for 1M compound libraries