UGM 2006 Miklós Vargyas Scientific Workshop Maximum Common Substructure.

Slides:



Advertisements
Similar presentations
iRobot: An Intelligent Crawler for Web Forums
Advertisements

TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advanced Piloting Cruise Plot.
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics.
JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.
UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance.
Structural Search Using ChemAxon Tools
The new JKlustor suite Miklós Vargyas Solutions for Cheminformatics.
1 Miklós Vargyas May, 2005 Compound Library Annotation.
1 Szabolcs Csepregi May, 2005 Structural Search Using ChemAxon Tools.
UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS.
1 Miklós Vargyas, Judit Papp May, 2005 MarvinSpace – live demo.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Multicriteria Decision-Making Models
Chapter 1 The Study of Body Function Image PowerPoint
Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
OvidSP Flexible. Innovative. Precise. Introducing OvidSP Resources.
UNITED NATIONS Shipment Details Report – January 2006.
Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.
and 6.855J Spanning Tree Algorithms. 2 The Greedy Algorithm in Action
DCV: A Causality Detection Approach for Large- scale Dynamic Collaboration Environments Jiang-Ming Yang Microsoft Research Asia Ning Gu, Qi-Wei Zhang,
Exploring Traversal Strategy for Web Forum Crawling Yida Wang, Jiang-Ming Yang, Wei Lai, Rui Cai, Lei Zhang and Wei-Ying Ma Chinese Academy of Sciences.
Microsoft Access 2007 Advanced Level. © Cheltenham Courseware Pty. Ltd. Slide No 2 Forms Customisation.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Copyright 2006 Digital Enterprise Research Institute. All rights reserved. MarcOnt Initiative Tools for collaborative ontology development.
Reductions Complexity ©D.Moshkovitz.
Vote Elicitation with Probabilistic Preference Models: Empirical Estimation and Cost Tradeoffs Tyler Lu and Craig Boutilier University of Toronto.
1 Outline relationship among topics secrets LP with upper bounds by Simplex method basic feasible solution (BFS) by Simplex method for bounded variables.
Robust Window-based Multi-node Technology- Independent Logic Minimization Jeff L.Cobb Kanupriya Gulati Sunil P. Khatri Texas Instruments, Inc. Dept. of.
Solve Multi-step Equations
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Randomized Algorithms Randomized Algorithms CS648 1.
EU Market Situation for Eggs and Poultry Management Committee 21 June 2012.
2 |SharePoint Saturday New York City
VOORBLAD.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
© 2012 National Heart Foundation of Australia. Slide 2.
Science as a Process Chapter 1 Section 2.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Equal or Not. Equal or Not
Take out the homework from last night then do, Warm up #1
H to shape fully developed personality to shape fully developed personality for successful application in life for successful.
Analyzing Genes and Genomes
We will resume in: 25 Minutes.
Chapter 12 Analyzing Semistructured Decision Support Systems Systems Analysis and Design Kendall and Kendall Fifth Edition.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 12 View Design and Integration.
Local Search Jim Little UBC CS 322 – CSP October 3, 2014 Textbook §4.8
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
1 Chapter 13 Nuclear Magnetic Resonance Spectroscopy.
The Small World Phenomenon: An Algorithmic Perspective Speaker: Bradford Greening, Jr. Rutgers University – Camden.
Amit Goyal Laks V. S. Lakshmanan RecMax: Exploiting Recommender Systems for Fun and Profit University of British Columbia
Presentation transcript:

UGM 2006 Miklós Vargyas Scientific Workshop Maximum Common Substructure

UGM 2006 Slide 2 Workshop overview Introduction, concepts, theory Clustering, the role of MCS Applications Future plans

UGM 2006 Slide 3 Motivations Automated reaction mapping

UGM 2006 Slide 4 Mapping chemical reactions

UGM 2006 Slide 5 ChemAxons automapper Find parts common to both sides Map common parts

UGM 2006 Slide 6 ChemAxons automapper Map the rest –Score possible mappings –Find the one that scores the highest

UGM 2006 Slide 7 Concepts and theory MCS/MCES/MOS MCS complexity O(n m )

UGM 2006 Slide 8 MCS search methods / Clique Barrow and Burstall, 1976 Raymond and Willett, RASCAL, 2002 Details in brief –Construct the product graph of G 1 and G 2 Node count: |V 1 | |V 2 | –Find clique, it corresponds to largest matching Why is it good –Very elegant, pure graph theory –MCES can also be found –Disconnected MCS/MCES can be found –Node and edge coloring fits easily What are the drawbacks –Product graph is large and dense Recent advances in clique detection

UGM 2006 Slide 9 MCS search methods / Backtrack Crandell-Smith, 1983 Advantages –Flexible, easy to add constraints, incorporate chemical knowledge, heuristics –Dynamic programming –Various search strategies Recent algorithms –Jun Xu, GMA, 1995

UGM 2006 Slide 10 Comparison of methods Brint and Willett, 1986: Clique based substantially faster Recent publication, 2006: backtracking is superior We tested both approaches –Backtracking: 1.2 s (exhaustive search) –Clique based was stopped after 2 hours!!!

UGM 2006 Slide 11 ChemAxon MCS search approach Based on Wang and Zhou, EMCSS, 1996 Backtracking –Divide and conquer strategy –Create all spanning trees of the query graph

UGM 2006 Slide 12 ChemAxon MCS search approach –Use this as a route plan to traverse the target graph

UGM 2006 Slide 13 An application of MCS Reaction automapping (live demonstration) Average mapping time: 320ms Complex structures cannot be mapped efficiently

UGM 2006 Slide 14 Product development philosophy Sophisticated technology High performance (speed, accuracy, features) Rounded, industry relevant functionality Customizable Extendable Long term relevance >300 active clients Client driven development Fast and reliable support Comprehensive API Platform independence (Java)

UGM 2006 Slide 15 LibMCS motivations However, finding MCS from a pair of molecules has limited usage for our study. When we get hits from HTS, we cluster them into groups and the chemists will eye browse each group to find the scaffolds that are potentially good templates for later expansion. One main use of MCS will be to process multiple compounds of similar structures and automate what chemists have been doing by eyes now. We expect to use MCS tools for two cases: 1) use to analyze hits from HTS screens. 2) use it as a sorting tool for data retrieval, i.e., whenever people export data from our database (compounds across assays), we run MCS so that structurally similar compounds are grouped together. Chemists like this very much (we currently do this by clustering based on overall Tanimoto similarity). The typical hits from screens range from (in few cases). In lead optimization phase, the compound list is around in a typical project. So if MCS tools can process 5000 compound under 5 seconds, it can be integrated with online web tools. Otherwise, if it takes several minutes, it will be only used to analyze hits off-line based on user requests. If it takes more than an hour, its usage will be very limited.

UGM 2006 Slide 16 Exact solution –Requires the pair-wise comparison of each structure n (n - 1) / 2 MCS computations Next problem is larger!! –All CS (above a given size) have to be found n (n - 1) / 2 CS computations Partitioning O(n 3 ) CS LibMCS is a hard problem to solve

UGM 2006 Slide 17 Pair-wise MCS table

UGM 2006 Slide 18 Pair-wise MCS computation Average MCS computation: 100ms First step: n (n - 1) / 2 MCS computations –100 structures: ms = 8 min –1000 structures: 14 hours Second step: larger problem has to be solved Practically not feasible approach

UGM 2006 Slide 19 Known approaches / Products Stahl and Mauser, 2004, 2005 –Cluster first (ES) –Find an MCS for each cluster Wilkens, Janes and Su, 2004 BioReason ClassPharmer ChemTK LeadScope Tripos ? Daylight ?

UGM 2006 Slide 20 ChemAxons approach Goal –Reduce the number of MCS pair computations Idea: guess which two structures give significant MCS –Similar compounds are likely to share large MCS –Similarity guided pair-wise MCS Not clustering by similarity and determine the MCS for the cluster Which molecular descriptor gives best correlation –ChemAxon fingerprint –BCUT (Burden matrix) Consequence –Approximate solution

UGM 2006 Slide 21 LibMCS algorithm Read input structures Generate fingerprint Calculate similarity matrix Make singletons Compute MCS MCS large Create new cluster Similarity above threshold Get two most similar More structures SSS Found Add to cluster n n n n y y y y

UGM 2006 Slide 22 Applications Screen analysis Data visualization and profiling Combinatorial library partitioning Buying new compounds ? Suggest more!!!!

UGM 2006 Slide 23 Application 1 / Screen analysis

UGM 2006 Slide 24 Activity filtering

UGM 2006 Slide 25 Live demonstration Partitioning mixed combinatorial library –Affect of parameters –Affect of modes –Benchmarks –Quality of clusters

UGM 2006 Slide 26 Combichem library scaffolds

UGM 2006 Slide 27 Combichem library scaffolds Turbo mode distorts clusters

UGM 2006 Slide 28 Combichem benchmark Influence of normal/fast/turbo mode Worth, distortion is not significant

UGM 2006 Slide 29 Development roadmap Soon –R-Group decomposition –Stereo care MCS –Preserving rings –Lower bound pre-filtering –Disconnected MCS –Multi cluster members Mid term –Integrate Ward/Jarvis-Patrick in the new GUI Long term –Integrate molecular descriptors, metrics –Integrate virtual screening

UGM 2006 Slide 30 Coming soon – R-Group decomposition

UGM 2006 Slide 31 Coming soon – R-Group decomposition

UGM 2006 Slide 32 Coming soon – Multi cluster

UGM 2006 Slide 33 Summary MCS developed for automatic reaction mapping MCS based hierarchical clustering Fast method Chemical adequacy must be improved Various uses, currently focusing on combinatorial library partitioning

UGM 2006 Slide 34 Acknowledgements Developers –Péter Vadász –Nóra Máté Ideas –Szabolcs Csepregi, Ferenc Csizmadia Special thanks to