Download presentation

Presentation is loading. Please wait.

Published byLeah Jacobs Modified over 3 years ago

1
The new JKlustor suite Miklós Vargyas Solutions for Cheminformatics

2
2 Why do we cluster? to reduce the number of objects to deal with –group subsets together –represent each group by one member of it

3
3 Whats the matter with that? –tedious parameter tuning in a trial-and-error fashion –lack of interpretability the algorithm does not provide explanation –often do not meet chemists expectation

4
4 Why is clustering molecules hard? lack of innate spatial arrangement –artificial arrangement infinite types of chemical spaces various distance metrics usually high dimensionality (hard to visualize) –various approaches, no superior one best method depends on application area, and on actual data

5
5 What do we need? no/few tuning easy to understand simple explanation novel approach –structure based clustering –Maximum Common Substructure –Molecular frameworks

6
6 Maximum Common Substructure largest substructure shared by two molecules Simple concept! More human, visual. Yet hard (= expensive (= slow)) to compute.

7
7 MCS complexity Sub-structure searching –query structure is known, it only have to be found as part of the target structure (subgraph isomorphism) –graph isomorphism is even simpler yet NP-hard finding the answer can take long (scales exponentially with respect to the number graph vertexes) in the worst case validating an answer is fast MCS –query structure is not known –all possible substructures need to be checked even the number of substructures is exponential!

8
8 MCS algorithms two camps backtrackingclique detection ad hochigh mathematical elegance average complexity is better than worst case average complexity is same as worst case dynamic heuristicsstatic (initial) heuristics coloring is easycoloring is hard fuzzy matchingfussy matching

9
9 MCS of a structure set

10
10 LibraryMCS: Hierarchical MCS

11
11 Intuitive visualization

12
12 SAR table view

13
13 R-group decomposition

14
14 LibraryMCS scales linearly

15
15 Clustering performance comparison

16
16 Behind performance MCS search –exhaustive –heuristics exact inexact Predictive MCS coupling in clustering –all pairs are not feasible –rich fingerprinting

17
17 Live demonstration Affect of use of heuristics –on average < 10% misclassifications –useful for obtaining birds-eye-view of a larger/diverse sets

18
1M< compounds libraries Molecular scaffolds, –Rings, ring systems –Bemis-Murcko frameworks

19
Sphere exclusion –Variants… linear scaling Fast clustering methods

20
20 Jklustor roadmap In the dev pipeline –IJC integration –Spotfire integration –new dynamic viewer Planned –disconnected MCS –multiple class members

21
21 Acknowledgements Gábor Imre Judit Vaskó-Szedlár Péter Vadász

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google