Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.

2 Contents ChemAxon chemical database tools Main features of JChem Base, Cartridge Example interfaces: JSP, ASP, AJAX examples Integration with other CXN products Markush structure storage, search and enumeration Recent developments, plans

3 Chemical database products JChem Base –A library for adding chemical structures into relational database systems. Available in Java, JSP and.NET –Open-source web application example is available. JChem Cartridge for Oracle –Extends Oracle SQL with chemical operators and index. –SQL interface for ChemAxon functionality Instant JChem –An all-in-one desktop chemical database application. JChem Web Services – SOAP interface to JChem Base JC4XL – Excel integration (coming) 3

4 Compatibility and integration Supported chemical file formats: SMILES MDL MOL/RXN/SDF/RDF (v2000 and v3000) CML, MRV IUPAC and traditional names InChI, mol2, PDB, etc. Database engines: Oracle, MySQL, MS SQL Server, MS Access, PostgreSQL, IBM DB2, Derby, etc. All operating systems through: Java API (JChem Base).NET API (JChem Base + IKVM) – for Windows SQL (Cartridge) 4

5 Structure searching: features Substructure, Similarity, Full, Full fragment, etc. search types Wide range of query atoms Query properties R-group queries Full SMARTS support Coordination compounds Link nodes Pseudo atoms, Lone pairs Relative stereo Reaction search features Polymers Position variation Hit coloring... 5

6 Structure searching: options Some selected structure search options: –Chemical Terms filter constraint –Tautomer search –Stereo on/off –Ignore charge/isotope/radical/valence/polymers –Vague bond matching modes: or aromatic; ignore bond types – Inverse hit list – Maximum search time / number of hits – SQL SELECT statement for pre-filtering – Ordering of results –etc. 6

7 Structure search: performance 7 JChem Base 5.2.0, Intel Quad Q6600 2.4GHz, 8GB RAM; Oracle Number of compounds Elapsed time Duplicates not checked Duplicates checked 10,00021 s26 s 100,0002 min2 min 36 s 200,0003 min 45 s5 min 5 s QueryNumber of hitsSearch time 20.81 s 930.79 s 5,8551.457 s 142,95011.076 s Compound registration: Substructure search in PubChem (19.5 million compounds):

8 Table types Control allowed chemical structures and available operations Molecule Reaction Markush Query Any structure 8

9 Example web applications Open source JSP, ASP examples –Marvin applets are used for query drawing and structure visualization AJAX example –Back-end is JChem Web Services –No Java is needed for browsing Demo 9

10 Integration Integration with other ChemAxon tools: –Custom, uniform chemical representation. (Standardizer – see separate presentation today.) –Automatically calculated properties by Chemical Terms Calculated columns (Calculator plugins) –Additional similarity calculations (Screen - JChem Base only) –Tautomer handling: Tautomer search Tautomer duplicate filter table/index option Custom tautomer transforms or canonical tautomer using Standardizer –Query drawing and structure visualization (Marvin) Provides the most consistent interface and back-end. 10

11 Integration Additional Cartridge functionality –JChem index (for non-JChem tables) –Communication with Oracle optimizer –Reaction based enumeration (Reactor) –Format conversions – image generation also –Markush enumeration (Calculator plugins) –Property predictions through Chemical Terms (Calculator plugins) 11

12 Registration system New component for registration system is under development (API only) Main features: –Customizable business logic Multilevel duplication control Customizable corporate registration ID Handling of salts, batches, lots, samples, and mixtures –Identification, split and registration of salt and solvent structures Storage of input structures in original format –Mock registration (dry run) –Pre-registration through a transitory area –Basic, customizable implementation examples Separate examples for chemists and registrars Web and Instant JChem interfaces will follow later 12

13 Handling of Markush structures

14 Markush structures Combinatorial Markush structure registration and search features handled in search and enumeration –R-groups (nesting to any depth) –Atom lists, bond lists –Position variation bond –Link nodes –Repeating units –Homology groups (aryl, alkyl, etc.) Built-in User-defined Compatible Markush enumeration plugin

15 Markush Enumeration Markush enumeration plugin –Full enumeration –Selected parts only –Random enumeration –Calculate library size: exact size of huge Markush libraries arbitrary precision or Magnitude –Scaffold alignment and coloring –Markush code –Optional example homology group enumeration

16 Markush storage & search Available in JChem Base and Instant JChem No enumeration involved – can handle very complex Markush structures (tested up to 10 40, but no explicit limits were built in.) Substructure and Full structure search Basic query features supported Substructure hit visualization: Markush structure reduction

17 Markush demo

18 Whats new

19 Whats new: JChem Base 5.1 –Position variation in queries –New fast & reliable tautomer duplicate search 5.2 –.NET API –Polymer storage and search –New query options and features including searching of attached data, group matching of undefined R-atoms, repeating units. –Improved substructure search performance –JChem Web Services –New metrics for similarity search (Tversky, etc.) (5.2.2)

20 Whats new: JChem Base Polymer support details Polymer brackets and properties(type, connectivity, etc.) considered during search and registration Attached data search (optional) – attached to atoms/bonds/brackets Source- and structure-based representation equivalence is checked (but can be switched off) –Addition to a double bond. E.g. polystyrene. –Polymerization through elimination of water or HCl. E.g. polyester, polyamide.

21 Whats new: JChem Base Polymer support details (cont.) Ladder type polymers Phase-shifting (for ht SRU) (can be switched off) End group matching: –* atoms: unspecified end groups –Search option to switch on/off end group matching Copolymer types: co, alt, rnd, blk, grf, xl, mer, mod Polymer mixtures New search options

22 Whats new: Cartridge-specific 5.1 –Tautomer duplicate filtering index option –Alter index option –Improved import speed (5.1.3) –Improved upgrade: no need to remove/recreate indices (5.1.4) 5.2 –Interactive installer –Increased substructure search performance (5.2.2) –Tversky similarity search (5.2.2)

23 Whats new: Markush New Features –Homology groups 19 built-in groups Customizable: –Examples (for built-in groups, enumeration only), –Full user-defined homology groups defined by R-group definition Marvin templates for easier sketching –Import reagent files as R-groups –Position variation and Repeating units

24 Plans

25 Plans: JChem Base & Cartridge JChem Base Further speed improvements (SSS, similarity) New vague bond level options R-group decomposition integration Improved support for Screen molecular descriptors Cartridge Screen molecular descriptors (BCUT, pharmacophore similarity, chemical hashed fp, etc) and metrics (Euclidean, Dice, etc.) for similarity search User-defined descriptor fingerprints Markush tables and search JChem Server, JChem cluster

26 Plans: Markush –.VMN import (format used by Merged Markush Service & Derwent World Patent Index) –Multiple graphical attachment points of R-groups –Homology variation queries –Overlap analysis of Markush structures –Homology group properties (# of atoms, branching points, # of heteroatoms, etc.) –Conditions for Markush variables

27 Summary JChem Base and Cartridge are comprehensive and efficient Markush structure storage, search and enumeration now reaching patent features coverage Continuous development, improvements in the pipeline

28 Find out more Product descriptions & links Forum Presentations and posters Download

