Szilárd Dóránt May 2006 Building on JChem Base. Contents Introduction Structural overview The Property Table JChem structure tables The log table Standardization.

Slides:



Advertisements
Similar presentations
February 2013 Szilárd Dóránt Scientific & technical Presentation Pipeline Pilot Integration.
Advertisements

1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics.
Version 5.3, February 2010 Scientific & technical presentation JChem Base.
Scientific & technical presentation JChem Cartridge for Oracle
Scientific & technical presentation Fragmenter Nóra Máté Sept 2005.
Scientific & technical presentation Calculator Plugins January 2011.
Instant JChem INFORMATICS MATTERS
Java Solutions for Cheminformatics Feb 2008 Whats new for PP.
Version 5.3, April 2010 The ChemAxon Markush project overview and development discussion.
Calculator Plugins József Szegezdi, Nóra Máté. ChemAxon Calculator Plugins ChemAxons plugin handling mechanism provides a framework for calculating various.
Structural Search Using ChemAxon Tools
Nov 2008 Scientific & technical presentation JChem for Excel.
Pipeline Pilot Integration Szilard Dorant Solutions for Cheminformatics.
JChem Base chemical database
Solutions for Cheminformatics
UGM, June, 2007 Presenting: Szabolcs Csepregi JChem Base and Cartridge latest.
Instant JChem - current status and what's coming soon. Tim Dudgeon Solutions for Cheminformatics.
1 Szabolcs Csepregi May, 2005 Structural Search Using ChemAxon Tools.
Leveraging ChemAxon Cheminformatics in an Integrated Drug Discovery and Development Platform Zhenbin Li, Paul Starbard, Jim Gregory, Donald Chen, Paul.
2008 Accelrys EUGM Pipelining ChemAxon Szilard Dorant Solutions for Cheminformatics.
Instant JChem 2009 US + EU Seminars Confidential. Copyright© 2009 ChemAxon Kft, Informatics Matters Ltd Instant JChem Instant JChem Seminar series Q
Connecting to Databases. relational databases tables and relations accessed using SQL database -specific functionality –transaction processing commit.
CC SQL Utilities.
Introduction to Structured Query Language (SQL)
Access Tutorial 3 Maintaining and Querying a Database
Phil Brewster  One of the first steps – identify the proper data types  Decide how data (in columns) should be stored and used.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
MS Access Advanced Instructor: Vicki Weidler Assistant:
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
CSCI 6962: Server-side Design and Programming
CSCI 6962: Server-side Design and Programming JDBC Database Programming.
ASP.NET Programming with C# and SQL Server First Edition
Java Database Connectivity (JDBC) Introduction to JDBC JDBC is a simple API for connecting from Java applications to multiple databases. Lets you smoothly.
PHP Programming with MySQL Slide 8-1 CHAPTER 8 Working with Databases and MySQL.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor Ms. Arwa.
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
DAY 14: ACCESS CHAPTER 1 Tazin Afrin October 03,
ADO.NET A2 Teacher Up skilling LECTURE 3. What’s to come today? ADO.NET What is ADO.NET? ADO.NET Objects SqlConnection SqlCommand SqlDataReader DataSet.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Lecture Set 14 B new Introduction to Databases - Database Processing: The Connected Model (Using DataReaders)
Chapter 7 Working with Databases and MySQL PHP Programming with MySQL 2 nd Edition.
1 Working with MS SQL Server Textbook Chapter 14.
1. Connecting database from PHP 2. Sending query 3. Fetching data 4. Persistent connections 5. Best practices.
7 1 Chapter 7 Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
M1G Introduction to Database Development 5. Doing more with queries.
Java Database Connectivity (JDBC). Topics 1. The Vendor Variation Problem 2. SQL and Versions of JDBC 3. Creating an ODBC Data Source 4. Simple Database.
Microsoft Access. Microsoft access is a database programs that allows you to store retrieve, analyze and print information. Companies use databases for.
C OMPUTING E SSENTIALS Timothy J. O’Leary Linda I. O’Leary Presentations by: Fred Bounds.
Database Fundamental & Design by A.Surasit Samaisut Copyrights : All Rights Reserved.
JDBC. Java.sql.package The java.sql package contains various interfaces and classes used by the JDBC API. This collection of interfaces and classes enable.
Database Access Using JDBC BCIS 3680 Enterprise Programming.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
Session 1 Module 1: Introduction to Data Integrity
Access Databases from Java Programs via JDBC Tessema M. Mengistu Department of Computer Science Southern Illinois University Carbondale
Chapter 12© copyright Janson Industries Java Server Faces ▮ Explain the JSF framework ▮ SDO (service data objects) ▮ Facelets ▮ Pagecode classes.
21 Copyright © 2009, Oracle. All rights reserved. Working with Oracle Business Intelligence Answers.
Level 1-2 Trigger Data Base development Current status and overview Myron Campbell, Alexei Varganov, Stephen Miller University of Michigan August 17, 2000.
CHAPTER 7 LESSON C Creating Database Reports. Lesson C Objectives  Display image data in a report  Manually create queries and data links  Create summary.
DAY 14: ACCESS CHAPTER 1 RAHUL KAVI October 8,
SQL Triggers, Functions & Stored Procedures Programming Operations.
MySQL Tutorial. Databases A database is a container that groups together a series of tables within a single structure Each database can contain 1 or more.
Preface IIntroduction Course Objectives I-2 Course Content I-3 1Introduction to Oracle Reports Developer Objectives 1-2 Business Intelligence 1-3 Enterprise.
Introduction to Database Programming with Python Gary Stewart
Chapter 6 - Database Implementation and Use
JDBC.
Aggregation Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together,
ISC440: Web Programming 2 Server-side Scripting PHP 3
Spreadsheets, Modelling & Databases
Presentation transcript:

Szilárd Dóránt May 2006 Building on JChem Base

Contents Introduction Structural overview The Property Table JChem structure tables The log table Standardization Memory considerations The search process Performance tips Duplicate filtering Displaying hits API examples JSP example Upgrading JChem Future plans

Introduction JChem Base provides high performance Java based tools for the storage, search and retrieval of chemical structures and associated data. These components can be integrated into web-based or standalone applications in association with other ChemAxon tools.

Structural overview Web browser Application Web application JChem Base API: Chemical logic Structure cache JDBC driver: Standard interface to the RDBMS RDBMS (e.g. Oracle, MySQL, etc.) : Storage and security

The Property Table The property table stores information about JChem structure tables, including: Fingerprint parameters Custom standardization rules Other table options and information Database-related license keys More than one property table can be used, each property table represents a particular JChem environment.

The structure of JChem tables Column nameExplanation cd_id unique numeric identifier in the table cd_structure the imported structure in the original format, without modifications (except for the removal of data fields) cd_smiles the standardized structure in ChemAxon Extended Smiles (cxsmiles) format, used by the search process cd_formula the formula of the standardized structure cd_molweight the molecular weight of the standardized structure cd_hash hash code used for duplicate filtering (PERFECT search) cd_flags can store row specific option, e.g. overriding the chiral flag cd_timestamp the date and time of the insertion of the row cd_fp… fingerprint columns [user fields] custom data fields can be added by the user

The log table For efficient cache update it is essential to keep track of modifications to the table A log table ( _UL ) keeps track of the modifications (insert / update/ delete) performed through the JChem API. To limit the number of log entries, old rows are deleted right after cache update (at the beginning a database search) DELETE right is needed for searches Option for number of rows to preserve

Standardization Essential for the graph search algorithm A basic standardization is provided as default A custom standardization can be specified for each table. The setting is saved in the Property Table. Set once and forget. Automatically utilized during: –Import (chemaxon.jchem.db.Importer) –Insert (chemaxon.jchem.db.UpdateHandler) –Update (chemaxon.jchem.db.UpdateHandler) –Search (chemaxon.jchem.db.JChemSearch) –Regeneration (chemaxon.jchem.db.UpdateHandler)

Memory Quick facts: The default JVM heap size is 64 MB On 32 bit systems no more than ~2 GB memory can be allocated to a single process 2 GB can hold approximately 20 million small structures JChem caches only whole structure tables (not rows) Table 1 Table 2 Table 3 Structure Cache Temporary allocations JChem Base Application Total memory need

Memory: Structure Cache The Structure Cache stores structures and fingerprints in a highly optimized, compact form Memory need is dependent on –The number of structures –The average size of the structures –The size of the fingerprint used Approximately 100MB is needed for 1 million drug- like (small) structures (using 512 bit fingerprints) The exact cache size can be retrieved from the API via: chemaxon.jchem.db.JChemSearch.getCachedTables()

Memory: temporary allocations Not directly related to the size of structure tables Increases with the number of parallel operations Also includes the memory needed for –your application routines –the running environment (application server) Cannot be predicted exactly: performing stress test is recommended Specify the amount of memory JChem should not use for the Structure Cache: chemaxon.jchem.db.JChemSearch.setMinNonCachedMemory()

The search process A two stage method provides optimal search performance: 1. Rapid pre-screening reduces the number of possible hit candidates -Chemical Hashed Fingerprints are used for substructure and superstructure searches -Hash code is used for duplicate filtering (usually during compound registration) 2. Graph search algorithm is used to determine the final hit list

Performance: fingerprints The number of screened structures and the number of hits should be close Search statistics can be obtained via: JChemSearch.setInfoToStdError(true) Fingerprints should not be too dark Statistics can be obtained by invoking: jcman s See:

Performance: limiting the number of hits A large final result may be unnecessary for the chemist: the number of hits can be limited

Performance: multiple processors JChem automatically utilizes the number of processors available for the JVM Adding processors is a straightforward way to improve search performance The thread count per search can be limited to prevent the creation of too many threads on systems with high number of parallel searches: JChemSearch.setNumberOfProcessingThreads()

Performance: server mode JVM The server mode (-server JVM option ) increases the efficiency of run-time optimization Slower start-up speed Needs more processing time to reach optimum performance Higher final performance

Duplicate filtering Designated search mode for duplicate filtering: chemaxon.sss.search.SearchConstants.PERFECT A 32 bit hash code is used to rapidly find possible duplicates (cd_hash column), these are further checked by graph search The structure cache is not used in PERFECT search mode Available in the following API classes: –chemaxon.jchem.db.Importer –chemaxon.jchem.db.UpdateHandler –chemaxon.jchem.db.JChemSearch –chemaxon.sss.search.MolSearch

Displaying hits (1) The structures are stored in their original (non- standardized) format in the cd_structure column of the JChem table. Always use the cd_structure column to display structures If needed, the structure can be standardized for display on-the-fly via Standardizer Helper methods to retrieve structures (supports multiple column types of cd_structure): DatabaseTools. readBytes(ResultSet rs, String columnName) DatabaseTools. readBytes(ResultSet rs, int idx)

Displaying hits (2) Only molecule IDs are stored during DB search Search again with chemaxon.sss.search.MolSearch to obtain the hit atoms Helper method for hit alignment (rotation): –MolHandler.align(Molecule mol, int[] indexes) Since Marvin automatically colors the connecting half of the bonds, some bonds have to be explicitly set to neutral color: –MolHandler. getNonHitBondEndpoints() –MolHandler. getNonHitBonds() See: jchem\examples\java\HitAlignmentAndColoringExample.java

API example : connecting to a database ConnectionHandler ch = new chemaxon.jchem.db.ConnectionHandler(); ch.setDriver(oracle.jdbc.driver.OracleDriver); ch.setPropertyTable(JChemProperties); ch.setLoginName(scott); ch.setPassword("tiger"); ch.connect(); // the java.sql.Connection object is available if needed: Connection con=ch.getConnection(); … // closing the connection: ch.close();

API example : database import Importer importer = new chemaxon.jchem.db.Importer(); importer.setConnectionHandler(conh); importer.setInput(sample.sdf); // importer.setInput(is);// alternatively a stream can also be specified importer.setTableName(SCOTT.STRUCTURES); importer.setHaltOnError(false); importer.setDuplicateImportAllowed(false); //can filter duplicates // specifying SDFile field - table field pairs: String fieldPairs = DB_Field1=SDF_Field1; DB_Field2=SDF_Field2; importer.setFieldConnections(fieldPairs); int importedCount = importer.importMols(); System.out.println( Imported + importedCount + structures );

API example : database export Exporter exporter = new chemaxon.jchem.db.Exporter(); exporter.setConnectionHandler(conh); exporter.setTableName(structures); //data fields to be exported with the structure: exporter.setFieldList(cd_id cd_formula name comments); String fileName=output.sdf; OutputStream os=new FileOutputStream(fileName); exporter.setOutputStream(os); exporter.setFormat(sdf); int exportedCount = exporter.writeAll(); System.out.println(Exported + exportedCount + structures);

API example : database search JChemSearch searcher = new chemaxon.jchem.db.JChemSearch(); searcher.setConnectionHandler(ch); searcher.setSearchType(JChemSearch.SUBSTRUCTURE) searcher.setQueryStructure(c1ccccc1); searcher.setStructureTable(SCOTT.STRUCTURES); // a query that returns cd_id values can be used for prefiltering: Searcher.setFilterQuery( SELECT cd_id FROM structures, biodata WHERE + structures.cd_id = biodata.cd_id AND biodata.toxicity < 0.3 ); searcher.setWaitingForResult(true); // otherwise runs in a separate thread searcher.run(); // getting the results as cd_id values: int[] results=searcher.getResults();

API example : inserting a structure // ConnectionHandler, mode, table name and data field names: UpdateHandler uh = new chemaxon.jchem.db.UpdateHandler( ch, UpdateHandler.INSERT, structures, comment, stock); uh.setValueForFixColumns(c1ccccc1); // the structure // specifying data field values: uh.setStructureValueForAdditionalColumn(1, some text); uh.setStructureValueForAdditionalColumn(2, new Double(8.5)); uh.setDuplicateFiltering(true); // filtering duplicate structures int id=uh.execute(true); // getting back the cd_id of the inserted structure if ( id > 0 ) { System.out.println(Inserted, cd_id value : + id); } else { System.out.println(Already exists with cd_id value : + (-id)); } // storing update information, the database connection remains open : uh.close();

JSP example application Some of the functions implemented: –Different search types –Hit alignment and coloring –Chemical Terms filtering –Import / Export –Insert / Modify / Delete Open source, customizable The source is available in the JChem package under: jchem\examples\jsp1_x

Upgrading JChem Tables of old versions have to be upgraded for two reasons: –Calculated values (e.g. fingerprints) may be out of date (should be recalculated at every upgrade) –In some versions there is a change in the table structure Normally done by JChemManager (jcman) GUI at startup or command-line by invokingjcman u An general upgrade API will be available for integrators from version 3.2

Future plans Support for storing and searching Markush structures Database field access from Chemical Terms expressions Tautomer search Chemical Terms columns Tables storing query structures

Summary ChemAxons JChem Base API provides sophisticated tools for the developer to deal with chemical structures and associated data. Building on the JChem API is convenient, because: Our various tools integrate seamlessly Both high and low level API classes are available Responsive developer-to-developer support

Links JChem home page: – Live demos: – API documentation: – Brochure: –

Máramaros köz 3/a Budapest, 1037 Hungary Thank you for your attention