Presentation is loading. Please wait.

Presentation is loading. Please wait.

Szilárd Dóránt May 2006 Building on JChem Base. Contents Introduction Structural overview The Property Table JChem structure tables The log table Standardization.

Similar presentations


Presentation on theme: "Szilárd Dóránt May 2006 Building on JChem Base. Contents Introduction Structural overview The Property Table JChem structure tables The log table Standardization."— Presentation transcript:

1 Szilárd Dóránt May 2006 Building on JChem Base

2 Contents Introduction Structural overview The Property Table JChem structure tables The log table Standardization Memory considerations The search process Performance tips Duplicate filtering Displaying hits API examples JSP example Upgrading JChem Future plans

3 Introduction JChem Base provides high performance Java based tools for the storage, search and retrieval of chemical structures and associated data. These components can be integrated into web-based or standalone applications in association with other ChemAxon tools.

4 Structural overview Web browser Application Web application JChem Base API: Chemical logic Structure cache JDBC driver: Standard interface to the RDBMS RDBMS (e.g. Oracle, MySQL, etc.) : Storage and security

5 The Property Table The property table stores information about JChem structure tables, including: Fingerprint parameters Custom standardization rules Other table options and information Database-related license keys More than one property table can be used, each property table represents a particular JChem environment.

6 The structure of JChem tables Column nameExplanation cd_id unique numeric identifier in the table cd_structure the imported structure in the original format, without modifications (except for the removal of data fields) cd_smiles the standardized structure in ChemAxon Extended Smiles (cxsmiles) format, used by the search process cd_formula the formula of the standardized structure cd_molweight the molecular weight of the standardized structure cd_hash hash code used for duplicate filtering (PERFECT search) cd_flags can store row specific option, e.g. overriding the chiral flag cd_timestamp the date and time of the insertion of the row cd_fp… fingerprint columns [user fields] custom data fields can be added by the user

7 The log table For efficient cache update it is essential to keep track of modifications to the table A log table ( _UL ) keeps track of the modifications (insert / update/ delete) performed through the JChem API. To limit the number of log entries, old rows are deleted right after cache update (at the beginning a database search) DELETE right is needed for searches Option for number of rows to preserve

8 Standardization Essential for the graph search algorithm A basic standardization is provided as default A custom standardization can be specified for each table. The setting is saved in the Property Table. Set once and forget. Automatically utilized during: –Import (chemaxon.jchem.db.Importer) –Insert (chemaxon.jchem.db.UpdateHandler) –Update (chemaxon.jchem.db.UpdateHandler) –Search (chemaxon.jchem.db.JChemSearch) –Regeneration (chemaxon.jchem.db.UpdateHandler)

9 Memory Quick facts: The default JVM heap size is 64 MB On 32 bit systems no more than ~2 GB memory can be allocated to a single process 2 GB can hold approximately 20 million small structures JChem caches only whole structure tables (not rows) Table 1 Table 2 Table 3 Structure Cache Temporary allocations JChem Base Application Total memory need

10 Memory: Structure Cache The Structure Cache stores structures and fingerprints in a highly optimized, compact form Memory need is dependent on –The number of structures –The average size of the structures –The size of the fingerprint used Approximately 100MB is needed for 1 million drug- like (small) structures (using 512 bit fingerprints) The exact cache size can be retrieved from the API via: chemaxon.jchem.db.JChemSearch.getCachedTables()

11 Memory: temporary allocations Not directly related to the size of structure tables Increases with the number of parallel operations Also includes the memory needed for –your application routines –the running environment (application server) Cannot be predicted exactly: performing stress test is recommended Specify the amount of memory JChem should not use for the Structure Cache: chemaxon.jchem.db.JChemSearch.setMinNonCachedMemory()

12 The search process A two stage method provides optimal search performance: 1. Rapid pre-screening reduces the number of possible hit candidates -Chemical Hashed Fingerprints are used for substructure and superstructure searches -Hash code is used for duplicate filtering (usually during compound registration) 2. Graph search algorithm is used to determine the final hit list

13 Performance: fingerprints The number of screened structures and the number of hits should be close Search statistics can be obtained via: JChemSearch.setInfoToStdError(true) Fingerprints should not be too dark Statistics can be obtained by invoking: jcman s See: http://www.chemaxon.com/jchem/doc/user/fingerprint.html http://www.chemaxon.com/jchem/doc/user/fingerprint.html

14 Performance: limiting the number of hits A large final result may be unnecessary for the chemist: the number of hits can be limited

15 Performance: multiple processors JChem automatically utilizes the number of processors available for the JVM Adding processors is a straightforward way to improve search performance The thread count per search can be limited to prevent the creation of too many threads on systems with high number of parallel searches: JChemSearch.setNumberOfProcessingThreads()

16 Performance: server mode JVM The server mode (-server JVM option ) increases the efficiency of run-time optimization Slower start-up speed Needs more processing time to reach optimum performance Higher final performance

17 Duplicate filtering Designated search mode for duplicate filtering: chemaxon.sss.search.SearchConstants.PERFECT A 32 bit hash code is used to rapidly find possible duplicates (cd_hash column), these are further checked by graph search The structure cache is not used in PERFECT search mode Available in the following API classes: –chemaxon.jchem.db.Importer –chemaxon.jchem.db.UpdateHandler –chemaxon.jchem.db.JChemSearch –chemaxon.sss.search.MolSearch

18 Displaying hits (1) The structures are stored in their original (non- standardized) format in the cd_structure column of the JChem table. Always use the cd_structure column to display structures If needed, the structure can be standardized for display on-the-fly via Standardizer Helper methods to retrieve structures (supports multiple column types of cd_structure): DatabaseTools. readBytes(ResultSet rs, String columnName) DatabaseTools. readBytes(ResultSet rs, int idx)

19 Displaying hits (2) Only molecule IDs are stored during DB search Search again with chemaxon.sss.search.MolSearch to obtain the hit atoms Helper method for hit alignment (rotation): –MolHandler.align(Molecule mol, int[] indexes) Since Marvin automatically colors the connecting half of the bonds, some bonds have to be explicitly set to neutral color: –MolHandler. getNonHitBondEndpoints() –MolHandler. getNonHitBonds() See: jchem\examples\java\HitAlignmentAndColoringExample.java

20 API example : connecting to a database ConnectionHandler ch = new chemaxon.jchem.db.ConnectionHandler(); ch.setDriver(oracle.jdbc.driver.OracleDriver); ch.setUrl(jdbc:oracle:thin:@localhost:1521:mydb); ch.setPropertyTable(JChemProperties); ch.setLoginName(scott); ch.setPassword("tiger"); ch.connect(); // the java.sql.Connection object is available if needed: Connection con=ch.getConnection(); … // closing the connection: ch.close();

21 API example : database import Importer importer = new chemaxon.jchem.db.Importer(); importer.setConnectionHandler(conh); importer.setInput(sample.sdf); // importer.setInput(is);// alternatively a stream can also be specified importer.setTableName(SCOTT.STRUCTURES); importer.setHaltOnError(false); importer.setDuplicateImportAllowed(false); //can filter duplicates // specifying SDFile field - table field pairs: String fieldPairs = DB_Field1=SDF_Field1; DB_Field2=SDF_Field2; importer.setFieldConnections(fieldPairs); int importedCount = importer.importMols(); System.out.println( Imported + importedCount + structures );

22 API example : database export Exporter exporter = new chemaxon.jchem.db.Exporter(); exporter.setConnectionHandler(conh); exporter.setTableName(structures); //data fields to be exported with the structure: exporter.setFieldList(cd_id cd_formula name comments); String fileName=output.sdf; OutputStream os=new FileOutputStream(fileName); exporter.setOutputStream(os); exporter.setFormat(sdf); int exportedCount = exporter.writeAll(); System.out.println(Exported + exportedCount + structures);

23 API example : database search JChemSearch searcher = new chemaxon.jchem.db.JChemSearch(); searcher.setConnectionHandler(ch); searcher.setSearchType(JChemSearch.SUBSTRUCTURE) searcher.setQueryStructure(c1ccccc1); searcher.setStructureTable(SCOTT.STRUCTURES); // a query that returns cd_id values can be used for prefiltering: Searcher.setFilterQuery( SELECT cd_id FROM structures, biodata WHERE + structures.cd_id = biodata.cd_id AND biodata.toxicity < 0.3 ); searcher.setWaitingForResult(true); // otherwise runs in a separate thread searcher.run(); // getting the results as cd_id values: int[] results=searcher.getResults();

24 API example : inserting a structure // ConnectionHandler, mode, table name and data field names: UpdateHandler uh = new chemaxon.jchem.db.UpdateHandler( ch, UpdateHandler.INSERT, structures, comment, stock); uh.setValueForFixColumns(c1ccccc1); // the structure // specifying data field values: uh.setStructureValueForAdditionalColumn(1, some text); uh.setStructureValueForAdditionalColumn(2, new Double(8.5)); uh.setDuplicateFiltering(true); // filtering duplicate structures int id=uh.execute(true); // getting back the cd_id of the inserted structure if ( id > 0 ) { System.out.println(Inserted, cd_id value : + id); } else { System.out.println(Already exists with cd_id value : + (-id)); } // storing update information, the database connection remains open : uh.close();

25 JSP example application Some of the functions implemented: –Different search types –Hit alignment and coloring –Chemical Terms filtering –Import / Export –Insert / Modify / Delete Open source, customizable The source is available in the JChem package under: jchem\examples\jsp1_x

26 Upgrading JChem Tables of old versions have to be upgraded for two reasons: –Calculated values (e.g. fingerprints) may be out of date (should be recalculated at every upgrade) –In some versions there is a change in the table structure Normally done by JChemManager (jcman) GUI at startup or command-line by invokingjcman u An general upgrade API will be available for integrators from version 3.2

27 Future plans Support for storing and searching Markush structures Database field access from Chemical Terms expressions Tautomer search Chemical Terms columns Tables storing query structures

28 Summary ChemAxons JChem Base API provides sophisticated tools for the developer to deal with chemical structures and associated data. Building on the JChem API is convenient, because: Our various tools integrate seamlessly Both high and low level API classes are available Responsive developer-to-developer support

29 Links JChem home page: –www.jchem.comwww.jchem.com Live demos: –www.jchem.com/exampleswww.jchem.com/examples API documentation: –www.jchem.com/doc/apiwww.jchem.com/doc/api Brochure: –www.chemaxon.com/brochures/JChemBase.pdfwww.chemaxon.com/brochures/JChemBase.pdf

30 Máramaros köz 3/a Budapest, 1037 Hungary info@chemaxon.com www.chemaxon.com info@chemaxon.com www.chemaxon.com Thank you for your attention


Download ppt "Szilárd Dóránt May 2006 Building on JChem Base. Contents Introduction Structural overview The Property Table JChem structure tables The log table Standardization."

Similar presentations


Ads by Google